diff --git a/docs-website/sidebars.js b/docs-website/sidebars.js index 5c71e79a101728..786abb62bc97b1 100644 --- a/docs-website/sidebars.js +++ b/docs-website/sidebars.js @@ -106,6 +106,7 @@ module.exports = { type: "doc", id: "docs/features/dataset-usage-and-query-history", }, + "docs/features/feature-guides/documentation-forms", { label: "Domains", type: "doc", @@ -162,6 +163,7 @@ module.exports = { type: "doc", id: "docs/posts", }, + "docs/features/feature-guides/properties", { label: "Schema history", type: "doc", @@ -676,11 +678,6 @@ module.exports = { label: "OpenAPI", id: "docs/api/openapi/openapi-usage-guide", }, - { - type: "doc", - label: "Structured Properties", - id: "docs/api/openapi/openapi-structured-properties", - }, ], }, "docs/dev-guides/timeline", @@ -810,6 +807,8 @@ module.exports = { "docs/api/tutorials/descriptions", "docs/api/tutorials/custom-properties", "docs/api/tutorials/ml", + "docs/api/tutorials/structured-properties", + "docs/api/tutorials/forms", ], }, { diff --git a/docs/api/openapi/openapi-structured-properties.md b/docs/api/openapi/openapi-structured-properties.md deleted file mode 100644 index 8dd660698a0e8f..00000000000000 --- a/docs/api/openapi/openapi-structured-properties.md +++ /dev/null @@ -1,328 +0,0 @@ -# Structured Properties - DataHub OpenAPI v2 Guide - -This guides walks through the process of creating and using a Structured Property using the `v2` version -of the DataHub OpenAPI implementation. Note that this refers to DataHub's OpenAPI version and not the version of OpenAPI itself. - -Requirements: -* curl -* jq - -## Structured Property Definition - -Before a structured property can be added to an entity it must first be defined. Here is an example -structured property being created against a local quickstart instance. - -### Create Property Definition - -Example Request: - -```shell -curl -X 'POST' -v \ - 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/propertyDefinition' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "qualifiedName": "my.test.MyProperty01", - "displayName": "MyProperty01", - "valueType": "urn:li:dataType:datahub.string", - "allowedValues": [ - { - "value": {"string": "foo"}, - "description": "test foo value" - }, - { - "value": {"string": "bar"}, - "description": "test bar value" - } - ], - "cardinality": "SINGLE", - "entityTypes": [ - "urn:li:entityType:datahub.dataset" - ], - "description": "test description" -}' | jq -``` - -### Read Property Definition - -Example Request: - -```shell -curl -X 'GET' -v \ - 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/propertyDefinition' \ - -H 'accept: application/json' | jq -``` - -Example Response: - -```json -{ - "value": { - "allowedValues": [ - { - "value": { - "string": "foo" - }, - "description": "test foo value" - }, - { - "value": { - "string": "bar" - }, - "description": "test bar value" - } - ], - "qualifiedName": "my.test.MyProperty01", - "displayName": "MyProperty01", - "valueType": "urn:li:dataType:datahub.string", - "description": "test description", - "entityTypes": [ - "urn:li:entityType:datahub.dataset" - ], - "cardinality": "SINGLE" - } -} -``` - -### Delete Property Definition - -There are two types of deletion present in DataHub: `hard` and `soft` delete. As of the current release only the `soft` delete -is supported for Structured Properties. See the subsections below for more details. - -#### Soft Delete - -A `soft` deleted Structured Property does not remove any underlying data on the Structured Property entity -or the Structured Property's values written to other entities. The `soft` delete is 100% reversible with zero data loss. -When a Structured Property is `soft` deleted, a few operations are not available. - -Structured Property Soft Delete Effects: - -* Entities with a `soft` deleted Structured Property value will not return the `soft` deleted properties -* Updates to a `soft` deleted Structured Property's definition are denied -* Adding a `soft` deleted Structured Property's value to an entity is denied -* Search filters using a `soft` deleted Structured Property will be denied - -The following command will `soft` delete the test property `MyProperty01` created in this guide by writing -to the `status` aspect. - -```shell -curl -X 'POST' \ - 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/status?systemMetadata=false' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ -"removed": true -}' | jq -``` - -Removing the `soft` delete from the Structured Property can be done by either `hard` deleting the `status` aspect or -changing the `removed` boolean to `false. - -```shell -curl -X 'POST' \ - 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty01/status?systemMetadata=false' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ -"removed": false -}' | jq -``` - -#### Hard Delete - -⚠ **Not Implemented** ⚠ - -## Applying Structured Properties - -Structured Properties can now be added to entities which have the `structuredProperties` as aspect. In the following -example we'll attach and remove properties to an example dataset entity with urn `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`. - -### Set Structured Property Values - -This will set/replace all structured properties on the entity. See `PATCH` operations to add/remove a single property. - -```shell -curl -X 'POST' -v \ - 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "properties": [ - { - "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01", - "values": [ - {"string": "foo"} - ] - } - ] -}' | jq -``` - -### Patch Structured Property Value - -For this example, we'll extend create a second structured property and apply both properties to the same -dataset used previously. After this your system should include both `my.test.MyProperty01` and `my.test.MyProperty02`. - -```shell -curl -X 'POST' -v \ - 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Amy.test.MyProperty02/propertyDefinition' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "qualifiedName": "my.test.MyProperty02", - "displayName": "MyProperty02", - "valueType": "urn:li:dataType:datahub.string", - "allowedValues": [ - { - "value": {"string": "foo2"}, - "description": "test foo2 value" - }, - { - "value": {"string": "bar2"}, - "description": "test bar2 value" - } - ], - "cardinality": "SINGLE", - "entityTypes": [ - "urn:li:entityType:datahub.dataset" - ] -}' | jq -``` - -This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`. - -```shell -curl -X 'POST' -v \ - 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "properties": [ - { - "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01", - "values": [ - {"string": "foo"} - ] - }, - { - "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty02", - "values": [ - {"string": "bar2"} - ] - } - ] -}' | jq -``` - -#### Remove Structured Property Value - -The expected state of our test dataset include 2 structured properties. We'd like to remove the first one and preserve -the second property. - -```shell -curl -X 'PATCH' -v \ - 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json-patch+json' \ - -d '{ - "patch": [ - { - "op": "remove", - "path": "/properties/urn:li:structuredProperty:my.test.MyProperty01" - } - ], - "arrayPrimaryKeys": { - "properties": [ - "propertyUrn" - ] - } - }' | jq -``` - -The response will show that the expected property has been removed. - -```json -{ - "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", - "aspects": { - "structuredProperties": { - "value": { - "properties": [ - { - "values": [ - { - "string": "bar2" - } - ], - "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty02" - } - ] - } - } - } -} -``` - -#### Add Structured Property Value - -In this example, we'll add the property back with a different value, preserving the existing property. - -```shell -curl -X 'PATCH' -v \ - 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json-patch+json' \ - -d '{ - "patch": [ - { - "op": "add", - "path": "/properties/urn:li:structuredProperty:my.test.MyProperty01", - "value": { - "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01", - "values": [ - { - "string": "bar" - } - ] - } - } - ], - "arrayPrimaryKeys": { - "properties": [ - "propertyUrn" - ] - } - }' | jq -``` - -The response shows that the property was re-added with the new value `bar` instead of the previous value `foo`. - -```json -{ - "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", - "aspects": { - "structuredProperties": { - "value": { - "properties": [ - { - "values": [ - { - "string": "bar2" - } - ], - "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty02" - }, - { - "values": [ - { - "string": "bar" - } - ], - "propertyUrn": "urn:li:structuredProperty:my.test.MyProperty01" - } - ] - } - } - } -} -``` diff --git a/docs/api/tutorials/forms.md b/docs/api/tutorials/forms.md new file mode 100644 index 00000000000000..f60699ffebab58 --- /dev/null +++ b/docs/api/tutorials/forms.md @@ -0,0 +1,148 @@ +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Documentation Forms + +## Why Would You Use Documentation Forms? + +Documentation Forms are a way for end-users to fill out all mandatory attributes associated with a data asset. The form will be dynamically generated based on the definitions provided by administrators and stewards and matching rules. + +Learn more about forms in the [Documentation Forms Feature Guide](../../../docs/features/feature-guides/documentation-forms.md). + + +### Goal Of This Guide +This guide will show you how to create and read forms. + +## Prerequisites + +For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. +For detailed information, please refer to [Datahub Quickstart Guide](/docs/quickstart.md). + + + + + +Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding SaaS release version is `v0.2.16.5` +Connect to your instance via [init](https://datahubproject.io/docs/cli/#init): + +1. Run `datahub init` to update the instance you want to load into +2. Set the server to your sandbox instance, `https://{your-instance-address}/gms` +3. Set the token to your access token + + + + + + +## Create a Form + + + + +Create a yaml file representing the forms you’d like to load. +For example, below file represents a form `123456` You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/forms/forms.yaml). + + +```yaml +- id: 123456 + # urn: "urn:li:form:123456" # optional if id is provided + type: VERIFICATION # Supported Types: DOCUMENTATION, VERIFICATION + name: "Metadata Initiative 2023" + description: "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out" + prompts: + - id: "123" + title: "Retention Time" + description: "Apply Retention Time structured property to form" + type: STRUCTURED_PROPERTY + structured_property_id: io.acryl.privacy.retentionTime + required: True # optional, will default to True + entities: # Either pass a list of urns or a group of filters. This example shows a list of urns + urns: + - urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD) + # optionally assign the form to a specific set of users and/or groups + # when omitted, form will be assigned to Asset owners + actors: + users: + - urn:li:corpuser:jane@email.com # note: these should be urns + - urn:li:corpuser:john@email.com + groups: + - urn:li:corpGroup:team@email.com # note: these should be urns +``` + +:::note +Note that the structured properties and related entities should be created before you create the form. +Please refer to the [Structured Properties Tutorial](/docs/api/tutorials/structured-properties.md) for more information. +::: + + +You can apply forms to either a list of entity urns, or a list of filters. For a list of entity urns, use this structure: + +``` +entities: +urns: + - urn:li:dataset:... +``` + +For a list of filters, use this structure: + +``` +entities: +filters: + types: + - dataset # you can use entity type name or urn + platforms: + - snowflake # you can use platform name or urn + domains: + - urn:li:domain:finance # you must use domain urn + containers: + - urn:li:container:my_container # you must use container urn +``` + +Note that you can filter to entity types, platforms, domains, and/or containers. + +Use the CLI to create your properties: + +```commandline +datahub forms upsert -f {forms_yaml} +``` + +If successful, you should see `Created form urn:li:form:...` + + + + +## Read Property Definition + + + + + +You can see the properties you created by running the following command: + +```commandline +datahub forms get --urn {urn} +``` +For example, you can run `datahub forms get --urn urn:li:form:123456`. + +If successful, you should see metadata about your form returned like below. + +```json +{ + "urn": "urn:li:form:123456", + "name": "Metadata Initiative 2023", + "description": "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out", + "prompts": [ + { + "id": "123", + "title": "Retention Time", + "description": "Apply Retention Time structured property to form", + "type": "STRUCTURED_PROPERTY", + "structured_property_urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime" + } + ], + "type": "VERIFICATION" +} +``` + + + diff --git a/docs/api/tutorials/structured-properties.md b/docs/api/tutorials/structured-properties.md new file mode 100644 index 00000000000000..c32e92e58e8c71 --- /dev/null +++ b/docs/api/tutorials/structured-properties.md @@ -0,0 +1,567 @@ +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Structured Properties + +## Why Would You Use Structured Properties? + + Structured properties are a structured, named set of properties that can be attached to logical entities like Datasets, DataJobs, etc. +Structured properties have values that are types. Conceptually, they are like “field definitions”. + +Learn more about structured properties in the [Structured Properties Feature Guide](../../../docs/features/feature-guides/properties.md). + + +### Goal Of This Guide + +This guide will show you how to execute the following actions with structured properties. +- Create structured properties +- Read structured properties +- Delete structured properties (soft delete) +- Add structured properties to a dataset +- Patch structured properties (add / remove / update a single property) + +## Prerequisites + +For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. +For detailed information, please refer to [Datahub Quickstart Guide](/docs/quickstart.md). + +Additionally, you need to have the following tools installed according to the method you choose to interact with DataHub: + + + + +Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding SaaS release version is `v0.2.16.5` +Connect to your instance via [init](https://datahubproject.io/docs/cli/#init): + +- Run `datahub init` to update the instance you want to load into. +- Set the server to your sandbox instance, `https://{your-instance-address}/gms`. +- Set the token to your access token. + + + + + +Requirements for OpenAPI are: +* curl +* jq + + + + + +## Create Structured Properties + +The following code will create a structured property `io.acryl.privacy.retentionTime`. + + + + +Create a yaml file representing the properties you’d like to load. +For example, below file represents a property `io.acryl.privacy.retentionTime`. You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/struct_props.yaml). + +```yaml +- id: io.acryl.privacy.retentionTime + # - urn: urn:li:structuredProperty:io.acryl.privacy.retentionTime # optional if id is provided + qualified_name: io.acryl.privacy.retentionTime # required if urn is provided + type: number + cardinality: MULTIPLE + display_name: Retention Time + entity_types: + - dataset # or urn:li:entityType:datahub.dataset + - dataFlow + description: "Retention Time is used to figure out how long to retain records in a dataset" + allowed_values: + - value: 30 + description: 30 days, usually reserved for datasets that are ephemeral and contain pii + - value: 90 + description: Use this for datasets that drive monthly reporting but contain pii + - value: 365 + description: Use this for non-sensitive data that can be retained for longer +``` + +Use the CLI to create your properties: +```commandline +datahub properties upsert -f {properties_yaml} +``` + +If successful, you should see `Created structured property urn:li:structuredProperty:...` + + + + +```commandline +curl -X 'POST' -v \ + 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "qualifiedName": "io.acryl.privacy.retentionTime", + "valueType": "urn:li:dataType:datahub.number", + "description": "Retention Time is used to figure out how long to retain records in a dataset", + "displayName": "Retention Time", + "cardinality": "MULTIPLE", + "entityTypes": [ + "urn:li:entityType:datahub.dataset", + "urn:li:entityType:datahub.dataFlow" + ], + "allowedValues": [ + { + "value": {"double": 30}, + "description": "30 days, usually reserved for datasets that are ephemeral and contain pii" + }, + { + "value": {"double": 60}, + "description": "Use this for datasets that drive monthly reporting but contain pii" + }, + { + "value": {"double": 365}, + "description": "Use this for non-sensitive data that can be retained for longer" + } + ] +}' | jq +``` + + + +## Read Structured Properties + +You can see the properties you created by running the following command: + + + + + +```commandline +datahub properties get --urn {urn} +``` +For example, you can run `datahub properties get --urn urn:li:structuredProperty:io.acryl.privacy.retentionTime`. +If successful, you should see metadata about your properties returned. + +```commandline +{ + "urn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime", + "qualified_name": "io.acryl.privacy.retentionTime", + "type": "urn:li:dataType:datahub.number", + "description": "Retention Time is used to figure out how long to retain records in a dataset", + "display_name": "Retention Time", + "entity_types": [ + "urn:li:entityType:datahub.dataset", + "urn:li:entityType:datahub.dataFlow" + ], + "cardinality": "MULTIPLE", + "allowed_values": [ + { + "value": "30", + "description": "30 days, usually reserved for datasets that are ephemeral and contain pii" + }, + { + "value": "90", + "description": "Use this for datasets that drive monthly reporting but contain pii" + }, + { + "value": "365", + "description": "Use this for non-sensitive data that can be retained for longer" + } + ] +} +``` + + + + +Example Request: +``` +curl -X 'GET' -v \ + 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/propertyDefinition' \ + -H 'accept: application/json' | jq +``` + +Example Response: + +```commandline +{ + "value": { + "allowedValues": [ + { + "value": { + "double": 30.0 + }, + "description": "30 days, usually reserved for datasets that are ephemeral and contain pii" + }, + { + "value": { + "double": 60.0 + }, + "description": "Use this for datasets that drive monthly reporting but contain pii" + }, + { + "value": { + "double": 365.0 + }, + "description": "Use this for non-sensitive data that can be retained for longer" + } + ], + "qualifiedName": "io.acryl.privacy.retentionTime", + "displayName": "Retention Time", + "valueType": "urn:li:dataType:datahub.number", + "description": "Retention Time is used to figure out how long to retain records in a dataset", + "entityTypes": [ + "urn:li:entityType:datahub.dataset", + "urn:li:entityType:datahub.dataFlow" + ], + "cardinality": "MULTIPLE" + } +} +``` + + + + + +## Set Structured Property To a Dataset + +This action will set/replace all structured properties on the entity. See PATCH operations to add/remove a single property. + + + + +You can set structured properties to a dataset by creating a dataset yaml file with structured properties. For example, below is a dataset yaml file with structured properties in both the field and dataset level. + +Please refer to the [full example here.](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/datasets.yaml) + +```yaml +- id: user_clicks_snowflake + platform: snowflake + schema: + fields: + - id: user_id + structured_properties: + io.acryl.dataManagement.deprecationDate: "2023-01-01" + structured_properties: + io.acryl.dataManagement.replicationSLA: 90 +``` + +Use the CLI to upsert your dataset yaml file: +```commandline +datahub dataset upsert -f {dataset_yaml} +``` +If successful, you should see `Update succeeded for urn:li:dataset:...` + + + + + + +Following command will set structured properties `retentionTime` as `90` to a dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)`. +Please note that the structured property and the dataset must exist before executing this command. (You can create sample datasets using the `datahub docker ingest-sample-data`) + +```commandline +curl -X 'POST' -v \ + 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "properties": [ + { + "propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime", + "values": [ + {"string": "90"} + ] + } + ] +}' | jq +``` + + + + +#### Expected Outcomes + +Once your datasets are uploaded, you can view them in the UI and view the properties associated with them under the Properties tab. + +

+ +

+ +Or you can run the following command to view the properties associated with the dataset: + +```commandline +datahub dataset get --urn {urn} +``` + +## Patch Structured Property Value + +This section will show you how to patch a structured property value - either by removing, adding, or upserting a single property. + +### Add Structured Property Value + +For this example, we'll extend create a second structured property and apply both properties to the same dataset used previously. +After this your system should include both `io.acryl.privacy.retentionTime` and `io.acryl.privacy.retentionTime02`. + + + + +Let's start by creating the second structured property. + +``` +curl -X 'POST' -v \ + 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime02/propertyDefinition' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "qualifiedName": "io.acryl.privacy.retentionTime02", + "displayName": "Retention Time 02", + "valueType": "urn:li:dataType:datahub.string", + "allowedValues": [ + { + "value": {"string": "foo2"}, + "description": "test foo2 value" + }, + { + "value": {"string": "bar2"}, + "description": "test bar2 value" + } + ], + "cardinality": "SINGLE", + "entityTypes": [ + "urn:li:entityType:datahub.dataset" + ] +}' | jq + +``` + +This command will attach one of each of the two properties to our test dataset `urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)` +Specically, this will set `io.acryl.privacy.retentionTime` as `90` and `io.acryl.privacy.retentionTime02` as `bar2`. + + +``` +curl -X 'POST' -v \ + 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "properties": [ + { + "propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime", + "values": [ + {"string": "90"} + ] + }, + { + "propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02", + "values": [ + {"string": "bar2"} + ] + } + ] +}' | jq +``` + + + + +#### Expected Outcomes +You can see that the dataset now has two structured properties attached to it. + +

+ +

+ + + +### Remove Structured Property Value + +The expected state of our test dataset include 2 structured properties. +We'd like to remove the first one (`io.acryl.privacy.retentionTime`) and preserve the second property. (`io.acryl.privacy.retentionTime02`). + + + + +``` +curl -X 'PATCH' -v \ + 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json-patch+json' \ + -d '{ + "patch": [ + { + "op": "remove", + "path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime" + } + ], + "arrayPrimaryKeys": { + "properties": [ + "propertyUrn" + ] + } + }' | jq +``` +The response will show that the expected property has been removed. + +``` +{ + "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", + "aspects": { + "structuredProperties": { + "value": { + "properties": [ + { + "values": [ + { + "string": "bar2" + } + ], + "propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02" + } + ] + } + } + } +} +``` + + + +#### Expected Outcomes +You can see that the first property has been removed and the second property is still present. + +

+ +

+ + + +### Upsert Structured Property Value + +In this example, we'll add the property back with a different value, preserving the existing property. + + + + +``` +curl -X 'PATCH' -v \ + 'http://localhost:8080/openapi/v2/entity/dataset/urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Ahive%2CSampleHiveDataset%2CPROD%29/structuredProperties' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json-patch+json' \ + -d '{ + "patch": [ + { + "op": "add", + "path": "/properties/urn:li:structuredProperty:io.acryl.privacy.retentionTime", + "value": { + "propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime", + "values": [ + { + "string": "365" + } + ] + } + } + ], + "arrayPrimaryKeys": { + "properties": [ + "propertyUrn" + ] + } + }' | jq +``` + +Below is the expected response: +``` +{ + "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", + "aspects": { + "structuredProperties": { + "value": { + "properties": [ + { + "values": [ + { + "string": "bar2" + } + ], + "propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime02" + }, + { + "values": [ + { + "string": "365" + } + ], + "propertyUrn": "urn:li:structuredProperty:io.acryl.privacy.retentionTime" + } + ] + } + } + } +} +``` + +The response shows that the property was re-added with the new value bar instead of the previous value foo. + + + + +#### Expected Outcomes +You can see that the first property has been added back with a new value and the second property is still present. + +

+ +

+ + + +## Delete Structured Properties + +There are two types of deletion present in DataHub: hard and soft delete. As of the current release only the soft delete is supported for Structured Properties. + +:::note SOFT DELETE +A soft deleted Structured Property does not remove any underlying data on the Structured Property entity or the Structured Property's values written to other entities. The soft delete is 100% reversible with zero data loss. When a Structured Property is soft deleted, a few operations are not available. + +Structured Property Soft Delete Effects: + +- Entities with a soft deleted Structured Property value will not return the soft deleted properties +- Updates to a soft deleted Structured Property's definition are denied +- Adding a soft deleted Structured Property's value to an entity is denied +- Search filters using a soft deleted Structured Property will be denied +::: + + + + + +The following command will soft delete the test property. + +```commandline +datahub delete --urn {urn} +``` + + + + +The following command will soft delete the test property by writing to the status aspect. + +``` +curl -X 'POST' \ + 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ +"removed": true +}' | jq +``` + +If you want to **remove the soft delete**, you can do so by either hard deleting the status aspect or changing the removed boolean to `false` like below. + +``` +curl -X 'POST' \ + 'http://localhost:8080/openapi/v2/entity/structuredProperty/urn%3Ali%3AstructuredProperty%3Aio.acryl.privacy.retentionTime/status?systemMetadata=false' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ +"removed": false +}' | jq +``` + + + diff --git a/docs/features/feature-guides/documentation-forms.md b/docs/features/feature-guides/documentation-forms.md new file mode 100644 index 00000000000000..8b2966810de7c0 --- /dev/null +++ b/docs/features/feature-guides/documentation-forms.md @@ -0,0 +1,113 @@ +import FeatureAvailability from '@site/src/components/FeatureAvailability'; + +# About DataHub Documentation Forms + + +DataHub Documentation Forms streamline the process of setting documentation requirements and delegating annotation responsibilities to the relevant data asset owners, stewards, and subject matter experts. + +Forms are highly configurable, making it easy to ask the right questions of the right people, for a specific set of assets. + +## What are Documentation Forms? + +You can think of Documentation Forms as a survey for your data assets: a set of questions that must be answered in order for an asset to be considered properly documented. + +Verification Forms are an extension of Documentation Forms, requiring a final verification, or sign-off, on all responses before the asset can be considered Verified. This is useful for compliance and/or governance annotation initiatives where you want assignees to provide a final acknowledgement that the information provided is correct. + +## Creating and Assigning Documentation Forms + +Documentation Forms are defined via YAML with the following details: + +- Name and Description to help end-users understand the scope and use case +- Form Type, either Documentation or Verification + - Verification Forms require a final signoff, i.e. Verification, of all required questions before the Form can be considered complete +- Form Questions (aka "prompts") for end-users to complete + - Questions can be assigned at the asset-level and/or the field-level + - Asset-level questions can be configured to be required; by default, all questions are optional +- Assigned Assets, defined by: + - A set of specific asset URNs, OR + - Assets related to a set of filters, such as Type (Datasets, Dashboards, etc.), Platform (Snowflake, Looker, etc.), Domain (Product, Marketing, etc.), or Container (Schema, Folder, etc.) +- Optional: Form Assignees + - Optionally assign specific DataHub users/groups to complete the Form for all relevant assets + - If omitted, any Owner of an Asset can complete Forms assigned to that Asset + +Here's an example of defining a Documentation Form via YAML: +```yaml +- id: 123456 + # urn: "urn:li:form:123456" # optional if id is provided + type: VERIFICATION # Supported Types: DOCUMENTATION, VERIFICATION + name: "Metadata Initiative 2024" + description: "How we want to ensure the most important data assets in our organization have all of the most important and expected pieces of metadata filled out" + prompts: # Questions for Form assignees to complete + - id: "123" + title: "Data Retention Time" + description: "Apply Retention Time structured property to form" + type: STRUCTURED_PROPERTY + structured_property_id: io.acryl.privacy.retentionTime + required: True # optional; default value is False + entities: # Either pass a list of urns or a group of filters. This example shows a list of urns + urns: + - urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD) + # optionally assign the form to a specific set of users and/or groups + # when omitted, form will be assigned to Asset owners + actors: + users: + - urn:li:corpuser:jane@email.com # note: these should be URNs + - urn:li:corpuser:john@email.com + groups: + - urn:li:corpGroup:team@email.com # note: these should be URNs + +``` + +:::note +Documentation Forms currently only support defining Structured Properties as Form Questions +::: + + + + + +## Additional Resources + +### Videos + +**Asset Verification in Acryl Cloud** + +

+ +

+ +## FAQ and Troubleshooting + +**What is the difference between Documentation and Verification Forms?** + +Both form types are a way to configure a set of optional and/or required questions for DataHub users to complete. When using Verification Forms, users will be presented with a final verification step once all required questions have been completed; you can think of this as a final acknowledgement of the accuracy of information submitted. + +**Who is able to complete Forms in DataHub?** + +By default, any owner of an Asset will be able to respond to questions assigned via a Form. + +When assigning a Form to an Asset, you can optionally assign specific DataHub users/groups to fill them out. + +**Can I assign multiple Forms to a single asset?** + +You sure can! Please keep in mind that an Asset will only be considered Documented or Verified if all required questions are completed on all assiged Forms. + +### API Tutorials + +- [Create a Documentation Form](../../../docs/api/tutorials/forms.md) + +:::note +You must create a Structured Property before including it in a Documentation Form. +To learn more about creating Structured Properties via CLI, please see the [Create Structured Properties](/docs/api/tutorials/structured-properties.md) tutorial. +::: + +### Related Features + +- [DataHub Properties](/docs/features/feature-guides/properties.md) \ No newline at end of file diff --git a/docs/features/feature-guides/properties.md b/docs/features/feature-guides/properties.md new file mode 100644 index 00000000000000..0d961b9ceac4ff --- /dev/null +++ b/docs/features/feature-guides/properties.md @@ -0,0 +1,158 @@ +import FeatureAvailability from '@site/src/components/FeatureAvailability'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# About DataHub Properties + + +DataHub Custom Properties and Structured Properties are powerful tools to collect meaningful metadata for Assets that might not perfectly fit into other Aspects within DataHub, such as Glossary Terms, Tags, etc. Both types can be found in an Asset's Properties tab: + +

+ +

+ +This guide will explain the differences and use cases of each property type. + +## What are Custom Properties and Structured Properties? +Here are the differences between the two property types at a glance: + +| Custom Properties | Structured Properties | +| --- | --- | +| Map of key-value pairs stored as strings | Validated namespaces and data types | +| Added to assets during ingestion and via API | Defined via YAML; created and added to assets via CLI | +| No support for UI-based Edits | Support for UI-based edits | + +**Custom Properties** are key-value pairs of strings that capture additional information about assets that is not readily available in standard metadata fields. Custom Properties can be added to assets automatically during ingestion or programmatically via API and *cannot* be edited via the UI. +

+ +

+

Example of Custom Properties assigned to a Dataset

+ +**Structured Properties** are an extension of Custom Properties, providing a structured and validated way to attach metadata to DataHub Assets. Available as of v0.13.1, Structured Properties have a pre-defined type (Date, Integer, URN, String, etc.). They can be configured to only accept a specific set of allowed values, making it easier to ensure high levels of data quality and consistency. Structured Properties are defined via YAML, added to assets via CLI, and can be edited via the UI. +

+ +

+

Example of Structured Properties assigned to a Dataset

+ +## Use Cases for Custom Properties and Structured Properties +**Custom Properties** are useful for capturing raw metadata from source systems during ingestion or programmatically via API. Some examples include: + +- GitHub file location of code which generated a dataset +- Data encoding type +- Account ID, cluster size, and region where a dataset is stored + +**Structured Properties** are useful for setting and enforcing standards of metadata collection, particularly in support of compliance and governance initiatives. Values can be added programmatically via API, then manually via the DataHub UI as necessary. Some examples include: + +- Deprecation Date + - Type: Date, Single Select + - Validation: Must be formatted as 'YYYY-MM-DD' +- Data Retention Period + - Type: String, Single Select + - Validation: Adheres to allowed values "30 Days", "90 Days", "365 Days", or "Indefinite" +- Consulted Compliance Officer, chosen from a list of DataHub users + - Type: DataHub User, Multi-Select + - Validation: Must be valid DataHub User URN + +By using Structured Properties, compliance and governance officers can ensure consistency in data collection across assets. + +## Creating, Assigning, and Editing Structured Properties + +Structured Properties are defined via YAML, then created and assigned to DataHub Assets via the DataHub CLI. + +Here's how we would define the above examples in YAML: + + + + +```yaml +- id: deprecation_date + qualified_name: deprecation_date + type: date # Supported types: date, string, number, urn, rich_text + cardinality: SINGLE # Supported options: SINGLE, MULTIPLE + display_name: Deprecation Date + description: "Scheduled date when resource will be deprecated in the source system" + entity_types: # Define which types of DataHub Assets the Property can be assigned to + - dataset +``` + + + + +```yaml +- id: retention_period + qualified_name: retention_period + type: string # Supported types: date, string, number, urn, rich_text + cardinality: SINGLE # Supported options: SINGLE, MULTIPLE + display_name: Data Retention Period + description: "Predetermined storage duration before being deleted or archived + based on legal, regulatory, or organizational requirements" + entity_types: # Define which types of DataHub Assets the Property can be assigned to + - dataset + allowed_values: + - value: "30 Days" + description: "Use this for datasets that are ephemeral and contain PII" + - value: "90 Days" + description: "Use this for datasets that drive monthly reporting but contain PII" + - value: "365 Days" + description: "Use this for non-sensitive data that can be retained for longer" + - value: "Indefinite" + description: "Use this for non-sensitive data that can be retained indefinitely" +``` + + + + +```yaml +- id: compliance_officer + qualified_name: compliance_officer + type: urn # Supported types: date, string, number, urn, rich_text + cardinality: MULTIPLE # Supported options: SINGLE, MULTIPLE + display_name: Consulted Compliance Officer(s) + description: "Member(s) of the Compliance Team consulted/informed during audit" + type_qualifier: # Define the type of Asset URNs to allow + - corpuser + - corpGroup + entity_types: # Define which types of DataHub Assets the Property can be assigned to + - dataset +``` + + + + +:::note +To learn more about creating and assigning Structured Properties via CLI, please see the [Create Structured Properties](/docs/api/tutorials/structured-properties.md) tutorial. +::: + +Once a Structured Property is assigned to an Asset, Users with the `Edit Properties` Metadata Privilege will be able to change Structured Property values via the DataHub UI. +

+ +

+

Example of editing the value of a Structured Property via the UI

+ +### Videos + +**Deep Dive: UI-Editable Properties** + +

+ +

+ + +### API + +Please see the following API guides related to Custom and Structured Properties: + +- [Custom Properties API Guide](/docs/api/tutorials/structured-properties.md) +- [Structured Properties API Guide](/docs/api/tutorials/structured-properties.md) + + +## FAQ and Troubleshooting + +**Why can't I edit the value of a Structured Property from the DataHub UI?** +1. Your version of DataHub does not support UI-based edits of Structured Properties. Confirm you are running DataHub v0.13.1 or later. +2. You are attempting to edit a Custom Property, not a Structured Property. Confirm you are trying to edit a Structured Property, which will have an "Edit" button visible. Please note that Custom Properties are not eligible for UI-based edits to minimize overwrites during recurring ingestion. +3. You do not have the necessary privileges. Confirm with your Admin that you have the `Edit Properties` Metadata Privilege. + +### Related Features + +- [Documentation Forms](/docs/features/feature-guides/documentation-forms.md) \ No newline at end of file