forked from datahub-project/datahub
-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(ingestion/bigquery): BigQuery Owner Label to Datahub Ownership (d…
- Loading branch information
1 parent
32a2de4
commit 9f2c5d3
Showing
5 changed files
with
236 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,24 +20,70 @@ The below table shows transformer which can transform aspects of entity [Dataset | |
### Config Details | ||
| Field | Required | Type | Default | Description | | ||
|-----------------------------|----------|---------|---------------|---------------------------------------------| | ||
| `semantics` | | enum | `OVERWRITE` | Whether to OVERWRITE or PATCH the entity present on DataHub GMS. | | ||
| `tag_prefix` | | str | | Regex to use for tags to match against. Supports Regex to match a prefix which is used to remove content. Rest of string is considered owner ID for creating owner URN. | | ||
| `is_user` | | bool | `true` | Whether should be consider a user or not. If `false` then considered a group. | | ||
| `tag_pattern` | | str | | Regex to use for tags to match against. Supports Regex to match a pattern which is used to remove content. Rest of string is considered owner ID for creating owner URN. | | ||
| `is_user` | | bool | `true` | Whether should be consider a user or not. If `false` then considered a group. | | ||
| `owner_character_mapping` | | dict[str, str] | | A mapping of extracted owner character to datahub owner character. | | ||
| `email_domain` | | str | | If set then this is appended to create owner URN. | | ||
| `extract_owner_type_from_tag_pattern` | | str | `false` | Whether to extract an owner type from provided tag pattern first group. If `true`, no need to provide owner_type and owner_type_urn config. For example: if provided tag pattern is `(.*)_owner_email:` and actual tag is `developer_owner_email`, then extracted owner type will be `developer`.| | ||
| `owner_type` | | str | `TECHNICAL_OWNER` | Ownership type. | | ||
| `owner_type_urn` | | str | `None` | Set to a custom ownership type's URN if using custom ownership. | | ||
|
||
Matches against a tag prefix and considers string in tags after that prefix as owner to create ownership. | ||
Let’s suppose we’d like to add a dataset ownerships based on part of dataset tags. To do so, we can use the `extract_ownership_from_tags` transformer that’s included in the ingestion framework. | ||
|
||
The config, which we’d append to our ingestion recipe YAML, would look like this: | ||
|
||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
config: | ||
tag_prefix: "dbt:techno-genie:" | ||
is_user: true | ||
email_domain: "coolcompany.com" | ||
tag_pattern: "owner_email:" | ||
``` | ||
So if we have input dataset tag like | ||
- `urn:li:tag:dataset_owner_email:[email protected]` | ||
- `urn:li:tag:dataset_owner_email:[email protected]` | ||
|
||
The portion of the tag after the matched tag pattern will be converted into an owner. Hence users `[email protected]` and `[email protected]` will be added as owners. | ||
|
||
### Examples | ||
|
||
- Add owners, however owner should be considered as group and also email domain not provided in tag string. For example: from tag urn `urn:li:tag:dataset_owner:abc` extracted owner urn should be `urn:li:corpGroup:[email protected]` then config would look like this: | ||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
config: | ||
tag_pattern: "owner:" | ||
is_user: false | ||
email_domain: "email.com" | ||
``` | ||
- Add owners, however owner type and owner type urn wanted to provide externally. For example: from tag urn `urn:li:tag:dataset_owner_email:[email protected]` owner type should be `CUSTOM` and owner type urn as `"urn:li:ownershipType:data_product"` then config would look like this: | ||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
config: | ||
tag_pattern: "owner_email:" | ||
owner_type: "CUSTOM" | ||
owner_type_urn: "urn:li:ownershipType:data_product" | ||
``` | ||
- Add owners, however some owner characters needs to replace with some other characters before ingestion. For example: from tag urn `urn:li:tag:dataset_owner_email:abc_xyz-email_com` extracted owner urn should be `urn:li:corpGroup:[email protected]` then config would look like this: | ||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
config: | ||
tag_pattern: "owner_email:" | ||
owner_character_mapping: | ||
"_": ".", | ||
"-": "@", | ||
``` | ||
- Add owners, however owner type also need to extracted from tag pattern. For example: from tag urn `urn:li:tag:data_producer_owner_email:[email protected]` extracted owner type should be `data_producer` then config would look like this: | ||
```yaml | ||
transformers: | ||
- type: "extract_ownership_from_tags" | ||
config: | ||
tag_pattern: "(.*)_owner_email:" | ||
extract_owner_type_from_tag_pattern: true | ||
``` | ||
|
||
## Clean suffix prefix from Ownership | ||
### Config Details | ||
| Field | Required | Type | Default | Description | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -643,6 +643,7 @@ def _test_owner( | |
config: Dict, | ||
expected_owner: str, | ||
expected_owner_type: Optional[str] = None, | ||
expected_owner_type_urn: Optional[str] = None, | ||
) -> None: | ||
dataset = make_generic_dataset( | ||
aspects=[ | ||
|
@@ -682,6 +683,8 @@ def _test_owner( | |
|
||
assert owner.owner == expected_owner | ||
|
||
assert owner.typeUrn == expected_owner_type_urn | ||
|
||
_test_owner( | ||
tag="owner:foo", | ||
config={ | ||
|
@@ -736,6 +739,25 @@ def _test_owner( | |
}, | ||
expected_owner="urn:li:corpuser:[email protected]", | ||
expected_owner_type=OwnershipTypeClass.CUSTOM, | ||
expected_owner_type_urn="urn:li:ownershipType:ad8557d6-dcb9-4d2a-83fc-b7d0d54f3e0f", | ||
) | ||
_test_owner( | ||
tag="data_producer_owner_email:abc_xyz-email_com", | ||
config={ | ||
"tag_pattern": "(.*)_owner_email:", | ||
"owner_character_mapping": { | ||
"_": ".", | ||
"-": "@", | ||
"__": "_", | ||
"--": "-", | ||
"_-": "#", | ||
"-_": " ", | ||
}, | ||
"extract_owner_type_from_tag_pattern": True, | ||
}, | ||
expected_owner="urn:li:corpuser:[email protected]", | ||
expected_owner_type=OwnershipTypeClass.CUSTOM, | ||
expected_owner_type_urn="urn:li:ownershipType:data_producer", | ||
) | ||
|
||
|
||
|