From 1f7c92bb994fc8682f82aa65dd712ab62a6c062f Mon Sep 17 00:00:00 2001 From: Pedro Silva Date: Fri, 26 Jul 2024 19:20:21 +0100 Subject: [PATCH] feat(docs): Document __DATAHUB_TO_FILE_ directive (#10968) Co-authored-by: Harshal Sheth --- metadata-ingestion/recipe_overview.md | 29 +++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/metadata-ingestion/recipe_overview.md b/metadata-ingestion/recipe_overview.md index a748edbf3bb44..27d5cb8c85f23 100644 --- a/metadata-ingestion/recipe_overview.md +++ b/metadata-ingestion/recipe_overview.md @@ -90,6 +90,35 @@ similar to variable substitution in GNU bash or in docker-compose files. For details, see [variable-substitution](https://docs.docker.com/compose/compose-file/compose-file-v2/#variable-substitution). This environment variable substitution should be used to mask sensitive information in recipe files. As long as you can get env variables securely to the ingestion process there would not be any need to store sensitive information in recipes. +### Loading Sensitive Data as Files in Recipes + + +Some sources (e.g. kafka, bigquery, mysql) require paths to files on a local file system. This doesn't work for UI ingestion, where the recipe needs to be totally self-sufficient. To add files to ingestion processes as part of the necessary configuration, DataHub offers a directive `__DATAHUB_TO_FILE_` which allows recipes to set the contents of files. + +The syntax for this directive is: `__DATAHUB_TO_FILE_: ` which will get turned into `: `. Note that value can be specified inline or using an env var/secret. + +I.e: + +```yaml +source: + type: mysql + config: + # Coordinates + host_port: localhost:3306 + database: dbname + + # Credentials + username: root + password: example + # If you need to use SSL with MySQL: + options: + connect_args: + __DATAHUB_TO_FILE_ssl_key: '${secret}' # use this for secrets that you need to mount to a file + # this will get converted into + # ssl_key: /tmp/path/to/file # where file contains the contents of ${secret} + ... +``` + ### Transformations If you'd like to modify data before it reaches the ingestion sinks – for instance, adding additional owners or tags – you can use a transformer to write your own module and integrate it with DataHub. Transformers require extending the recipe with a new section to describe the transformers that you want to run.