-
Navigate to the Console in incognitive mode. Ensure that you are logged in as [email protected]
-
Open Cloud Shell while logged in as admin@.
You would need to create Google Cloud's Application Default Credentials using
gcloud auth application-default login
-
Clone this repository in Cloud Shell
git clone https://github.com/yadavj2008/datamesh-on-gcp cd datamesh-on-gcp
-
Bootstraping environment variables.
Open the bootstrap-env.sh file and change the USERNAME based on your argolis env
Once edited, execute the bootstrap-env.sh script to populate the env variables.
source bootstrap-env.sh
-
Make sure your admin@<ldap>.altostrat.com account has the "Organization Administrator" and "Organization Policy Administrator" roles assigned at the Organization Level.
-
Deploy the demo using deploy-helper.sh script( NOTE: all required env variables already been set by bootstrap-env.sh script in previous step)
cd ~/datamesh-on-gcp/oneclick/ echo $PROJECT_DATASTO $PROJECT_DATAGOV $USERNAME $RAND_ID bash deploy-helper.sh ${PROJECT_DATASTO} ${PROJECT_DATAGOV} ${USERNAME} ${RAND_ID}
-
Validate the Dataplex are created with the right number of assets. Go to Dataplex… Then Manage… You should see 5 Lakes as Shown Below
-
Go to Composer… Then Environments… Click on -composer link..then click on 'Environment Variables'
Managing Data Security is the main goal of this lab. You will learn how to design and manage security policies using Dataplex's UI and REST API as part of the lab. The purpose of the lab is to learn how to handle distributed data security more effectively across data domains.
Make sure you run the security lab before moving on to other labs
Dataplex Security Lab Instructions
You will discover how to leverage common Dataplex templates to curate raw data and translate it into standardized formats like parquet and Avro in the Data Curation lane. This demonstrates how domain teams may quickly process data in a serverless manner and begin consuming it for testing purposes.
Data Curation Lab Instructions
You will learn how to define and perform Data Quality jobs on raw data in the Data Quality lab, evaluate and understand the DQ findings, and construct a dashboard to assess and monitor DQ.
You will use DLP Data Profiler in this lab so that it can automatically classify the BQ data, which will then be used by a Dataplex to provide business tags/annotations.
Data Classification Lab Instructions
In this lab, you will learn how to use BigQuery through Composer to populate the data products using conventional SQL after using Configuration-driven Dataproc Templates to migrate the data (supports incremental load) from GCS to BQ.
Building Data Products Lab Instructions
You will learn how to create bulk tags on the Dataplex Data Product entity across domains using Composer in this lab after the Data Products have been created as part of the above lab. You will learn how to find data using the logical structure and business annotations of Dataplex in this lab. Lineage is not enabled as part of the Lab at the moment, but hopefully we can in the future. You will use a custom metadata tag library to create 4 predefined tag templates - Data Classification, Data Quality, Data Exchange and Data product info(Onwership)
Business Metadata tagging and discovery in Dataplex Lab Instructions
- Create HMS and attach it to the lake. Follow the instructions here
- Create multiple personas/roles in CLoud Indentity and play around with the security policies
- Become more creative and share ideas
- Don't forget post-survey and feedback
Please make sure you clean up your environment
#Remove lien if any
gcloud alpha resource-manager liens list --project ${PROJECT_DATAGOV}
gcloud alpha resource-manager liens delete <your lien-id from previous step> --project ${PROJECT_DATAGOV}
gcloud projects delete ${PROJECT_DATAGOV}
gcloud projects delete ${PROJECT_DATAGOV}