** THIS REPO HAS BEEN DEPRECATED. PLEASE USE THE GOOGLE CLOUD OFFICIAL VERSION - Official link **
-
Navigate to the Console in incognitive mode. Ensure that you are logged in as [email protected]
-
Open Cloud Shell while logged in as admin@.
-
Clone this repository in Cloud Shell
git clone https://github.com/mansim07/datamesh-on-gcp
-
Set up the environment variables.
Make sure you run the RAND once and capture the value
echo $(((RND=RANDOM<<15|RANDOM)))
Replace the necessary values before you execute the below 2 commands
echo "export RAND_ID=replace-value-from-above" >> ~/.profile
echo "export [email protected]" >> ~/.profile
Copy and execute the below commands. No changes are needed.
source ~/.profile echo "export PROJECT_DATAGOV=mbdatagov-${RAND_ID}" >> ~/.profile echo "export PROJECT_DATASTO=mbdatastore-${RAND_ID}" >> ~/.profile echo "export ORG_ID=$(gcloud organizations list --filter="displayName~${USERNAME}" --format='value(name)')" >> ~/.profile echo "export BILLING_ID=$(gcloud beta billing accounts list --filter="displayName~${USERNAME}" --format='value(name)')" >> ~/.profile
-
Validate the environment variables
cat ~/.profile
-
Create two new projects with the assigned billing account using the below commands:
-
Create the projects
source ~/.profile gcloud projects create ${PROJECT_DATAGOV} \ --organization=${ORG_ID} gcloud projects create ${PROJECT_DATASTO} \ --organization=${ORG_ID}
-
Associate the project with the billing ID.
gcloud beta billing projects link ${PROJECT_DATAGOV} \ --billing-account=${BILLING_ID} gcloud beta billing projects link ${PROJECT_DATASTO} \ --billing-account=${BILLING_ID}
-
Install necessary python libraries
pip3 install google-cloud-storage pip3 install numpy pip3 install faker_credit_score
-
Make sure your admin@<ldap>.altostrat.com account has the "Organization Administrator" and "Organization Policy Administrator" roles assigned at the Organization Level.
-
Use Terraform to setup the rest of the environment
Optional - Use Terraform Setup Instructions
cd ~/datamesh-on-gcp/oneclick/ source ~/.profile bash deploy-helper.sh ${PROJECT_DATASTO} ${PROJECT_DATAGOV} ${USERNAME} ${RAND_ID}
-
Validate the Dataplex are created with the right number of assets. Go to Dataplex… Then Manage… You should see 5 Lakes as Shown Below
-
Go to Composer… Then Environments… Click on -composer link..then click on 'Environment Variables'
Managing Data Security is the main goal of this lab. You will learn how to design and manage security policies using Dataplex's UI and REST API as part of the lab. The purpose of the lab is to learn how to handle distributed data security more effectively across data domains.
Make sure you run the security lab before moving on to other labs
Dataplex Security Lab Instructions
You will discover how to leverage common Dataplex templates to curate raw data and translate it into standardized formats like parquet and Avro in the Data Curation lane. This demonstrates how domain teams may quickly process data in a serverless manner and begin consuming it for testing purposes.
Data Curation Lab Instructions
You will learn how to define and perform Data Quality jobs on raw data in the Data Quality lab, evaluate and understand the DQ findings, and construct a dashboard to assess and monitor DQ.
You will use DLP Data Profiler in this lab so that it can automatically classify the BQ data, which will then be used by a Dataplex to provide business tags/annotations.
Data Classification Lab Instructions
In this lab, you will learn how to use BigQuery through Composer to populate the data products using conventional SQL after using Configuration-driven Dataproc Templates to migrate the data (supports incremental load) from GCS to BQ.
Building Data Products Lab Instructions
You will learn how to create bulk tags on the Dataplex Data Product entity across domains using Composer in this lab after the Data Products have been created as part of the above lab. You will learn how to find data using the logical structure and business annotations of Dataplex in this lab. Lineage is not enabled as part of the Lab at the moment, but hopefully we can in the future. You will use a custom metadata tag library to create 4 predefined tag templates - Data Classification, Data Quality, Data Exchange and Data product info(Onwership)
Business Metadata tagging and discovery in Dataplex Lab Instructions
- Create HMS and attach it to the lake. Follow the instructions here
- Create multiple personas/roles in CLoud Indentity and play around with the security policies
- Become more creative and share ideas
- Don't forget post-survey and feedback
Please make sure you clean up your environment
#Remove lien if any
gcloud alpha resource-manager liens list --project ${PROJECT_DATAGOV}
gcloud alpha resource-manager liens delete <your lien-id from previous step> --project ${PROJECT_DATAGOV}
gcloud projects delete ${PROJECT_DATAGOV}
gcloud projects delete ${PROJECT_DATAGOV}