Dataplex data profiling lets you identify common statistical characteristics of the columns of your BigQuery tables. This information helps data consumers understand their data better, which makes it possible to analyze data more effectively. Dataplex also uses this information to recommend rules for data quality.
- Auto Data Profiling
- User Configured Data Profiling
User Configured Data Profiling
- This feature is currently supported only for BigQuery tables.
- Data profiling compute used is Google managed, so you don't need to plan for/or handle any infrastructure complexity.
# | Step |
---|---|
1 | A User Managed Service Account is needed with roles/dataplex.dataScanAdmin to run the profiling job |
2 | A scan profile needs to be created against a table |
3 | In the scan profile creation step, you can select a full scan or incremental |
4 | In the scan profile creation step, you can configure profiing to run on schedue or on demand |
5 | Profiling results are visually displayed |
6 | Configure RBAC for running scan versus viewing results |
role/dataplex.dataScanAdmin: Full access to DataScan resources.
role/dataplex.dataScanEditor: Write access to DataScan resources.
role/dataplex.dataScanViewer: Read access to DataScan resources, excluding the results.
role/dataplex.dataScanDataViewer: Read access to DataScan resources, including the results.
At the time of authoring of this lab, Console and REST API only
The User Managed Service Account customer-sa@ needs privileges to create and run profiling scans. From the Cloud Shell scoped to your project, run the below:
PROJECT_ID=`gcloud config list --format "value(core.project)" 2>/dev/null`
CUSTOMER_UMSA_FQN="customer-sa@${PROJECT_ID}.iam.gserviceaccount.com"
gcloud projects add-iam-policy-binding $PROJECT_ID --member=serviceAccount:$CUSTOMER_UMSA_FQN \
--role="roles/dataplex.dataScanAdmin"
Note how you cannot switch to incremental mode.
Create a partitioned BQ table, run the profiling in an incremental mode, add some data and run profiling again and observe the results.
This concludes the lab module. Proceed to the main menu.