Providing notebooks and code for synthetic data generationUpdate tutorials new data #233

komashk · 2024-08-23T22:31:49Z

Description of changes:

Two AWS blogs describing PyDeequ capabilities have been updated to use synthetic data instead of the original Amazon Reviews dataset:

Testing data quality at scale with PyDeequ (updates are public), also tutorial.
Monitor data quality in your data lake using PyDeequ and AWS Glue (blog reviews have been completed, awaiting publication).
Created a new folder under ./tutorials/synthetic_data to host Jupyter notebooks that describe data generation for the blogs mentioned above (datasets for product in Electronics and Jewelry categories) and for 18 other product categories.

The synthetic data is hosted publicly in s3://aws-bigdata-blog/generated_synthetic_reviews/data/.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…addressed PR comments awslabs#208 and awslabs#230

review-notebook-app · 2024-08-23T22:31:54Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tutorials/synthetic_data/01-synthetic-data-electronics.ipynb

…hetic data used for 2 AWS blogs on PyDeequ

updated the notebooks to use a new synthetic data for demonstration; …

9672b87

…addressed PR comments awslabs#208 and awslabs#230

chenliu0831 reviewed Aug 29, 2024

View reviewed changes

tutorials/synthetic_data/01-synthetic-data-electronics.ipynb Show resolved Hide resolved

tutorials/synthetic_data/01-synthetic-data-electronics.ipynb Show resolved Hide resolved

added notebooks and python module that outline generation of the synt…

728b9d1

…hetic data used for 2 AWS blogs on PyDeequ

komashk force-pushed the update-tutorials-new-data branch from 81af344 to 728b9d1 Compare September 5, 2024 23:44

komashk requested a review from chenliu0831 September 5, 2024 23:46

chenliu0831 approved these changes Sep 6, 2024

View reviewed changes

chenliu0831 merged commit 48ed442 into awslabs:master Sep 6, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Providing notebooks and code for synthetic data generationUpdate tutorials new data #233

Providing notebooks and code for synthetic data generationUpdate tutorials new data #233

komashk commented Aug 23, 2024

review-notebook-app bot commented Aug 23, 2024

Providing notebooks and code for synthetic data generationUpdate tutorials new data #233

Providing notebooks and code for synthetic data generationUpdate tutorials new data #233

Conversation

komashk commented Aug 23, 2024

review-notebook-app bot commented Aug 23, 2024