Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing notebooks and code for synthetic data generationUpdate tutorials new data #233

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

komashk
Copy link
Contributor

@komashk komashk commented Aug 23, 2024

Description of changes:

Two AWS blogs describing PyDeequ capabilities have been updated to use synthetic data instead of the original Amazon Reviews dataset:

Testing data quality at scale with PyDeequ (updates are public), also tutorial.
Monitor data quality in your data lake using PyDeequ and AWS Glue (blog reviews have been completed, awaiting publication).
Created a new folder under ./tutorials/synthetic_data to host Jupyter notebooks that describe data generation for the blogs mentioned above (datasets for product in Electronics and Jewelry categories) and for 18 other product categories.

The synthetic data is hosted publicly in s3://aws-bigdata-blog/generated_synthetic_reviews/data/.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@komashk komashk force-pushed the update-tutorials-new-data branch from 81af344 to 728b9d1 Compare September 5, 2024 23:44
@komashk komashk requested a review from chenliu0831 September 5, 2024 23:46
@chenliu0831 chenliu0831 merged commit 48ed442 into awslabs:master Sep 6, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants