Releases: hitsz-ids/synthetic-data-generator
Releases · hitsz-ids/synthetic-data-generator
0.2.4
What's Changed
- Docs - Update README News and fix the link of python package badge. by @jalr4ever in #243
- Bugfix - Datatime formatter in small dataset and improve performace by @cyantangerine in #244
- Bugfix - Fixed numeric inspector error for int32/float32 types, by @cyantangerine in #247
- Feature - Support more encoders,
NormalizedFrequencyEncoder
&NormalizedLabelEncoder
, by @cyantangerine in #247 - Feature - Integrate
GaussianCopula
model into theSynthesizer
. by @jalr4ever in #241 - Feature - Support
DataFrameConnector
for in-memory datasets processing, by @cyantangerine in #247 - Feature - Support
NormalizedFrequencyEncoder
andNormalizedLabelEncoder
for categorical encoding, by @cyantangerine in #247 - Enhancement - Support CTGAN sample with
drop_more
parameter for better generation efficiency, by @cyantangerine in #247 - Enhancement - Improved Disk_cache performance by avoiding pd iterative connections, by @cyantangerine in #247
Full Changelog: 0.2.3...0.2.4
0.2.3
What's Changed
- Enhance - Handling Fixed Column Relationships using FixedCombinationInspector and FixedCombinationTransformer by @MooooCat @jalr4ever in #219
- BugFix - Fix the type error in the
query
function of Metadata. by @jalr4ever in #235 - Enhance - Handling fixed column relationships by
specific_combinations
andSpecificCombinationTransformer
. by @jalr4ever in #236 - chore: Drop python 3.8 support and improve ci file name by @Wh1isper in #237
Full Changelog: 0.2.2...0.2.3
0.2.2
What's Changed
- Feature: Add progressbar for CTGAN when fitting and sampling by @cyantangerine in #228
- Enhance: Check the type of foreign key by @Z712023 in #229
- BugFix: Parallel Data Processing by @cyantangerine in #227
- Enhanee: Improved CONTRIBUTING Docs with 4+1 view and Overview Diagram by @jalr4ever in #226
- BugFix: Regulate positive-negative values in the generated data by @jalr4ever in #232
- Enhance: Tenfold performance boost for reduce the memory usage of Gaussian Copula training. by @jalr4ever in #233
New Contributors
- @cyantangerine made their first contribution in #228
Full Changelog: 0.2.1...0.2.2
0.2.1
What's Changed
- Add CHN address inspector by @MooooCat in #158
- Update inspector part in Doc(API Reference) by @MooooCat in #159
- Add dotenv in single-table gpt model by @MooooCat in #161
- Speed up regex inspector, Add chn/eng name inspectors by @MooooCat in #162
- Add single table metadata example by @MooooCat in #166
- bugfix: SingleTableGPTModel._sample_with_data "has no attribute 'result'" by @aaronrmm in #174
- Change Metadata.column_list from Set to List by @MooooCat in #176
- Remove unnecessary dependency torchvision by @Guo-Yunzhe in #177
- Update pyproject.toml (joblib version) by @MooooCat in #175
- Bugfix: fix gussian copula segmentfault error by @MooooCat in #180
- Bugfix: fix division by zero error in numeric inspector, add comments by @MooooCat in #181
- Intro data processor in sdgx by @MooooCat in #171
- Intro data processor in Readme by @MooooCat in #182
- Fix View GFI Link in Readme by @MooooCat in #183
- Fix precision problem in metric's testcases by @MooooCat in #185
- Use GLM-4 by @TracyWang95 in #188
- Pin numpy<2 by @Wh1isper in #190
- Feature: Add Email Generator (a new type of sdgx.data_processor) by @MooooCat in #184
- Add ChnPiiGenerator and Enhance Models by @MooooCat in #191
- Update documentation and docstrings for DataProcessors by @MooooCat in #186
- Add live QR code by @MooooCat in #198
- Enhance Data Handling with Empty Column Inspector and Transformer by @MooooCat in #197
- Update NonValueTransformer's Default Setting and Handle Custom Fill Values by @MooooCat in #199
- Enhance Chinese Name Inspector by @MooooCat in #200
- Add Chinese Company Name Support and Inspector by @MooooCat in #201
- Update Live QR Code Image by @MooooCat in #203
- BugFix:
base_url
not included when request to gpt in SingleTableGPTModel by @jalr4ever in #205 - Enhance: Fix Data Quality with Outlier Handling and Improved Missing Value Treatment by @MooooCat in #207
- Typo Fix: Unified Logger Usage by @MooooCat in #209
- Update Live QR Code Image 0730 by @MooooCat in #210
- Bugfix: Update Fit Methods in Data Processors by @MooooCat in #211
- Add ConstInspector and ConstValueTransformer for Handling Constant Columns by @MooooCat in #202
- Enhance: Add NonValueTransformer Reverse Conversion with NAN_VALUE Replacement by @MooooCat in #212
- Maintenance: Update CTGAN Example to Use Latest SDG by @MooooCat in #213
- Fix Minor Typo by @MooooCat in #216
- Enhance Numeric Data Inspection and Introduce Positive/Negative Filtering by @MooooCat in #217
- Fix Division by Zero Error in Numeric Column Inspection by @MooooCat in #220
New Contributors
- @aaronrmm made their first contribution in #174
- @Guo-Yunzhe made their first contribution in #177
- @TracyWang95 made their first contribution in #188
- @jalr4ever made their first contribution in #205
Full Changelog: 0.2.0...0.2.1
0.2.0
What's Changed
LLM-Based SingleTable Model
A single-table data synthesis model based on LLM is included, view colab example:
Commits:
- Introduce LLM-based single-table model. by @MooooCat in #129
- Bugfix: fix model type typo by @MooooCat in #144
- Bugfix: fix return datatype in _sample_with_metadata by @MooooCat in #145
- Bugfix: fix LLM result typo by @MooooCat in #146
Improvements on Inspectors
- Add Regex Inspector and Email Inspector example. by @MooooCat in #115
- Implement datetime_formats in DatetimeInspector by @Femi-lawal in #125
- Distinguish int/float in NumericInspector by @MooooCat in #133
Metadata
- Bugfix: fix KeyError when metadata raising an MetadataInvalidError. by @MooooCat in #134
- Add dict support on metadata, optimize datetime format judgment rules, add eq for combiner by @MooooCat in #135
Python 3.12 Support
Readme and Docs
- Update README.md by @iokk3732 in #123
- docs: add iokk3732 as a contributor for code by @allcontributors in #127
- docs: add Femi-lawal as a contributor for code by @allcontributors in #128
- Add language switch on Readme.md by @MooooCat in #130
- Minor modifications on readme by @MooooCat in #131
- Update SDG Readme by @MooooCat in #139
- Update doc readme by @MooooCat in #140
- Add Colab Examples, Update Readme by @MooooCat in #147
- Update readme.md by @MooooCat in #150
- Add ctgan description on Readme.md by @MooooCat in #151
Others
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #124
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #138
New Contributors
- @iokk3732 made their first contribution in #123
- @Femi-lawal made their first contribution in #125
Full Changelog
Please view: 0.1.5...0.2.0
0.1.5
What's Changed
- docs: add Z712023 as a contributor for code by @allcontributors in #112
- Bugfix metric mutual information by @Z712023 in #118
- [Bugfix] Temporarily modify single table demo data link by @MooooCat in #121
- Introduce inspect_level in inspector and metadata by @MooooCat in #113
- Add start history chart in README by @Wh1isper in #122
New Contributors
Full Changelog: 0.1.4...0.1.5
0.1.4
What's Changed
- [Bugfix] Add future annotations by @MooooCat in #106
- Add testing for JSD metrics by @sjh120 in #100
- Add base model for multi-table statistic model, change single-table base class location by @MooooCat in #102
- Add mutual information metric by @Z712023 in #101
Full Changelog: 0.1.3...0.1.4
0.1.3
What's Changed
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #87
- [0.2.0] Metadata Implementation by @MooooCat in #81
- Patch on multi table combiner and test case by @Wh1isper in #89
- Fix typo _dumo_json by @Wh1isper in #90
- Intro dummy table for speedup models case by @Wh1isper in #92
- Intro torchrun in CLI by @Wh1isper in #88
- Implement MetadataCombiner, partitial refactoring on Metadata by @Wh1isper in #96
- Add mock data and testing for multi tables' related imp by @Wh1isper in #97
- Intro SubsetRelationshipInspector by @Wh1isper in #99
- Add demo data for multi-table scenario by @MooooCat in #98
Full Changelog: 0.1.2...0.1.3