[FEATURE] Add historic rules #38

holdenk · 2023-09-19T06:12:47Z

Is your feature request related to a problem? Please describe.

It would be nice to be able to express the desire for the new data to "look like" the old data (in terms of distribution).

Describe the solution you'd like

Since spark expectations collects summary stats already adding validation rules to allow there to be tolerances on the difference in today's summary v.s. the previous summary could be a good start.

Describe alternatives you've considered

I suppose we could write a query rule where folks just manually write the SQL query.

Additional context

TFDV goes above and beyond with it's historic views -- https://www.tensorflow.org/tfx/data_validation/get_started

asingamaneni · 2023-09-19T15:25:02Z

@holdenk This is a good idea. We could add another rule_type stats_dq which would complement our existing rule types row_dq, agg_dq and query_dq.

By default, we can offer a view derived from the stats_df we generate. Currently, users specify a stats_table. We can propose a new stats_table_view constructed from the job's stats_df. Additionally, we can read from the current stats_table to establish a stats_table_existing_view.

Leveraging these two views, users can craft queries tailored to their validation needs. For added convenience, we'll include standard query examples in our documentation.

holdenk · 2023-09-19T15:30:30Z

Awesome 👏

phanikumarvemuri · 2023-09-19T21:05:48Z

@holdenk @asingamaneni Great feature.
I think at some point we need to provide an interface for users to implement custom rule types that can be integrated with spark-expectations .

holdenk added the enhancement New feature or request label Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add historic rules #38

[FEATURE] Add historic rules #38

holdenk commented Sep 19, 2023

asingamaneni commented Sep 19, 2023

holdenk commented Sep 19, 2023

phanikumarvemuri commented Sep 19, 2023

[FEATURE] Add historic rules #38

[FEATURE] Add historic rules #38

Comments

holdenk commented Sep 19, 2023

asingamaneni commented Sep 19, 2023

holdenk commented Sep 19, 2023

phanikumarvemuri commented Sep 19, 2023