Open Source Data Quality Monitoring.
Datachecks is an open-source data monitoring tool that helps to monitor the data quality of databases and data pipelines. It identifies potential issues, including in the databases and data pipelines. It helps to identify the root cause of the data quality issues and helps to improve the data quality.
Datachecks can generate several reliability, uniqueness, completeness metrics from several data sources
APM (Application Performance Monitoring) tools are used to monitor the performance of applications. APM tools are mandatory part of dev stack. Without AMP tools, it is very difficult to monitor the performance of applications.
But for Data products regular APM tools are not enough. We need a new kind of tools that can monitor the performance of Data applications. Data monitoring tools are used to monitor the data quality of databases and data pipelines. It identifies potential issues, including in the databases and data pipelines. It helps to identify the root cause of the data quality issues and helps to improve the data quality.
Install datachecks
with the command that is specific to the database.
To install all datachecks dependencies, use the below command.
pip install datachecks -U
Please visit the Quick Start Guide
Datachecks supports sql and search data sources. Below are the list of supported data sources.
Data Source | Type | Supported |
---|---|---|
Postgres | Transactional Database | 👍 |
MySql | Transactional Database | 👍 |
MS SQL Server | Transactional Database | 🔜 |
OpenSearch | Search Engine | 👍 |
Elasticsearch | Search Engine | 👍 |
GCP BigQuery | Data Warehouse | 👍 |
DataBricks | Data Warehouse | 👍 |
Snowflake | Data Warehouse | 🔜 |
AWS RedShift | Data Warehouse | 🔜 |
Metric | Description |
---|---|
Reliability Metrics | Reliability metrics detect whether tables/indices/collections are updating with timely data |
Numeric Distribution Metrics | Numeric Distribution metrics detect changes in the numeric distributions i.e. of values, variance, skew and more |
Uniqueness Metrics | Uniqueness metrics detect when data constraints are breached like duplicates, number of distinct values etc |
Completeness Metrics | Completeness metrics detect when there are missing values in datasets i.e. Null, empty value |
Validity Metrics | Validity metrics detect whether data is formatted correctly and represents a valid value |
For additional information and help, you can use one of these channels:
- Discord (Live chat with the team, support, discussions, etc.)
- GitHub issues (Bug reports, feature requests)
🙌 We greatly appreciate contributions - be it’s a bug fix, new feature, or documentations!
Check out the contributions guide and open issues.
Datachecks contributors: 💙
Usage Analytics & Data Privacy
This project is licensed under the terms of the APACHE 2 License.