Skip to content

Latest commit

 

History

History
59 lines (41 loc) · 2.81 KB

README.md

File metadata and controls

59 lines (41 loc) · 2.81 KB

Welcome to the DQW Structured data repository! 🏗️

This repo contains the structured data DQW streamlit app code, however, the streamlit apps have been split into 5 for maintenance purposes:

The packages used in the application are in the table below.

App section Description Visualisation Selection Package
Synthetic tabular x x table-evaluator
Tabular x x sweetviz
Tabular x x pandas-profiling
Tabular, text x PyCaret

Structured (tabular) data

Key points addressed:

  • Quantitative measures – number of rows and columns.
  • Qualitative measures – column types.
  • Descriptive statistics with NumPy for numeric columns, for example, count, mean, percentiles and standard deviation. For discrete columns, count, unique, top and frequency.
  • Explore missing data.
  • Examine outliers.
  • Mitigate class imbalance.
  • Compare datasets, like train, test and evaluate data.
  • Evaluate synthetic datasets.
  • Create a quality report.

To complete the key points, 4 subsections are created:

  • One file EDA with pandas-profiling
  • One file preporcessing with PyCaret
  • Two file comparison with Sweetviz
  • Synthetic data evaluation with table-evaluator
  • In all the sections, there is an option to download a pdf/zip of the results

How to run locally

  1. Installation process:

    Create virtual environment and activate it - https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/

    Clone or download files from this repo

    Run pip install -r requirements.txt

    Run streamlit app.py to launch app

  2. Software dependencies:

    In requirements.txt

  3. Latest releases

    Use app.py