Skip to content

SSolarski/automl_imbalanced

Repository files navigation

AutoML pipeline for imabalanced classification

A project for the AutoML course WS22/23 by Stefan Solarski at LMU.

Contents

The ImbalancedAutoMLPipeline class is in the ImbalancedAutoML.py file and it is where the most important code is. It reads hyperparameter search spaces from the configuration.py file.

The Jupyter notebook is used to run the benchmarks for the automl system, as well as the vizualizations and initial benchmarking of default parameter classifiers.

The final report explains the approaches used to tackle the problem of imbalanced learning, the experiments we conducted, the resulting pipeline, and results.

The backup folder holds results from the previous benchmark runs and saved will contain results for new future runs of the benchmarks.

Setup

Clone the repository and use pip, or another package manager, to install the requirements.

git clone https://github.com/SSolarski/automl_imbalanced.git
cd automl_imbalanced
pip install -r requirements.txt

Necessary packages are given in requirements.txt, we used Python v3.8.16.

Packages

We used the following packages:

  1. pandas -> manipulating and displaying the datasets and results
  2. jinja2 -> improves the style of pandas dataframes
  3. numpy -> numerical calculations
  4. openml -> importing datasets from openml
  5. scikit-learn -> basic classifiers, pipelines and preprocessing
  6. imbalanced-learn -> classifiers, pipelines and preprocessing for imbalanced datasets
  7. xgboost -> XGBoost classifier
  8. scikit-optimize -> hyperparameter tuning using Bayesian optimization
  9. matplotlib -> visualizing performance of the automl system
  10. jupyter -> running Jupyter notebooks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published