A project for the AutoML course WS22/23 by Stefan Solarski at LMU.
The ImbalancedAutoMLPipeline class is in the ImbalancedAutoML.py file and it is where the most important code is. It reads hyperparameter search spaces from the configuration.py file.
The Jupyter notebook is used to run the benchmarks for the automl system, as well as the vizualizations and initial benchmarking of default parameter classifiers.
The final report explains the approaches used to tackle the problem of imbalanced learning, the experiments we conducted, the resulting pipeline, and results.
The backup folder holds results from the previous benchmark runs and saved will contain results for new future runs of the benchmarks.
Clone the repository and use pip, or another package manager, to install the requirements.
git clone https://github.com/SSolarski/automl_imbalanced.git
cd automl_imbalanced
pip install -r requirements.txt
Necessary packages are given in requirements.txt, we used Python v3.8.16.
We used the following packages:
- pandas -> manipulating and displaying the datasets and results
- jinja2 -> improves the style of pandas dataframes
- numpy -> numerical calculations
- openml -> importing datasets from openml
- scikit-learn -> basic classifiers, pipelines and preprocessing
- imbalanced-learn -> classifiers, pipelines and preprocessing for imbalanced datasets
- xgboost -> XGBoost classifier
- scikit-optimize -> hyperparameter tuning using Bayesian optimization
- matplotlib -> visualizing performance of the automl system
- jupyter -> running Jupyter notebooks