We won the 2nd place of the NeurIPS 2020 competition: Find the best black-box optimizer (BBO) for machine learning.🎉 We proposed a simple ensemble algorithm of black-box optimizers that outperforms any single optimizer but within the same timing budget. Evaluation of optimizers is a computing-intensive and time consuming task since the number of test cases grow exponentially with models, datasets and metrics. In our case, we need to evaluate 15 optimizers, execute 4,230 jobs, train 2.7 million models and run 541,440 optimizations (suggest-observe). Utilizing the RAPIDS libraries cuDF and cuML, our GPU Accelerated exhaustive search is capable of finding the best ensemble in reasonable time. On a DGX-1, the search time is reduced from more than 10 days on two 20-core CPUs to less than 24 hours on 8-GPUs.
Our paper is published on arxiv! GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms
Please use the following BibTeX if you want to cite our work:
@misc{liu2020gpu,
title={GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms},
author={Jiwei Liu and Bojan Tunguz and Gilberto Titericz},
year={2020},
eprint={2012.04201},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
In this competition, black-box optimization algorithms are evaluated on real-world objective functions, using a benchmark system built on top of the AutoML challenge workflow and the Bayesmark package. This competition has widespread impact as black-box optimization is relevant for hyper-parameter tuning in almost every machine learning project (especially deep learning). The leader board will be determined using the optimization performance on held-out (hidden) objective functions, where the optimizer must run without human intervention.
Our final submission is an ensemble of optimizer TuRBO and scikit-optimize. Code is in example_submissions/turbosk
.
Our ensemble method turbo-skopt
(LB 92.9 ranking 2nd) improves significantly upon the single optimizers it consists of, namely turbo
(LB 88.9 ranking 24) and skopt
(LB 88.08 ranking 36) on the final leaderboard. We see similar improvement in our local validation.
Our solution includes two parts:
- A multi-GPU optimized exhaustive search algorithm (this repo).
- Rapids-enabled Bayesmark (rapids branch)
- conda create -n bbo_rapids python=3.7
- conda activate bbo_rapids
- conda install "pytorch=1.6" "cudf=0.16" "cuml=0.16" cudatoolkit=10.2.89 -c pytorch -c rapidsai -c nvidia -c conda-forge -c defaults
- pip install gpytorch==1.2.1
- pip install git+https://github.com/uber-research/TuRBO.git@master
- pip install pySOT==0.2.3 opentuner==0.8.2 nevergrad==0.1.4 hyperopt==0.1.1 scikit-optimize==0.5.2 scikit-learn==0.20.2 xgboost==1.2.1
- git clone https://github.com/daxiongshu/bayesmark
- cd bayesmark
- git checkout rapids
- ./build_wheel.sh
- python setup.py install
- please change the global variable
NUM_GPUS
inrun_one_opt.py
accordingly - run a quick sanity check experiment with
python run_one_opt.py
, which takes 6 mins on a dgx-1. - run the exhaustive search with
python run_exhaustive_search.py
. which takes less than 24 hours on a dgx-1.
- The ensemble of optimizers outperform single optimizers in terms of generalization performance.
Performance of optimization algorithms in terms of (a) cross validation score that is visible to and minimized by optimizers and (b) holdout validation score which represents the generalization ability of the optimizer. The y-axis is normalized mean score and lower is better. The top 5 optimizers are highlighted in each sub-figure.
- Optimizers are good at different machine learning models.
- The overall execution time is dominant by model evaluation rather than optimization
We chose turbo-skopt
as our final submission because
- it has a Top-3 generalization score.
- it converges faster than single optimizers.
- it achieves best performance for a representative deep learning model.