This is the source code to the paper "Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation". Please refer to the paper for the experimental details.
- An Efficient Database Configuration Tuning Benchmark via Surrogate
- Experimental Evaluation
- Environment Installation
- Workload Preparation
- Environment Variables
- Experiments Design
- Exp.1: Tuning improvement over knob set generated by different important measurements.
- Exp.2: Performance improvement and tuning cost when increasing the number of tuned knobs.
- Exp.3: Incremental Knob Selection.
- Exp.4: Optimizer comparision on different configuration space.
- Exp.5: Comparison experiment for knobs heterogeneity.
- Exp.6: Algorithm overhead comparison.
- Exp.7: Transfering methods comparison.
Through the benchmark, you can evaluate the tuning optimizers' performance with minimum overhead.
-
Preparations: Python == 3.7
-
Install packages and download the surrogate model
pip install -r requirements.txt pip install .
The surrogate models can be found in the Google drive. To easily run the tuning benchmark, you can download the surrogate models and place them in the fold autotune/tuning_benchmark/surrogate.
- Run the benchmark for knob selection. We use selecting 5 knobs and tuning SYSBENCH via SMAC as an example.
bash experiment/compare_knob_selection.sh SYSBENCH SMAC 5
- Run the benchmark for optimizer. We use optimization over the configuration space of SYSBENCH as an example.
python run_benchmark.py --method=VBO --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --knobs_num=5 --workload=sysbench --lhs_log=result/sysbench_5knobs_vbo.res --model_path=../tuning_benchmark/surrogate/RF_SYSBENCH_5knob.joblib
python run_benchmark.py --method=MBO --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --knobs_num=5 --workload=sysbench --lhs_log=result/sysbench_5knobs_mbo.res --model_path=../tuning_benchmark/surrogate/RF_SYSBENCH_5knob.joblib
python run_benchmark.py --method=SMAC --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --knobs_num=5 --workload=sysbench --lhs_log=result/sysbench_5knobs_smac.res --model_path=../tuning_benchmark/surrogate/RF_SYSBENCH_5knob.joblib
python run_benchmark.py --method=TPE --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --knobs_num=5 --workload=sysbench --lhs_log=result/sysbench_5knobs_tpe.res --model_path=../tuning_benchmark/surrogate/RF_SYSBENCH_5knob.joblib
python run_benchmark.py --method=TURBO --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --knobs_num=5 --workload=sysbench --lhs_log=result/sysbench_5knobs_turbo.res --tr_init --model_path=../tuning_benchmark/surrogate/RF_SYSBENCH_5knob.joblib
python run_benchmark.py --method=GA --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --knobs_num=5 --workload=sysbench --lhs_log=result/sysbench_5knobs_ga.res --model_path=../tuning_benchmark/surrogate/RF_SYSBENCH_5knob.joblib
You can find all the training data for the tuning benchmark in autotune/tuning_benchmark/data.
In our experiments, the operating system is Linux 4.9. We conduct experimets on MySQL 5.7.19.
-
Preparations: Python == 3.7
-
Install packages
pip install -r requirements.txt pip install .
-
Download and install MySQL 5.7.19 and boost
wget http://sourceforge.net/projects/boost/files/boost/1.59.0/boost_1_59_0.tar.gz wget https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-boost-5.7.19.tar.gz sudo cmake . -DCMAKE_INSTALL_PREFIX=PATH_TO_INSTALL -DMYSQL_DATADIR=PATH_TO_DATA -DDEFAULT_CHARSET=utf8 -DDEFAULT_COLLATION=utf8_general_ci -DMYSQL_TCP_PORT=3306 -DWITH_MYISAM_STORAGE_ENGINE=1 -DWITH_INNOBASE_STORAGE_ENGINE=1 -DWITH_ARCHIVE_STORAGE_ENGINE=1 -DWITH_BLACKHOLE_STORAGE_ENGINE=1 -DWITH_MEMORY_STORAGE_ENGINE=1 -DENABLE_DOWNLOADS=1 -DDOWNLOAD_BOOST=1 -DWITH_BOOST=PATH_TO_BOOST; sudo make -j 16; sudo make install;
Download and install
git clone https://github.com/akopytov/sysbench.git
./autogen.sh
./configure
make && make install
Load data
sysbench --db-driver=mysql --mysql-host=$HOST --mysql-socket=$SOCK --mysql-port=$MYSQL_PORT --mysql-user=root --mysql-password=$PASSWD --mysql-db=sbtest --table_size=800000 --tables=150 --events=0 --threads=32 oltp_read_write prepare > sysbench_prepare.out
We install OLTP-Bench to use the following workload: TPC-C, SEATS, Smallbank, TATP, Voter, Twitter, SIBench.
- Download
git clone https://github.com/oltpbenchmark/oltpbench.git
-
To run
oltpbenchmark
outside the folder, modify the following file:-
./src/com/oltpbenchmark/DBWorkload.java (Line 85)
pluginConfig = new XMLConfiguration("PATH_TO_OLTPBENCH/config/plugin.xml"); # modify this
-
./oltpbenchmark
#!/bin/bash java -Xmx8G -cp `$OLTPBENCH_HOME/classpath.sh bin` -Dlog4j.configuration=$OLTPBENCH_HOME/log4j.properties com.oltpbenchmark.DBWorkload $@
-
./classpath.sh
#!/bin/bash echo -ne "$OLTPBENCH_HOME/build" for i in `ls $OLTPBENCH_HOME/lib/*.jar`; do # IMPORTANT: Make sure that we do not include hsqldb v1 if [[ $i =~ .*hsqldb-1.* ]]; then continue fi echo -ne ":$i" done
-
-
Install
ant bootstrap ant resolve ant build
Download IMDB Data Set from http://homepages.cwi.nl/~boncz/job/imdb.tgz.
Follow the instructions of https://github.com/winkyao/join-order-benchmark to load data into MySQL.
Before running the experiments, the following environment variables require to be set.
export SYSBENCH_BIN=PATH_TO_sysbench/src/sysbench
export OLTPBENCH_BIN=PATH_TO_oltpbench/oltpbenchmark
export MYSQLD=PATH_TO_mysqlInstall/bin/mysqld
export MYSQL_SOCK=PATH_TO_mysql/base/mysql.sock
export MYCNF=PATH_TO_autotune/template/experiment_normandy.cnf
export DATADST=PATH_TO_mysql/data
export DATASRC=PATH_TO_mysql/data_copy
All optimization methods are listed as follows:
Method | String of ${METHOD} |
---|---|
Vanilla BO | VBO |
Mixed-Kernel BO | MBO |
Sequential Model-based Algorithm Configuration | SMAC |
Tree-structured Parzen Estimator | TPE |
Trust-Region BO | TURBO |
Deep Deterministic Policy Gradient | DDPG |
Genetic Algorithm | GA |
Compared importance measurements: lasso
, gini
, fanova
, ablation
, shap
.
To conduct the experiment shown in Figure 3(a), the script is as follows. Please specify ${lhs_log}
.
python train.py --knobs_config=experiment/gen_knobs/JOB_lasso.json --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_gini.json --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_fanova.json --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_ablation.json --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_shap.jso --knobs_num=5 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_lasso.json --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_gini.json --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_fanova.json --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_ablation.json --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
python train.py --knobs_config=experiment/gen_knobs/JOB_shap.jso --knobs_num=20 --method=VBO --workload=job --dbname=imdboload --y_variable=lat --lhs_num=10 --lhs_log=${lhs_log}
To conduct the experiments in (b), (c), and (d), modify ${knobs_num}
,${method}
,${workload}
, ${dbname}
, and ${y_variable}
, where
-
${knobs_num}
= 5, 20 -
${method}
= VBO, DDPG -
${workload}
= job, sysbench- if
${workload} == job
, then${dbname} = imdbload
,${y_variable}=lat
- if
${workload} == sysbench
, then${dbname} =sbtest
,${y_variable}=tps
- if
Note${knobs_config}
indicates the configuration file where knobs are ranked by importance.
- We provide the configuration file generated on our VM:
experiment/gen_knobs/${workload}_${measure}.json
. - You can also generate new configuration file with samples in your environment.
To conduct the experiment shown in Figure 5 (a) and 5 (b), the script is as follows.
python train.py --method=VBO --workload=job --dbname=imdbload --y_variable=lat --lhs_num=10 --knobs_num=${knobs_num} --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=VBO --workload=sysbench --dbname=sbtest --y_variable=tps --lhs_num=10 --knobs_num=${knobs_num} --knobs_config=experiment/gen_knobs/SYSBENCH_shap.json --lhs_log=${lhs_log}
Please specify ${knobs_num}
and ${lhs_log}
, where
${knobs_num}
= 5, 10, 15, 20, 30, 50, 70, 90, 197
Compared methods: 5 Knobs
, 20 Knobs
, increase
, decrease
.
To conduct the experiment shown in Figure 6(a), the script is as follows. Please specify ${lhs_log}
.
python train.py --method=VBO --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=VBO --knobs_num=20 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=increase --knobs_num=-1 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=decrease --knobs_num=-1 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
To conduct the experiment shown in (b), you can
- replace
--workload=JOB --y_variable=lat
with--workload=sysbench --y_variable=tps
Compared optimizers: VBO
, MBO
, SMAC
, TPE
, TURBO
, DDPG
, GA
.
To conduct the experiment shown in Figure 7(a), the script is as follows. Please specify ${lhs_log}
.
python train.py --method=VBO --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=MBO --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=SMAC --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=TPE --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=TURBO --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=DDPG --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
python train.py --method=GA --knobs_num=5 --workload=job --y_variable=lat --dbname=imdbload --knobs_config=experiment/gen_knobs/JOB_shap.json --lhs_log=${lhs_log}
To conduct the experiment shown in (b), (c), (d), (e), (f), and (g), you can
- replace
--knobs_num=5
with--knobs_num=20
or--knobs_num=197
- replace
--workload=JOB --y_variable=lat --dbname=imdbload
with--workload=sysbench --y_variable=tps --dbname=sbtest
Compared optimizers: VBO
, MBO
, SMAC
, DDPG
.
To conduct the experiment shown in Figure 8(a) and (b), the script is as follows.
python train.py --method=${method} --knobs_num=20 --workload=job --y_variable=lat --dbname=${dbname} --knobs_config=experiment/gen_knobs/JOB_continuous.json --lhs_log=${lhs_log} --lhs_num=10
python train.py --method=${method} --knobs_num=20 --workload=job --y_variable=lat --dbname=${dbname} --knobs_config=experiment/gen_knobs/JOB_heterogeneous.json --lhs_log=${lhs_log} --lhs_num=10
Please specify ${method}
, ${dbname}
and ${lhs_log}
, where
${method}
is one ofVBO
,MBO
,SMAC
,DDPG
.
Compared optimizers: MBO
, SMAC
, TPE
, TURBO
, DDPG
, GA
.
To conduct the experiment shown in Figure 8(a) and (b), the script is as follows.
python train.py --method=${method} --knobs_num=20 --workload=job --y_variable=lat --dbname=${dbname} --knobs_config=experiment/gen_knobs/job_shap.json --lhs_log=${lhs_log} --lhs_num=10
Please specify ${method}
, ${dbname}
and ${lhs_log}
, where
${method}
is one ofMBO
,SMAC
,TPE
,TURBO
,DDPG
,GA
.
Note if you have already done Exp.4, you can skip running the above script and analyze log files in script/log/
.
Compared methods: RGPE-MBO
, RGPE-SMAC
, MAP-MBO
, MAP-SMAC
, FineTune-DDPG
To conduct the experiment shown in Table 9, there are two steps:
- Pre-train on source workloads (
Smallbank
,SIBench
,Voter
,Seats
,TATP
); - Validate on target workloads (
TPCC
,SYSBENCH
,Twitter
).
Scripts for pre-trains is similar to the ones for Exp.4
To validate on target workloads, the scripts are as follows.
python train.py --method=MBO --RGPE --source_repo=${repo} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10
python train.py --method=SMAC --RGPE --source_repo=${repo} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10
python train.py --method=MBO --workload_map --source_repo=${repo} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10
python train.py --method=SMAC --workload_map --source_repo=${repo} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10
python train.py --method=DDPG --params=model_params/${ddpg_params} --knobs_num=20 --workload=job --y_variable=lat --dbname=tpcc --knobs_config=experiment/gen_knobs/oltp.json --lhs_log=${lhs_log} --lhs_num=10
Note that
- for
RGPE-
methods, you should specify--RGPE --source_repo=${repo}
- for
MAP-
methods, you should specify--workload_map --source_repo=${repo}
- for
FineTune-DDPG
, you should specify--params=model_params/${ddpg_params}
autotune/tuner.py
: the implemented optimization methods.autotune/dbenv.py
: the interacting functions with database.script/train.py
: the python script to start an experiment.script/experiment/gen_knob
: the knob importance ranking files generated by different methods.