- This repository hosts the source code and supplementary materials for our:
- VLDB 2024 submission, GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization
- SIGMOD 2024 Demo submission, A Demonstration of GPTuner: A GPT-Based Manual-Reading Database Tuning System
- GPTuner collects and refines heterogeneous domain knowledge, unifies a structured view of the refined knowledge, and uses the knowlege to (1) select important knobs, (2) optimize the value range of each knob and (3) explore the optimized space with a novel Coarse-to-Fine Bayesian Optimization Framework.
- A video demonstration is available at YouTube!
Stay tuned for the latest updates and enhancements in this project! 🚀
Remember to star ⭐ and subscribe 🔔 for the newest features and improvements!
GPTuner is a manual-reading database tuning system to suggest satisfactory knob configurations with reduced tuning costs. The figure above presents the tuning workflow that involves seven steps:
- 📌 User provides the DBMS to be tuned (e.g., PostgreSQL or MySQL), the target workload, and the optimization objective (e.g., latency or throughput).
- 📌 GPTuner collects and refines the heterogeneous knowledge from different sources (e.g., GPT-4, DBMS manuals, and web forums) to construct Tuning Lake, a collection of DBMS tuning knowledge.
- 📌 GPTuner unifies the refined tuning knowledge from Tuning Lake into a structured view accessible to machines (e.g., JSON).
- 📌 GPTuner reduces the search space dimensionality by selecting important knobs to tune (i.e., fewer knobs to tune means fewer dimensions).
- 📌 GPTuner optimizes the search space in terms of the value range for each knob based on structured knowledge.
- 📌 GPTuner explores the optimized space via a novel Coarse-to-Fine Bayesian Optimization framework.
- 📌 Finally, GPTuner identifies satisfactory knob configurations within resource limits (e.g., the maximum optimization time or iterations specified by users).
The following instructions have been tested on Ubuntu 20.04 and PostgreSQL v14.9:
sudo apt-get update
sudo apt-get install postgresql-14
Step 2: Install BenchBase with our script
- Note: the script is tested on
openjdk version "17.0.8.1" 2023-08-24
, please prepare your JAVA environment first
cd ./scripts
sh install_benchbase.sh postgres
- Note: modify
./benchbase/target/benchbase-postgres/config/postgres/sample_{your_target_benchmark}_config.xml
to customize your tuning setting first
sh build_benchmark.sh postgres tpch
sudo pip install -r requirements.txt
- Note: modify
configs/postgres.ini
to determine the target DBMS first, therestart
andrecover
commands depend on the environment and we provide Docker version - Note: modify
src/run_gptuner.py
to set up yourapi_base
,api_key
andmodel
first
# PYTHONPATH=src python src/run_gptuner.py <dbms> <benchmark> <timeout> <seed>
PYTHONPATH=src python src/run_gptuner.py postgres tpch 180 -seed=100
where <dbms>
specifies the DBMS (e.g., postgres or mysql), <benchmark>
is the target workload (e.g., tpch or tpcc), <timeout>
is the maximum time allowed to stress-test the benchmark, <seed>
is the random seed used by the optimizer.
The optimization result is stored in optimization_results/{dbms}/{stage}/{seed}/runhistory.json
, where {dbms}
is the target DBMS, {stage}
is coarse or fine and {seed}
is the random seed given by user.
- the
data
block contains the following information, we explain the project-related information below. For more details, please refer to SMAC3 Library.config_id
: i is the identifier for the knob configuration given by i-th iteration- instance
- budget
- seed
cost
: the optimization objective (e.g., throughput or latency)- time
- status
- starttime
- endtime
- additional_info
- the
"configs"
block contains the knob configuration of the i-th iteration, for example:
"configs": {
"1": {
"effective_io_concurrency": 200,
"random_page_cost": 1.2
},
}
Step 1: Complete Steps 1 to 4 in the Quick Start section
PYTHONPATH=src python -m streamlit run src/demo/entrypage.py
Step 3: Follow our video demonstration to use the GUI
We compare GPTuner with state-of-the-art methods both using or not using natural language knowledge as input:
- DB-BERT SIGMOD'22: a DBMS tuning tool that uses BERT to read the manuals and use the gained information to guide Reinforcement Learning (RL)
- SMAC: the best Bayesian Optimiztion (BO)-based method evaluated in an Experimental Evaluation VLDB'22
- GP: the classic Gassian Process-based BO approach used in iTuned VLDB'09 and OtterTune SIGMOD'17
- DDPG++: a RL-based tuning method proposed in CDBTune SIGMOD'19 and improved in Inquiry VLDB'21
We compare GPTuner with baselines on different DBMS (PostgreSQL and MySQL), benchmarks (TPC-H and TPC-C) and metrics (throughput and latency). We present the results on PostgreSQL in this repository. For more details, please refer to our paper.
configs/
postgres.ini
: Configuration file to optimize PostgreSQLmysql.ini
: Configuration file to optimize MySQL
optimization_results/
temp_results/
: Temporary storage for optimization resultspostgres/
coarse/
: Coarse-stage optimization results for PostgreSQLfine/
: Fine-stage optimization results for PostgreSQL
scripts/
install_benbase.sh
: Script to install the BenchBase benchmark toolbuild_benchmark.sh
: Script to build benchmark environmentsrecover_postgres.sh
: Script to recover the state of PostgreSQL databaserecover_mysql.sh
: Script to recover the state of MySQL database
knowledge_collection/
postgres/
target_knobs.txt
: List of target knobs for PostgreSQL tuningknob_info/
system_view.json
: Information from PostgreSQL system views (pg_settings)official_document.json
: Information from PostgreSQL official documentation
knowledge_sources/
gpt/
: Knowledge sourced from GPT modelsmanual/
: Knowledge from DBMS manualsweb/
: Knowledge extracted from web sourcesdba/
: Knowledge from database administrators
tuning_lake/
: Data lake for DBMS tuning knowledgestructured_knowledge/
special/
: Specialized structured knowledgenormal/
: General structured knowledge
example_pool/
: Pool of examples for prompt ensemble algorithmsql
: Provide sql statements if you need query-level knob selectionsrc/
: Source codedemo/
: Module to execute the GUI (Demonstration Code)dbms/
dbms_template.py
: Template for database management systemspostgres.py
: Implementation for PostgreSQLmysql.py
: Implementation for MySQL
knowledge_handler/
gpt.py
: Module for interactions with GPTknowledge_preparation.py
: Module for knowledge preparation (Sec. 5.1)knowledge_transformation.py
: Module for knowledge transformation (Sec. 5.2)
space_optimizer/
knob_selection.py
: Module for knob selection (Sec. 6.1)default_space.py
: Definition of default search spacecoarse_space.py
: Definition of coarse search space (Sec. 6.2)fine_space.py
: Definition of fine search space (Sec. 6.2)
config_recommender/
workload_runner.py
: Module to run workloadscoarse_stage.py
: Recommender for coarse stage configuration (Sec. 7)fine_stage.py
: Recommender for fine stage configuration (Sec. 7)
run_gptuner.py
: Main script to run GPTuner
- Paper version
- GPTuner uses OpenAI completion API of
gpt-4
orgpt-3.5-turbo
- GPTuner leverages tuning knowledge from
GPT-4
,DBMS official manuals
andweb contents
- GPTuner supports
PostgreSQL
andMySQL
- GPTuner stress-tests workloads through the
BenchBase
tool
- GPTuner uses OpenAI completion API of
- Future implementation (We warmly invite and appreciate your contributions! 👫)
- GPTuner employs
locally depolyed large language models
as well - GPTuner collects web contents through
web-gpt
andweb-crawler
- GPTuner uses a
generic
stress-test tool, supportingany given workload
optimization - GPTuner refines its
knowledge_collection
with ahuman-in-the-loop
mechanism - GPTuner supports more
DBMS
- to be continued...
- GPTuner employs
If you use this codebase, or otherwise found our work valuable, please cite 📒:
@misc{lao2023gptuner,
title={GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization},
author={Jiale Lao and Yibo Wang and Yufei Li and Jianping Wang and Yunjia Zhang and Zhiyuan Cheng and Wanghu Chen and Mingjie Tang and Jianguo Wang},
year={2023},
eprint={2311.03157},
archivePrefix={arXiv},
primaryClass={cs.DB}
}