This repro contains all the scripts used in ICPE'23 BFQ, Multiqueue-Deadline, or Kyber? Performance Characterization of Linux Storage Schedulers in the NVMe Era.
Before running the experiments, we provide instructions on how to build and install Linux 6.3.8 and how to build the tools. Please note the path to the binary is needed in the following experiments to be provided to the scripts. Please use the absolute path.
Clone the source code
git clone https://github.com/stonet-research/icpe24_io_scheduler_study_artifact
Download Linux 6.3.8
# Install dependencies for building the Linux kernel
sudo apt-get update
sudo apt-get install git fakeroot build-essential libncurses-dev bison flex libssl-dev libelf-dev debhelper zstd bison libssl-dev bc pahole
mkdir linux_build; cd linux_build
wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.3.8.tar.xz
tar -xf linux-6.3.8.tar.xz
# copy the updated fops.c
cp ../scheduler_study_artifact/linux/fops.c linux-6.3.8/block
cd linux-6.3.8
make olddefconfig
# disable keys
scripts/config --disable SYSTEM_TRUSTED_KEYS
scripts/config --disable SYSTEM_REVOCATION_KEYS
# make the linux kernel
make -j $(getconf _NPROCESSORS_ONLN) bindeb-pkg LOCALVERSION=-sched-expr
# After the make finished
cd ..
sudo dpkg -i *deb
# update grub and reboot
sudo update-grub2
sudo sync
sudo reboot now
# After reboot, check the kernel version, the kernel version should be 6.3.8-sched-expr
uname -r
Enable all the I/O schedulers:
sudo modprobe kyber-iosched
sudo modprobe bfq
Force synchronous dispatching:
echo 2 | sudo tee /sys/module/fops/parameters/force_sync_submission
Note: During the experiments, we find that the I/O requests with None are processed by the same process where the I/O request is issued. The I/O requests with BFQ, Kyber and MQ-Deadline are processed by a kernel worker, this causes a nearly 50% performance drop. Since this work focuses on the I/O schedulers instead of the Linux storage stack, thus we force all the requests to be processed in the same way.
Download and install fio
cd ~
mkdir local; cd local
git clone https://github.com/axboe/fio
cd fio
git checkout fio-3.35
./configure
make -j $(getconf _NPROCESSORS_ONLN)
The fio binary is located at fio/fio. This binary should be provided to the scripts as the --fio option.
sudo apt-get install -y pkg-config meson python3-pyelftools uuid-dev libpcap-dev libssl-dev libncurses5 libncurses5-dev
cd ~/local
git clone https://github.com/spdk/spdk.git
cd spdk
git checkout aed4ece93c659195d4b56399a181f41e00a7a25e
git submodule update --init
sudo scripts/pkgdep.sh
./configure --with-fio=/path_to_fio_repo
# If you follow the path used in this manual
# ./configure --with-fio=~/local/fio
make
The SPDK directory should be provided to the scripts as the --spdk_dir option.
SPDK needs the PCIe address to access the storage devices, to get the PCIe address:
ls -l /sys/block/nvme0n1/device/device
# output: lrwxrwxrwx 1 root root 0 Jan 27 02:00 /sys/block/nvme0n1/device/device -> ../../../0000:00:04.0
Then the PCIe address of this storage device is 0000:00:04.0. This address will be used in the following experiments with the --spdk_dev option.
sudo apt-get install asciidoctor binutils-dev bison build-essential clang cmake flex libbpf-dev libbpfcc-dev libcereal-dev libdw-dev libelf-dev libiberty-dev libpcap-dev llvm-dev libclang-dev systemtap-sdt-dev zlib1g-dev
cd ~/local
git clone https://github.com/iovisor/bpftrace.git
cd bpftrace
git checkout v0.19.0
git submodule init
git submodule update
mkdir build
./build-libs.sh
cmake -B ./build -DBUILD_TESTING=OFF
make -C ./build -j$(nproc)
The binary file is located in //bpftrace/build/src/bpftrace, this path is needed with the --bpftrace option.
# Install dependencies for building perf
sudo apt install libdw-dev libunwind-dev libiberty-dev libzstd-dev libperl-dev libbfd-dev libcap-dev libnuma-dev libaudit-dev systemtap-sdt-dev libgtk2.0-dev
cd ~/linux_build/linux-6.3.8/tools/perf
make -j$(getconf _NPROCESSORS_ONLN)
After make, the perf binary is located at //linux_build/linux-6.3.8/tools/perf/perf, this path is needed with the --perf option.
We rely on python3 to run the scripts. There is no specific version requirement for python3. matplotlib is used to plot the figures. To install matplotlib:
sudo apt-get install pip
sudo pip install matplotlib
Since the experiment results varies with different hardware, we also provide all the original data such as fio output and fio traces used in this paper. If you want to plot with our datasets, please see Artifact Evaluation with Existing Outputs.
All the experiments in our paper are carried out in a single socket server with a 10-core CPU and 8 Samsung 980 pro 1TB. The result may vary with different hardware~(such as CPU) or different storage devices. Here is a list to check if you use a different environment:
- We disable Hyper-threading and Trubo in all the experiments.
- Please make sure that all the fio processes/threads are run on the CPU cores that in the same socket. We have noticed that some I/O schedulers~(BFQ and MQ-Deadline) induce significantly high lock contention with cross-NUMA settings.
- We do CPU pinning in figure 1, 2, 3, table 2, 3, 4. Please make sure CPU 1 and 2 are available. If CPU 1 or 2 is not available to the experiments, the pinned CPU core should be changed in the scripts.
- Since different hardware leads to different absolute performance~(throughput and latency), you might need to change the xlimt and ylimit used in the plots~(or delete them).
NOTE: Please check if the device used (the --dev option) does not contain any useful data. The experiments will corrupt the data in the used storage devices.
Each time a script is run, it creates a directory start in the current directory to store the results. If the directory already exists, the script exits. The old data is not deleted automatically to prevent data loss. If you want to re-run an experiment, please delete the old results first.
Before running the experiments, force all the I/O requests to be dispatched synchronously every time the machine is restarted.
echo 2 | sudo tee /sys/module/fops/parameters/force_sync_submission
Before the experiments, we need to precondition the device. WARNING: This will erase all the data on the device.
cd precondition
sudo ./fill_nvme.sh YOUR_DEVICE
sudo ./fill_random.sh YOUR_DEVICE
Note: In figure 1, we measure the random read throughput and latency with vary request size start from 512B. If you see the following error message, it means that your device does not support 512b block size, please remove it from line 8 in qd_iops_vary_bs.py
# fio error message
fio: io_u error on file /dev/nvme1n1: Invalid argument: read offset=537125032448, buflen=512
fio: first direct IO errored. File system may not support direct IO, or iomem_align= is bad, or invalid block size. Try setting direct=0.
For fig 1a and 1b, a single device is needed, for fig 1c, all the devices are needed. If there are more than one devices, they are concatenated with ':', please see the examples below.
cd fig-1-samsung-baseline
# figure 1a, use a single storage device.
sudo python3 qd_iops_vary_bs.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 qd_iops_vary_bs.py -r --fio /home/user/local/fio/fio --dev /dev/nvme1n1
# figure 1b, use a single storage device.
sudo python3 qd_iops_inc_proc_4kb.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE --spdk_dir YOUR_SPDK_DIR --spdk_dev YOUR_SPDK_DEVICE
# Example: sudo python3 qd_iops_inc_proc_4kb.py -r --fio /home/user/local/fio/fio --dev /dev/nvme1n1 --spdk_dir /home/user/local/spdk --spdk_dev 'trtype=PCIe traddr=0000.00.09.0 ns=1'
# figure 1c, use all 8 storage devices.
sudo python3 qd_iops_inc_proc_4kb_8_dev.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE --spdk_dir YOUR_SPDK_DIR --spdk_dev YOUR_SPDK_DEVICE
# Example: sudo python3 qd_iops_inc_proc_4kb_8_dev.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1 --spdk_dir /home/user/local/spdk --spdk_dev 'trtype=PCIe traddr=0000.00.09.0 ns=1:traddr=0000.00.0a.0:traddr=0000.00.0b.0:traddr=0000.00.08.0:traddr=0000.00.05.0:traddr=0000.00.04.0:traddr=0000.00.06.0:traddr=0000.00.07.0 ns=1'
# figure 2, 4a and 6a, use a single storage device
cd ../fig-2-4a-6a-intra-proc-scal
sudo python3 lapp_cdf_inc_qd_1_dev_1_core.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 lapp_cdf_inc_qd_1_dev_1_core.py -r --fio /home/user/local/fio/fio --dev /dev/nvme1n1
# figure 3, 4b and 6b, use a signle storage device
cd ../fig-3-4b-6b-inter-proc-scal
sudo python3 lapp_cdf_inc_proc_1_dev_1_core.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 lapp_cdf_inc_proc_1_dev_1_core.py -r --fio /home/user/local/fio/fio --dev /dev/nvme1n1
# figure 5 and 6c, use a single storage device
cd ../fig-5-6c-inter-proc-scal-all-core
sudo python3 lapp_cdf_inc_proc_1_dev_10_core.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 lapp_cdf_inc_proc_1_dev_10_core.py -r --fio /home/user/local/fio/fio --dev /dev/nvme1n1
# figure 7, use a single storage device
cd ../fig-7-tapp-scal-1-ssd/
sudo python3 tapp_inc_proc_1_dev_10_core.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 tapp_inc_proc_1_dev_10_core.py -r --fio /home/user/local/fio/fio --dev /dev/nvme1n1 --spdk_dir /home/user/local/spdk --spdk_dev 'trtype=PCIe traddr=0000.00.09.0 ns=1'
cd ../fig-8-lock-overhead/
# figure 8-a, use a single device
sudo python3 breakdown_1_dev_global.py -r --fio YOUR_FIO_PATH --perf YOUR_PERF_PATH --vmlinux YOUR_VMLINUX_PATH --dev YOUR_DEVICE
# Exmaple: sudo python3 breakdown_1_dev_global.py -r --fio /home/user/local/fio/fio --perf /home/user/linux_build/linux-6.3.8/tools/perf/perf --vmlinux /home/user/linux_build/linux-6.3.8/vmlinux --dev /dev/nvme1n1
python3 plot_lock_contention.py
# figure8-b, use all the storage devices
sudo python3 breakdown_8_dev_global.py -r --fio YOUR_FIO_PATH --perf YOUR_PERF_PATH --vmlinux YOUR_VMLINUX_PATH --dev YOUR_DEVICE
# Example: sudo python3 breakdown_8_dev_global.py -r --fio /home/user/local/fio/fio --perf /home/user/linux_build/linux-6.3.8/tools/perf/perf --vmlinux /home/user/linux_build/linux-6.3.8/vmlinux --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1
python3 plot_lock_contention_8_dev.py
The devices are set in the source code fig-9-ssd-scala/tapp_inc_dev_10_proc.py at line 8. Please provide the devices used in the experments as the example given at line 9 - line 12.
cd fig-9-ssd-scala
sudo python3 tapp_inc_dev_10_proc.py -r --fio YOUR_FIO_PATH
# Example: sudo python3 tapp_inc_dev_10_proc.py -r --fio /home/user/local/fio/fio
# Figure 10: use all 8 devices
cd ../fig-10-tapp-scal-8-ssd/
sudo python3 tapp_inc_proc_8_dev_10_core.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE --spdk_dir YOUR_SPDK_DIR --spdk_dev YOUR_SPDK_DEVICE
# Example: sudo python3 tapp_inc_proc_8_dev_10_core.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1 --spdk_dir /home/user/local/spdk --spdk_dev 'trtype=PCIe traddr=0000.00.09.0 ns=1:traddr=0000.00.0a.0:traddr=0000.00.0b.0:traddr=0000.00.08.0:traddr=0000.00.05.0:traddr=0000.00.04.0:traddr=0000.00.06.0:traddr=0000.00.07.0 ns=1'
There are four directories for figure-11, please follow the same step.
cd ../fig-11-a-lapp-t4kb-mix/
sudo python3 1_lapp_inc_rtapp_8_dev_10_cpu.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 1_lapp_inc_rtapp_8_dev_10_cpu.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1
cd ../fig-11-b-lapp-t64kb-mix/
sudo python3 1_lapp_inc_rtapp_64k_8_dev_10_cpu.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 1_lapp_inc_rtapp_64k_8_dev_10_cpu.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1
cd ../fig-11-c-lapp-t4kbw-mix/
sudo python3 1_lapp_inc_wtapp_8_dev_10_cpu.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 1_lapp_inc_wtapp_8_dev_10_cpu.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1
cd ../fig-11-d-lapp-t64kbw-mix/
sudo python3 1_lapp_inc_wtapp_64k_8_dev_10_cpu.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 1_lapp_inc_wtapp_64k_8_dev_10_cpu.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1
There are two directories for figure-12, please follow the same step.
cd ../fig-12-a-t4kb-t64kb-mix/
sudo python3 1_tapp_4k__inc_rtapp_64k_8_dev_10_cpu.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 1_tapp_4k__inc_rtapp_64k_8_dev_10_cpu.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1
cd ../fig-12-b-t4kb-t64kbw-mix/
sudo python3 1_tapp_4k__inc_wtapp_64k_8_dev_10_cpu.py -r --fio YOUR_FIO_PATH --dev YOUR_DEVICE
# Example: sudo python3 1_tapp_4k__inc_wtapp_64k_8_dev_10_cpu.py -r --fio /home/user/local/fio/fio --dev /dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1
cd ../table-2
sudo python3 kyber_vary_target_lat.py -r --fio YOUR_FIO_PATH --bpftrace YOUR_BPFTRACE_PATH --dev YOUR_DEVICE --dev_name YOUR_DEVICE_NAME
# Example: sudo python3 kyber_vary_target_lat.py -r --fio /home/user/local/fio/fio --bpftrace /home/user/local/bpftrace/build/src/bpftrace --dev /dev/nvme1n1 --dev_name nvme1n1
cd ../table-3-4
sudo python3 kyber_l_t_mix_2_core.py -r --fio YOUR_FIO_PATH --bpftrace YOUR_BPFTRACE_PATH --dev YOUR_DEVICE --dev_name YOUR_DEVICE_NAME
# Example: sudo python3 kyber_l_t_mix_2_core.py -r --fio /home/user/local/fio/fio --bpftrace /home/user/local/bpftrace/build/src/bpftrace --dev /dev/nvme1n1 --dev_name nvme1n1
Change to the results directory:
cd results
cd fig-1-samsung-baseline
# figure 1a
python3 qd_iops_vary_bs.py
# figure 1b
python3 qd_iops_inc_proc_4kb.py
# figure 1c
python3 qd_iops_inc_proc_4kb_8_dev.py
# figure 2, 4a and 6a
cd ../fig-2-4a-6a-intra-proc-scal
python3 lapp_cdf_inc_qd_1_dev_1_core.py
# figure 3, 4b and 6b
cd ../fig-3-4b-6b-inter-proc-scal
python3 lapp_cdf_inc_proc_1_dev_1_core.py
# figure 5 and 6c
cd ../fig-5-6c-inter-proc-scal-all-core
python3 lapp_cdf_inc_proc_1_dev_10_core.py
# figure 7
cd ../fig-7-tapp-scal-1-ssd/
python3 tapp_inc_proc_1_dev_10_core.py
cd ../fig-8-lock-overhead/
# figure 8-a
python3 plot_lock_contention.py
# figure8-b
python3 plot_lock_contention_8_dev.py
cd ../fig-9-ssd-scala
python3 tapp_inc_dev_10_proc.py
cd ../fig-10-tapp-scal-8-ssd/
python3 tapp_inc_proc_8_dev_10_core.py
There are four directories for figure-11, please follow the same step.
cd ../fig-11-a-lapp-t4kb-mix/
python3 1_lapp_inc_rtapp_8_dev_10_cpu.py
cd ../fig-11-b-lapp-t64kb-mix/
python3 1_lapp_inc_rtapp_64k_8_dev_10_cpu.py
cd ../fig-11-c-lapp-t4kbw-mix/
python3 1_lapp_inc_wtapp_8_dev_10_cpu.py
cd ../fig-11-d-lapp-t64kbw-mix/
python3 1_lapp_inc_wtapp_64k_8_dev_10_cpu.py
There are two directories for figure-12, please follow the same step.
cd ../fig-12-a-t4kb-t64kb-mix/
python3 1_tapp_4k__inc_rtapp_64k_8_dev_10_cpu.py
cd ../fig-12-b-t4kb-t64kbw-mix/
python3 1_tapp_4k__inc_wtapp_64k_8_dev_10_cpu.py
cd ../table-2
python3 kyber_vary_target_lat.py
cd ../table-3-4
python3 kyber_l_t_mix_2_core.py
# License
This code and artifact is distributed under the MIT license.
MIT License
Copyright (c) 2023 @Large Research
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.