Research Artifact of USENIX Security 2023 Paper: CacheQL: Quantifying and Localizing Cache Side-Channel Vulnerabilities in Production Software
Preprint: https://arxiv.org/pdf/2209.14952.pdf
Warning: This repo is provided as-is and is only for research purposes. Please use it only on test systems with no sensitive data. You are responsible for protecting yourself, your data, and others from potential risks caused by this repo.
-
Build from source code
git clone https://github.com/Yuanyuan-Yuan/CacheQL cd CacheQL pip install -r requirements.txt
This repo is organized as follows:
-
data
- This folder provides scripts for processing data. -
pin
- This folder provides our pintools for logging secret-dependent data access (SDA) and control branch (SCB). -
pp
- This folder provides scripts for logging cache set accesses with Prime+Probe. -
software
- Our evaluated software. -
config.py
- This script sets all global paths for different experiments. -
params.py
- This script includes all parameters in our experiments. -
model.py
- This script includes implementations of our encoder, compressor, and classifier. -
dataset.py
- This script implements Pytorch dataset classes for different side channel traces and secrets. -
utils.py
- Some utility tools. -
MI*.py
- The script for quantifying side channel leaks under different scenarios. -
shapley.py
- The script for localizing leakage sites in software.
Our evaluated software can be downloaded here. Once downloaded, put everything into the software
folder.
See SOFTWARE for detailed documents.
You need to first set up all related paths in config.py
. We have set some default paths w.r.t. this project; you may need to update them if you want to analyze your own data and software.
See DATA for how the data
folder is organized according to the current config.py
.
Each dataset is implemented as one Pytorch Dataset class in dataset.py
.
cd data
python generate_key.py
Run the above commands to generate private keys for OpenSSL, MbedTLS, and Libgcrypt.
generate_key.py
also converts the generated keys into .npz
format for support of being loaded by Pytorch.
dataset.py
implements the following four datasets for crypto software. You can use them as Pytorch Datasets.
-
RSADatasetMulti
- For analyzing the whole trace of crypto software (logged by Pin). -
RSADatasetMultiDec
- For the decryption phase (logged by Pin). -
RSADatasetMultiPre
-For the pre-processing phase (logged by Pin). -
RSADatasetMultiPP
- For traces collected using Prime+Probe.
We use CelebA as one representative media dataset, which contains human face photos. The official dataset can be downloaded here.
After downloading the dataset, run the following commonds:
cd data
python celeba_process.py
The dataset will be splitted into different subfolders and be cropped and resized into data/celeba_crop128/fit
folder.
The dataset class for CelebA is implemented using the ImageDatasetMulti
class in dataset.py
. You can use it as one Pytorch Dataset.
We provide our pintools in the pin
folder.
-
SCB.cpp
- Logging records of secret-dependent control branch (SCB) for the whole execution. -
SDA.cpp
- Logging records of secret-dependent data access (SDA) for the whole execution. -
SCB_dec.cpp
- Logging records of SCB for the decryption phase. -
SDA_dec.cpp
- Logging records of SDA for the decryption phase.
Download Pin from here and unzip it to PIN_ROOT
(specify this path by yourself). Then, run the following commands:
cp -r pin/*.cpp /PIN_ROOT/source/tools/ManualExamples/
cd /PIN_ROOT/source/tools/ManualExamples/
make obj-intel64/SCB.so TARGET=intel64
make obj-intel64/SDA.so TARGET=intel64
make obj-intel64/SCB_dec.so TARGET=intel64
make obj-intel64/SDA_dec.so TARGET=intel64
Before collecting side channel traces, run the following command to disable ASLR.
setarch $(uname -m) -R /bin/bash
When collecting traces for the decryption phase, you need to specify any one of the following arguments:
-
sr
- The name of the function starting decryption. -
sa
- The address of the function starting decryption.
You can use the scripts provided here (following the instructions here) to automate the collection of side channel traces.
We use Mastik (Ver. 0.02) to launch Prime+Probe on L1 cache to collect the cache set accesses. We provide our scripts in pp/Mastik
.
We recommend you setting the cache miss threshold in these scripts according to your machines.
Suppose Mastik is downloaded into the path /MASTIK
. Run the following commands:
cp -r pp/Mastik/*.c /MASTIK/demo
cd /MASTIK
make
We assume victim and spy are on the same CPU core and no other process is runing on this CPU core.
First, run the following command to isolate the cpu_id
-th CPU core .
sudo cset shield --cpu {cpu_id}
Then collect side channel records as the follow.
cd Mastik
sudo cset shield --exec python run_pp.py -- {pp_crypto OR pp_media} {cpu_id} {segment_id}
Mastik/pp_crypto.py
/Mastik/pp_media.py
is the coordinator which runs spy and victim on the same CPU core simultaneously and saves the collected cache set access.
Run python MI.py --software XXX --side YYY --setting ZZZ
to train a model and quantify leaks for the whole trace.
--software
- The analyzed software.
choices = [
'rsa_openssl_0.9.7c', 'rsa_openssl_3.0.0',
'rsa_mbedtls_2.15.0', 'rsa_mbedtls_3.0.0',
'rsa_sign_libgcrypt_1.6.1', 'rsa_sign_libgcrypt_1.9.4',
'aes_openssl_0.9.7c', 'aes_openssl_3.0.0',
'aes_mbedtls_2.15.0', 'aes_mbedtls_3.0.0',
]
-
--side
- The type of side channels logged via Pin.
choices = [cacheline
,cachebank
] -
--setting
- The setting of the leakage mode.
choices = [SDA
,SCB
]
Run python MI_det.py --software XXX --side YYY --setting ZZZ
to quantify leaks for crypto software whose blinding is disabled. The choices of parameters are same as the above.
Since side channel traces are deterministic, validation is not needed.
Run python MI_dec.py --software XXX --side YYY --setting ZZZ
to quantify leaks for the decryption phase.
Run python MI_dec_det.py --software XXX --side YYY --setting ZZZ
to quantify leaks for the decryption phase with blinding disabled.
The choices of parameters are same as the above.
Run python MI_image.py --software libjpeg-turbo-2.1.2 --side YYY --setting ZZZ
to train a model and quantify leaks for the whole trace of libjpeg
.
-
--side
- The type of side channels logged via Pin.
choices = [cacheline
,cachebank
] -
--setting
- The setting of leakage mode.
choices = [SDA
,SCB
]
Run python MI_image.py --software XXX --setting ZZZ --repeat_num N
to train a model and quantify leaks in side channel records logged via Prime+Probe.
--software
- The analyzed software.
choices = [
'rsa_openssl_0.9.7c', 'rsa_openssl_3.0.0',
'rsa_mbedtls_2.15.0', 'rsa_mbedtls_3.0.0',
'rsa_sign_libgcrypt_1.6.1', 'rsa_sign_libgcrypt_1.9.4',
'aes_openssl_0.9.7c', 'aes_openssl_3.0.0',
'aes_mbedtls_2.15.0', 'aes_mbedtls_3.0.0',
]
-
--setting
- The setting of leakage.
choices = [pp_dcache
,pp_icache
] -
--repeat_num
- The number of repeats when performing Prime+Probe.
choices = [1
,2
,4
,8
,16
]
Run python shapley.py --software XXX --side YYY --setting ZZZ --use_IG 0
to localize the leakage sites.
-
--use_IG
- Whether using Integrated Gradient to compute the gradients.
choices = [0
,1
]. Default is0
, which uses the conventional method to compute gradients. -
--software
- The analyzed software.
choices = [
'rsa_openssl_0.9.7c', 'rsa_openssl_3.0.0',
'rsa_mbedtls_2.15.0', 'rsa_mbedtls_3.0.0',
'rsa_sign_libgcrypt_1.6.1', 'rsa_sign_libgcrypt_1.9.4',
'aes_openssl_0.9.7c', 'aes_openssl_3.0.0',
'aes_mbedtls_2.15.0', 'aes_mbedtls_3.0.0',
]
-
--side
- The type of side channels logged via Pin.
choices = [cacheline
,cachebank
] -
--setting
- The setting of the leakage mode.
choices = [SDA
,SCB
]
The full report of our localized side channel vulnerabilities is provided at the project page.
We sincerely thank Janos Follath (developer of MbedTLS), Matt Caswell (developer of OpenSSL), and developers of Libjpeg-turbo for their prompt responses and comments on our reported vulnerabilities.
If CacheQL is helpful for your research, please consider cite our work as follows:
@inproceedings{yuan2023cacheql,
title={CacheQL: Quantifying and Localizing Cache Side-Channel Vulnerabilities in Production Software},
author={Yuan, Yuanyuan and Liu, Zhibo and Wang, Shuai},
booktitle={32nd USENIX Security Symposium (USENIX Security 23)},
year={2023}
}
If you have any questions, feel free to contact Yuanyuan ([email protected]).