This project demonstrates Arithmetic Coding Compression for High-Frequency Trading (HFT) tick data with denoising using Haar Wavelet Transforms. The tick data is compressed using arithmetic coding and the wavelet transforms help reduce noise in the data.
- Introduction
- Mathematical Background
- Project Structure
- How to Run
- Requirements
- Step-by-Step Instructions
- Sample Usage
High-Frequency Trading (HFT) generates massive amounts of tick data in real-time. Compressing this data efficiently while maintaining its integrity is crucial. Arithmetic Coding is a variable-length entropy encoding method that compresses data efficiently by encoding an entire message as a range of real numbers.
To reduce noise in the tick data, Haar Wavelet Transforms are applied. Haar wavelets help in denoising the signal, preserving relevant data while removing noise.
This project combines arithmetic coding with wavelet-based denoising for optimal tick data compression.
Arithmetic Coding works by encoding a message into a single number between 0 and 1. The core idea is to represent the message as an interval
Let:
-
$P(x)$ be the probability of symbol$x$ , -
$C(x)$ be the cumulative probability of symbols less than$x$ .
For a message
- Initialize the interval as
$[0, 1)$ . - For each symbol
$x_i$ :- The new interval is given by:
$low = low + (high - low) \cdot C(x_i)$
$high = low + (high - low) \cdot P(x_i)$
- The new interval is given by:
- After encoding all symbols, any number within the final interval
$[low, high)$ can be used as the encoded message.
Decoding is the reverse process. Given the encoded value, you retrieve each symbol by determining which interval the encoded number falls into.
Haar Wavelet Transform (HWT) is used to decompose a signal into its approximation and detail components, which can then be used for denoising.
Given a discrete signal
-
Approximation Coefficients
$a[n]$ : Represents the average. -
Detail Coefficients
$d[n]$ : Represents the difference.
For a signal
The inverse transform reconstructs the original signal using:
The multi-level wavelet transform repeatedly applies this operation to the approximation coefficients to further decompose the signal.
.
├── README.md # Project README
├── tick_data.hpp # Tick data Header file
├── tick_data.cpp # Tick data implementation
├── ae.hpp # Arithmetic Encoding & Decoding Header
├── ae.cpp # Arithmetic Encoding & Decoding Implementation
├── haar_wavelet.hpp # Haar Wavelet Transform Header
├── haar_wavelet.cpp # Haar Wavelet Transform implementation
├── benchmark.cpp # Main execution file
├── generate.cpp # Generate random tick data
├── tick_data.csv # CSV file for tick data (generated)
├── important_tick_data.csv # CSV file filtered/denoised tick data
└── Makefile # Make file
- C++ compiler supporting C++11 or higher
- Python 3.x (for visualization and plotting)
g++
for compiling C++ code- Python packages:
pandas
,matplotlib
(specified inrequirements.txt
)
-
Clone the Repository:
git clone https://github.com/1618lip/hft_compression.git cd hft_compression
-
Compile C++ Code: Compile the project files (
tick_data.cpp
,benchmark.cpp
,ae.cpp
,haar_wavelet.cpp
) using theMakefile
:make
If you see warnings, just ignore it.
-
Generate Tick Data: To generate random tick data, run the
generate
program:g++ -o generate generate.cpp ./generate Enter the number of tick data points to generate: <enter an integer> Random tick data generated in tick_data.csv
This will create a file called
tick_data.csv
with randomly generated tick data. -
Generate Denoised & Filtered Tick Data: Once you have
tick_data.csv
, generate the denoised and filtered tick data:g++ -o denoise generate_denoised_tick.cpp haar_wavelet.cpp ./denoise
-
Run Compression and Denoising: After generating the data, use the
benchmark
program to compress and denoise the tick data:./benchmark
This will read the generated tick data, apply Haar Wavelet Transform for denoising, compress the data using arithmetic coding, and print the compression ratios.
-
See Filtering Effect: (Optional): To visualize the original vs filtered tick data using Python:
python plot.py
# Step 0: Compile all Files
make
# Step 1: Generate 100 random tick data
./generate 100
# Step 2: Denoise & Filter generated tick data
./denoise
# Step 2: Compress the tick data with and without denoising
./benchmark
# Step 3: Visualize the tick data (optional, using Python)
python plot.py
This project demonstrates efficient compression for HFT tick data and integrates denoising using Haar Wavelet Transforms to preserve important data while removing noise.
This project is licensed under the MIT License.