Optiver - Trading at the Close | Kaggle

Links:

Repository Structure:

/images - contains images used in README.md
/reference_notebooks - contains public notebooks from the competition

Ignored data files:

The below folder should contain the unzipped data files from the competition. The folder is ignored by git to avoid uploading large files to the repository. The folder structure should be as follows:

/kaggle/input/optiver-trading-at-the-close/
- /example_test_files
- /optiver2023
- train.csv
- public_timerseries_testing_util.py

Missing Value Handling:

Interpolation: Fill missing values with interpolated data based on nearby points.
Forward Fill or Backward Fill: Fill gaps with the previous (or next) non-null value.
Drop: Simply remove missing values if they are few.
Impute: Use statistical methods (like mean, median) or models to estimate missing values.

Outliers Detection and Treatment:

Visual Inspection: Using plots to detect anomalies.
Statistical Tests: e.g., Z-scores, IQR.
Treatment: Cap, replace, or remove outliers, depending on the context.

Decomposition:

Trend: Removing or accounting for the underlying trend in the data.
Seasonality: Adjusting for recurring patterns or cycles.
Residual: The remainder of the time series after removing trend and seasonality.

Stationarity:

Differencing: Taking the difference with a previous observation.
Transformation: e.g., logarithm, square root.
Ad Fuller Test: To check for stationarity.

Detrending:

Remove trends from data to make it more stationary. Common methods include differencing and regression.

Normalization/Standardization:

Min-Max Scaling: Transforms data to range [0, 1].
Z-score Normalization: Mean of 0 and standard deviation of 1.

Feature Engineering:

Lagged Features: Use previous time steps as features.
Rolling Window Statistics: E.g., rolling mean, rolling standard deviation.
Domain-specific Features: Extracted based on domain knowledge.

Handling Unevenly Spaced Observations:

Resampling: Change the frequency of the data (e.g., from daily to monthly).
Aggregating: Summarizing data over a specific interval.

Encoding Cyclical Features:

For time-based features like hour of the day, day of the week, or month of the year, use sin/cos transformations to encode them so the model captures the cyclicity.

Temporal Split:

When splitting data into training and test sets, always ensure it's done temporally. This means that the future is never used to predict the past.

Removing Noise:

Smoothing: Techniques like moving average can help reduce noise.
Wavelet Denoising: Useful for certain types of data.

Characteristics

Trend: Over longer periods, stocks and indices may exhibit upward (bull market) or downward (bear market) trends.
Seasonality: Certain stocks or sectors might exhibit recurring patterns or cycles.
Volatility: Stock prices can be volatile.
Correlation: The movement of individual stocks in relation to the synthetic index.
Noise: Random fluctuations in stock prices.
Unexpected Events: Sudden events affecting stock prices.
Feedback Loops: Reinforcing price movements.
Liquidity: Variation in stock liquidity.
Earning Reports & Dividends: Influence on stock prices.
Economic Indicators: Broader economic factors affecting stock prices.
Sector-Specific Movements: Unified sectoral price movements.
Lagged Reactions: Delayed stock or market reactions.

Correlations:

Strong Positive Correlations (close to 1):

reference_price has a strong positive correlation with matched_size, bid_price, ask_price, and vwap.
bid_price and bid_size show a strong positive correlation.
ask_price and ask_size are also positively correlated.
near_price and far_price have a strong positive correlation.

Strong Negative Correlations (close to -1):

reference_price and ask_size exhibit a strong negative correlation.

Moderate Correlations:

imbalance_size has a moderate positive correlation with variables like imbalance_buy_sell_flag and near_price.

Weak Correlations:

Variables like stock_id, seconds_in_bucket, date_id, and time_id show very weak correlations with most other variables.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
DurbinAudio		DurbinAudio
helpers		helpers
images		images
kaggle/input/optiver-trading-at-the-close		kaggle/input/optiver-trading-at-the-close
reference_notebooks		reference_notebooks
.gitignore		.gitignore
CG_Preprocess_TCN_COMPLETE.ipynb		CG_Preprocess_TCN_COMPLETE.ipynb
CG_tcn_iterative_spline.ipynb		CG_tcn_iterative_spline.ipynb
README.md		README.md
download-data.sh		download-data.sh
durbin-notebook-lgbm.ipynb		durbin-notebook-lgbm.ipynb
durbin-notebook-rnn.ipynb		durbin-notebook-rnn.ipynb
en742_final_project_scratch.py		en742_final_project_scratch.py
example_temporal_fusion_transformer.ipynb		example_temporal_fusion_transformer.ipynb
initial-test-for-submission.ipynb		initial-test-for-submission.ipynb
lagged-features.ipynb		lagged-features.ipynb
main-notebook-no-model.ipynb		main-notebook-no-model.ipynb
main-notebook.ipynb		main-notebook.ipynb
preprocessing-notebook.ipynb		preprocessing-notebook.ipynb
requirements.txt		requirements.txt
testing-segments.ipynb		testing-segments.ipynb
tuned-lightgbm-best-public.ipynb		tuned-lightgbm-best-public.ipynb
vaisnor-notebook.ipynb		vaisnor-notebook.ipynb
vaisnor-xgb-copy.ipynb		vaisnor-xgb-copy.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optiver - Trading at the Close | Kaggle

Links:

Repository Structure:

Ignored data files:

Missing Value Handling:

Outliers Detection and Treatment:

Decomposition:

Stationarity:

Detrending:

Normalization/Standardization:

Feature Engineering:

Handling Unevenly Spaced Observations:

Encoding Cyclical Features:

Temporal Split:

Removing Noise:

Characteristics

Correlations:

Strong Positive Correlations (close to 1):

Strong Negative Correlations (close to -1):

Moderate Correlations:

Weak Correlations:

Target CSV

About

Contributors 4

Languages

cvaisnor/optiver_kaggle_comp

Folders and files

Latest commit

History

Repository files navigation

Optiver - Trading at the Close | Kaggle

Links:

Repository Structure:

Ignored data files:

Missing Value Handling:

Outliers Detection and Treatment:

Decomposition:

Stationarity:

Detrending:

Normalization/Standardization:

Feature Engineering:

Handling Unevenly Spaced Observations:

Encoding Cyclical Features:

Temporal Split:

Removing Noise:

Characteristics

Correlations:

Strong Positive Correlations (close to 1):

Strong Negative Correlations (close to -1):

Moderate Correlations:

Weak Correlations:

Target CSV

About

Resources

Stars

Watchers

Forks

Contributors 4

Languages