This project aims to forecast sales for thousands of product families sold at Favorita stores using historical data. The analysis is performed using XGBoost, a powerful machine learning algorithm. Through feature engineering and model tuning, the project provides both short-term and long-term sales predictions, with a focus on generating a 30-day forecast.
-
Data Collection and Preprocessing:
- Used historical sales data provided by Favorita stores.
- Processed the data to handle missing values and outliers.
- Performed feature engineering to create lag features and rolling statistics.
- Applied one-hot encoding for categorical variables and scaling for numerical variables.
-
Exploratory Data Analysis (EDA):
- Visualized sales trends over time.
- Analyzed seasonal effects, promotions, and other external factors influencing sales.
-
Feature Engineering:
- Created lag features and rolling windows to capture historical patterns.
- Engineered calendar-related features such as day of the week, month, and promotions.
-
Model Building:
- Implemented XGBoost for time series forecasting.
- Fine-tuned the model using RMSE and RMSLE as evaluation metrics.
- Generated a 30-day sales forecast.
-
Model Evaluation:
- Used RMSE and RMSLE to assess the model’s accuracy.
- Compared the model’s predictions against actual sales data for validation.
- XGBoost Model: Applied an advanced machine learning approach for time series forecasting.
- Feature Engineering: Integrated lag and rolling statistics to enhance forecast accuracy.
- Data Visualization: Plots and charts to illustrate key trends and model performance.
- 30-Day Sales Forecast: Focused on predicting future sales, with results analyzed for potential improvements.
- Current Challenge: The 30-day forecast results show a linear pattern, lacking the nuanced shape of actual sales data.
- Future Work: Exploring advanced techniques such as model stacking, ensemble methods, and deeper time-series models (e.g., ARIMA, LSTM) to improve forecast accuracy.
- Achieved a RMSLE of 0.409.
- Significant improvement in prediction accuracy through feature engineering.
You can also explore the project on Kaggle: https://www.kaggle.com/code/walidkw/store-sales-ts-forecasting-xgboost