Food-amenities-ordered-quantity-predictions

A comprehensive repository containing the step by step approach to increasing the predictive accuracy of ordered quantities.

Business Problem

We need to accurately forecast the quantity of different SKUs (food amenities) that will be ordered by customers in the future based on the past data.

Approach 1

Data Definition

Data variables and definition

$ AvgSP - Average Selling price time series

$ Wholesale - Wholesale price time series

$ RetailPrice - Retail Price time series

$ FinalGRN - Aggregated cost price time series

$ TotalGTOrders - Time series of the total customers across all SKUs

Time period considered

$ Train data - Mar 9th, 2017 - May 7th, 2017

$ Test data - May 8th, 2017 - May 19th, 2017

Derived variables considered - Ordered Quantity for the SKU Carrot (local)

Data Understanding and Processing

Dealing with outliers

$ Heavy outliers were spotted in the Ordered Quantity of Carrot (local).

$ The values below 250 were converted to 250 and the values above 900 were converted to 900 for easing the model build up and testing

Summary statistics
Training and Test Dataset

$ Train - 9th Mar, 2017 - 7th May, 2017 $ Test - 8th May, 2017 - 19th May, 2017

Seasonal Effect

$ Seasonal effect is very clear from the visualizations

$ It is scaled down performing 1st degree differencing on the data

Functions to create data input to model

$ Input required: AvgSP, Wholesale, RetailPrice, FinalGRN, TotalGTOrders

$ The training data is divided into 5 different time series for every input variable.

$ ARIMA is used to forecast the test values for all the inputs based on the training data time series.

$ Note: The forecasting is done on logarithmic scale

$ Accuracy metric used - rmse; Obtained rmse values (original scale) - AvgSP: 2.3, Wholesale: 4.3, RetailPrice: 4.2, FinalGRN: 3.86, TotalGTOrders: 15.4

Data Modelling

Model name

$ Gradient Boosting Regressor

$ It is an ensemble model which initially performs normal regression (using 'n_estimators' number of regression trees).

$ Then it improves the model by regressing over the errors and adding an extra variable (error term) to the initial regression equation.

Model Accuracy on training and test dataset

$ Accuracy metric - RMSE

$ Training data - 0.009

$ Test data - 235.6

$ The model clearly overfitted the training data. The reason being heavy multicollinearity. Principal Component Analysis or other feature decomposition techniques needed.

Comparison study of model

$ At this position, the model performs poorly as compared to the existing technique based on Seasonal Naiive Bayes method. The hidden patterns haven't been completely detected and processed.

How model will take care of customer addition input?

$ At this point, the model hasn't been calibrated to include customer addition input.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Gradient Boosting Regressor.py		Gradient Boosting Regressor.py
Input Quantity Prediction (ARIMA).py		Input Quantity Prediction (ARIMA).py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Food-amenities-ordered-quantity-predictions

About

Releases

Packages

Languages

Ankushr785/Food-amenities-ordered-quantity-predictions

Folders and files

Latest commit

History

Repository files navigation

Food-amenities-ordered-quantity-predictions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages