Machine Learning and Statistical Linear Regression Analysis

Description

This project is more of a study to understand the relationship between simple statistics and machine learning. The main goal is to implement a simple linear regression model using the least squares method and compare the results using PyTorch and its MSESquaredLoss function in an attempt to find similar lines of best fit.

Project/Study Details

Data

The project has a data.py file that generates synthetic data based on a given slope and intercept. It creates a dataset composed of explanatory and response variables.

You can run the data.py file to see the generated data. The data is plotted using matplotlib, of course.

train and inference scripts

These scripts contain the code to train (train.py) a simple linear regression model based on the synthetic data, and the inference (inference.py) script to predict the response variable based on the explanatory variable.

As an example, sugar content is the explanatory variable, and the calories are the response variable. The model will predict the calories based on the sugar content. In reality though, our data is meaningless.

Training loss: MSESquaredLoss

Results

When running the inference script after we have trained our model, we get the output:

Random seed: 42
Number of parameters: 2

Model's Predicted Line:
Predicted Line: y = 1.8597915172576904x + 1.0682505369186401
Slope: 1.8597915172576904, Intercept: 1.0682505369186401

Statistical Predicted Line (Least squares method):
Predicted Line: y = 1.8614728450775146x + 1.0692236423492432
Slope: 1.8614728450775146, Intercept: 1.0692236423492432

Immediately, we can see that the results are very close between the statistical linear regression model and the machine learning model.

Of course, the results will vary based on epochs, learning rate, and other hyperparameters, even those that affect the synthetic data generated.

During inference, the results are plotted using matplotlib and displayed. As an example with the seed set to 42:

Machine Learning model line:

Statistical linear regression line:

Then, we can also plot a graph displaying the machine learning model's predicted values alongside the actual data values:

We can also gather the point-slope form of both our statistical linear regression model and the machine learning model.

Using MSESquared loss, the result is the same as the statistical linear regression model, which is expected, as the statistic model is based on the least squares method, which minimizes the squares of the residuals.

Conclusion

This study was a fun one. Simple statistics and machine learning can be easily related, and this is a good way to learn such concepts. Machine learning can be used for more complex problems, but it is always good to understand the basics.

Machine learning can be easily implemented using PyTorch to create multi-variable (more than one dependent/independent variable) linear regression models, making it more powerful than simple statistics in a lot of cases.

Note

Although there is a config file for this project, it still has not been implemented in the code completely. The config file will be used to set hyperparameters and other variables.

ALSO, for those looking at past commits, this repo was originally made to test out different machine learning and statistics relationships. However, I changed the scope of the repo to be focused solely on the relationship of linear regression between machine learning and statistic models. So yes, I am aware that past commits may have file names that don't apply directly to the current scope of the project.

Useful resources related to the topic:

Project Future

This project will probably be maintained lightly in the future from here on out. Making the code clearer and more concise is a goal, as well as implementing the config file to set hyperparameters and other variables. Perhaps, putting the code in jupyter notebooks for better visualization and understanding of the code is another idea that can be implemented in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
models		models
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config_parser.py		config_parser.py
proj_config.json		proj_config.json
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning and Statistical Linear Regression Analysis

Description

Project/Study Details

Data

train and inference scripts

Results

Conclusion

About

Releases

Languages

License

PatzEdi/ml_stat_linear_regression

Folders and files

Latest commit

History

Repository files navigation

Machine Learning and Statistical Linear Regression Analysis

Description

Project/Study Details

Data

train and inference scripts

Results

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Languages