This repository has chapters, code, and organizational materials for the book "How to be a data scientist impostor?"
The book is not finished yet -- it is a work in progress...
The purpose of this book is to give an overview and examples of different philosophical and mathematical methodologies and software programming techniques that would allow the reader to practice Data Science almost as successfully as seasoned practitioners, who have solid backgrounds in Statistics or Machine Learning. (Or better than them.)
The programming languages used are Wolfram Language (WL) and R.
Almost all code is available in both WL and R.
(WL is the primary language. Also, "Wolfram Language" and "Mathematica" are used as synonyms in this book.)
-
We start with Data Science market diagnosis and general strategies for problem solving.
- Here we also discuss what kind of people we are going to collaborate with, argue with, be examined by, be hired by.
-
Then we proceed with didactic chapters for:
-
doing data analysis, and
-
explanations of fundamental Machine Learning (ML) algorithms.
-
-
Then we give practical know-how for tackling certain ML problems. Variations of those problems often occur in "real life."
-
Finally, we show some "shock and awe" projects.
See this mind-map or this org-mode file for a more detailed order of book's parts and chapters.
Generally speaking, I am very interested in comparisons of the abilities of theories, methodologies, programming languages, algorithms, and concrete implementations to solve problems encountered in practice. This book presents a fair amount of such comparisons.
(And yes that is used to compare WL/Mathematica and R.)
This book is for the smart and audacious. (Definitely not for dummies…)
The reader is expected to have at least one fairly well developed relevant skill. Like the following.
- Programming ability.
- Mathematical maturity and reasoning abilities.
- Mathematical modeling abilities.
- Ability to express processes through equations and formulas.
- Systems operations knowledge.
- Strong Physics or Physical Sciences engineering background.
- Like Mechanical Engineering, Electrical Engineering, Chemical Engineering …
- (Software Engineering does not count here.)
We assume the reader is inquisitive and willing to jump into the water without knowing how to swim.
Most of the practical ML know-how projects are projects from MathematicaVsR at GitHub.
Many of the chapters were previously published in MathematicaForPrediction at WordPress or MathematicaForPrediction at GitHub.
This book outsources the detailed explanations of the core Machine Learning workflows to the book "Simplified Machine Learning Workflows", which from its part outsources the software architecture methods explanations to the book "Software Design Methods with Wolfram Language".
Here is a diagram that shows the dependencies between the books and code repositories:
Below is given a list of some videos with presentations of mine that discuss some of the topics in this book. The videos more relevant to this book are put on top.
-
"Exemplifying the Cultural Differences between Machine Learning and Statistics", (2016),
-
"Quantile Regression - Theory, Implementations, and Applications", (2014),
-
"Quantile Regression workflows", (2019),
-
"Using Keras in R", (2018),
-
"A Conversational Agent for Neural Networks: Construction, Training and Utilization", (2018).
Anton Antonov
Windermere, Florida, USA
2019-07-15