The task is a basic analysis of the classic dataset Boston house prices
.
- understand the dataset and problem associated with it
- examine the tools which help us describe and visualize the data
- make some artifacts - report of our findings, plots and code
- be prepared for the next dataset with the right questions and code
Inspired by the Applied Machine Learning Process book.
- Problem definition
- What is the problem?
- Informal description
- Formal description
- Assumptions
- Similar problems
- Description of provided data
- Why does the problem need to be solved?
- Motivation
- Benefits
- Use
- How would I solve the problem (manually)?
- What is the problem?
- Data analysis
- Summarize data
- Data structure
- Data distributions
- Correlations with target variable
- Correlations among attributes (redundancy)
- Visualize data
- Attribute histograms
- Pairwise scatter-plots
- Summarize data
See:
- problem_definition.md
- data_analysis.md
- data_analysis.py