Skip to content

Latest commit

 

History

History

boston_dataset_exploration

Dataset exploration: Boston house prices

The task is a basic analysis of the classic dataset Boston house prices.

Goals

  • understand the dataset and problem associated with it
  • examine the tools which help us describe and visualize the data
  • make some artifacts - report of our findings, plots and code
  • be prepared for the next dataset with the right questions and code

Questions

Inspired by the Applied Machine Learning Process book.

  • Problem definition
    • What is the problem?
      • Informal description
      • Formal description
      • Assumptions
      • Similar problems
      • Description of provided data
    • Why does the problem need to be solved?
      • Motivation
      • Benefits
      • Use
    • How would I solve the problem (manually)?
  • Data analysis
    • Summarize data
      • Data structure
      • Data distributions
        • Correlations with target variable
        • Correlations among attributes (redundancy)
    • Visualize data
      • Attribute histograms
      • Pairwise scatter-plots

Solution

See:

  • problem_definition.md
  • data_analysis.md
  • data_analysis.py