From 502aefc0c56c15be7f4de7bb2d649a16a4285d53 Mon Sep 17 00:00:00 2001 From: elysian Date: Wed, 28 Feb 2024 19:17:17 -0500 Subject: [PATCH 1/2] Replaced Boston dataset with California --- ...EY_Practice21_Basic_Stats_I_Averages.ipynb | 2395 +++-------------- 1 file changed, 401 insertions(+), 1994 deletions(-) diff --git a/Practices/_Keys/KEY_Practice21_Basic_Stats_I_Averages.ipynb b/Practices/_Keys/KEY_Practice21_Basic_Stats_I_Averages.ipynb index 4a96080..b7dd175 100644 --- a/Practices/_Keys/KEY_Practice21_Basic_Stats_I_Averages.ipynb +++ b/Practices/_Keys/KEY_Practice21_Basic_Stats_I_Averages.ipynb @@ -11,12 +11,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For this practice, let's use the Boston dataset." + "For this practice, let's use the California dataset." ] }, { "cell_type": "code", - "execution_count": 118, + "execution_count": 1, "metadata": { "collapsed": true }, @@ -28,41 +28,41 @@ }, { "cell_type": "code", - "execution_count": 119, + "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ - "# Import the load_boston method \n", - "from sklearn.datasets import load_boston" + "# Import the fetch_california_housing method to load the California data later on\n", + "from sklearn.datasets import fetch_california_housing" ] }, { "cell_type": "code", - "execution_count": 120, + "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ - "# Import pandas, so that we can work with the data frame version of the Boston data\n", + "# Import pandas, so that we can work with the data frame version of the California data\n", "import pandas as pd" ] }, { "cell_type": "code", - "execution_count": 121, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ - "# Load the Boston data\n", - "boston = load_boston()" + "# Load the California data\n", + "california = fetch_california_housing()" ] }, { "cell_type": "code", - "execution_count": 122, + "execution_count": 5, "metadata": { "scrolled": true }, @@ -71,631 +71,158 @@ "name": "stdout", "output_type": "stream", "text": [ - "Boston House Prices dataset\n", - "===========================\n", + ".. _california_housing_dataset:\n", "\n", - "Notes\n", - "------\n", - "Data Set Characteristics: \n", + "California Housing dataset\n", + "--------------------------\n", "\n", - " :Number of Instances: 506 \n", + "**Data Set Characteristics:**\n", "\n", - " :Number of Attributes: 13 numeric/categorical predictive\n", - " \n", - " :Median Value (attribute 14) is usually the target\n", + ":Number of Instances: 20640\n", "\n", - " :Attribute Information (in order):\n", - " - CRIM per capita crime rate by town\n", - " - ZN proportion of residential land zoned for lots over 25,000 sq.ft.\n", - " - INDUS proportion of non-retail business acres per town\n", - " - CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n", - " - NOX nitric oxides concentration (parts per 10 million)\n", - " - RM average number of rooms per dwelling\n", - " - AGE proportion of owner-occupied units built prior to 1940\n", - " - DIS weighted distances to five Boston employment centres\n", - " - RAD index of accessibility to radial highways\n", - " - TAX full-value property-tax rate per $10,000\n", - " - PTRATIO pupil-teacher ratio by town\n", - " - B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n", - " - LSTAT % lower status of the population\n", - " - MEDV Median value of owner-occupied homes in $1000's\n", + ":Number of Attributes: 8 numeric, predictive attributes and the target\n", "\n", - " :Missing Attribute Values: None\n", + ":Attribute Information:\n", + " - MedInc median income in block group\n", + " - HouseAge median house age in block group\n", + " - AveRooms average number of rooms per household\n", + " - AveBedrms average number of bedrooms per household\n", + " - Population block group population\n", + " - AveOccup average number of household members\n", + " - Latitude block group latitude\n", + " - Longitude block group longitude\n", "\n", - " :Creator: Harrison, D. and Rubinfeld, D.L.\n", + ":Missing Attribute Values: None\n", "\n", - "This is a copy of UCI ML housing dataset.\n", - "http://archive.ics.uci.edu/ml/datasets/Housing\n", + "This dataset was obtained from the StatLib repository.\n", + "https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html\n", "\n", + "The target variable is the median house value for California districts,\n", + "expressed in hundreds of thousands of dollars ($100,000).\n", "\n", - "This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n", + "This dataset was derived from the 1990 U.S. census, using one row per census\n", + "block group. A block group is the smallest geographical unit for which the U.S.\n", + "Census Bureau publishes sample data (a block group typically has a population\n", + "of 600 to 3,000 people).\n", "\n", - "The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic\n", - "prices and the demand for clean air', J. Environ. Economics & Management,\n", - "vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n", - "...', Wiley, 1980. N.B. Various transformations are used in the table on\n", - "pages 244-261 of the latter.\n", + "A household is a group of people residing within a home. Since the average\n", + "number of rooms and bedrooms in this dataset are provided per household, these\n", + "columns may take surprisingly large values for block groups with few households\n", + "and many empty houses, such as vacation resorts.\n", "\n", - "The Boston house-price data has been used in many machine learning papers that address regression\n", - "problems. \n", - " \n", - "**References**\n", + "It can be downloaded/loaded using the\n", + ":func:`sklearn.datasets.fetch_california_housing` function.\n", "\n", - " - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.\n", - " - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n", - " - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)\n", + ".. topic:: References\n", + "\n", + " - Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,\n", + " Statistics and Probability Letters, 33 (1997) 291-297\n", "\n" ] } ], "source": [ - "# This will provide the characteristics for the Boston dataset\n", - "print(boston.DESCR)" - ] - }, - { - "cell_type": "code", - "execution_count": 123, - "metadata": {}, - "outputs": [], - "source": [ - "# Here, I'm including the prices of Boston's houses, which is boston['target'], as a column with the other \n", - "# features in the Boston dataset.\n", - "boston_data = np.concatenate((boston['data'], pd.DataFrame(boston['target'])), axis = 1)" + "# This will provide the characteristics for the California dataset\n", + "print(california.DESCR)" ] }, { "cell_type": "code", - "execution_count": 124, + "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", - "\n", "\n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", @@ -708,778 +235,171 @@ " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDVMedIncHouseAgeAveRoomsAveBedrmsPopulationAveOccupLatitudeLongitudeHouseValue
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.9824.08.325241.06.9841271.023810322.02.55555637.88-122.234.526
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.1421.68.301421.06.2381370.9718802401.02.10984237.86-122.223.585
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.0334.77.257452.08.2881361.073446496.02.80226037.85-122.243.521
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.9433.45.643152.05.8173521.073059558.02.54794537.85-122.253.413
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.3336.2
50.029850.02.180.00.4586.43058.76.06223.0222.018.7394.125.2128.7
60.0882912.57.870.00.5246.01266.65.56055.0311.015.2395.6012.4322.9
70.1445512.57.870.00.5246.17296.15.95055.0311.015.2396.9019.1527.1
80.2112412.57.870.00.5245.631100.06.08215.0311.015.2386.6329.9316.5
90.1700412.57.870.00.5246.00485.96.59215.0311.015.2386.7117.1018.9
100.2248912.57.870.00.5246.37794.36.34675.0311.015.2392.5220.4515.0
110.1174712.57.870.00.5246.00982.96.22675.0311.015.2396.9013.2718.9
120.0937812.57.870.00.5245.88939.05.45095.0311.015.2390.5015.7121.7
130.629760.08.140.00.5385.94961.84.70754.0307.021.0396.908.2620.4
140.637960.08.140.00.5386.09684.54.46194.0307.021.0380.0210.2618.2
150.627390.08.140.00.5385.83456.54.49864.0307.021.0395.628.4719.9
161.053930.08.140.00.5385.93529.34.49864.0307.021.0386.856.5823.1
170.784200.08.140.00.5385.99081.74.25794.0307.021.0386.7514.6717.5
180.802710.08.140.00.5385.45636.63.79654.0307.021.0288.9911.6920.2
190.725800.08.140.00.5385.72769.53.79654.0307.021.0390.9511.2818.2
201.251790.08.140.00.5385.57098.13.79794.0307.021.0376.5721.0213.6
210.852040.08.140.00.5385.96589.24.01234.0307.021.0392.5313.8319.6
221.232470.08.140.00.5386.14291.73.97694.0307.021.0396.9018.7215.2
230.988430.08.140.00.5385.813100.04.09524.0307.021.0394.5419.8814.5
240.750260.08.140.00.5385.92494.14.39964.0307.021.0394.3316.3015.6
250.840540.08.140.00.5385.59985.74.45464.0307.021.0303.4216.5113.9
260.671910.08.140.00.5385.81390.34.68204.0307.021.0376.8814.8116.6
270.955770.08.140.00.5386.04788.84.45344.0307.021.0306.3817.2814.8
280.772990.08.140.00.5386.49594.44.45474.0307.021.0387.9412.8018.4
291.002450.08.140.00.5386.67487.34.23904.0307.021.0380.2311.9821.03.846252.06.2818531.081081565.02.18146737.85-122.253.422
...........................
4764.871410.018.100.00.6146.48493.62.305324.0666.020.2396.2118.6816.7
47715.023400.018.100.00.6145.30497.32.100724.0666.020.2349.4824.9112.0
47810.233000.018.100.00.6146.18596.72.170524.0666.020.2379.7018.0314.6
47914.333700.018.100.00.6146.22988.01.951224.0666.020.2383.3213.1121.4
4805.824010.018.100.00.5326.24264.73.424224.0666.020.2396.9010.7423.0
4815.708180.018.100.00.5326.75074.93.331724.0666.020.2393.077.7423.7
4825.731160.018.100.00.5327.06177.03.410624.0666.020.2395.287.01206351.560325.05.0454551.133333845.02.56060639.48-121.090.781
4832.818380.018.100.00.5325.76240.34.098324.0666.020.2392.9210.4221.8
4842.378570.018.100.00.5835.87141.93.724024.0666.020.2370.7313.3420.6
4853.673670.018.100.00.5836.31251.93.991724.0666.020.2388.6210.5821.2
4865.691750.018.100.00.5836.11479.83.545924.0666.020.2392.6814.9819.1
4874.835670.018.100.00.5835.90553.23.152324.0666.020.2388.2211.4520.6
4880.150860.027.740.00.6095.45492.71.82094.0711.020.1395.0918.0615.2
4890.183370.027.740.00.6095.41498.31.75544.0711.020.1344.0523.977.0
4900.207460.027.740.00.6095.09398.01.82264.0711.020.1318.4329.688.1
4910.105740.027.740.00.6095.98398.81.86814.0711.020.1390.1118.0713.6
4920.111320.027.740.00.6095.98383.52.10994.0711.020.1396.9013.3520.1
4930.173310.09.690.00.5855.70754.02.38176.0391.019.2396.9012.0121.8
4940.279570.09.690.00.5855.92642.62.38176.0391.019.2396.9013.5924.5
4950.178990.09.690.00.5855.67028.82.79866.0391.019.2393.2917.6023.1
4960.289600.09.690.00.5855.39072.92.79866.0391.019.2396.9021.1419.7
4970.268380.09.690.00.5855.79470.62.89276.0391.019.2396.9014.1018.3
4980.239120.09.690.00.5856.01965.32.40916.0391.019.2396.9012.9221.2
4990.177830.09.690.00.5855.56973.52.39996.0391.019.2395.7715.1017.5
5000.224380.09.690.00.5856.02779.72.49826.0391.019.2396.9014.3316.8
5010.062630.011.930.00.5736.59369.12.47861.0273.021.0391.999.6722.4
5020.045270.011.930.00.5736.12076.72.28751.0273.021.0396.909.0820.6
5030.060760.011.930.00.5736.97691.02.16751.0273.021.0396.905.6423.9206362.556818.06.1140351.315789356.03.12280739.49-121.210.771
5040.109590.011.930.00.5736.79489.32.38891.0273.021.0393.456.4822.0206371.700017.05.2055431.1200921007.02.32563539.43-121.220.923
5050.047410.011.930.00.5736.03080.82.50501.0273.021.0396.907.8811.9206381.867218.05.3295131.171920741.02.12320939.43-121.320.847
206392.388616.05.2547171.1622641387.02.61698139.37-121.240.894
\n", - "

506 rows × 14 columns

\n", + "

20640 rows × 9 columns

\n", "
" ], "text/plain": [ - " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\n", - "0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 \n", - "1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 \n", - "2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 \n", - "3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 \n", - "4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 \n", - "5 0.02985 0.0 2.18 0.0 0.458 6.430 58.7 6.0622 3.0 222.0 \n", - "6 0.08829 12.5 7.87 0.0 0.524 6.012 66.6 5.5605 5.0 311.0 \n", - "7 0.14455 12.5 7.87 0.0 0.524 6.172 96.1 5.9505 5.0 311.0 \n", - "8 0.21124 12.5 7.87 0.0 0.524 5.631 100.0 6.0821 5.0 311.0 \n", - "9 0.17004 12.5 7.87 0.0 0.524 6.004 85.9 6.5921 5.0 311.0 \n", - "10 0.22489 12.5 7.87 0.0 0.524 6.377 94.3 6.3467 5.0 311.0 \n", - "11 0.11747 12.5 7.87 0.0 0.524 6.009 82.9 6.2267 5.0 311.0 \n", - "12 0.09378 12.5 7.87 0.0 0.524 5.889 39.0 5.4509 5.0 311.0 \n", - "13 0.62976 0.0 8.14 0.0 0.538 5.949 61.8 4.7075 4.0 307.0 \n", - "14 0.63796 0.0 8.14 0.0 0.538 6.096 84.5 4.4619 4.0 307.0 \n", - "15 0.62739 0.0 8.14 0.0 0.538 5.834 56.5 4.4986 4.0 307.0 \n", - "16 1.05393 0.0 8.14 0.0 0.538 5.935 29.3 4.4986 4.0 307.0 \n", - "17 0.78420 0.0 8.14 0.0 0.538 5.990 81.7 4.2579 4.0 307.0 \n", - "18 0.80271 0.0 8.14 0.0 0.538 5.456 36.6 3.7965 4.0 307.0 \n", - "19 0.72580 0.0 8.14 0.0 0.538 5.727 69.5 3.7965 4.0 307.0 \n", - "20 1.25179 0.0 8.14 0.0 0.538 5.570 98.1 3.7979 4.0 307.0 \n", - "21 0.85204 0.0 8.14 0.0 0.538 5.965 89.2 4.0123 4.0 307.0 \n", - "22 1.23247 0.0 8.14 0.0 0.538 6.142 91.7 3.9769 4.0 307.0 \n", - "23 0.98843 0.0 8.14 0.0 0.538 5.813 100.0 4.0952 4.0 307.0 \n", - "24 0.75026 0.0 8.14 0.0 0.538 5.924 94.1 4.3996 4.0 307.0 \n", - "25 0.84054 0.0 8.14 0.0 0.538 5.599 85.7 4.4546 4.0 307.0 \n", - "26 0.67191 0.0 8.14 0.0 0.538 5.813 90.3 4.6820 4.0 307.0 \n", - "27 0.95577 0.0 8.14 0.0 0.538 6.047 88.8 4.4534 4.0 307.0 \n", - "28 0.77299 0.0 8.14 0.0 0.538 6.495 94.4 4.4547 4.0 307.0 \n", - "29 1.00245 0.0 8.14 0.0 0.538 6.674 87.3 4.2390 4.0 307.0 \n", - ".. ... ... ... ... ... ... ... ... ... ... \n", - "476 4.87141 0.0 18.10 0.0 0.614 6.484 93.6 2.3053 24.0 666.0 \n", - "477 15.02340 0.0 18.10 0.0 0.614 5.304 97.3 2.1007 24.0 666.0 \n", - "478 10.23300 0.0 18.10 0.0 0.614 6.185 96.7 2.1705 24.0 666.0 \n", - "479 14.33370 0.0 18.10 0.0 0.614 6.229 88.0 1.9512 24.0 666.0 \n", - "480 5.82401 0.0 18.10 0.0 0.532 6.242 64.7 3.4242 24.0 666.0 \n", - "481 5.70818 0.0 18.10 0.0 0.532 6.750 74.9 3.3317 24.0 666.0 \n", - "482 5.73116 0.0 18.10 0.0 0.532 7.061 77.0 3.4106 24.0 666.0 \n", - "483 2.81838 0.0 18.10 0.0 0.532 5.762 40.3 4.0983 24.0 666.0 \n", - "484 2.37857 0.0 18.10 0.0 0.583 5.871 41.9 3.7240 24.0 666.0 \n", - "485 3.67367 0.0 18.10 0.0 0.583 6.312 51.9 3.9917 24.0 666.0 \n", - "486 5.69175 0.0 18.10 0.0 0.583 6.114 79.8 3.5459 24.0 666.0 \n", - "487 4.83567 0.0 18.10 0.0 0.583 5.905 53.2 3.1523 24.0 666.0 \n", - "488 0.15086 0.0 27.74 0.0 0.609 5.454 92.7 1.8209 4.0 711.0 \n", - "489 0.18337 0.0 27.74 0.0 0.609 5.414 98.3 1.7554 4.0 711.0 \n", - "490 0.20746 0.0 27.74 0.0 0.609 5.093 98.0 1.8226 4.0 711.0 \n", - "491 0.10574 0.0 27.74 0.0 0.609 5.983 98.8 1.8681 4.0 711.0 \n", - "492 0.11132 0.0 27.74 0.0 0.609 5.983 83.5 2.1099 4.0 711.0 \n", - "493 0.17331 0.0 9.69 0.0 0.585 5.707 54.0 2.3817 6.0 391.0 \n", - "494 0.27957 0.0 9.69 0.0 0.585 5.926 42.6 2.3817 6.0 391.0 \n", - "495 0.17899 0.0 9.69 0.0 0.585 5.670 28.8 2.7986 6.0 391.0 \n", - "496 0.28960 0.0 9.69 0.0 0.585 5.390 72.9 2.7986 6.0 391.0 \n", - "497 0.26838 0.0 9.69 0.0 0.585 5.794 70.6 2.8927 6.0 391.0 \n", - "498 0.23912 0.0 9.69 0.0 0.585 6.019 65.3 2.4091 6.0 391.0 \n", - "499 0.17783 0.0 9.69 0.0 0.585 5.569 73.5 2.3999 6.0 391.0 \n", - "500 0.22438 0.0 9.69 0.0 0.585 6.027 79.7 2.4982 6.0 391.0 \n", - "501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1.0 273.0 \n", - "502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1.0 273.0 \n", - "503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1.0 273.0 \n", - "504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1.0 273.0 \n", - "505 0.04741 0.0 11.93 0.0 0.573 6.030 80.8 2.5050 1.0 273.0 \n", + " MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \\\n", + "0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 \n", + "1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 \n", + "2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 \n", + "3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 \n", + "4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 \n", + "... ... ... ... ... ... ... ... \n", + "20635 1.5603 25.0 5.045455 1.133333 845.0 2.560606 39.48 \n", + "20636 2.5568 18.0 6.114035 1.315789 356.0 3.122807 39.49 \n", + "20637 1.7000 17.0 5.205543 1.120092 1007.0 2.325635 39.43 \n", + "20638 1.8672 18.0 5.329513 1.171920 741.0 2.123209 39.43 \n", + "20639 2.3886 16.0 5.254717 1.162264 1387.0 2.616981 39.37 \n", "\n", - " PTRATIO B LSTAT MEDV \n", - "0 15.3 396.90 4.98 24.0 \n", - "1 17.8 396.90 9.14 21.6 \n", - "2 17.8 392.83 4.03 34.7 \n", - "3 18.7 394.63 2.94 33.4 \n", - "4 18.7 396.90 5.33 36.2 \n", - "5 18.7 394.12 5.21 28.7 \n", - "6 15.2 395.60 12.43 22.9 \n", - "7 15.2 396.90 19.15 27.1 \n", - "8 15.2 386.63 29.93 16.5 \n", - "9 15.2 386.71 17.10 18.9 \n", - "10 15.2 392.52 20.45 15.0 \n", - "11 15.2 396.90 13.27 18.9 \n", - "12 15.2 390.50 15.71 21.7 \n", - "13 21.0 396.90 8.26 20.4 \n", - "14 21.0 380.02 10.26 18.2 \n", - "15 21.0 395.62 8.47 19.9 \n", - "16 21.0 386.85 6.58 23.1 \n", - "17 21.0 386.75 14.67 17.5 \n", - "18 21.0 288.99 11.69 20.2 \n", - "19 21.0 390.95 11.28 18.2 \n", - "20 21.0 376.57 21.02 13.6 \n", - "21 21.0 392.53 13.83 19.6 \n", - "22 21.0 396.90 18.72 15.2 \n", - "23 21.0 394.54 19.88 14.5 \n", - "24 21.0 394.33 16.30 15.6 \n", - "25 21.0 303.42 16.51 13.9 \n", - "26 21.0 376.88 14.81 16.6 \n", - "27 21.0 306.38 17.28 14.8 \n", - "28 21.0 387.94 12.80 18.4 \n", - "29 21.0 380.23 11.98 21.0 \n", - ".. ... ... ... ... \n", - "476 20.2 396.21 18.68 16.7 \n", - "477 20.2 349.48 24.91 12.0 \n", - "478 20.2 379.70 18.03 14.6 \n", - "479 20.2 383.32 13.11 21.4 \n", - "480 20.2 396.90 10.74 23.0 \n", - "481 20.2 393.07 7.74 23.7 \n", - "482 20.2 395.28 7.01 25.0 \n", - "483 20.2 392.92 10.42 21.8 \n", - "484 20.2 370.73 13.34 20.6 \n", - "485 20.2 388.62 10.58 21.2 \n", - "486 20.2 392.68 14.98 19.1 \n", - "487 20.2 388.22 11.45 20.6 \n", - "488 20.1 395.09 18.06 15.2 \n", - "489 20.1 344.05 23.97 7.0 \n", - "490 20.1 318.43 29.68 8.1 \n", - "491 20.1 390.11 18.07 13.6 \n", - "492 20.1 396.90 13.35 20.1 \n", - "493 19.2 396.90 12.01 21.8 \n", - "494 19.2 396.90 13.59 24.5 \n", - "495 19.2 393.29 17.60 23.1 \n", - "496 19.2 396.90 21.14 19.7 \n", - "497 19.2 396.90 14.10 18.3 \n", - "498 19.2 396.90 12.92 21.2 \n", - "499 19.2 395.77 15.10 17.5 \n", - "500 19.2 396.90 14.33 16.8 \n", - "501 21.0 391.99 9.67 22.4 \n", - "502 21.0 396.90 9.08 20.6 \n", - "503 21.0 396.90 5.64 23.9 \n", - "504 21.0 393.45 6.48 22.0 \n", - "505 21.0 396.90 7.88 11.9 \n", + " Longitude HouseValue \n", + "0 -122.23 4.526 \n", + "1 -122.22 3.585 \n", + "2 -122.24 3.521 \n", + "3 -122.25 3.413 \n", + "4 -122.25 3.422 \n", + "... ... ... \n", + "20635 -121.09 0.781 \n", + "20636 -121.21 0.771 \n", + "20637 -121.22 0.923 \n", + "20638 -121.32 0.847 \n", + "20639 -121.24 0.894 \n", "\n", - "[506 rows x 14 columns]" + "[20640 rows x 9 columns]" ] }, - "execution_count": 124, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Convert the Boston data to a data frame format, so that it's easier to view and process\n", - "boston_df = pd.DataFrame(boston_updated, columns = np.concatenate((boston['feature_names'], 'MEDV'), axis = None))\n", - "boston_df" + "# Convert the housing object to a data frame format, so that it's easier to view and process\n", + "california_df = pd.DataFrame(california['data'], columns = california['feature_names'])\n", + "# Here, I'm including the prices of California's houses, which is california['target'], \n", + "# as a column with the other features in the California dataset.\n", + "california_df['HouseValue'] = california['target']\n", + "california_df" ] }, { "cell_type": "code", - "execution_count": 125, + "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "CRIM 3.593761\n", - "ZN 11.363636\n", - "INDUS 11.136779\n", - "CHAS 0.069170\n", - "NOX 0.554695\n", - "RM 6.284634\n", - "AGE 68.574901\n", - "DIS 3.795043\n", - "RAD 9.549407\n", - "TAX 408.237154\n", - "PTRATIO 18.455534\n", - "B 356.674032\n", - "LSTAT 12.653063\n", - "MEDV 22.532806\n", + "MedInc 3.870671\n", + "HouseAge 28.639486\n", + "AveRooms 5.429000\n", + "AveBedrms 1.096675\n", + "Population 1425.476744\n", + "AveOccup 3.070655\n", + "Latitude 35.631861\n", + "Longitude -119.569704\n", + "HouseValue 2.068558\n", "dtype: float64\n" ] } ], "source": [ "# Determine the mean of each feature\n", - "averages_column = np.mean(boston_df, axis = 0)\n", + "averages_column = np.mean(california_df, axis = 0)\n", "print(averages_column)" ] }, { "cell_type": "code", - "execution_count": 126, + "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "0 59.635666\n", - "1 56.235315\n", - "2 55.298456\n", - "3 52.585755\n", - "4 53.731875\n", - "5 53.256432\n", - "6 61.520342\n", - "7 64.543646\n", - "8 64.077024\n", - "9 62.390724\n", - "10 63.379471\n", - "11 62.601226\n", - "12 59.316977\n", - "13 59.951733\n", - "14 60.346704\n", - "15 59.437714\n", - "16 56.999681\n", - "17 60.880721\n", - "18 50.586658\n", - "19 60.061236\n", - "20 61.470549\n", - "21 61.904810\n", - "22 62.467812\n", - "23 62.892474\n", - "24 62.291561\n", - "25 55.078724\n", - "26 60.745351\n", - "27 55.671012\n", - "28 61.852906\n", - "29 60.935961\n", - " ... \n", - "476 90.554622\n", - "477 88.216579\n", - "478 89.752321\n", - "479 89.804136\n", - "480 88.547301\n", - "481 88.859420\n", - "482 89.237483\n", - "483 86.210763\n", - "484 84.816184\n", - "485 86.797169\n", - "486 89.342475\n", - "487 86.874712\n", - "488 92.280340\n", - "489 88.865841\n", - "490 87.484433\n", - "491 92.284703\n", - "492 91.821659\n", - "493 65.674786\n", - "494 65.189448\n", - "495 64.136614\n", - "496 67.542371\n", - "497 66.809291\n", - "498 66.533016\n", - "499 66.892266\n", - "500 67.353899\n", - "501 57.842659\n", - "502 58.516841\n", - "503 59.581947\n", - "504 59.144678\n", - "505 58.111815\n", - "Length: 506, dtype: float64\n" + "0 33.562744\n", + "1 262.094029\n", + "2 54.061360\n", + "3 60.454940\n", + "4 61.045845\n", + " ... \n", + "20635 88.830077\n", + "20636 34.017826\n", + "20637 105.942697\n", + "20638 76.494316\n", + "20639 148.160729\n", + "Length: 20640, dtype: float64\n" ] } ], "source": [ "# Determine the mean of each row\n", - "averages_row = np.mean(boston_df, axis = 1)\n", + "averages_row = np.mean(california_df, axis = 1)\n", "print(averages_row)" ] }, @@ -1508,802 +428,289 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We will determine the average price for houses along the Charles River and that for houses NOT along the river." + "We will determine the average price for houses less than 20 years old and that for houses 20 years old or more." ] }, { "cell_type": "code", - "execution_count": 130, + "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", - "\n", "\n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", " \n", " \n", "
CRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDVMedIncHouseAgeAveRoomsAveBedrmsPopulationAveOccupLatitudeLongitudeHouseValue
1423.321050.019.581.00.87105.403100.01.32165.0403.014.7396.9026.8213.4
1521.126580.019.581.00.87105.01288.01.61025.0403.014.7343.2812.1215.3592.56252.02.7719300.75438694.01.64912337.82-122.290.600
1541.413850.019.581.00.87106.12996.01.74945.0403.014.7321.0215.12750.924117.02.8177681.052392762.01.73576337.81-122.281.775
771.111119.05.8309181.173913721.03.48309237.81-122.281.083
801.500017.03.1972321.000000609.02.10726637.81-122.281.625
870.760010.02.6515151.054545546.01.65454537.81-122.271.625
1553.535010.019.581.00.87106.15282.61.74555.0403.014.788.0115.0215.6
1601.273460.019.581.00.60506.25092.61.79845.0403.014.7338.925.5027.0
1621.833770.019.581.00.60507.80298.22.04075.0403.014.7389.611.9250.0
1631.519020.019.581.00.60508.37593.92.16205.0403.014.7388.453.3250.0
2080.135870.010.591.00.48906.06459.14.23924.0277.018.6381.3214.6624.4
2090.435710.010.591.00.48905.344100.03.87504.0277.018.6396.9023.0920.0
2100.174460.010.591.00.48905.96092.13.87714.0277.018.6393.2517.2721.7
2110.375780.010.591.00.48905.40488.63.66504.0277.018.6395.2423.9819.3
2120.217190.010.591.00.48905.80753.83.65264.0277.018.6390.9416.0322.4
2160.045600.013.891.00.55005.88856.03.11215.0276.016.4392.8013.5123.3
2180.110690.013.891.00.55005.95193.82.88935.0276.016.4396.9017.9221.5
2190.114250.013.891.00.55006.37392.43.36335.0276.016.4393.7410.5023.0
2200.358090.06.201.00.50706.95188.52.86178.0307.017.4391.709.7126.7
2210.407710.06.201.00.50706.16491.33.04808.0307.017.4395.2421.4621.7
2220.623560.06.201.00.50706.87977.73.27218.0307.017.4390.399.9327.5
2340.447910.06.201.00.50706.72666.53.65198.0307.017.4360.208.0529.0
2360.520580.06.201.00.50706.63176.54.14808.0307.017.4388.459.5425.1
2690.0906520.06.961.00.46405.92061.53.91753.0223.018.6391.3413.6520.7
2730.2218820.06.961.00.46407.69151.84.36653.0223.018.6390.776.5835.2
2740.0564440.06.411.00.44706.75832.94.07764.0254.017.6396.903.5332.4
2760.1046940.06.411.00.44707.26749.04.78724.0254.017.6389.256.0533.2
2770.0612740.06.411.00.44706.82627.64.86284.0254.017.6393.454.1633.1
2820.0612920.03.331.00.44297.64549.75.21195.0216.014.9377.073.0146.0
2830.0150190.01.211.00.40107.92324.85.88501.0198.013.6395.523.1650.0
3568.982960.018.101.00.77006.21297.42.122224.0666.020.2377.7317.6017.8
3573.849700.018.101.00.77006.39591.02.505224.0666.020.2391.3413.2721.7
3585.201770.018.101.00.77006.12783.42.722724.0666.020.2395.4311.4822.7
3634.222390.018.101.00.77005.80389.01.904724.0666.020.2353.0414.6416.8..............................
3643.474280.018.101.00.71808.78082.91.904724.0666.020.2354.555.2921.9206323.125015.06.0233771.0805191047.02.71948139.26-121.451.156
3695.669980.018.101.00.63106.68396.81.356724.0666.020.2375.333.7350.0206362.556818.06.1140351.315789356.03.12280739.49-121.210.771
3706.538760.018.101.00.63107.01697.51.202424.0666.020.2392.052.9650.0206371.700017.05.2055431.1200921007.02.32563539.43-121.220.923
3728.267250.018.101.00.66805.87589.61.129624.0666.020.2347.888.8850.0206381.867218.05.3295131.171920741.02.12320939.43-121.320.847
206392.388616.05.2547171.1622641387.02.61698139.37-121.240.894
\n", + "

5828 rows × 9 columns

\n", "
" ], "text/plain": [ - " CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\n", - "142 3.32105 0.0 19.58 1.0 0.8710 5.403 100.0 1.3216 5.0 403.0 \n", - "152 1.12658 0.0 19.58 1.0 0.8710 5.012 88.0 1.6102 5.0 403.0 \n", - "154 1.41385 0.0 19.58 1.0 0.8710 6.129 96.0 1.7494 5.0 403.0 \n", - "155 3.53501 0.0 19.58 1.0 0.8710 6.152 82.6 1.7455 5.0 403.0 \n", - "160 1.27346 0.0 19.58 1.0 0.6050 6.250 92.6 1.7984 5.0 403.0 \n", - "162 1.83377 0.0 19.58 1.0 0.6050 7.802 98.2 2.0407 5.0 403.0 \n", - "163 1.51902 0.0 19.58 1.0 0.6050 8.375 93.9 2.1620 5.0 403.0 \n", - "208 0.13587 0.0 10.59 1.0 0.4890 6.064 59.1 4.2392 4.0 277.0 \n", - "209 0.43571 0.0 10.59 1.0 0.4890 5.344 100.0 3.8750 4.0 277.0 \n", - "210 0.17446 0.0 10.59 1.0 0.4890 5.960 92.1 3.8771 4.0 277.0 \n", - "211 0.37578 0.0 10.59 1.0 0.4890 5.404 88.6 3.6650 4.0 277.0 \n", - "212 0.21719 0.0 10.59 1.0 0.4890 5.807 53.8 3.6526 4.0 277.0 \n", - "216 0.04560 0.0 13.89 1.0 0.5500 5.888 56.0 3.1121 5.0 276.0 \n", - "218 0.11069 0.0 13.89 1.0 0.5500 5.951 93.8 2.8893 5.0 276.0 \n", - "219 0.11425 0.0 13.89 1.0 0.5500 6.373 92.4 3.3633 5.0 276.0 \n", - "220 0.35809 0.0 6.20 1.0 0.5070 6.951 88.5 2.8617 8.0 307.0 \n", - "221 0.40771 0.0 6.20 1.0 0.5070 6.164 91.3 3.0480 8.0 307.0 \n", - "222 0.62356 0.0 6.20 1.0 0.5070 6.879 77.7 3.2721 8.0 307.0 \n", - "234 0.44791 0.0 6.20 1.0 0.5070 6.726 66.5 3.6519 8.0 307.0 \n", - "236 0.52058 0.0 6.20 1.0 0.5070 6.631 76.5 4.1480 8.0 307.0 \n", - "269 0.09065 20.0 6.96 1.0 0.4640 5.920 61.5 3.9175 3.0 223.0 \n", - "273 0.22188 20.0 6.96 1.0 0.4640 7.691 51.8 4.3665 3.0 223.0 \n", - "274 0.05644 40.0 6.41 1.0 0.4470 6.758 32.9 4.0776 4.0 254.0 \n", - "276 0.10469 40.0 6.41 1.0 0.4470 7.267 49.0 4.7872 4.0 254.0 \n", - "277 0.06127 40.0 6.41 1.0 0.4470 6.826 27.6 4.8628 4.0 254.0 \n", - "282 0.06129 20.0 3.33 1.0 0.4429 7.645 49.7 5.2119 5.0 216.0 \n", - "283 0.01501 90.0 1.21 1.0 0.4010 7.923 24.8 5.8850 1.0 198.0 \n", - "356 8.98296 0.0 18.10 1.0 0.7700 6.212 97.4 2.1222 24.0 666.0 \n", - "357 3.84970 0.0 18.10 1.0 0.7700 6.395 91.0 2.5052 24.0 666.0 \n", - "358 5.20177 0.0 18.10 1.0 0.7700 6.127 83.4 2.7227 24.0 666.0 \n", - "363 4.22239 0.0 18.10 1.0 0.7700 5.803 89.0 1.9047 24.0 666.0 \n", - "364 3.47428 0.0 18.10 1.0 0.7180 8.780 82.9 1.9047 24.0 666.0 \n", - "369 5.66998 0.0 18.10 1.0 0.6310 6.683 96.8 1.3567 24.0 666.0 \n", - "370 6.53876 0.0 18.10 1.0 0.6310 7.016 97.5 1.2024 24.0 666.0 \n", - "372 8.26725 0.0 18.10 1.0 0.6680 5.875 89.6 1.1296 24.0 666.0 \n", + " MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \\\n", + "59 2.5625 2.0 2.771930 0.754386 94.0 1.649123 37.82 \n", + "75 0.9241 17.0 2.817768 1.052392 762.0 1.735763 37.81 \n", + "77 1.1111 19.0 5.830918 1.173913 721.0 3.483092 37.81 \n", + "80 1.5000 17.0 3.197232 1.000000 609.0 2.107266 37.81 \n", + "87 0.7600 10.0 2.651515 1.054545 546.0 1.654545 37.81 \n", + "... ... ... ... ... ... ... ... \n", + "20632 3.1250 15.0 6.023377 1.080519 1047.0 2.719481 39.26 \n", + "20636 2.5568 18.0 6.114035 1.315789 356.0 3.122807 39.49 \n", + "20637 1.7000 17.0 5.205543 1.120092 1007.0 2.325635 39.43 \n", + "20638 1.8672 18.0 5.329513 1.171920 741.0 2.123209 39.43 \n", + "20639 2.3886 16.0 5.254717 1.162264 1387.0 2.616981 39.37 \n", + "\n", + " Longitude HouseValue \n", + "59 -122.29 0.600 \n", + "75 -122.28 1.775 \n", + "77 -122.28 1.083 \n", + "80 -122.28 1.625 \n", + "87 -122.27 1.625 \n", + "... ... ... \n", + "20632 -121.45 1.156 \n", + "20636 -121.21 0.771 \n", + "20637 -121.22 0.923 \n", + "20638 -121.32 0.847 \n", + "20639 -121.24 0.894 \n", "\n", - " PTRATIO B LSTAT MEDV \n", - "142 14.7 396.90 26.82 13.4 \n", - "152 14.7 343.28 12.12 15.3 \n", - "154 14.7 321.02 15.12 17.0 \n", - "155 14.7 88.01 15.02 15.6 \n", - "160 14.7 338.92 5.50 27.0 \n", - "162 14.7 389.61 1.92 50.0 \n", - "163 14.7 388.45 3.32 50.0 \n", - "208 18.6 381.32 14.66 24.4 \n", - "209 18.6 396.90 23.09 20.0 \n", - "210 18.6 393.25 17.27 21.7 \n", - "211 18.6 395.24 23.98 19.3 \n", - "212 18.6 390.94 16.03 22.4 \n", - "216 16.4 392.80 13.51 23.3 \n", - "218 16.4 396.90 17.92 21.5 \n", - "219 16.4 393.74 10.50 23.0 \n", - "220 17.4 391.70 9.71 26.7 \n", - "221 17.4 395.24 21.46 21.7 \n", - "222 17.4 390.39 9.93 27.5 \n", - "234 17.4 360.20 8.05 29.0 \n", - "236 17.4 388.45 9.54 25.1 \n", - "269 18.6 391.34 13.65 20.7 \n", - "273 18.6 390.77 6.58 35.2 \n", - "274 17.6 396.90 3.53 32.4 \n", - "276 17.6 389.25 6.05 33.2 \n", - "277 17.6 393.45 4.16 33.1 \n", - "282 14.9 377.07 3.01 46.0 \n", - "283 13.6 395.52 3.16 50.0 \n", - "356 20.2 377.73 17.60 17.8 \n", - "357 20.2 391.34 13.27 21.7 \n", - "358 20.2 395.43 11.48 22.7 \n", - "363 20.2 353.04 14.64 16.8 \n", - "364 20.2 354.55 5.29 21.9 \n", - "369 20.2 375.33 3.73 50.0 \n", - "370 20.2 392.05 2.96 50.0 \n", - "372 20.2 347.88 8.88 50.0 " + "[5828 rows x 9 columns]" ] }, - "execution_count": 130, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Use the query method to define a subset of boston_df that only include houses are along the river (CHAS = 1). \n", - "along_river = boston_df.query('CHAS == 1')\n", - "along_river" + "# Use the query method to define a subset of california_df that only include houses less than 20 years old. \n", + "newer_houses = california_df.query('HouseAge < 20')\n", + "newer_houses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "What do you notice about the CHAS column? " + "What do you notice about the HouseAge column? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**Answer:** It's all 1.0! This means that we successfully subsetting all houses that are along the Charles River. Great work!" + "**Answer:** From a glance, the numbers seem to be less than 20! This is a good sign that we successfully subsetted all houses that are less than 20 years old. Great work!" ] }, { "cell_type": "code", - "execution_count": 128, + "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "28.44" + "1.9326925875085794" ] }, - "execution_count": 128, + "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Now determine the average price for these houses. 'MEDV' is the column name for the prices. \n", - "averages_price_along_river = np.mean(along_river['MEDV'])\n", - "averages_price_along_river" + "# Now determine the average price for these houses. 'HouseValue' is the column name for the prices. \n", + "averages_newer_houses = np.mean(newer_houses['HouseValue'])\n", + "averages_newer_houses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now try determining the average for houses NOT along the River." + "Now try determining the average for houses 20 years or older." ] }, { "cell_type": "code", - "execution_count": 129, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "22.093842887473482" + "2.122016487307589" ] }, - "execution_count": 129, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# Determine the average price for houses that are NOT along the Charles River (when CHAS = 0). \n", - "not_along_river = boston_df.query('CHAS == 0')\n", - "averages_price_not_along_river = np.mean(not_along_river['MEDV'])\n", - "averages_price_not_along_river" + "# Determine the average price for houses that are 20 years or older. \n", + "older_houses = california_df.query('HouseAge >= 20')\n", + "averages_older_houses = np.mean(older_houses['HouseValue'])\n", + "averages_older_houses" ] }, { @@ -2330,7 +737,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.1" + "version": "3.11.8" } }, "nbformat": 4, From bba9f9c949d2d298751d7f47c266318ac3c3fb45 Mon Sep 17 00:00:00 2001 From: elysian Date: Wed, 28 Feb 2024 19:22:30 -0500 Subject: [PATCH 2/2] Edited Practice 21 to match key --- .../Practice21_Basic_Stats_I_Averages.ipynb | 76 ++++++++----------- 1 file changed, 30 insertions(+), 46 deletions(-) diff --git a/Practices/Practice21_Basic_Stats_I_Averages.ipynb b/Practices/Practice21_Basic_Stats_I_Averages.ipynb index d20f401..1ce601f 100644 --- a/Practices/Practice21_Basic_Stats_I_Averages.ipynb +++ b/Practices/Practice21_Basic_Stats_I_Averages.ipynb @@ -4,7 +4,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For this practice, let's use the Boston dataset." + "# Practice: Basic Statistics I: Averages" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this practice, let's use the California dataset." ] }, { @@ -27,8 +34,8 @@ }, "outputs": [], "source": [ - "# Import the load_boston method \n", - "from sklearn.datasets import load_boston" + "# Import the fetch_california_housing method to load the California data later on\n", + "from sklearn.datasets import fetch_california_housing" ] }, { @@ -39,7 +46,7 @@ }, "outputs": [], "source": [ - "# Import pandas, so that we can work with the data frame version of the Boston data\n", + "# Import pandas, so that we can work with the data frame version of the California data\n", "import pandas as pd" ] }, @@ -51,8 +58,8 @@ }, "outputs": [], "source": [ - "# Load the Boston data\n", - "boston = load_boston()" + "# Load the California data\n", + "california = fetch_california_housing()" ] }, { @@ -61,21 +68,8 @@ "metadata": {}, "outputs": [], "source": [ - "# This will provide the characteristics for the Boston dataset\n", - "print(boston.DESCR)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "# Here, I'm including the prices of Boston's houses, which is boston['target'], as a column with the other \n", - "# features in the Boston dataset.\n", - "boston_data = np.concatenate((boston['data'], pd.DataFrame(boston['target'])), axis = 1)" + "# This will provide the characteristics for the California dataset\n", + "print(california.DESCR)" ] }, { @@ -86,9 +80,12 @@ }, "outputs": [], "source": [ - "# Convert the Boston data to a data frame format, so that it's easier to view and process\n", - "boston_df = pd.DataFrame(boston_data, columns = np.concatenate((boston['feature_names'], 'MEDV'), axis = None))\n", - "boston_df" + "# Convert the housing object to a data frame format, so that it's easier to view and process\n", + "california_df = pd.DataFrame(california['data'], columns = california['feature_names'])\n", + "# Here, I'm including the prices of California's houses, which is california['target'], \n", + "# as a column with the other features in the California dataset.\n", + "california_df['HouseValue'] = california['target']\n", + "california_df" ] }, { @@ -123,9 +120,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "" - ] + "source": [] }, { "cell_type": "markdown", @@ -138,7 +133,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We will determine the average price for houses along the Charles River and that for houses NOT along the river." + "We will determine the average price for houses less than 20 years old and that for houses 20 years old or more." ] }, { @@ -149,22 +144,20 @@ }, "outputs": [], "source": [ - "# Use the query method to define a subset of boston_df that only include houses are along the river (CHAS = 1). " + "# Use the query method to define a subset of california_df that only include houses less than 20 years old. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "What do you notice about the CHAS column? " + "What do you notice about the HouseAge column? " ] }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "" - ] + "source": [] }, { "cell_type": "code", @@ -174,14 +167,14 @@ }, "outputs": [], "source": [ - "# Now determine the average price for these houses. 'MEDV' is the column name for the prices. " + "# Now determine the average price for these houses. 'HouseValue' is the column name for the prices. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now try determining the average for houses NOT along the River." + "Now try determining the average for houses 20 years or older." ] }, { @@ -192,7 +185,7 @@ }, "outputs": [], "source": [ - "# Determine the average price for houses that are NOT along the Charles River (when CHAS = 0). " + "# Determine the average price for houses that are 20 years or older. \n" ] }, { @@ -201,15 +194,6 @@ "source": [ "Good work! You're becoming an expert in subsetting and determining averages on subsetted data. This will be integral for your capstone projects and future careers as data scientists! " ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [] } ], "metadata": { @@ -228,7 +212,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.1" + "version": "3.1.0" } }, "nbformat": 4,