Skip to content

Commit

Permalink
Merge pull request #60 from GWC-DCMB/elysia_lesson_practice_review
Browse files Browse the repository at this point in the history
Practice 21
  • Loading branch information
echou89 authored Jul 24, 2024
2 parents cee3fe7 + dbae4fb commit 9b264fd
Show file tree
Hide file tree
Showing 2 changed files with 431 additions and 2,040 deletions.
76 changes: 30 additions & 46 deletions Practices/Practice21_Basic_Stats_I_Averages.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For this practice, let's use the Boston dataset."
"# Practice: Basic Statistics I: Averages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this practice, let's use the California dataset."
]
},
{
Expand All @@ -27,8 +34,8 @@
},
"outputs": [],
"source": [
"# Import the load_boston method \n",
"from sklearn.datasets import load_boston"
"# Import the fetch_california_housing method to load the California data later on\n",
"from sklearn.datasets import fetch_california_housing"
]
},
{
Expand All @@ -39,7 +46,7 @@
},
"outputs": [],
"source": [
"# Import pandas, so that we can work with the data frame version of the Boston data\n",
"# Import pandas, so that we can work with the data frame version of the California data\n",
"import pandas as pd"
]
},
Expand All @@ -51,8 +58,8 @@
},
"outputs": [],
"source": [
"# Load the Boston data\n",
"boston = load_boston()"
"# Load the California data\n",
"california = fetch_california_housing()"
]
},
{
Expand All @@ -61,21 +68,8 @@
"metadata": {},
"outputs": [],
"source": [
"# This will provide the characteristics for the Boston dataset\n",
"print(boston.DESCR)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Here, I'm including the prices of Boston's houses, which is boston['target'], as a column with the other \n",
"# features in the Boston dataset.\n",
"boston_data = np.concatenate((boston['data'], pd.DataFrame(boston['target'])), axis = 1)"
"# This will provide the characteristics for the California dataset\n",
"print(california.DESCR)"
]
},
{
Expand All @@ -86,9 +80,12 @@
},
"outputs": [],
"source": [
"# Convert the Boston data to a data frame format, so that it's easier to view and process\n",
"boston_df = pd.DataFrame(boston_data, columns = np.concatenate((boston['feature_names'], 'MEDV'), axis = None))\n",
"boston_df"
"# Convert the housing object to a data frame format, so that it's easier to view and process\n",
"california_df = pd.DataFrame(california['data'], columns = california['feature_names'])\n",
"# Here, I'm including the prices of California's houses, which is california['target'], \n",
"# as a column with the other features in the California dataset.\n",
"california_df['HouseValue'] = california['target']\n",
"california_df"
]
},
{
Expand Down Expand Up @@ -123,9 +120,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
"source": []
},
{
"cell_type": "markdown",
Expand All @@ -138,7 +133,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We will determine the average price for houses along the Charles River and that for houses NOT along the river."
"We will determine the average price for houses less than 20 years old and that for houses 20 years old or more."
]
},
{
Expand All @@ -149,22 +144,20 @@
},
"outputs": [],
"source": [
"# Use the query method to define a subset of boston_df that only include houses are along the river (CHAS = 1). "
"# Use the query method to define a subset of california_df that only include houses less than 20 years old. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What do you notice about the CHAS column? "
"What do you notice about the HouseAge column? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
"source": []
},
{
"cell_type": "code",
Expand All @@ -174,14 +167,14 @@
},
"outputs": [],
"source": [
"# Now determine the average price for these houses. 'MEDV' is the column name for the prices. "
"# Now determine the average price for these houses. 'HouseValue' is the column name for the prices. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now try determining the average for houses NOT along the River."
"Now try determining the average for houses 20 years or older."
]
},
{
Expand All @@ -192,7 +185,7 @@
},
"outputs": [],
"source": [
"# Determine the average price for houses that are NOT along the Charles River (when CHAS = 0). "
"# Determine the average price for houses that are 20 years or older. \n"
]
},
{
Expand All @@ -201,15 +194,6 @@
"source": [
"Good work! You're becoming an expert in subsetting and determining averages on subsetted data. This will be integral for your capstone projects and future careers as data scientists! "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -228,7 +212,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.1.0"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 9b264fd

Please sign in to comment.