Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace boston #56

Merged
merged 2 commits into from
Jan 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 34 additions & 22 deletions Lessons/Lesson22_Basic_Stats_II_Percents.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
"id": "5ADm2TV-s7VG"
},
"source": [
"**Example 2:** Let's learn to calculate percentages by using real world data. We will work with a dataset of Boston housing prices."
"**Example 2:** Let's learn to calculate percentages by using real world data. We will work with a dataset of Ames, Iowa housing prices."
]
},
{
Expand All @@ -96,8 +96,9 @@
},
"outputs": [],
"source": [
"# Import the load_boston method \n",
"from sklearn.datasets import load_boston"
"# Import the fetch_openml method \n",
"from sklearn.datasets import fetch_openml\n",
"housing = fetch_openml(name=\"house_prices\", as_frame=True, parser=\"auto\")"
]
},
{
Expand All @@ -110,7 +111,7 @@
},
"outputs": [],
"source": [
"# Import pandas, so that we can work with the data frame version of the Boston housing data\n",
"# Import pandas, so that we can work with the data frame version of the Ames housing data\n",
"import pandas as pd"
]
},
Expand All @@ -125,12 +126,10 @@
},
"outputs": [],
"source": [
"# Load the dataset of housing prices in Boston, and convert to\n",
"# Load the dataset of house prices in Ames, and convert to\n",
"# a data frame format so it's easier to view and process\n",
"boston = load_boston()\n",
"boston_df = pd.DataFrame(boston['data'], columns = boston['feature_names'])\n",
"boston_df['PRICE'] = boston.target\n",
"boston_df"
"ames_df = pd.DataFrame(housing['data'])\n",
"ames_df"
]
},
{
Expand All @@ -140,7 +139,20 @@
"id": "eyMUHGews7VZ"
},
"source": [
"CHAS is the indicator variable we used last week, where 1 indicates that the property (tract) is on the Charles River and 0 means otherwise."
"The `SaleCondition` column lists the condition of the house sale:\n",
"\n",
"\n",
"* `Normal`: Normal Sale \n",
"\n",
"* `Abnorml`: Abnormal Sale - trade, foreclosure, short sale\n",
"\n",
"* `AdjLand`: Adjoining Land Purchase\n",
"\n",
"* `Alloca`: Allocation - two linked properties with separate deeds, typically condo with a garage unit\n",
"\n",
"* `Family`: Sale between family members \n",
"\n",
"* `Partial`: Home was not completed when last assessed (associated with New Homes)\n"
]
},
{
Expand All @@ -150,7 +162,7 @@
"id": "IMpeHBEzs7VZ"
},
"source": [
"What percentage of the tracts bound the Charles River? We'll see how to do this using the query method AND using boolean indexing."
"What percentage of the houses were sold normally? We'll see how to do this using the query method AND using boolean indexing."
]
},
{
Expand Down Expand Up @@ -200,10 +212,10 @@
},
"outputs": [],
"source": [
"# Determine the total number of tracts in the dataset\n",
"# Determine the total number of houses in the dataset\n",
"\n",
"\n",
"# Now calculate the percentage of tracts that bounds the Charles River.\n"
"# Now calculate the percentage of houses sold normally.\n"
]
},
{
Expand All @@ -226,23 +238,23 @@
"id": "kFGToww_s7Vg"
},
"source": [
"What percentage of tracts have a median price less than $10,000?"
"What percentage of houses have a price less than $200,000?"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "xiZbDvpOs7Vh"
},
"outputs": [],
"source": [
"# Determine number of tracts that cost less than $10,000\n",
"# Determine number of houses that cost less than $200,000\n",
"\n",
"\n",
"# Calculate the percentage of tracts that cost less than $10k.\n"
"# Calculate the percentage of houses that cost less than $200k.\n"
]
},
{
Expand All @@ -252,7 +264,7 @@
"id": "RLZ-k3L7s7Vq"
},
"source": [
"What percentage of tracts have a median price **between** \\$10,000 and \\$30,000?"
"What percentage of tracts have a median price **between** $200,000 and $500,000?"
]
},
{
Expand All @@ -265,13 +277,13 @@
},
"outputs": [],
"source": [
"# Make an array of booleans with cost greater than $10,000 AND less than $30,000\n",
"# Make an array of booleans with cost greater than $200,000 AND less than $500,000\n",
"\n",
"\n",
"# Determine number of tracts that cost between $10,000 and $30,000\n",
"# Determine number of houses that cost between $200,000 and $500,000\n",
"\n",
"\n",
"# Calculate the percentage of tracts between $10,000 and $30,000\n"
"# Calculate the percentage of houses between $200,000 and $500,000\n"
]
},
{
Expand Down Expand Up @@ -301,7 +313,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.9.12"
}
},
"nbformat": 4,
Expand Down
Loading