In this assignment you will use a "hidden" API in the Virginia elections site to download county-level presidential election results and then analyze the results. Please commit python scripts a1.py, a2.py, b1.py, b2.py, b3.py, b4.py, ANSWERS>txt
and the images specified below.
Your task is to download all of the CSVs for all of the presidential general elections in Virginia from their Elections Database website and save them as data/1924.csv,...,data/2016.csv
.
First, visit the website in your browser and use the interface to find a list of all presidential general elections. The URL will look like this:
http://historical.elections.virginia.gov/elections/search/year_from:1924/year_to:2016/office_id:1/stage:General
Click on one of them, say 2016, and then click "See Details for this Election" and you'll be taken to this URL:
http://historical.elections.virginia.gov/elections/view/80871/
Click "Download this Election" on the left bar and then "Municipality Results" and you'll get a CSV at this URL:
http://historical.elections.virginia.gov/elections/download/80871/precincts_include:0/
Note that 80871
is the identifier for that the website is using for this election.
-
Create a CSV file called
election_ids.csv
with two columns:year
andelection_id
whereelection_id
is the id used by the website for the presidential election in the corresponding year. (4 points)Hint: The result should start like this:
election_id,year 80871,2016 ....
The HTML source for the 2016 election row looks like this:
<tr id="election-id-80871" class="election_item general_party"> <td class="year first" style="padding: 7px 4px;">2016</td> <td class="" style="padding: 7px 4px;">President</td> <td class="" style="padding: 7px 4px;">Statewide</td> <td class="party_border_top">General Election</td> ...
So you can find the election rows by looking for
tr
tags with a class ofelection_item
(not that spaces in the class string denote multiple classes). You can find their ids by grabbing theid
attribute and extracting the number from the end. You can find the year of the election in the text of the nexttd
tag. -
Get and save the CSV for each year to files
data/1924.csv,...,data/2016.csv
. (4 points)Hint: To iterate over the rows in a DataFrame you can use the following pattern:
for i, row in df.iterrows(): # now row is a Series with index ['election_id', 'year']
Note that when reading
election_ids.csv
, pandas will automatically use numeric types forelection_id
andyear
. To use them in a filename you'll need to convert them to strings, e.g.open('data/' + str(year) + '.csv', 'w')
.
-
Create a new CSV called
republican_shares.csv
with three columns: Year, County/City, andR_SHARE
where the last column is the proportion of all votes cast for the Republican candidate. Note that theCounty/City
column in the downloaded CSVs is not quite accurate because there are congressional district subdivisions (e.g.Chesterfield County (CD 4)
). So you'll have to aggregate the data up to the trueCounty/City
level. (8 points)Hints:
- For each year's DataFrame, find the Republican candidate's column.
- Drop the first row; subset to just the Republican's and Total Votes Cast columns, convert those to numbers (they are currently strings with commas in them)
- Remove the congressional districts in the
County/City
column, e.g. replaceChesterfield County (CD 4)
to justChesterfield County
. - Find the Republican and total votes cast in each county using a
groupby().sum()
. - Find the
R_SHARE
. - Subset to just the
R_SHARE
, and add a column called Year that is the year for this election. - Finally use
pd.concat()
to concatenate the DataFrames for each election and write the result torepublican_shares.csv
-
Create a plot with four time series of Republican vote shares in the following counties: Accomack, Amelia County, Amherst, Alleghany. Make sure your plot has meaningful axis labels and a title. (4 points)
Hint: Use set_index() to create a multi-index, select the
R_SHARE
column so you have a Series and then unstack. -
Select at least 3 variables from the 2016 5-year ACS whose relationship with 2016 Republican vote share you are interested in exploring. See the Variable Types table here for more information on variable types. Download them at the County level for Virginia using the Census API and rename the columns to something more meaningful before saving the data to a CSV called
acs.csv
. (4 points)Hint: Look at the examples (note the URL for 2016 is slightly different from 2014) and use the
in
parameter in the API to select Virginia and thefor
parameter to specify counties. Convert the result to a dataframe as in lecture and then useto_csv()
. -
Merge the ACS data from (B3) with the Republican vote share data from (B1) for the year 2016. Generate a pairplot using seaborn and save it as
pairplot.png
. Also run an OLS regression of the Republican vote share on the ACS variables and save the coefficients in a CSV calledcoefficients.csv
. Summarize (Do not worry about being rigorous here) your findings in a sentence or two inANSWERS.txt
. (4 points)Hint: For the merge, remove ', Virginia' from the ACS and deal with the mismatched cases, e.g. by making everything upper case.