For this assignment you should follow these steps.
-
Create a new project on your computer (or the RStudio server). This might be done by forking this repository and then by cloning the repository.
-
Find a suitable quantitative dataset to analyze. This should be a primarily quantitative dataset. It would be best if you used a dataset that you cared about, but some suggested datasets include these:
- Mapping Early American Elections, especially
congressional-candidate-totals.csv
orcongressional-counties-parties.csv
. - County-level data from the 1906, 1916, 1926, or 1936 Censuses of Religious Bodies.
- Selected U.S. Census data of your choice from NHGIS.
-
Add your data to this repository as a CSV file. If there is more than one file, they should probably go in a
data/
directory. -
Create a Quarto file in this directory in which you will do your analysis. Name it something sensible like
exploration.Qmd
. This file should read in your data using theread_csv()
function in the readr package. Here is some sample code for reading in a CSV file. -
Using both prose and code, create an exploratory data analysis of your dataset. Use the techniques in Grolemund and Wickham as well as in Peng to figure out what the dataset is, what could be learned from it, and what potential pitfalls there are in the data.
-
Edit your document to get rid of visualizations and prose that proved not to be useful. In other words, you don't have to give me a final draft, but don't give me a completely rough draft either.
-
Knit the document to HTML.
-
Submit you HTML file the same way that you would a worksheet, but include a link to your repository in the comments.