title | subtitle | week | type | reading | tasks | |||
---|---|---|---|---|---|---|---|---|
One Script, Many Products |
RMarkdown to create dynamic research outputs. Publishing to github/word/html/etc |
8 |
Case Study |
|
|
- Browse website about RMarkdown
- Browse R Markdown, the Definitive Guide
- Build a RMarkdown document that downloads a dataset, produces one graph and one table, and exports to four different formats (HTML, GitHub Markdown, Word, Powerpoint).
You are working on a team that needs to provide regular updates about a dataset that is regularly updated. Currently, an employee does the following steps once per week:
- Goes to a website with the source data
- Downloads the data as a text file
- Opens a graphing program and clicking through a set of procedures to make a particular set of graphs
- Saves the updated data in several formats including:
- Powerpoint presentation
- Website hosted on the team website
- Word document that is included in company reports
- A PDF document for downloading/printing
This takes the employee about 3 hours every week. You are a new member of the team and you confidently declare you could automate the procedure using R and RMarkdown (and that you could complete the automation in less than three hours!). The team looks at you with wide eyes. You realize you better get working.
You can specify that RMarkdown should produce multiple outputs using the following syntax in the YAML header:
output:
html_document: default
github_document: default
powerpoint_presentation: default
word_document: default
You can read more about the YAML header and all the options here. Note that you can specify many options for each output format to change the theme, structure, etc.
However, if you click the "Knit" button in RStudio, it will only make one output. To 'render' all of them, you have to use a command like this in the R Console.
rmarkdown::render("path/to/file.Rmd",output_format = "all")
You will be working with the data available here. You can read it in using read_table()
but you will have to look at the text file and specify how many lines to skip
.
Your objective is to automatically produce various outputs like MS Word, PPTX, and HTML.
The document should:
- Download the data (including correctly importing it)
- Make one timeseries plot (ggplot of CO2~time)
- Make one table. This table can summarize anything you want (top 5 years? Mean CO2 every decade?)
- Create a new RMarkdown Document (possibly starting with the template in
File -> New File -> R Markdown
- Edit the YAML header at the top of the .Rmd file to specify the desired file types as noted above.
- Write code to read the "Mauna Loa CO2 annual mean data" from this website.
- check out
tidyverse::read_table()
or similar to import the dataset into R directly from the URL (ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_annmean_mlo.txt) - after you look at the format of the text file, you will want to check out the
skip
parameter ofread_table()
- check out
- Use ggplot to plot a time series of CO2 levels through time
- Add an additional table below the graph and format it nicely with
knitr:kable()
or similar. - Use
rmarkdown::render(file, output_format = "all")
to render all the outputs specified in the YAML. - Consider changing the 'chunk' settings so that the underlying code (and any messages) are hidden in the output documents. For example, consider
results='hide',message=FALSE, echo=F
. - Tables can be a little tricky to embed in multiple formats. One approach is to use
as_image()
in kableExtra package which generates a png file that is easy to embed. You can use it like this:
data %>%
kable() %>%
as_image(width = 10,file = "table.png")
- Save the outputs in your repository.
Explore my version of the html output here.
Think about how you could use this "one document, several outputs" approach in a project.