-
Notifications
You must be signed in to change notification settings - Fork 1
/
03-data-visualisation.Rmd
57 lines (34 loc) · 5.22 KB
/
03-data-visualisation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Data visualisation
Being able to look at the data is a key step in data exploration, during the analysis and in the communication of results. R has a range of powerful tools to create graphs quickly, and then to develop them into a publication-ready format where helpful.
For an introduction to the Grammar of Graphics and ggplot2, the package that we will be using for visualisation, watch this video:
`r video_code("sk7TT5qM5Hw")`
[Click here](https://drive.google.com/open?id=1UGp7own42PkPv-TLbX29sQUu2DfeaMUp) for the code used in the video
## `ggplot2`
We use the `ggplot2`-package because it offers a consistent way to create anything from simple exploratory plots to complex data visualisation. Each graph command needs certain parts:
* a call to the `ggplot`-function and the **data** as the first argument: `ggplot(gapminder, `
* a mapping of the **aesthetics**, i.e. of variables to visual elements. This uses the `aes`-function that is given to `ggplot` as the second argument: `aes(x=infant_mortality, y=fertility, col=continent))` (Note that two closing brackets are needed as this also completes the ggplot function call)
* a **geometry**, i.e. a type of chart, that is added with a plus-symbol and the function call, e.g., `+ geom_point()`
* optional elements such as labels that are included again with plus-symbols and function calls, e.g., `+ labs(title = "Association of infant mortality and fertility", subtitle="2010 data from gapminder.org")`
Multiple geometries can be layered on top of each other, for example to add trend lines to scatterplots. In that case, `aes()` functions can be included into the `geom_x()`-functions to make some of the mappings specific to certain geometries. This is done in the example below to color the points by continent without applying that aestethic to the line - if it was included in the main `aes()`-function, the plot would contain a separate coloured line for each continent.
*Note:* Line breaks are completely up to you and can be used to make the code readable as long as the command does not appear complete too early - to keep it simple, there should always be an open bracket, a comma or a + before a line break within the command to create a ggplot-chart
```{r ggplot-example, fig.cap="A simple ggplot example", message=FALSE}
pacman::p_load(dslabs, dplyr, ggplot2)
gapminder2010 <- gapminder %>% filter(year==2010)
ggplot(gapminder2010, aes(x=infant_mortality, y=fertility)) +
geom_point(aes(col=continent)) + geom_smooth() +
ggtitle("Association of infant mortality and fertility",
subtitle="2010 data from gapminder.org")
```
## `esquisse`: using ggplot2 with your mouse
The `esquisse` package provides an RStudio add-in that lets you create `ggplot2`-charts interactively, without having to know the code in advance. If that sounds You can check out the [Getting Started guide](https://dreamrs.github.io/esquisse/articles/get-started.html){target="_blank"} to see what that would look like. To try it out, run `install.packages("esquisse")`, load the data you want to use, and type `esquisse::esquisser()` into your Console.
Try not to rely on `esquisse` *instead of* `ggplot2` - the aim in this module is to learn how to write code that is reproducible, not how to click boxes. However, `esquisse` shows you the code it generates (check `Export & Code` at the bottom right), so that it can be very helpful when you are getting started, or when you are not quite sure how to do something.
## Opinionated visualisation
It is easy to find examples of misleading charts that should never have been published. However, even when all the elements of a chart are legitimately presented, the same data can still be used to suggest radically different conclusions. See the following example and follow the source link to read more about "opinionated" data visualisations. Apart from being critical when seeing charts, the take-away message from this is to be conscious of the power of small design choices - that is a power you should wield consciously when creating charts.
```{r two-messages, echo=FALSE, fig.cap='Same data, two messages (Source: <a href="https://www.infoworld.com/article/3088166/why-how-to-lie-with-statistics-did-us-a-disservice.html">infoworld.com</a>)'}
knitr::include_graphics("./images/iraq-bloody-toll.png")
```
## Further resources {#further-resources-visualisation}
* The [R Graphics Cookbook](https://r-graphics.org/){target="_blank"} by Winston Chang is available online with 150 "recipes" that cover everything from basic exploratory charts to colour-coded maps.
* The BBC graphics team has published their own [R Cookbook](https://bbc.github.io/rcookbook/){target="_blank"} with many tips for making charts that convey a clear message, as well as some custom functions for making clean publication-ready charts.
* Irizarry's *Introduction to Data Science* has a good chapter on [data visualisation principles](https://rafalab.github.io/dsbook/data-visualization-principles.html){target="_blank"}
* I am a big fan of the gapminder bubble chart. In the video (from about 5:00), I show how to create a static version, but R can also create the dynamic version that shows global development over time. For that, you can `r xfun::embed_file("./files/gapminder_animated.R", text = "check out this code")`.