-
Notifications
You must be signed in to change notification settings - Fork 20
/
Copy pathREADME.Rmd
executable file
·243 lines (190 loc) · 11.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
---
title: "tricolore. A flexible color scale for ternary compositions"
author: "Jonas Schöley & Ilya Kashnitsky"
output: github_document
---
```{r echo=FALSE}
knitr::opts_chunk$set(warning=FALSE,
message=FALSE,
fig.width = 12,
fig.height = 12)
```
[![CRAN_Version](https://www.r-pkg.org/badges/version/tricolore)](https://cran.r-project.org/package=tricolore)
[![GitHub Actions R-CMD-check](https://github.com/jschoeley/tricolore/actions/workflows/R-CMD-check.yaml/badge.svg)
[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
What is *tricolore*?
--------------------
`tricolore` is an R library providing a flexible color scale for the visualization of three-part (ternary) compositions. Its main functionality is to color-code any ternary composition as a mixture of three primary colors and to draw a suitable color-key. `tricolore` flexibly adapts to different visualization challenges via
- *discrete* and *continuous* color support,
- support for unbalanced compositional data via *centering*,
- support for data with very narrow range via *scaling*,
- *hue*, *chroma* and *lightness* options.
![](README_files/teaser.png)
Getting Started
---------------
```{r eval=FALSE}
install.packages('tricolore')
library(tricolore); DemoTricolore()
```
The `Tricolore()` function expects a dataframe of three-part compositions, color-codes the compositions and returns a list with elements `rgb` and `key`. The first list element is a vector of rgb codes for the color-coded compositions, the latter element gives a plot of the color key.
Here's a minimal example using simulated data.
```{r message=FALSE, fig.cap='A ternary color key with the color-coded compositional data visible as points.'}
library(tricolore)
# simulate 243 ternary compositions
P <- as.data.frame(prop.table(matrix(runif(3^6), ncol = 3), 1))
# color-code each composition and return a corresponding color key
colors_and_legend <- Tricolore(P, 'V1', 'V2', 'V3')
# the color-coded compositions
head(colors_and_legend$rgb)
colors_and_legend$key
```
You can familiarize yourself with the various options of `tricolore` by running `DemoTricolore()`.
Ternary choropleth maps
-----------------------
Here I demonstrate how to create a choropleth map of the regional distribution of education attainment in Europe 2016 using `ggplot2`.
The data set `euro_example` contains the administrative boundaries for the European NUTS-2 regions in the column `geometry`. This data can be used to plot a choropleth map of Europe using the `sf` package. Each region is represented by a single row. The name of a region is given by the variable `name` while the respective [NUTS-2](https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics) geocode is given by the variable `id`. For each region some compositional statistics are available: Variables starting with `ed` refer to the relative share of population ages 25 to 64 by educational attainment in 2016 and variables starting with `lf` refer to the relative share of workers by labor-force sector in the European NUTS-2 regions 2016.
**1. Using the `Tricolore()` function, color-code each educational composition in the `euro_example` data set and add the resulting vector of hex-srgb colors as a new variable to the dataframe. Store the color key separately.**
```{r}
# color-code the data set and generate a color-key
tric_educ <- Tricolore(euro_example,
p1 = 'ed_0to2', p2 = 'ed_3to4', p3 = 'ed_5to8')
```
`tric` contains both a vector of color-coded compositions (`tric$rgb`) and the corresponding color key (`tric$key`). We add the vector of colors to the map-data.
```{r}
# add the vector of colors to the `euro_example` data
euro_example$educ_rgb <- tric_educ$rgb
```
**2. Using `ggplot2` and the joined color-coded education data and geodata, plot a ternary choropleth map of education attainment in the European regions. Add the color key to the map.**
The secret ingredient is `scale_fill_identity()` to make sure that each region is colored according to the value in the `educ_rgb` variable of `euro_example`.
```{r}
library(ggplot2)
plot_educ <-
# using data sf data `euro_example`...
ggplot(euro_example) +
# ...draw a choropleth map
geom_sf(aes(fill = educ_rgb, geometry = geometry), size = 0.1) +
# ...and color each region according to the color-code
# in the variable `educ_rgb`
scale_fill_identity()
plot_educ
```
Using `annotation_custom()` and `ggplotGrob` we can add the color key produced by `Tricolore()` to the map. Internally, the color key is produced with the [`ggtern`](http://www.ggtern.com/) package. In order for it to render correctly we need to load `ggtern` *after* loading `ggplot2`. Don't worry, the `ggplot2` functions still work.
```{r}
library(ggtern)
plot_educ +
annotation_custom(
ggplotGrob(tric_educ$key),
xmin = 55e5, xmax = 75e5, ymin = 8e5, ymax = 80e5
)
```
Because the color key behaves just like a `ggplot2` plot we can change it to our liking.
```{r}
plot_educ <-
plot_educ +
annotation_custom(
ggplotGrob(
tric_educ$key +
labs(L = '0-2', T = '3-4', R = '5-8')),
xmin = 55e5, xmax = 75e5, ymin = 8e5, ymax = 80e5
)
plot_educ
```
Some final touches...
```{r}
plot_educ +
theme_void() +
coord_sf(datum = NA) +
labs(title = 'European inequalities in educational attainment',
subtitle = 'Regional distribution of ISCED education levels for people aged 25-64 in 2016.')
```
Continuous vs. discrete colors
------------------------------
By default `tricolore` uses a discrete colors scale with 16 colors. This can be changed via the `breaks` parameter. A value of `Inf` gives a continuous color scale...
```{r}
# color-code the data set and generate a color-key
tric_educ_disc <- Tricolore(euro_example,
p1 = 'ed_0to2', p2 = 'ed_3to4', p3 = 'ed_5to8',
breaks = Inf)
euro_example$educ_rgb_disc <- tric_educ_disc$rgb
ggplot(euro_example) +
geom_sf(aes(fill = educ_rgb_disc, geometry = geometry), size = 0.1) +
scale_fill_identity() +
annotation_custom(
ggplotGrob(
tric_educ_disc$key +
labs(L = '0-2', T = '3-4', R = '5-8')),
xmin = 55e5, xmax = 75e5, ymin = 8e5, ymax = 80e5
) +
theme_void() +
coord_sf(datum = NA) +
labs(title = 'European inequalities in educational attainment',
subtitle = 'Regional distribution of ISCED education levels for people aged 25-64 in 2016.')
```
...and a `breaks = 2` gives a discrete color scale with $2^2=4$ colors, highlighting the regions with an absolute majority of any part of the composition.
```{r}
# color-code the data set and generate a color-key
tric_educ_disc <- Tricolore(euro_example,
p1 = 'ed_0to2', p2 = 'ed_3to4', p3 = 'ed_5to8',
breaks = 2)
euro_example$educ_rgb_disc <- tric_educ_disc$rgb
ggplot(euro_example) +
geom_sf(aes(fill = educ_rgb_disc, geometry = geometry), size = 0.1) +
scale_fill_identity() +
annotation_custom(
ggplotGrob(
tric_educ_disc$key +
labs(L = '0-2', T = '3-4', R = '5-8')),
xmin = 55e5, xmax = 75e5, ymin = 8e5, ymax = 80e5
) +
theme_void() +
coord_sf(datum = NA) +
labs(title = 'European inequalities in educational attainment',
subtitle = 'Regional distribution of ISCED education levels for people aged 25-64 in 2016.')
```
Ternary centering
-----------------
While the ternary balance scheme allows for dense yet clear visualizations of *well spread out* ternary compositions the technique is less informative when used with highly *unbalanced data*. The map below shows the regional labor force composition in Europe as of 2016 in nearly monochromatic colors, the different shades of blue signifying a working population which is concentrated in the tertiary (services) sector. Regions in Turkey and Eastern Europe show a somewhat higher concentration of workers in the primary (production) sector but overall the data shows little variation with regards to the *visual reference point*, i.e. the greypoint marking perfectly balanced proportions.
```{r}
tric_lf_non_centered <- Tricolore(euro_example, breaks = Inf,
'lf_pri', 'lf_sec', 'lf_ter')
euro_example$rgb_lf_non_centered <- tric_lf_non_centered$rgb
ggplot(euro_example) +
geom_sf(aes(fill = rgb_lf_non_centered, geometry = geometry), size = 0.1) +
scale_fill_identity() +
annotation_custom(
ggplotGrob(tric_lf_non_centered$key +
labs(L = '% Primary', T = '% Secondary', R = '% Tertiary')),
xmin = 55e5, xmax = 75e5, ymin = 8e5, ymax = 80e5
) +
theme_void() +
coord_sf(datum = NA) +
labs(title = 'European inequalities in labor force composition',
subtitle = 'Regional distribution of labor force across the three sectors in 2016.')
```
A remedy for analyzing data which shows little variation in relation to some reference point is to *change the point of reference*. The map below yet again shows the European regional labor force composition in 2016 but the color scale has been altered so that its greypoint -- the visual point of reference -- is positioned at the European annual average. Consequently the colors now show direction and magnitude of the deviation from the European average labor force composition. Pink, Green and Blue hues show a higher than average share of workers in the primary, secondary and tertiary sector respectively. The saturation of the colors show the magnitude of that deviation with perfect grey marking a region that has a labor force composition equal to the European average, i.e. the reference point.
Centering the color scale over the labor-force composition of the average European NUTS-2 region shows various patterns of deviations from the average. Metropolitan regions (Hamburg, Stockholm, Paris, Madrid) have a higher than average share of tertiary workers. Large parts of France are quite grey, indicating a labor-force composition close to the average, while Eastern Europe, the south of Spain and Italy have a higher than average share of workers active in the primary sector.
```{r}
tric_lf_centered <-
Tricolore(euro_example,
'lf_pri', 'lf_sec', 'lf_ter',
center = NA, crop = TRUE)
euro_example$rgb_lf_centered <- tric_lf_centered$rgb
ggplot(euro_example) +
geom_sf(aes(fill = rgb_lf_centered, geometry = geometry), size = 0.1) +
scale_fill_identity() +
annotation_custom(
ggplotGrob(
tric_lf_centered$key +
labs(L = '% Primary', T = '% Secondary', R = '% Tertiary')),
xmin = 55e5, xmax = 75e5, ymin = 8e5, ymax = 80e5
) +
theme_void() +
coord_sf(datum = NA) +
labs(title = 'European inequalities in labor force composition',
subtitle = 'Regional distribution of labor force across the three sectors in 2016.')
```
Contributing
------------
This software is an academic project. We welcome any issues and pull requests.
Please report any bugs you find by submitting an issue on github.com/jschoeley/tricolore/issues.
If you wish to contribute, please submit a pull request following the guidelines stated in [CONTRIBUTING.md](https://github.com/jschoeley/tricolore/blob/devel/CONTRIBUTING.md).