-
Notifications
You must be signed in to change notification settings - Fork 6
/
README.Rmd
158 lines (121 loc) · 7.13 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
warning = FALSE,
comment = "##",
fig.path = "man/figures/README-",
fig.height = 5,
fig.width = 5
# out.width = "100%"
)
library(HistData)
```
<!-- badges: start -->
[![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/HistData)](https://cran.r-project.org/package=HistData)
[![](http://cranlogs.r-pkg.org/badges/grand-total/HistData)](https://cran.r-project.org/package=HistData)
[![DOI](https://zenodo.org/badge/106572219.svg)](https://zenodo.org/badge/latestdoi/106572219)
[![HistData status badge](https://friendly.r-universe.dev/badges/HistData)](https://friendly.r-universe.dev/HistData)
[![](https://img.shields.io/badge/documentation-blue)](https://friendly.github.io/HistData)
<!-- badges: end -->
# HistData <img src="man/figures/logo.png" align="right" height="200px" />
**Data Sets from the History of Statistics and Data Visualization**
Dev. Version: 0.9-2
The `HistData` package provides a collection of small data sets
that are interesting and important in the history of statistics and data
visualization. The goal of the package is to make these available, both for
instructional use (as examples, problem sets or projects) and for historical research
(extending or criticizing a previous analysis).
Some of these present interesting challenges, or opportunities to "show off",
with graphics or analysis in R.
Many of the data sets have examples which reproduce an historical graph or analysis.
These are meant mainly as starters for more extensive re-analysis or graphical
elaboration. If you are interested in any of these problems or data sets, I've purposely left
lots of room to do better!
They are part of a program of research called *statistical historiography*
(Friendly, 2007; Friendly & Denis, 2001; Friendly et-al, 2016)
meaning the use of statistical methods to study problems and questions in the
history of statistics and graphics. A main aspect of this is the increased
understanding of historical problems in science and data analysis
trough the process of trying to reproduce a graph or analysis using
modern methods. I call this "Re-visioning", meaning _to see again, hopefully in a new light_.
They are also used in our book,
[_A History of Data Visualization & Graphic Communication_](https://www.hup.harvard.edu/catalog.php?isbn=9780674975231)
(Friendly & Wainer, 2021). See also the [companion website for this book](https://friendly.github.io/HistDataVis/).
If you are looking more widely for datasets to use for examples, teaching or research, check out Vincent Arel-Bundock's
[Rdatasets](https://vincentarelbundock.github.io/Rdatasets/) package, with over 2200 datasets from various
R packages, with this list of [Available datasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html).
### Data science
There is another R aspect that should be noted here:
A great deal of "data sciency" work was involved in constructing this package,
alas (for teaching) not captured in the resulting CRAN-friendly package.
* In some cases, data had to be **extracted** from historical documents, using a variety of techniques (web scraping, OCR of PDS files followed by conversion to a data set), each problem with its own toolbox, in R or outside. In many cases, transcription errors had to be corrected
with code or manually;
* **digitization** of data from an image;
* **conversion** of text-based data sets to a CSV file and then to an `.RData` file with proper column names. Ever seen a Unix `.shar` (shell archive) file? Well, I have.
* **cleaning** variable names, e.g., using `janitor::clean_names()`, or, in some cases, manually editing an excel file.
* Applying **type-conversion**, e.g., `chr` to `factor` or `ordered`; constructing appropriate contrasts for factors to facilitate re-analysis.
* **tidying** data.frames: long <--> wide, abbreviations of character string labels, ...
* **documentation**: The thankless task? No -- considerable effort was made to give detailed descriptions, notes on methods, executable examples, references to original sources and
analyses, ...
## Installation
Get the released version from CRAN or [R-universe](https://friendly.r-universe.dev/)
install.packages("HistData")
install.packages('HistData', repos = 'https://friendly.r-universe.dev')
The development version can be installed to your R library directly from github via:
remotes::install_github("friendly/HistData")
## Data sets
Here are the data sets in the package, with links to their documentation. Some topics are represented by two or more
data sets.
```{r datasets, results='asis'}
# link dataset to pkgdown doc
refurl <- "http://friendly.github.io/HistData/reference/"
dsets <- vcdExtra::datasets("HistData") |>
dplyr::select(Item, Title) |>
dplyr::mutate(Item = glue::glue("[{Item}]({refurl}{Item}.html)"))
#knitr::kable(dsets)
library(tinytable)
# tt(dsets) |>
# format_tt(j = 1, markdown = TRUE) |>
# style_tt(j = 1, bootstrap_css = "width: 30%;") |>
# style_tt(j = 2, bootstrap_css = "width: 70%;")
tt(dsets, width = c(.2, .8)) |>
format_tt(j = 1, markdown = TRUE)
# save_tt("html") |>
# knitr::asis_output()
```
## Contributors
Please note that the `HistData` project is released with a [Contributor Code of Conduct](https://friendly.github.io/HistData/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.
Over the years, many people have contributed new data sets, offered corrections,
suggestions, or documentation examples. They are appreciatedly listed below:
David Bellhouse,
Brian Clair,
Stephane Dray,
Luiz Droubi,
Antoine de Falguerolles,
Monique Graf,
James Hanley,
Peter Li,
Dennis Murphy,
Jim Oeppen,
James Riley,
Neville Verlander,
Hadley Wickham.
## References
Friendly, M. (2007). A Brief History of Data Visualization.
In Chen, C., Hardle, W. & Unwin, A. (eds.)
*Handbook of Computational Statistics: Data Visualization*, Springer-Verlag, III, Ch. 1, 1-34.
[Preprint](http://datavis.ca/papers/hbook.pdf)
Friendly, M. & Denis, D. (2001).
Milestones in the history of thematic cartography, statistical graphics, and data visualization.
Web stite: [http://datavis.ca/milestones/](http://datavis.ca/milestones/)
Friendly, M. & Sigal, M. & Harnanansingh, D. (2016).
"The Milestones Project: A Database for the History of Data Visualization,"
In Kostelnick, C. & Kimball, M. (ed.), *Visible Numbers: The History of Data Visualization*, Ashgate Press, Chapter 10. [Preprint](https://www.datavis.ca/papers/MilestonesProject.pdf)
Friendly, M. & Wainer, H. (2021). [*A History of Data Visualization and Graphic Communication*](https://www.hup.harvard.edu/catalog.php?isbn=9780674975231).
Harvard University Press. Companion [web site](https://friendly.github.io/HistDataVis/)