README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  comment = "##",
  fig.path = "man/figures/README-",
  fig.height = 5,
  fig.width = 5
#  out.width = "100%"
)

library(HistData)
```

<!-- badges: start -->
[![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) 
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/HistData)](https://cran.r-project.org/package=HistData)
[![](http://cranlogs.r-pkg.org/badges/grand-total/HistData)](https://cran.r-project.org/package=HistData)
[![DOI](https://zenodo.org/badge/106572219.svg)](https://zenodo.org/badge/latestdoi/106572219)
[![HistData status badge](https://friendly.r-universe.dev/badges/HistData)](https://friendly.r-universe.dev/HistData)
[![](https://img.shields.io/badge/documentation-blue)](https://friendly.github.io/HistData)

<!-- badges: end -->


# HistData  <img src="man/figures/logo.png" align="right" height="200px" />
**Data Sets from the History of Statistics and Data Visualization**

Dev. Version: 0.9-2

The `HistData` package provides a collection of small data sets
that are interesting and important in the history of statistics and data
visualization. The goal of the package is to make these available, both for
instructional use (as examples, problem sets or projects) and for historical research
(extending or criticizing a previous analysis).
Some of these present interesting challenges, or opportunities to "show off",
with graphics or analysis in R. 

Many of the data sets have examples which reproduce an historical graph or analysis.
These are meant mainly as starters for more extensive re-analysis or graphical
elaboration. If you are interested in any of these problems or data sets, I've purposely left
lots of room to do better!

They are part of a program of research called *statistical historiography*
(Friendly, 2007; Friendly & Denis, 2001; Friendly et-al, 2016)
meaning the use of statistical methods to study problems and questions in the
history of statistics and graphics. A main aspect of this is the increased
understanding of historical problems in science and data analysis
trough the process of trying to reproduce a graph or analysis using
modern methods. I call this "Re-visioning", meaning _to see again, hopefully in a new light_.

They are also used in our book,
[_A History of Data Visualization & Graphic Communication_](https://www.hup.harvard.edu/catalog.php?isbn=9780674975231)
(Friendly & Wainer, 2021). See also the [companion website for this book](https://friendly.github.io/HistDataVis/).

If you are looking more widely for datasets to use for examples, teaching or research, check out Vincent Arel-Bundock's
[Rdatasets](https://vincentarelbundock.github.io/Rdatasets/) package, with over 2200 datasets from various
R packages, with this list of [Available datasets](https://vincentarelbundock.github.io/Rdatasets/articles/data.html).

### Data science

There is another R aspect that should be noted here: 
A great deal of "data sciency" work was involved in constructing this package,
alas (for teaching) not captured in the resulting CRAN-friendly package.

* In some cases, data had to be **extracted** from historical documents, using a variety of techniques (web scraping, OCR of PDS files followed by conversion to a data set), each problem with its own toolbox, in R or outside. In many cases, transcription errors had to be corrected 
with code or manually;
* **digitization** of data from an image;
* **conversion** of text-based data sets to a CSV file and then to an `.RData` file with proper column names. Ever seen a Unix `.shar` (shell archive) file? Well, I have.
* **cleaning** variable names, e.g., using `janitor::clean_names()`, or, in some cases, manually editing an excel file.
* Applying **type-conversion**, e.g., `chr` to `factor` or `ordered`; constructing appropriate contrasts for factors to facilitate re-analysis.
* **tidying** data.frames: long <--> wide, abbreviations of character string labels, ...
* **documentation**: The thankless task? No -- considerable effort was made to give detailed descriptions, notes on methods, executable examples, references to original sources and
analyses, ...


## Installation

Get the released version from CRAN or [R-universe](https://friendly.r-universe.dev/)

    install.packages("HistData")
    install.packages('HistData', repos = 'https://friendly.r-universe.dev')

The development version can be installed to your R library directly from github via:

    remotes::install_github("friendly/HistData")

## Data sets
Here are the data sets in the package, with links to their documentation. Some topics are represented by two or more
data sets.

```{r datasets, results='asis'}
# link dataset to pkgdown doc
refurl <- "http://friendly.github.io/HistData/reference/"

dsets <- vcdExtra::datasets("HistData") |> 
  dplyr::select(Item, Title) |> 
  dplyr::mutate(Item = glue::glue("[{Item}]({refurl}{Item}.html)")) 

#knitr::kable(dsets)

library(tinytable)
# tt(dsets) |>
#   format_tt(j = 1, markdown = TRUE) |>
#   style_tt(j = 1, bootstrap_css = "width: 30%;") |>
#   style_tt(j = 2, bootstrap_css = "width: 70%;")
tt(dsets, width = c(.2, .8)) |> 
    format_tt(j = 1, markdown = TRUE) 
#    save_tt("html") |>
#    knitr::asis_output()
```

## Contributors
Please note that the `HistData` project is released with a [Contributor Code of Conduct](https://friendly.github.io/HistData/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.

Over the years, many people have contributed new data sets, offered corrections,
suggestions, or documentation examples. They are appreciatedly listed below:

David Bellhouse,
Brian Clair,
Stephane Dray, 
Luiz Droubi,
Antoine de Falguerolles,
Monique Graf,
James Hanley, 
Peter Li, 
Dennis Murphy, 
Jim Oeppen,
James Riley,
Neville Verlander, 
Hadley Wickham. 

## References

Friendly, M. (2007). A Brief History of Data Visualization.
In Chen, C., Hardle, W. & Unwin, A. (eds.)  
*Handbook of Computational Statistics: Data Visualization*, Springer-Verlag, III, Ch. 1, 1-34.
[Preprint](http://datavis.ca/papers/hbook.pdf)

Friendly, M. & Denis, D. (2001).
Milestones in the history of thematic cartography, statistical graphics, and data visualization.
Web stite: [http://datavis.ca/milestones/](http://datavis.ca/milestones/)

Friendly, M. & Sigal, M. & Harnanansingh, D. (2016).
"The Milestones Project: A Database for the History of Data Visualization,"
In Kostelnick, C. & Kimball, M. (ed.), *Visible Numbers: The History of Data Visualization*, Ashgate Press, Chapter 10. [Preprint](https://www.datavis.ca/papers/MilestonesProject.pdf)

Friendly, M. & Wainer, H. (2021). [*A History of Data Visualization and Graphic Communication*](https://www.hup.harvard.edu/catalog.php?isbn=9780674975231).
Harvard University Press. Companion [web site](https://friendly.github.io/HistDataVis/)