Fixed plots in vignettes

CSAFE-ISU · Dec 9, 2024 · 5640860 · 5640860
1 parent d70ff85
commit 5640860
Show file tree

Hide file tree

Showing 4 changed files with 256 additions and 30 deletions.
diff --git a/R/plots.R b/R/plots.R
@@ -115,16 +115,8 @@ plot_scores <- function(scores, obs_score = NULL, n_bins = 50) {
         angle = 90,
         vjust = -1,
         hjust = 0.5
-      ) + # add text
-      ggplot2::labs(title = "The observed similarity score compared to reference similarity scores", x = "Score", y = "Rate")
-  } else {
-    p <- p +
-      ggplot2::labs(title = "Reference similarity scores", x = "Score", y = "Rate")
+      )
   }
-  p <- p +
-    ggplot2::theme(legend.position = "bottom",
-                   legend.text = ggplot2::element_text(size = 6),
-                   legend.title = ggplot2::element_text(size = 8))
 
   return(p)
 }

diff --git a/...tes/slrs-from-random-forests-tutorial.Rmd → vignettes/introduction-to-slr-model.Rmd b/...tes/slrs-from-random-forests-tutorial.Rmd → vignettes/introduction-to-slr-model.Rmd
@@ -1,26 +1,31 @@
 ---
-title: "SLRs from Random Forests Tutorial"
-output: rmarkdown::html_vignette
+title: "An Introduction to the SLR Model"
+output: 
+  rmarkdown::html_vignette:
+    toc: true         # Enable Table of Contents
+    toc_depth: 2      # Set the depth of the TOC (adjust as needed)
+    number_sections: true
+    css: "styles.css"
 vignette: >
-  %\VignetteIndexEntry{slrs-from-random-forests-tutorial}
+  %\VignetteIndexEntry{an-introduction-to-the-slr-model}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
 
-```{r, include = FALSE}
+```{r knitr, include = FALSE}
 knitr::opts_chunk$set(
   collapse = TRUE,
-  comment = "#>"
+  comment = "#>",
+  fig.width = 7, 
+  fig.height = 5
 )
 ```
 
-```{r setup}
-library(handwriterRF)
-```
+## Introduction
 
 The `handwriterRF` package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) (<doi:10.1002/sam.11566>). This tutorial summarizes the method introduced in the paper and explains how to use `handwriterRF` to compare handwriting samples. The method employs a random forest to produce a score-based likelihood ratio (SLR), quantifying the strength of evidence that two handwritten documents were written by the same writer or different writers.
 
-## The Data
+## Handwriting Data
 
 We use handwriting samples from the [CSAFE Handwriting Database](https://data.csafe.iastate.edu/HandwritingDatabase/) and the [CVL Handwriting Database](https://cvl.tuwien.ac.at/research/cvl-databases/an-off-line-database-for-writer-retrieval-writer-identification-and-word-spotting/). These databases contain paragraph-length handwriting samples. We randomly selected two Wizard of Oz prompts and two London Letter prompts from CSAFE writers, and four prompts from CVL writers. These samples were randomly split into three sets: training, validation, and testing.
 
@@ -34,16 +39,21 @@ We estimated the writer profiles for all handwriting samples using the `get_writ
 4. Calculate the proportion of graphs assigned to each cluster with `get_cluster_fill_rates()`. The cluster fill rates serve as an estimate of the writer profile for the sample.
 
 The `train` data frame contains the estimated writer profiles for train set. Let's visualize the writer profiles for two writers from `train`: 
-```{r}
+
+```{r profiles-facet}
+library(handwriterRF)
+
 wps <- train %>% dplyr::filter(writer == "w0004" | writer == "w0015")
 
 plot_writer_profiles(wps, color_by = "writer", facets = "writer")
 ```
+
 Each writer has four documents in `train`. We see that for each writer, the profiles are not exactly the same, but many of the spikes and valleys occur in the same clusters. We can plot all the writer profiles on the same axes to better compare the two writers.
 
-```{r}
+```{r profiles-combined}
 plot_writer_profiles(wps, color_by = "writer")
 ```
+
 In this plot the spikes and valleys are not all aligned. In cluster 37, writer w0004 has a small spike while w0015 has a valley. In cluster 27, writer w0015 has a taller spike that writer w0004. Intuitively, we see similarities and differences between the writer profiles in the plot. But we employ a statistical method to formally evaluate the similarities between writer profiles.
 
 ## Constucting Reference Similarity Scores with a Random Forest
@@ -60,15 +70,15 @@ Next, we calculate the distances between each pair of writer profiles in `valida
 
 To visualize the reference similarity scores, we plot the rates of scores assigned to different bins (rather than frequencies), as the data often contains far more "different writer" pairs than "same writer" pairs. This gives us a clearer view of the distribution of reference scores.
 
-```{r, dpi=300}
+```{r ref-scores}
 plot_scores(ref_scores)
 ```
 
 ## Compare Two Test Handwriting Samples
 
 The `test` data frame contains writer profiles from writers not in `train` or `validation`. Let’s compare two writer profiles in the `test` set using the trained random forest and reference similarity scores. We’ll use the first two samples from writer `w0005` as an example. First, we plot the writer profiles:
 
-```{r}
+```{r example-profiles}
 test_samples <- test[test$writer == "w0005",][1:2,]
 
 plot_writer_profiles(test_samples)
@@ -78,14 +88,14 @@ plot_writer_profiles(test_samples)
 
 We can compute the similarity score between these two test samples using the `compare_writer_profiles()` function. This score is derived using the same procedure as for the validation set: we calculate the distance between the two profiles, then compute the proportion of random forest decision trees that predict "same writer."
 
-```{r}
+```{r example-score}
 score <- compare_writer_profiles(test_samples)
 score
 ```
 
 Let's visually see how the similarity score `r score$score` compares to our same writer and different writers similarity scores.
 
-```{r}
+```{r plot-score}
 plot_scores(ref_scores, obs_score = score$score)
 ```
 
@@ -99,7 +109,7 @@ $P_2: \text{the handwriting samples were written by different writers}$
 
 The SLR is the ratio of the likelihood of observing the similarity score under $P_1$ to the likelihood under $P_2$. To calculate the SLR, we use `compare_writer_profiles()` with the `score_only = FALSE` argument. This function applies kernel density estimation to fit probability density functions (PDFs) to the reference scores for same writer and different writer pairs. The SLR is the ratio of the height of the same writer PDF at the observed similarity score to the height of the different writer PDF at the same score. An SLR greater than 1 suggests the samples were likely written by the same writer, while an SLR less than 1 suggests the samples were likely written by different writers.
 
-```{r}
+```{r slr}
 slr <- compare_writer_profiles(test_samples, score_only = FALSE)
 slr
 ```
diff --git a/vignettes/styles.css b/vignettes/styles.css
@@ -0,0 +1,217 @@
+@import url('https://fonts.googleapis.com/css2?family=Montserrat:ital,wght@0,100..900;1,100..900&display=swap');
+
+:root {
+  --blue: #003A70;
+  --lt-blue: #6BA4B8;
+  --purple: #981E97;
+  --lt-purple: #B24DB2;
+  --dk-gray: #343a40;
+  --md-gray: #545859;
+  --lt-gray: #ddd;
+  --orange: #F68D2E;
+}
+
+/* Increase the navbar logo size */
+.navbar-brand img {
+    max-height: 50px; /* Increase the max width as needed */
+    width: auto;     /* Maintain aspect ratio */
+    vertical-align: middle;
+}
+
+.navbar {
+  background-color: white !important;
+  padding-left: 10px;
+  padding-bottom: 0px;
+  margin-bottom: 0px;
+  height: 70px;
+  display: flex;
+  align-items: flex-end;
+}
+
+.navbar > .container,
+.navbar > .container-fluid,
+.navbar > .container-sm,
+.navbar > .container-md,
+.navbar > .container-lg,
+.navbar > .container-xl,
+.navbar > .container-xxl {
+  display: flex;
+  flex-wrap: inherit;
+  justify-content: space-between;
+  align-items: flex-end;
+}
+
+/* Inactive tab color */
+.navbar-nav .nav-link {
+    color: var(--md-gray); /* Color for inactive links */
+    padding: 0;
+}
+
+/* Active tab color */
+.navbar-nav .nav-link.active, .navbar-nav .nav-link.show {
+  color: var(--dk-gray); /* Color for active links */
+  font-weight: bold;
+}
+
+/* Hover tab color */
+.navbar-nav .nav-link:hover {
+    color: var(--dk-gray); /* Color on hover */
+    font-weight: bold;
+}
+
+hr {
+  margin: 1rem 0;
+  color: var(--md-gray);
+  border: 0;
+  border-top: 1px solid;
+  opacity: 0.25;
+}
+
+h6, .h6, h5, .h5, h4, .h4, h3, .h3, h2, .h2, h1, .h1 {
+  margin-top: 0;
+  margin-bottom: 0.5rem;
+  font-weight: 400;
+  line-height: 1.2;
+  color: var(--purple);
+}
+
+h1, .h1 {
+  font-size: calc(1.75rem + 1.5vw);
+  font-weight: 100;
+}
+@media (min-width: 1200px) {
+  h1, .h1 {
+    font-size: 3rem;
+  }
+}
+
+h2, .h2 {
+  font-size: calc(1rem + 0.6vw);
+  font-weight: 700;
+}
+@media (min-width: 1200px) {
+  h2, .h2 {
+    font-size: 2rem;
+  }
+}
+
+h3, .h3 {
+  font-size: calc(1rem + 0.6vw);
+  font-weight: 500;
+}
+@media (min-width: 1200px) {
+  h3, .h3 {
+    font-size: 1.75rem;
+  }
+}
+
+h4, .h4 {
+  font-size: calc(1.275rem + 0.3vw);
+}
+@media (min-width: 1200px) {
+  h4, .h4 {
+    font-size: 1.5rem;
+  }
+}
+
+h5, .h5 {
+  font-size: 1.25rem;
+}
+
+h6, .h6 {
+  font-size: 1rem;
+}
+
+/* Table of contents in sidebar */
+.sidebar nav[role="doc-toc"] ul > li > a:hover,
+.sidebar nav[role="doc-toc"] ul > li > ul > li > a:hover {
+  color: var(--purple) !important;
+  font-weight: bold;
+}
+.sidebar nav[role="doc-toc"] ul > li > a.focus,
+.sidebar nav[role="doc-toc"] ul > li > ul > li > a.focus,
+.sidebar nav[role="doc-toc"] ul > li > a.active, 
+.sidebar nav[role="doc-toc"] ul > li > ul > li > a.active {
+    color: var(--purple) !important;
+    font-weight: bold;
+    border-left-width: 1px;
+    border-left-style: solid;
+    border-left-color: var(--purple);
+}
+
+/* Unvisited link */
+a:link {
+    color: var(--purple); /* Change to your desired color */
+}
+
+/* Visited link */
+a:visited {
+    color: var(--lt-purple); /* Change to your desired color */
+    font-weight: normal;
+}
+
+/* Mouse over link */
+a:hover {
+    color: var(--lt-purple); /* Change to your desired color */
+    font-weight: normal;
+}
+
+/* Selected link */
+a:active {
+    color: var(--lt-purple); /* Change to your desired color */
+    font-weight: normal;
+}
+
+/* Body */
+body {
+  font-family: 'Montserrat', sans-serif;
+  font-weight: 500;
+  font-size: 16px;
+  color: var(--md-gray);
+}
+
+
+*, ::after, ::before {
+    box-sizing: border-box;
+}
+
+/* Scenario Cards */
+.twocards {
+  display: flex;
+  flex-direction: row;
+}
+
+.card {
+  border: 1px solid #dadada;
+  margin: 15px;
+  box-shadow: 4px 4px 8px 0 rgba(0, 0, 0, 0.2);
+  transition: 0.2s;
+  width: 50%;
+  display: flex;
+  flex-direction: column;
+}
+
+.card h3 {
+  margin-left: 14px;
+  margin-right: 14px;
+  margin-bottom: 4px;
+  margin-top: 24px;
+}
+
+.card:hover {
+  box-shadow: 8px 8px 16px 0 rgba(0, 0, 0, 0.2);
+}
+
+.card .container {
+  padding: 2px 14px;
+  flex: 1;
+  display: flex;
+  flex-direction: column;
+}
+
+.card p {
+  margin-left: 14px;
+  margin-right: 14px;
+  margin-bottom: 4px;
+  margin-top: 0;
+}
diff --git a/...new-random-forests-and-refence-scores.Rmd → vignettes/training-slr-model.Rmd b/...new-random-forests-and-refence-scores.Rmd → vignettes/training-slr-model.Rmd
@@ -1,17 +1,24 @@
 ---
-title: "Make New Random Forests and Reference Scores"
-output: rmarkdown::html_vignette
+title: "Training an SLR Model"
+output: 
+  rmarkdown::html_vignette:
+    toc: true         # Enable Table of Contents
+    toc_depth: 2      # Set the depth of the TOC (adjust as needed)
+    number_sections: true
+    css: "styles.css"
 vignette: >
   %\VignetteIndexEntry{train-model}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
 
 ```{r, include = FALSE}
-  knitr::opts_chunk$set(
+knitr::opts_chunk$set(
   collapse = TRUE,
-  comment = "#>"
-  )
+  comment = "#>",
+  fig.width = 7, 
+  fig.height = 5
+)
 ```
 
 HandwriterRF has a pre-trained random forest and set of reference similarity scores that you may use with the functions `compare_documents()` and `compare_writer_profiles()`. This tutorial shows you how to train your own random forest and create your own set of reference scores.