Skip to content

Commit

Permalink
Merge pull request #15 from CSAFE-ISU/12-release-handwriterrf-102
Browse files Browse the repository at this point in the history
12 release handwriterrf 102
  • Loading branch information
stephaniereinders authored Nov 1, 2024
2 parents 24b388c + aa71164 commit fd2a339
Show file tree
Hide file tree
Showing 14 changed files with 31 additions and 33 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
^altdoc$
^_quarto$
^cran-comments\.md$
^CRAN-SUBMISSION$
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Package: handwriterRF
Type: Package
Title: Handwriting Analysis with Random Forests
Version: 1.0.1
Version: 1.0.2
Authors@R: c(person("Iowa State University of Science and Technology on behalf of its Center for Statistics and Applications in Forensic Evidence", role = c("aut", "cph", "fnd")),
person("Stephanie", "Reinders", role = c("aut", "cre"), email = "[email protected]"))
Maintainer: Stephanie Reinders <[email protected]>
Description: Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the 'same writer' or 'different writers.'
Description: Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the same writer or different writers.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# handwriterRF 1.0.2

* Removed quotes around "same writer" and "different writer" in documentation.

* Removed dontrun{} from the examples for random_forest. Changed example for get_distances() to something that runs in less than 5 seconds and removed dontrun{} from this example. The examples for calculate_slr() take longer than 5 seconds to run so dontrun{} was changed to donttest{} for these examples.

# handwriterRF 1.0.1

# handwriterRF 1.0.0
Expand Down
2 changes: 0 additions & 2 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -216,13 +216,11 @@
#' # view the distances data frame
#' random_forest$dists
#'
#' \dontrun{
#' # plot the same writer density
#' plot(random_forest$densities$same_writer)
#'
#' # plot the different writer density
#' plot(random_forest$densities$diff_writer)
#' }
#'
#' @md
"random_forest"
6 changes: 2 additions & 4 deletions R/distances.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,8 @@
#' # calculate maximum and Euclidean distances between the first 3 documents in cfr.
#' distances <- get_distances(df = cfr[1:3, ], distance_measures = c('max', 'euc'))
#'
#' \dontrun{
#' # calculate absolute and Euclidean distances between all documents in cfr.
#' distances <- get_distances(df = cfr, distance_measures = c('abs', 'euc'))
#' }
#' distances <- get_distances(df = cfr, distance_measures = c('man'))
#'
get_distances <- function(df, distance_measures) {
dists <- list()

Expand Down
2 changes: 1 addition & 1 deletion R/scores.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
get_score <- function(d, rforest) {
get_prop_same_votes <- function(preds) {
# Get the proportion of decision trees in the trained random forest that
# predict, or vote, 'same writer'.
# predict (vote) same writer.
preds <- as.data.frame(preds)
ntrees <- ncol(preds)
prop <- rowSums(preds == 2) / ntrees
Expand Down
8 changes: 4 additions & 4 deletions R/slrs.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@
#' \item \code{\link[handwriter]{get_cluster_fill_counts}} counts the number of graphs assigned to each cluster.
#' \item \code{\link{get_cluster_fill_rates}} calculates the proportion of graphs assigned to each cluster. The cluster fill rates serve as a writer profile.
#' \item A similarity score is calculated between the cluster fill rates of the two documents using a random forest trained with \pkg{ranger}.
#' \item The similarity score is compared to reference distributions of 'same writer' and 'different
#' writer' similarity scores. The result is a score-based likelihood ratio that conveys the strength
#' of the evidence in favor of 'same writer' or 'different writer'. For more details, see Madeline
#' \item The similarity score is compared to reference distributions of same writer and different
#' writer similarity scores. The result is a score-based likelihood ratio that conveys the strength
#' of the evidence in favor of same writer or different writer. For more details, see Madeline
#' Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>.
#' }
#'
Expand All @@ -49,7 +49,7 @@
#' @export
#'
#' @examples
#' \dontrun{
#' \donttest{
#' # Compare two samples from the same writer
#' sample1 <- system.file(file.path("extdata", "w0030_s01_pWOZ_r01.png"), package = "handwriterRF")
#' sample2 <- system.file(file.path("extdata", "w0030_s01_pWOZ_r02.png"), package = "handwriterRF")
Expand Down
4 changes: 2 additions & 2 deletions R/train.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
#' saved.
#' @param run_number An integer used for both the set.seed function and to
#' distinguish between different runs on the same input data frame.
#' @param downsample Whether to downsample the number of 'different writer'
#' @param downsample Whether to downsample the number of different writer
#' distances before training the random forest. If TRUE, the different writer
#' distances will be randomly sampled, resulting in the same number of
#' different writer and same writer pairs.
Expand Down Expand Up @@ -137,7 +137,7 @@ get_csafe_train_set <- function(df, train_prompt_codes) {

#' Make Densities from a Trained Random Forest
#'
#' Create densities of 'same writer' and 'different writer' scores produced by a
#' Create densities of same writer and different writer scores produced by a
#' trained random forest.
#'
#' @param rforest A \pkg{ranger} random forest created with \code{\link{train_rf}}.
Expand Down
6 changes: 3 additions & 3 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,9 @@ The result is a data frame:
- *docname1* is the file name of the first sample.
- *docname2* is the file name of the second sample.
- *score* is the similarity score between the two samples.
- *numerator* is the numerator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference 'same writer' similarity scores.
- *denominator* is the denominator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference 'different writers' similarity scores.
- *slr* is a score-based likelihood ratio that quantifies the strength of evidence in favor of 'same writer' or 'different writer.'
- *numerator* is the numerator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference same writer similarity scores.
- *denominator* is the denominator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference different writers similarity scores.
- *slr* is a score-based likelihood ratio that quantifies the strength of evidence in favor of same writer or different writer.

Display the slr data frame. We hide the file path columns here so that the data frame fits on this page.

Expand Down
7 changes: 3 additions & 4 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
## Resubmission
This is a resubmission. In this version I have:

* Fixed error in test "Train random forest works with ranger package" that
occurred on Debian. Despite setting the random number generator seed,
the random forest created on Debian has reasonable values but is not equal to the random forest created on a Mac and used in the test as the expected output. Now the
test instead checks that the function runs without error.
* Removed quotes around "same writer" and "different writer" in documentation.

* Removed dontrun{} from the examples for random_forest. Changed example for get_distances() to something that runs in less than 5 seconds and removed dontrun{} from this example. The examples for calculate_slr() take longer than 5 seconds to run so dontrun{} was changed to donttest{} for these examples.


## R CMD check results
Expand Down
8 changes: 4 additions & 4 deletions man/calculate_slr.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 2 additions & 4 deletions man/get_distances.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 0 additions & 2 deletions man/random_forest.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/train_rf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit fd2a339

Please sign in to comment.