Skip to content

Commit

Permalink
Addressed CRAN comments
Browse files Browse the repository at this point in the history
please remove the single quotes around "same writer" and "different
writers", single qoutes should only be used around package/software/API
names.

\dontrun{} should only be used if the example really cannot be executed
(e.g. because of missing additional software, missing API keys, ...) by
the user. That's why wrapping examples in \dontrun{} adds the comment
("# Not run:") as a warning for the user. Does not seem necessary.
Please replace \dontrun with \donttest.

Please unwrap the examples if they are executable in < 5 sec, or replace
dontrun{} with \donttest{}.
  • Loading branch information
stephaniereinders committed Oct 29, 2024
1 parent 24b388c commit 219cb66
Show file tree
Hide file tree
Showing 13 changed files with 23 additions and 33 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
^altdoc$
^_quarto$
^cran-comments\.md$
^CRAN-SUBMISSION$
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Version: 1.0.1
Authors@R: c(person("Iowa State University of Science and Technology on behalf of its Center for Statistics and Applications in Forensic Evidence", role = c("aut", "cph", "fnd")),
person("Stephanie", "Reinders", role = c("aut", "cre"), email = "[email protected]"))
Maintainer: Stephanie Reinders <[email protected]>
Description: Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the 'same writer' or 'different writers.'
Description: Perform forensic handwriting analysis of two scanned handwritten documents. This package implements the statistical method described by Madeline Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>. Similarity measures and a random forest produce a score-based likelihood ratio that quantifies the strength of the evidence in favor of the documents being written by the same writer or different writers.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Expand Down
2 changes: 0 additions & 2 deletions R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -216,13 +216,11 @@
#' # view the distances data frame
#' random_forest$dists
#'
#' \dontrun{
#' # plot the same writer density
#' plot(random_forest$densities$same_writer)
#'
#' # plot the different writer density
#' plot(random_forest$densities$diff_writer)
#' }
#'
#' @md
"random_forest"
6 changes: 2 additions & 4 deletions R/distances.R
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,8 @@
#' # calculate maximum and Euclidean distances between the first 3 documents in cfr.
#' distances <- get_distances(df = cfr[1:3, ], distance_measures = c('max', 'euc'))
#'
#' \dontrun{
#' # calculate absolute and Euclidean distances between all documents in cfr.
#' distances <- get_distances(df = cfr, distance_measures = c('abs', 'euc'))
#' }
#' distances <- get_distances(df = cfr, distance_measures = c('man'))
#'
get_distances <- function(df, distance_measures) {
dists <- list()

Expand Down
2 changes: 1 addition & 1 deletion R/scores.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
get_score <- function(d, rforest) {
get_prop_same_votes <- function(preds) {
# Get the proportion of decision trees in the trained random forest that
# predict, or vote, 'same writer'.
# predict (vote) same writer.
preds <- as.data.frame(preds)
ntrees <- ncol(preds)
prop <- rowSums(preds == 2) / ntrees
Expand Down
8 changes: 4 additions & 4 deletions R/slrs.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@
#' \item \code{\link[handwriter]{get_cluster_fill_counts}} counts the number of graphs assigned to each cluster.
#' \item \code{\link{get_cluster_fill_rates}} calculates the proportion of graphs assigned to each cluster. The cluster fill rates serve as a writer profile.
#' \item A similarity score is calculated between the cluster fill rates of the two documents using a random forest trained with \pkg{ranger}.
#' \item The similarity score is compared to reference distributions of 'same writer' and 'different
#' writer' similarity scores. The result is a score-based likelihood ratio that conveys the strength
#' of the evidence in favor of 'same writer' or 'different writer'. For more details, see Madeline
#' \item The similarity score is compared to reference distributions of same writer and different
#' writer similarity scores. The result is a score-based likelihood ratio that conveys the strength
#' of the evidence in favor of same writer or different writer. For more details, see Madeline
#' Johnson and Danica Ommen (2021) <doi:10.1002/sam.11566>.
#' }
#'
Expand All @@ -49,7 +49,7 @@
#' @export
#'
#' @examples
#' \dontrun{
#' \donttest{
#' # Compare two samples from the same writer
#' sample1 <- system.file(file.path("extdata", "w0030_s01_pWOZ_r01.png"), package = "handwriterRF")
#' sample2 <- system.file(file.path("extdata", "w0030_s01_pWOZ_r02.png"), package = "handwriterRF")
Expand Down
4 changes: 2 additions & 2 deletions R/train.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
#' saved.
#' @param run_number An integer used for both the set.seed function and to
#' distinguish between different runs on the same input data frame.
#' @param downsample Whether to downsample the number of 'different writer'
#' @param downsample Whether to downsample the number of different writer
#' distances before training the random forest. If TRUE, the different writer
#' distances will be randomly sampled, resulting in the same number of
#' different writer and same writer pairs.
Expand Down Expand Up @@ -137,7 +137,7 @@ get_csafe_train_set <- function(df, train_prompt_codes) {

#' Make Densities from a Trained Random Forest
#'
#' Create densities of 'same writer' and 'different writer' scores produced by a
#' Create densities of same writer and different writer scores produced by a
#' trained random forest.
#'
#' @param rforest A \pkg{ranger} random forest created with \code{\link{train_rf}}.
Expand Down
6 changes: 3 additions & 3 deletions README.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,9 @@ The result is a data frame:
- *docname1* is the file name of the first sample.
- *docname2* is the file name of the second sample.
- *score* is the similarity score between the two samples.
- *numerator* is the numerator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference 'same writer' similarity scores.
- *denominator* is the denominator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference 'different writers' similarity scores.
- *slr* is a score-based likelihood ratio that quantifies the strength of evidence in favor of 'same writer' or 'different writer.'
- *numerator* is the numerator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference same writer similarity scores.
- *denominator* is the denominator value of the score-based likelihood ratio. Intuitively, the larger the value the more the similarity score looks like the reference different writers similarity scores.
- *slr* is a score-based likelihood ratio that quantifies the strength of evidence in favor of same writer or different writer.

Display the slr data frame. We hide the file path columns here so that the data frame fits on this page.

Expand Down
7 changes: 3 additions & 4 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
## Resubmission
This is a resubmission. In this version I have:

* Fixed error in test "Train random forest works with ranger package" that
occurred on Debian. Despite setting the random number generator seed,
the random forest created on Debian has reasonable values but is not equal to the random forest created on a Mac and used in the test as the expected output. Now the
test instead checks that the function runs without error.
* Removed quotes around "same writer" and "different writer" in documentation.

* Removed dontrun{} from the examples for random_forest. Changed example for get_distances() to something that runs in less than 5 seconds and removed dontrun{} from this example. The examples for calculate_slr() take longer than 5 seconds to run so dontrun{} was changed to donttest{} for these examples.


## R CMD check results
Expand Down
8 changes: 3 additions & 5 deletions man/calculate_slr.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 2 additions & 4 deletions man/get_distances.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 0 additions & 2 deletions man/random_forest.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/train_rf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 219cb66

Please sign in to comment.