Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update mail address and replace "word count" by "document-term" #15

Merged
merged 1 commit into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ Package: MetaNLP
Type: Package
Title: Natural Language Processing for Meta Analysis
Version: 0.1.2.9000
Authors@R: c(person("Nico", "Bruder", role = c("aut"), email = "[email protected]"),
Authors@R: c(person("Nico", "Bruder", role = c("aut"), email = "[email protected]", comment = c(ORCID = "0009-0004-9522-2075")),
person("Samuel", "Zimmermann", role = c("aut"), email = "[email protected]", comment = c(ORCID = "0009-0000-4828-9294")),
person("Johannes", "Vey", role = c("aut"), email = "[email protected]", comment = c(ORCID = "0000-0002-2610-9667")),
person("Maximilian", "Pilz", role = c("aut", "cre"), email = "[email protected]", comment = c(ORCID = "0000-0002-9685-1613")),
person(given = "Institute of Medical Biometry - University of Heidelberg", role = c("cph")))
Description: Given a CSV file with titles and abstracts, the package creates a
word count matrix that is lemmatized and stemmed and can directly be used to
document-term matrix that is lemmatized and stemmed and can directly be used to
train machine learning methods for automatic title-abstract screening in the
preparation of a meta analysis.
License: MIT + file LICENSE
Expand All @@ -34,7 +34,7 @@ Collate:
useful_functions.R
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
BugReports: https://github.com/imbi-heidelberg/MetaNLP/issues
URL: https://github.com/imbi-heidelberg/MetaNLP
Config/testthat/edition: 3
Expand Down
8 changes: 4 additions & 4 deletions R/MetaNLP.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@
"_PACKAGE"


#' Create a data frame with word counts
#' Create a data frame with document-term matrix
#'
#' A \code{MetaNLP} object is the base class of the package \pkg{MetaNLP}.
#' It is initialized by passing the path to a CSV file and constructs
#' a data frame whose column names are the words that occur in the titles
#' and abstracts and whose cells contain the word counts for each
#' and abstracts and whose cells contain the word frequencies for each
#' paper.
#'
#' @rdname MetaNLP
Expand All @@ -42,7 +42,7 @@ setClass("MetaNLP", representation(data_frame = "data.frame"))
#'
#' @details
#' An object of class \code{MetaNLP} contains a slot data_frame where
#' the word count data frame is stored.
#' the document-term matrix is stored as a data frame.
#' The CSV file must have a column \code{ID} to identify each paper, a column
#' \code{title} with the belonging titles of the papers and a column
#' \code{abstract} which contains the abstracts. If the CSV stores training data,
Expand Down Expand Up @@ -196,7 +196,7 @@ setMethod("plot", signature("MetaNLP", y = "missing"),
# check whether decision column exists and filter data
if(dec != "total") {
if(is.null(x@data_frame$decision_)) {
warning("Column decision_ does not exist. Word cloud is created by using the whole word count matrix.")
warning("Column decision_ does not exist. Word cloud is created by using the whole document-term matrix.")
data <- x@data_frame
}
else {
Expand Down
8 changes: 4 additions & 4 deletions R/delete_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' There can be words that do not offer additional information
#' in the classification whether a paper should be included or excluded
#' from a meta-analysis. Thus, such words should not be part of the word count
#' from a meta-analysis. Thus, such words should not be part of the document-term
#' matrix. This function allows the user to remove these columns of the word
#' count matrix by specifying a vector of words to delete.
#'
Expand All @@ -13,7 +13,7 @@
#' @details
#' The words in \code{delete_list} can be given like they appear in the
#' text. They are lemmatized and stemmed by \code{delete_words} to match the
#' columns of the word count matrix.
#' columns of the document-term matrix.
#'
#' @export
setGeneric("delete_words", function(object, delete_list) {
Expand Down Expand Up @@ -53,7 +53,7 @@ setMethod("delete_words", signature("MetaNLP", "character"),
#'
#' Usually, stop words do not offer useful information in the classification
#' whether a paper should be included or excluded
#' from a meta-analysis. Thus, such words should not be part of the word count
#' from a meta-analysis. Thus, such words should not be part of the document-term
#' matrix. This function allows the user to automatically delete stop words.
#'
#' @param object A MetaNLP object, whose data frame is to be modified.
Expand Down Expand Up @@ -94,7 +94,7 @@ setMethod("delete_stop_words", signature("MetaNLP"),

#' Replace special characters in column names
#'
#' When using non-english languages, the column names of the word count matrix
#' When using non-english languages, the column names of the document-term matrix
#' can contain special characters. These might lead to encoding problems, when
#' this matrix is used to train a machine learning model. This functions
#' automatically replaces all special characters by the nearest equivalent
Expand Down
2 changes: 1 addition & 1 deletion R/feature_selection.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Select features via elasticnet regularization
#'
#' As the word count matrix quickly grows with an increasing number of abstracts,
#' As the document-term matrix quickly grows with an increasing number of abstracts,
#' it can easily reach several thousand columns. Thus, it can be important to
#' extract the columns that carry most of the information in the decision making
#' process. This function uses a generalized linear model combined with
Expand Down
16 changes: 8 additions & 8 deletions R/useful_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,14 +82,14 @@ setGeneric("write_csv", function(object, ...) {
})


#' Save the word count matrix
#' Save the document-term matrix
#'
#' This function can be used to save the word count matrix of a MetaNLP object
#' This function can be used to save the document-term matrix of a MetaNLP object
#' as a csv-file.
#'
#' @param object An object of class MetaNLP.
#' @param path Path where to save the csv.
#' @param type Specifies if the word count matrix should be saved as
#' @param type Specifies if the document-term matrix should be saved as
#' "train_wcm.csv" or "test_wcm.csv". If the user wants to use another file name,
#' the whole path including the file name should be given as the \code{path}
#' argument
Expand Down Expand Up @@ -141,12 +141,12 @@ setMethod("write_csv", signature("MetaNLP"),
#' Read and adapt test data
#'
#' This function takes a MetaNLP object (the training data) and the
#' test data. The function creates the word count matrix from the test data
#' test data. The function creates the document-term matrix from the test data
#' and matches the columns of the given training MetaNLP object with the columns
#' of the test word count matrix. This means that columns, which do appear
#' in the test word count matrix but not in the training word count matrix are
#' removed; columns that appear in the training word count matrix but not in the
#' test word count matrix are added as a column consisting of zeros.
#' of the test document-term matrix. This means that columns, which do appear
#' in the test document-term matrix but not in the training document-term matrix are
#' removed; columns that appear in the training document-term matrix but not in the
#' test document-term matrix are added as a column consisting of zeros.
#'
#' @param object The MetaNLP object created from the training data.
#' @param file Either the path to the test data csv, the data frame containing
Expand Down
6 changes: 3 additions & 3 deletions man/MetaNLP.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/delete_stop_words.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/delete_words.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions man/read_test_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/replace_special_characters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/select_features.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/write_csv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading