diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml index 390b31e..21463b6 100644 --- a/.github/workflows/pkgdown.yaml +++ b/.github/workflows/pkgdown.yaml @@ -12,7 +12,7 @@ on: name: pkgdown jobs: - distribution-check: + pkgdown: runs-on: ubuntu-latest container: image: fedora:latest @@ -48,7 +48,7 @@ jobs: run: | devtools::document() devtools::load_all() - pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) + pkgdown::build_site_github_pages(new_process = FALSE, install = TRUE) pkgdown::build_reference(topics = 'random_walk') shell: Rscript {0} diff --git a/DESCRIPTION b/DESCRIPTION index 6c248ec..4c85b13 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -4,10 +4,9 @@ Title: Labyrinth: A Simulation of Knowledge Networks Version: 0.3.0 Date: 2023-12-31 Authors@R: c( - person("Yinchun","Su", role = c("aut", "cre"), - email = "46381867+randef1ned@users.noreply.github.com"), - person("Junwei", "Han", role = "ctb", - email = "hanjunwei1981@163.com")) + person("Junwei","Han", role = c("aut", "cre", "ctb"), + email = "hanjunwei1981@163.com"), + person("Yinchun", "Su", role = "aut") Maintainer: Junwei Han Description: One paragraph description of what the package does as one or more full sentences. I can use Config/testthat/parallel as diff --git a/NEWS.md b/NEWS.md index c19e393..778b4a5 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +## labyrinth v0.3.0 + +* Updated docs + ## labyrinth v0.2.3 * Added `predict()` function @@ -51,4 +55,3 @@ ## labyrinth v0.0.1 * Initial submission - diff --git a/R/data.R b/R/data.R index fc0fb66..ba4fa2b 100644 --- a/R/data.R +++ b/R/data.R @@ -358,4 +358,3 @@ #' drug_id <- drug_annot$drug_id #' hist(table(drug_id)) "drug_annot" - diff --git a/R/utils.R b/R/utils.R index d8215d1..33f9de5 100644 --- a/R/utils.R +++ b/R/utils.R @@ -165,7 +165,7 @@ load_data <- function(required_file) { user_dir <- R_user_dir("labyrinth", which = "data") model_path <- file.path(user_dir, required_file) data_path <- system.file("extdata", required_file, package = "labyrinth") - + e <- new.env() if (file.exists(data_path)) { load(data_path, envir = e) @@ -189,7 +189,7 @@ load_data <- function(required_file) { } } } - + variable_name <- ls(envir = e) return(e[[variable_name[1]]]) } diff --git a/README.md b/README.md index 9765a6d..ce2f1bf 100644 --- a/README.md +++ b/README.md @@ -31,12 +31,12 @@ I developed and tested `labyrinth` on Fedora Linux version 38 and 39. While this We recommended these dependencies to be installed: -- **R (≥ 4.3.0)**: We developed this R package using R version 4.3.3. +- **R (≥ 4.3.0)**: We developed this R package using R version 4.3.x. - **Python**: Python is required for drawing plots in demos. It is recommended to have Python and `seaborn` installed, as the `reticulate` package will use the system's Python installation. - **OpenMP**: This package uses OpenMP for parallelization and multithreading if OpenMP exists. Having OpenMP installed can significantly improve performance. - **Intel oneAPI Math Kernel Library (oneMKL)**: This library can further enhance mathematical performance, especially on Intel processors. oneMKL is not required but highly recommended. -It would takes less than ten minutes to install this package. If you encounter any issues while running this package on other operating system, please open an [issue](https://github.com/randef1ned/labyrinth/issues). +It would takes less than ten minutes to install this package. If you encounter any issues while running this package on other operating system, please open an issue. ## Before installation @@ -98,7 +98,7 @@ Or you can download the pre-built binary packages from [Releases](https://github ## Usage -Load the package using `library(labyrinth)`. We provide a vignette for the package that can be called using: `vignette("labyrinth")`. Alternatively, you can view the online version on [website](https://labyrinth.yinchun.su/articles/labyrinth) or [GitHub](doc/labyrinth_knit.md). The examples I provided would take several minutes to run on a normal desktop computer. Basically that is all you have to know. +Load the package using `library(labyrinth)`. We provide a vignette for the package that can be called using: `vignette("labyrinth")`. Alternatively, you can view the online version on [GitHub](doc/labyrinth_knit.md), or [`pkgdown` documentation](https://labyrinth.yinchun.su/articles/labyrinth). The examples I provided would take several minutes to run on a normal desktop computer. Basically that is all you have to know. [This documentation](doc/training_knit.md) contains information about the contents and the necessary information for training the model used in this project. The `tools/` folder contains all the code and scripts required for constructing your own model, so that you can understand the technical details. Besides, you can refer to [this documentation](doc/preface_knit.md) for the background and inspirations behind the overall workflow of `labyrinth. diff --git a/doc/labyrinth_knit.md b/doc/labyrinth_knit.md index 89f1fdf..68aa131 100644 --- a/doc/labyrinth_knit.md +++ b/doc/labyrinth_knit.md @@ -15,11 +15,11 @@ WSL](https://github.com/WhitewaterFoundry/Fedora-Remix-for-WSL). To get started, you will need to prepare the following dependencies. -1. **Fedora** or **Red Hat Enterprise Linux** with or without WSL. +1. **Fedora** or **Red Hat Enterprise Linux** with or without WSL. -2. **R 4.3**. +2. **R 4.3**. -3. **Required libraries**. +3. **Required libraries**. ``` r # pROC ggthemes @@ -225,11 +225,7 @@ p1 + p2 + plot_layout(guides = 'collect') + theme(legend.position = 'bottom') ``` -
- - -
-![](img/unnamed-chunk-5-1.png) +![Figure 3](../vignettes/img/unnamed-chunk-5-1.png) ### Reproducibility statement diff --git a/doc/preface_knit.md b/doc/preface_knit.md index e067afd..f100931 100644 --- a/doc/preface_knit.md +++ b/doc/preface_knit.md @@ -530,7 +530,7 @@ fields, which is defined as the number of articles retrieved using
-\label{fig:banana}**Bibliometric analysis for drug repurposing. **Drug repurposing gains significant attention since 2010. We adopted banana scale to depict this trend. +\label{fig:banana}**Bibliometric analysis for drug repurposing. **Drug repurposing gains significant attention since 2010. We adopted banana scale to depict this trend.

**Bibliometric analysis for drug repurposing. **Drug repurposing gains significant attention since 2010. We adopted banana scale to depict this diff --git a/doc/training_knit.md b/doc/training_knit.md index b852e1a..4e0ae25 100644 --- a/doc/training_knit.md +++ b/doc/training_knit.md @@ -11,10 +11,8 @@ validate](#construct-the-model-of-labyrinth-and-validate) - [Data preparation](#data-preparation) -

-Preprocessing diagram - -
+![Preprocessing diagram](../vignettes/img/preprocess.jpg) +Preprocessing diagram ## Setting up environment @@ -40,58 +38,58 @@ repository, it is sufficient to download and install the appropriate `epel-release` RPM. Then R can be installed as described in the **Fedora** section. -1. CentOS Stream 9 +1. CentOS Stream 9 - ``` bash - sudo dnf config-manager --set-enabled crb - sudo dnf install epel-release epel-next-release -y - ``` + ``` bash + sudo dnf config-manager --set-enabled crb + sudo dnf install epel-release epel-next-release -y + ``` -2. Rocky Linux 9, AlmaLinux 9, and RHEL 9 +2. Rocky Linux 9, AlmaLinux 9, and RHEL 9 - ``` bash - sudo dnf config-manager --set-enabled crb - sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm -y - ``` + ``` bash + sudo dnf config-manager --set-enabled crb + sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm -y + ``` -3. CentOS Stream 8 +3. CentOS Stream 8 - ``` bash - sudo dnf config-manager --set-enabled powertools - sudo dnf install epel-release epel-next-release -y - ``` + ``` bash + sudo dnf config-manager --set-enabled powertools + sudo dnf install epel-release epel-next-release -y + ``` -4. Rocky Linux 8, AlmaLinux 8, and RHEL 8 +4. Rocky Linux 8, AlmaLinux 8, and RHEL 8 - ``` bash - sudo dnf config-manager --set-enabled powertools - sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm -y - ``` + ``` bash + sudo dnf config-manager --set-enabled powertools + sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm -y + ``` -5. CentOS 7 and RHEL 7 +5. CentOS 7 and RHEL 7 - ``` bash - sudo yum-config-manager --enable extras - sudo yum install dnf -y - sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y + ``` bash + sudo yum-config-manager --enable extras + sudo yum install dnf -y + sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y - # Use puias computational repo - curl -o /tmp/RPM-GPG-KEY-springdale http://springdale.math.ias.edu/data/puias/7/x86_64/os/RPM-GPG-KEY-springdale - sudo mv /tmp/RPM-GPG-KEY-springdale /etc/pki/rpm-gpg/RPM-GPG-KEY-springdale + # Use puias computational repo + curl -o /tmp/RPM-GPG-KEY-springdale http://springdale.math.ias.edu/data/puias/7/x86_64/os/RPM-GPG-KEY-springdale + sudo mv /tmp/RPM-GPG-KEY-springdale /etc/pki/rpm-gpg/RPM-GPG-KEY-springdale - tee > /tmp/puias_computational.repo << EOF - [puias_computational] - name=PUIAS computational Base $releasever - $basearch - mirrorlist=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch/mirrorlist - #baseurl=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch - enabled=1 - gpgcheck=1 - gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-puias - EOF - sudo mv /tmp/puias_computational.repo /etc/yum.repos.d + tee > /tmp/puias_computational.repo << EOF + [puias_computational] + name=PUIAS computational Base $releasever - $basearch + mirrorlist=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch/mirrorlist + # baseurl=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch + enabled=1 + gpgcheck=1 + gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-puias + EOF + sudo mv /tmp/puias_computational.repo /etc/yum.repos.d - sudo dnf update -y - ``` + sudo dnf update -y + ``` ### Fedora @@ -110,30 +108,30 @@ sudo apt install r-base-dev build-essential make automake curl openssl libxml2-d This R package relies on several Python packages. Users can install these packages using one of the following methods: -1. System package manager on Linux +1. System package manager on Linux - ``` bash - # On Fedora, you can run: - sudo dnf install python3-numpy python3-sklearn python3-scipy python3-seaborn python3-tqdm python3-beautifulsoup4 python3-selenium python3-lxml + ``` bash + # On Fedora, you can run: + sudo dnf install python3-numpy python3-sklearn python3-scipy python3-seaborn python3-tqdm python3-beautifulsoup4 python3-selenium python3-lxml - # On Debian, you can run: - sudo apt install python3-numpy python3-sklearn-pandas python3-scipy python3-seaborn python3-tqdm python3-bs4 python3-selenium python3-lxml - ``` + # On Debian, you can run: + sudo apt install python3-numpy python3-sklearn-pandas python3-scipy python3-seaborn python3-tqdm python3-bs4 python3-selenium python3-lxml + ``` -2. Python pip +2. Python pip - ``` bash - pip3 install numpy scikit-learn scipy seaborn tqdm beautifulsoup4 selenium lxml - ``` + ``` bash + pip3 install numpy scikit-learn scipy seaborn tqdm beautifulsoup4 selenium lxml + ``` -3. Anaconda / Miniconda +3. Anaconda / Miniconda - ``` bash - conda create -n labyrinth numpy scikit-learn scipy seaborn tqdm beautifulsoup4 selenium lxml r-tidyverse r-devtools r-rcppeigen r-rcppprogress r-fastmatch r-rpca r-future.apply r-pbapply r-dbi r-rsqlite r-bursts r-patchwork r-furrr r-datapreparation r-tokenizers r-reticulate r-knitr r-progressr r-future.callr r-hrbrthemes r-proc r-ggthemes r-meta r-ggally r-matrixtests r-corrplot r-statix bioconductor-tcgabiolinks bioconductor-clusterprofiler bioconductor-fgsea bioconductor-deseq2 bioconductor-m3c -c bioconda -c conda-forge - conda activate labyrinth - Rscript -e "devtools::install_version('dbparser', version = '1.2.0')" - Rscript -e "remotes::install_github('randef1ned/word2vec')" - ``` + ``` bash + conda create -n labyrinth numpy scikit-learn scipy seaborn tqdm beautifulsoup4 selenium lxml r-tidyverse r-devtools r-rcppeigen r-rcppprogress r-fastmatch r-rpca r-future.apply r-pbapply r-dbi r-rsqlite r-bursts r-patchwork r-furrr r-datapreparation r-tokenizers r-reticulate r-knitr r-progressr r-future.callr r-hrbrthemes r-proc r-ggthemes r-meta r-ggally r-matrixtests r-corrplot r-statix bioconductor-tcgabiolinks bioconductor-clusterprofiler bioconductor-fgsea bioconductor-deseq2 bioconductor-m3c -c bioconda -c conda-forge + conda activate labyrinth + Rscript -e "devtools::install_version('dbparser', version = '1.2.0')" + Rscript -e "remotes::install_github('randef1ned/word2vec')" + ``` ### R @@ -144,6 +142,12 @@ remotes::install_github('randef1ned/word2vec') remotes::install_github('randef1ned/diffusr') ``` +This project uses a [custom fork](https://github.com/randef1ned/diffusr) +of the diffusr package, which is maintained by @randef1ned. This fork +optimizes the computational execution of the package, providing improved +performance compared to the original version. [View the documentation +online](https://diffusr.yinchun.su/). + ## Folder structure The `tools/` folder in this GitHub repository contains the code and @@ -152,64 +156,64 @@ scripts required for the pre-processing and processing steps of ### Preprocessing steps -1. Parse the drug names: +1. Parse the drug names: - - `extract_names.Rmd`: This R Markdown file is used for parsing and - extracting names from the data. + - `extract_names.Rmd`: This R Markdown file is used for parsing and + extracting names from the data. -2. Download and read the data: +2. Download and read the data: - - `parse_wos.py`: This Python script reads and parses the Web of - Science (WoS) data. - - `export_edge_tsv.py`: This Python script exports the edge data as - a TSV (tab-separated values) file. - - `download_mesh_synonyms.py`: This Python script downloads and - processes the MeSH (Medical Subject Headings) synonyms. - - `clinical_trials_parser.py`: This Python script parses the - clinical trials data. - - `clean_wos.py`: This Python script cleans and preprocesses the WoS - data. + - `parse_wos.py`: This Python script reads and parses the Web of + Science (WoS) data. + - `export_edge_tsv.py`: This Python script exports the edge data as + a TSV (tab-separated values) file. + - `download_mesh_synonyms.py`: This Python script downloads and + processes the MeSH (Medical Subject Headings) synonyms. + - `clinical_trials_parser.py`: This Python script parses the + clinical trials data. + - `clean_wos.py`: This Python script cleans and preprocesses the WoS + data. -3. Identify stopwords and run word2vec: +3. Identify stopwords and run word2vec: - - `identify_stopwords.Rmd`: This R Markdown file is used to identify - and handle stopwords in the data. - - `run_word2vec.R`: This R script runs the word2vec algorithm on the - tokenized data. - - `compute_distance.R`: This R script computes the distances between - the word2vec embeddings for each drug. + - `identify_stopwords.Rmd`: This R Markdown file is used to identify + and handle stopwords in the data. + - `run_word2vec.R`: This R script runs the word2vec algorithm on the + tokenized data. + - `compute_distance.R`: This R script computes the distances between + the word2vec embeddings for each drug. -4. Construct large knowledge network: +4. Construct large knowledge network: - - `build_network.R`: This R script builds the large network by - linking the clinical trials IDs and paper IDs. + - `build_network.R`: This R script builds the large network by + linking the clinical trials IDs and paper IDs. -5. Extract the stages of the drugs: +5. Extract the stages of the drugs: - - `drug_status.R`: This R script extracts the stages of the drugs - from the data. + - `drug_status.R`: This R script extracts the stages of the drugs + from the data. ### Construct the model of `labyrinth` and validate -1. Construct the main knowledge network: +1. Construct the main knowledge network: - - `main_network.R`: This R script constructs the main network for - the project. + - `main_network.R`: This R script constructs the main network for + the project. -2. Compute the details in the citation network: +2. Compute the details in the citation network: - - `citation_metric.R`: This R script computes various metrics and - details in the citation network. + - `citation_metric.R`: This R script computes various metrics and + details in the citation network. -3. Download MeSH structures for validation: +3. Download MeSH structures for validation: - - `download_mesh_structure.py`: This Python script downloads the - MeSH structure data for validation purposes. + - `download_mesh_structure.py`: This Python script downloads the + MeSH structure data for validation purposes. -4. Download TCGA and process the data for validation: +4. Download TCGA and process the data for validation: - - `download_tcga.R`: This R script downloads and processes the TCGA - (The Cancer Genome Atlas) data for validation. + - `download_tcga.R`: This R script downloads and processes the TCGA + (The Cancer Genome Atlas) data for validation. ## Data preparation diff --git a/vignettes/training.Rmd b/vignettes/training.Rmd index 62b89f0..88dfb9f 100644 --- a/vignettes/training.Rmd +++ b/vignettes/training.Rmd @@ -2,13 +2,13 @@ title: "Training the model" author: "Yinchun Su" output: - html_document: - toc: true - standalone: true md_document: variant: gfm toc: true standalone: true + html_document: + toc: true + standalone: true vignette: > %\VignetteIndexEntry{Preface} %\VignetteEncoding{UTF-8} @@ -73,7 +73,7 @@ To use the EPEL repository, it is sufficient to download and install the appropr [puias_computational] name=PUIAS computational Base $releasever - $basearch mirrorlist=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch/mirrorlist - #baseurl=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch + # baseurl=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-puias @@ -132,6 +132,7 @@ devtools::install_version('dbparser', version = '1.2.0') remotes::install_github('randef1ned/word2vec') remotes::install_github('randef1ned/diffusr') ``` +This project uses a [custom fork](https://github.com/randef1ned/diffusr) of the diffusr package, which is maintained by @randef1ned. This fork optimizes the computational execution of the package, providing improved performance compared to the original version. [View the documentation online](https://diffusr.yinchun.su/). ## Folder structure