I developed and tested labyrinth
on Fedora
Linux version 38 and 39. While this package
does not contain any operating system-specific code, it has not been
tested on other operating systems. In theory, labyrinth
should work on
other Unix-like operating systems as long as the required dependencies
are installed.
If you need to rebuild this model, it is strongly recommended to use Fedora Linux or its downstream operating system (i.e., Red Hat Enterprise Linux, CentOS Stream, Rocky Linux, Alma Linux) and follow the instructions above. Using a different environment may lead to unexpected issues or differences in results.
The Fedora RPMs for R have been ported to CentOS/RHEL by the project
Extra Packages for Enterprise Linux (EPEL). These RPMs are also
compatible with distributions derived from CentOS/RHEL. To use the EPEL
repository, it is sufficient to download and install the appropriate
epel-release
RPM. Then R can be installed as described in the
Fedora section.
-
CentOS Stream 9
sudo dnf config-manager --set-enabled crb sudo dnf install epel-release epel-next-release -y
-
Rocky Linux 9, AlmaLinux 9, and RHEL 9
sudo dnf config-manager --set-enabled crb sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm -y
-
CentOS Stream 8
sudo dnf config-manager --set-enabled powertools sudo dnf install epel-release epel-next-release -y
-
Rocky Linux 8, AlmaLinux 8, and RHEL 8
sudo dnf config-manager --set-enabled powertools sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm -y
-
CentOS 7 and RHEL 7
sudo yum-config-manager --enable extras sudo yum install dnf -y sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y # Use puias computational repo curl -o /tmp/RPM-GPG-KEY-springdale http://springdale.math.ias.edu/data/puias/7/x86_64/os/RPM-GPG-KEY-springdale sudo mv /tmp/RPM-GPG-KEY-springdale /etc/pki/rpm-gpg/RPM-GPG-KEY-springdale tee > /tmp/puias_computational.repo << EOF [puias_computational] name=PUIAS computational Base $releasever - $basearch mirrorlist=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch/mirrorlist # baseurl=http://puias.math.ias.edu/data/puias/computational/$releasever/$basearch enabled=1 gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-puias EOF sudo mv /tmp/puias_computational.repo /etc/yum.repos.d sudo dnf update -y
sudo dnf install R @development-tools automake libxml2-devel libcurl-devel fontconfig-devel openssl-devel harfbuzz-devel fribidi-devel libgit2-devel freetype-devel libpng-devel libtiff-devel libjpeg-devel python3-pip ghostscript pandoc R-flexiblas flexiblas-*
sudo apt install r-base-dev build-essential make automake curl openssl libxml2-dev libcurl4-openssl-dev libfontconfig1-dev libssl-dev libharfbuzz-dev libfribidi-dev libgit2-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev python3-pip ghostscript pandoc libopenblas-dev libopenblas-openmp-dev
This R package relies on several Python packages. Users can install these packages using one of the following methods:
-
System package manager on Linux
# On Fedora, you can run: sudo dnf install python3-numpy python3-sklearn python3-scipy python3-seaborn python3-tqdm python3-beautifulsoup4 python3-selenium python3-lxml # On Debian, you can run: sudo apt install python3-numpy python3-sklearn-pandas python3-scipy python3-seaborn python3-tqdm python3-bs4 python3-selenium python3-lxml
-
Python pip
pip3 install numpy scikit-learn scipy seaborn tqdm beautifulsoup4 selenium lxml
-
Anaconda / Miniconda
conda create -n labyrinth numpy scikit-learn scipy seaborn tqdm beautifulsoup4 selenium lxml r-tidyverse r-devtools r-rcppeigen r-rcppprogress r-fastmatch r-rpca r-future.apply r-pbapply r-dbi r-rsqlite r-bursts r-patchwork r-furrr r-datapreparation r-tokenizers r-reticulate r-knitr r-progressr r-future.callr r-hrbrthemes r-proc r-ggthemes r-meta r-ggally r-matrixtests r-corrplot r-statix bioconductor-tcgabiolinks bioconductor-clusterprofiler bioconductor-fgsea bioconductor-deseq2 bioconductor-m3c -c bioconda -c conda-forge conda activate labyrinth Rscript -e "devtools::install_version('dbparser', version = '1.2.0')" Rscript -e "remotes::install_github('randef1ned/word2vec')"
install.packages(c('tidyverse', 'devtools', 'igraph', 'future.apply', 'pbapply', 'DBI', 'RSQLite', 'bursts', 'patchwork', 'furrr', 'M3C', 'dataPreparation', 'tokenizers', 'reticulate', 'magrittr', 'knitr', 'progressr', 'future.callr', 'hrbrthemes', 'pROC', 'ggthemes', 'meta', 'GGally', 'matrixTests', 'rstatix', 'corrplot', 'BiocManager'))
devtools::install_version('dbparser', version = '1.2.0')
remotes::install_github('randef1ned/word2vec')
remotes::install_github('randef1ned/diffusr')
This project uses a custom fork of the diffusr package, which is maintained by @randef1ned. This fork optimizes the computational execution of the package, providing improved performance compared to the original version. View the documentation online.
The tools/
folder in this GitHub repository contains the code and
scripts required for the pre-processing and processing steps of
labyrinth
.
-
Parse the drug names:
extract_names.Rmd
: This R Markdown file is used for parsing and extracting names from the data.
-
Download and read the data:
parse_wos.py
: This Python script reads and parses the Web of Science (WoS) data.export_edge_tsv.py
: This Python script exports the edge data as a TSV (tab-separated values) file.download_mesh_synonyms.py
: This Python script downloads and processes the MeSH (Medical Subject Headings) synonyms.clinical_trials_parser.py
: This Python script parses the clinical trials data.clean_wos.py
: This Python script cleans and preprocesses the WoS data.
-
Identify stopwords and run word2vec:
identify_stopwords.Rmd
: This R Markdown file is used to identify and handle stopwords in the data.run_word2vec.R
: This R script runs the word2vec algorithm on the tokenized data.compute_distance.R
: This R script computes the distances between the word2vec embeddings for each drug.
-
Construct large knowledge network:
build_network.R
: This R script builds the large network by linking the clinical trials IDs and paper IDs.
-
Extract the stages of the drugs:
drug_status.R
: This R script extracts the stages of the drugs from the data.
-
Construct the main knowledge network:
main_network.R
: This R script constructs the main network for the project.
-
Compute the details in the citation network:
citation_metric.R
: This R script computes various metrics and details in the citation network.
-
Download MeSH structures for validation:
download_mesh_structure.py
: This Python script downloads the MeSH structure data for validation purposes.
-
Download TCGA and process the data for validation:
download_tcga.R
: This R script downloads and processes the TCGA (The Cancer Genome Atlas) data for validation.
Data | Link |
---|---|
ChEMBL | http://doi.org/10.6019/CHEMBL.database.32 |
UniChEM | https://ftp.ebi.ac.uk/pub/databases/chembl/UniChem/data/table_dumps/ |
DrugBank | https://go.drugbank.com/releases/5-1-10 |
Comparative Toxicogenomic Database | https://ctdbase.org/downloads/ |
Cochrane Library | https://www.cochranelibrary.com/central |
GSE65185 | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE65185 |
DisGENET | https://www.disgenet.org/download/sqlite/current/2020 |