Skip to content

Commit

Permalink
Merge pull request #1 from karolisr/main
Browse files Browse the repository at this point in the history
Edits to the install script and documentation.
  • Loading branch information
karolisr authored Jun 23, 2024
2 parents 475aadc + b87eff0 commit cc8fc9b
Show file tree
Hide file tree
Showing 3 changed files with 105 additions and 73 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ bin/*
semblans_fa/*
plant_cdna/*
project/*
*.code-workspace
35 changes: 28 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,32 +12,53 @@ Through the collation of several external packages and the leveraging of C++ dat

All documentation for Semblans can be found in the [wiki](https://github.com/gladshire/Semblans/wiki)

# Dependencies

Semblans will install most of the dependencies it requires, but make sure you have working installations of:
- [**bowtie2**](https://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
- [**jellyfish**](https://genome.umd.edu/jellyfish.html)
- [**salmon**](https://salmon.readthedocs.io/en/latest/salmon.html)
- [**samtools**](http://www.htslib.org)
- [**numpy**](https://numpy.org)

On Ubuntu this can be done by running:
```bash
sudo apt update
sudo apt install bowtie2 jellyfish salmon samtools python3-numpy
```

# Installation

If the user has downloaded a pre-compiled Semblans release, this step is not necessary. To install Semblans, navigate to its root directory and then call:
```
```bash
./install.sh
```
Please allow several minutes for Semblans to set up the necessary packages.

By default, Semblans will not retrieve the PantherDB functional protein database for sequence annotation. **If the user intends to utilize Semblans' annotation functionality, they should instead call the following installation command:
```
```bash
./install.sh --with-panther
```
**Be aware that the PantherDB database is large (~17GB compressed; ~80GB uncompressed), and can take some time to download.**

# Quick Start / Test data

Included with Semblans is a directory called 'examples'. This directory contains a very small short read dataset ("ChloroSubSet") for testing/verifying functionality of the Semblans pipeline. To test, uncompress the data from **ChloroSubSet.tar.gz**. The user should then **ensure they have a reference proteome**, as one is necessary for several of the pipeline's postprocessing stages. Links to broad, kingdom-level reference proteomes are hosted at the bottom of this document. In this example, I use the kingdom-level plant proteome. Once prepared, the user may call:
```
semblans --left ChloroSubSet_1.fq --right ChloroSubSet_2.fq --prefix ChloroSubSet --ref-proteome ensembl_plant.pep.all.fa --threads 8 --ram 10
```bash
semblans \
--left ChloroSubSet_1.fq \
--right ChloroSubSet_2.fq \
--prefix ChloroSubSet \
--ref-proteome ensembl_plant.pep.all.fa \
--threads 4 \
--ram 10
```
Some users may experience issues, particularly during the transcript assembly phase during Trinity. Common errors and solutions are hosted on our GitHub's [wiki page](https://github.com/gladshire/Semblans/wiki/Common-Errors-&-Solutions#issues-at-trinity-stage). As cataloguing these is an ongoing process, we urge users to post an issue on the Semblans repository page detailing their problem if it persists or is otherwise unaddressed by this page.

Reference peptide sets for **postprocess**

[**Ensembl animal reference**](https://uofi.app.box.com/s/0rlp6q0u5uvc161mzbdr3c0xoiti63sk)
[**Ensembl animal reference**](https://www.dropbox.com/scl/fi/n49jm9i1yscrfrsj1dnq8/ensembl_animals.pep.all.fa.gz?rlkey=yemush6bm36wr4fu7dpe8h5e0&st=98kgb83l&dl=1)

[**Ensembl plant reference**](https://uofi.app.box.com/s/lvg7x2qrxvg8hfcmue9xv4y9t1rgfb48)
[**Ensembl plant reference**](https://www.dropbox.com/scl/fi/hbvtnd9wsiwt8k7gakcmq/ensembl_plant.pep.all.fa.gz?rlkey=8cp9sn5wrt9uu4vc2pmg8xvin&st=1gesuqq0&dl=1)

[**Ensembl fungi reference**](https://uofi.box.com/s/qc4nmun4apb0pik5fqxukn4qvn3wm943)
[**Ensembl fungi reference**](https://www.dropbox.com/scl/fi/8as6tci331utcrqrl7txz/ensembl_fungi.pep.all.fa.gz?rlkey=eyhsv35lelnao5xgd7s9dy51e&st=oc9dz9s6&dl=1)
142 changes: 76 additions & 66 deletions install.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
#!/bin/bash
#!/usr/bin/env bash
####################################################
# Suppress printing of error messages
# exec 2>/dev/null
# Stop on first error
set -o errexit
# Set trap on ERR to be inherited by shell functions
set -o errtrace
# Trap errors
trap 'echo Error at line: $LINENO' ERR
####################################################

# Determine whether to install PANTHER for annotation
install_panther=false
Expand All @@ -12,7 +22,6 @@ for flag in "$@"; do
fi
done


echo "Initiating install of Semblans"

if $install_panther
Expand All @@ -24,12 +33,15 @@ else
echo "If user wishes to perform annotations, they should include the --with-panther / -p flag."
fi

# Prepare Semblans file structure
mkdir include &>/dev/null
mkdir lib &>/dev/null
mkdir external &>/dev/null
# Prepare Semblans directory structure
rm -rf ./include ./lib ./external ./data

mkdir -p ./include
mkdir -p ./lib
mkdir -p ./external

if $install_panther; then
mkdir data &>/dev/null
mkdir -p ./data
fi

#========================================================================
Expand All @@ -39,18 +51,16 @@ fi
echo "Now installing required libraries"

# Install boost libraries
if ( [ ! -f ./lib/libboost_filesystem.a ] ||
[ ! -f ./lib/libboost_regex.a ] ||
[ ! -f ./lib/libboost_system.a ] ||
[ ! -f ./lib/libboost_locale.a ] ); then
if [ ! -f ./lib/libboost_filesystem.a ] ||
[ ! -f ./lib/libboost_regex.a ] ||
[ ! -f ./lib/libboost_system.a ] ||
[ ! -f ./lib/libboost_locale.a ]; then
echo " Installing Boost libraries ..."
wget -q https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/boost_1_81_0.tar.gz
tar -xf boost_1_81_0.tar.gz
cd boost_1_81_0
cd boost_1_81_0 || return 1
./bootstrap.sh --prefix=../
./b2 install cxxflags="-std=c++11" link=static
make
make install
mv LICENSE_1_0.txt ../include/boost/
cd ..
rm -rf boost_1_81_0*
Expand All @@ -73,13 +83,13 @@ fi
echo " Installing libconfini library ..."
wget -q https://github.com/madmurphy/libconfini/releases/download/1.16.4/libconfini-1.16.4-x86_64-bin.tar.xz
tar -xf libconfini-1.16.4-x86_64-bin.tar.xz
mkdir ./include/libconfini
mkdir -p ./include/libconfini
mv ./usr/include/* ./include/libconfini/
mv ./usr/lib/* ./lib/
mv ./usr/share/doc/libconfini/AUTHORS ./include/libconfini/
mv ./usr/share/doc/libconfini/COPYING ./include/libconfini/
rm libconfini-1.16.4-x86_64-bin.tar.xz
rm -rf ./usr/
rm -rf ./usr

# Install libcurl
echo " Installing libcurl library ..."
Expand All @@ -102,7 +112,7 @@ if [ ! -e "./external/hmmer/bin/hmmscan" ]; then
wget -q http://eddylab.org/software/hmmer/hmmer.tar.gz
tar -xf hmmer.tar.gz -C ./external/
mv ./external/hmmer* ./external/hmmer
cd ./external/hmmer
cd ./external/hmmer || return 1
./configure --prefix "$PWD"
make
make install
Expand Down Expand Up @@ -132,13 +142,13 @@ fi


# Install NCBI sra-tools
if ( [ ! -e "./external/sratoolkit/bin/prefetch" ] &&
[ ! -e "./external/sratoolkit/bin/fasterq-dump" ] ); then
if [ ! -e "./external/sratoolkit/bin/prefetch" ] &&
[ ! -e "./external/sratoolkit/bin/fasterq-dump" ]; then
echo "Installing SRA-Tools ..."
wget -q --output-document sratoolkit.tar.gz https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-ubuntu64.tar.gz
tar -xf sratoolkit.tar.gz -C ./external/
mv ./external/sratoolkit* ./external/sratoolkit
cd ./external/sratoolkit
cd ./external/sratoolkit || return 1
wget -q https://raw.githubusercontent.com/ncbi/sra-tools/master/LICENSE
cd ../../
rm sratoolkit.tar.gz
Expand All @@ -163,9 +173,9 @@ if [ ! -e "./external/Rcorrector/rcorrector" ]; then
wget -q --output-document rcorrector.tar.gz https://github.com/mourisl/Rcorrector/archive/refs/tags/v1.0.7.tar.gz
tar -xf rcorrector.tar.gz -C ./external/
mv ./external/Rcorrector* ./external/Rcorrector
cd ./external/Rcorrector
cd ./external/Rcorrector || return 1
make
cd ../../
cd ../..
rm rcorrector.tar.gz
else
echo "Rcorrector already installed. Skipping ..."
Expand All @@ -177,9 +187,9 @@ if [ ! -e "./external/Trimmomatic/trimmomatic-0.39.jar" ]; then
wget -q --output-document trimmomatic.zip http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip
unzip trimmomatic.zip -d ./external/
mv ./external/Trimmomatic* ./external/Trimmomatic
cd ./external/Trimmomatic/adapters/
cd ./external/Trimmomatic/adapters || return 1
echo -e "$(cat NexteraPE-PE.fa)\n$(cat TruSeq2-PE.fa)\n$(cat TruSeq2-SE.fa)\n$(cat TruSeq3-PE-2.fa)\n$(cat TruSeq3-PE.fa)\n$(cat TruSeq3-SE.fa)" > TruSeq_all.fa
cd ../../../
cd ../../..
rm trimmomatic.zip
else
echo "Trimmomatic already installed. Skipping ..."
Expand All @@ -191,9 +201,9 @@ if [ ! -e "./external/kraken2/kraken2" ]; then
wget -q --output-document kraken2.tar.gz https://github.com/DerrickWood/kraken2/archive/refs/tags/v2.1.2.tar.gz
tar -xf kraken2.tar.gz -C ./external/
mv ./external/kraken2-2.1.2 ./external/kraken2
cd ./external/kraken2
cd ./external/kraken2 || return 1
./install_kraken2.sh .
cd ../../
cd ../..
rm kraken2.tar.gz
else
echo "Kraken2 already installed. Skipping ..."
Expand All @@ -205,19 +215,19 @@ if [ ! -e "./external/trinityrnaseq/Trinity" ]; then
wget -q --output-document trinity.tar.gz https://github.com/trinityrnaseq/trinityrnaseq/releases/download/Trinity-v2.15.1/trinityrnaseq-v2.15.1.FULL.tar.gz
tar -xf trinity.tar.gz -C ./external/
mv ./external/trinityrnaseq-v2.15.1 ./external/trinityrnaseq
cd ./external/trinityrnaseq/trinity-plugins/bamsifter
cd ./external/trinityrnaseq/trinity-plugins/bamsifter || return 1
sed -i '1s/^/#include <string> \n/' sift_bam_max_cov.cpp
cd ../../
cd ../..
make
cd ../../
cd ../..
rm trinity.tar.gz
else
echo "Trinity already installed. Skipping ..."
fi

# Install NCBI BLAST
if ( [ ! -e "./external/ncbi-blast+/bin/blastx" ] ||
[ ! -e "./external/ncbi-blast+/bin/blastp" ] ); then
if [ ! -e "./external/ncbi-blast+/bin/blastx" ] ||
[ ! -e "./external/ncbi-blast+/bin/blastp" ]; then
echo "Installing NCBI-BLAST+ ..."
wget -q --output-document ncbi-blast.tar.gz https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.15.0/ncbi-blast-2.15.0+-x64-linux.tar.gz
tar -xf ncbi-blast.tar.gz -C ./external/
Expand All @@ -228,67 +238,67 @@ else
fi

# Install Diamond
if ( [ ! -e "./external/diamond/diamond" ] ); then
if [ ! -e "./external/diamond/diamond" ]; then
echo "Installing Diamond ..."
mkdir ./external/diamond-2.1.7
mkdir -p ./external/diamond-2.1.7
wget -q --output-document diamond.tar.gz https://github.com/bbuchfink/diamond/releases/download/v2.1.7/diamond-linux64.tar.gz
tar -xf diamond.tar.gz -C ./external/diamond-2.1.7/
mv ./external/diamond-2.1.7 ./external/diamond
cd ./external/diamond
cd ./external/diamond || return 1
wget -q https://raw.githubusercontent.com/bbuchfink/diamond/master/LICENSE
cd ../../
cd ../..
rm diamond.tar.gz
else
echo "Diamond already installed. Skipping ..."
fi


# Install Corset
if ( [ ! -e "./external/corset/corset" ] ); then
if [ ! -e "./external/corset/corset" ]; then
echo "Installing Corset ..."
wget -q --output-document corset.tar.gz https://github.com/Oshlack/Corset/releases/download/version-1.09/corset-1.09-linux64.tar.gz
tar -xf corset.tar.gz -C ./external/
mv ./external/corset-1.09-linux64 ./external/corset
cd ./external/corset/
cd ./external/corset || return 1
chmod -x LICENSE
chmod -x COPYING
cd ../../
cd ../..
rm corset.tar.gz
else
echo "Corset already installed. Skipping ..."
fi

# Install STAR
#if ( [ ! -e "./external/STAR/bin" ] ); then
# echo "Installing STAR ..."
# wget -q --output-document star.tar.gz https://github.com/alexdobin/STAR/archive/refs/tags/2.7.11a.tar.gz
# tar -xf star.tar.gz -C ./external/
# mv ./external/STAR* ./external/STAR
# cd ./external/STAR/source/
# make STAR
# cd ../../../
# rm star.tar.gz
#else
# echo "STAR alread installed. Skipping ..."
#fi
# if [ ! -e "./external/STAR/bin" ]; then
# echo "Installing STAR ..."
# wget -q --output-document star.tar.gz https://github.com/alexdobin/STAR/archive/refs/tags/2.7.11a.tar.gz
# tar -xf star.tar.gz -C ./external/
# mv ./external/STAR* ./external/STAR
# cd ./external/STAR/source || return 1
# make STAR
# cd ../../..
# rm star.tar.gz
# else
# echo "STAR alread installed. Skipping ..."
# fi

# Install Salmon
if ( [ ! -e "./external/salmon/bin/salmon" ] ); then
if [ ! -e "./external/salmon/bin/salmon" ]; then
echo "Installing Salmon ..."
wget -q --output-document salmon.tar.gz https://github.com/COMBINE-lab/salmon/releases/download/v1.10.0/salmon-1.10.0_linux_x86_64.tar.gz
tar -xf salmon.tar.gz -C ./external/
mv ./external/salmon* ./external/salmon
cd ./external/salmon
cd ./external/salmon || return 1
wget -q https://raw.githubusercontent.com/COMBINE-lab/salmon/master/LICENSE
cd ../../
cd ../..
rm salmon.tar.gz
else
echo "Salmon already installed. Skipping ..."
fi

# Install TransDecoder
if ( [ ! -e "./external/TransDecoder/TransDecoder.LongOrfs" ] ||
[ ! -e "./external/TransDecoder/TransDecoder.Predict" ] ); then
if [ ! -e "./external/TransDecoder/TransDecoder.LongOrfs" ] ||
[ ! -e "./external/TransDecoder/TransDecoder.Predict" ]; then
echo "Installing TransDecoder ..."
wget -q --output-document transdecoder.tar.gz https://github.com/TransDecoder/TransDecoder/archive/refs/tags/TransDecoder-v5.7.0.tar.gz
tar -xf transdecoder.tar.gz -C ./external/
Expand Down Expand Up @@ -317,8 +327,8 @@ if $install_panther ; then
fi

# Check SRA Toolkit installation
if ( [ ! -e "./external/sratoolkit/bin/prefetch" ] &&
[ -e "./external/sratoolkit/bin/fasterq-dump" ] ); then
if [ ! -e "./external/sratoolkit/bin/prefetch" ] &&
[ -e "./external/sratoolkit/bin/fasterq-dump" ]; then
packagesNotInstalled+=("SRA-Tools")
fi

Expand Down Expand Up @@ -348,8 +358,8 @@ if [ ! -e "./external/trinityrnaseq/Trinity" ]; then
fi

# Check BLAST installation
if ( [ ! -e "./external/ncbi-blast+/bin/blastx" ] ||
[ ! -e "./external/ncbi-blast+/bin/blastp" ] ); then
if [ ! -e "./external/ncbi-blast+/bin/blastx" ] ||
[ ! -e "./external/ncbi-blast+/bin/blastp" ]; then
packagesNotInstalled+=("BLAST+")
fi

Expand All @@ -369,8 +379,8 @@ if [ ! -e "./external/salmon/bin/salmon" ]; then
fi

# Check TransDecoder installation
if ( [ ! -e "./external/TransDecoder/TransDecoder.LongOrfs" ] ||
[ ! -e "./external/TransDecoder/TransDecoder.Predict" ] ); then
if [ ! -e "./external/TransDecoder/TransDecoder.LongOrfs" ] ||
[ ! -e "./external/TransDecoder/TransDecoder.Predict" ]; then
packagesNotInstalled+=("TransDecoder")
fi

Expand All @@ -391,14 +401,14 @@ fi
#========================================================================

echo "Now building Semblans ..."
mkdir ./bin/
mkdir -p ./bin
make

# Determine if build was successful
if ( [ ! -e "./bin/preprocess" ] ||
[ ! -e "./bin/assemble" ] ||
[ ! -e "./bin/postprocess" ] ||
[ ! -e "./bin/semblans" ] ); then
if [ ! -e "./bin/preprocess" ] ||
[ ! -e "./bin/assemble" ] ||
[ ! -e "./bin/postprocess" ] ||
[ ! -e "./bin/semblans" ]; then
echo "ERROR: Semblans failed to build from source"
exit 1
else
Expand Down

0 comments on commit cc8fc9b

Please sign in to comment.