Skip to content

mACPpred 2.0: Stacked deep learning for anticancer peptide prediction with integrated spatial and probabilistic feature representations

License

Notifications You must be signed in to change notification settings

sangarajukumar/mACPpred2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mACPpred2

Standalone program for the paper "mACPpred 2.0: Integrating NLP-derived and conventional ML-based probabilistic features for accurate anticancer peptide identification using stacked deep learning"

stars forks license DOI

IntroductionInstallationGetting StartedCitationReferences

Introduction

This repository provides the standalone program that was added to the mACPpred 2.0 web server at https://balalab-skku.org/mACPpred2/. The baseline and final models are available via Zenodo at DOI

Installation

Creating conda environment

conda create -n mACPpred2 python=3.9.12
conda activate mACPpred2

Installing TensorFlow with CUDA support

conda install -c conda-forge cudatoolkit=11.7.0
python -m pip install nvidia-cudnn-cu11==8.6.0.163 --no-cache-dir
python -m pip install tensorflow==2.11.* --no-cache-dir
python -m pip install chardet --no-cache-dir
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

Installing bio-embeddings[1] and re-installing PyTorch with CUDA support

python -m pip install bio-embeddings[seqvec] --no-cache-dir
python -m pip install scipy==1.10.1 --no-cache-dir
python -m pip install protobuf==3.20.* --no-cache-dir
python -m pip install bio-embeddings[all] --no-cache-dir
python -m pip uninstall torch
python -m pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117 --no-cache-dir

Installing required specific packages

python -m pip install peptidy==0.0.1 --no-cache-dir
python -m pip install protlearn==0.0.3 --no-cache-dir
python -m pip install catboost==1.2 lightgbm==3.3.5 scikit-learn==0.24.2 xgboost==0.82 --no-cache-dir

Getting started

Cloning this repository

git clone https://github.com/nhattruongpham/mACPpred2.git
cd mACPpred2

Downloading basline and final models

  • Please download the baseline and final models via Zenodo at DOI
  • For the baseline models, please extract and put all *.pkl files into the models/baseline_models folder.
  • For the final models, please please extract and put all *.h5 files into models/final_models folder.

Running prediction

Usage

CUDA_VISIBLE_DEVICES=<GPU_NUMBER> python predictor.py --input_file <PATH_TO_INPUT_FILE> --output_file <PATH_TO_OUTPUT_FILE>

Example

CUDA_VISIBLE_DEVICES=0 python predictor.py --input_file examples/test.fasta --output_file result.csv

Citation

If you use this code or part of it, please cite the following papers:

@article{Sangaraju2024article,
  title={mACPpred 2.0: Integrating NLP-derived and conventional ML-based probabilistic features for accurate anticancer peptide identification using stacked deep learning},
  author={Sangaraju, Vinoth Kumar and Pham, Nhat Truong and Wei, Leyi and Yu, Xue and Manavalan, Balachandran},
  journal={},
  volume={},
  number={},
  pages={},
  year={},
  publisher={}
}

References

[1] Dallago, C., Schütze, K., Heinzinger, M., Olenyi, T., Littmann, M., Lu, A. X., Yang, K. K., Min, S., Yoon, S., Morton, J. T., & Rost, B. (2021). Learned embeddings from deep learning to visualize and predict protein sets. Current Protocols, 1, e113. DOI
[2] Özçelik, R., van Weesep, L., de Ruiter, S., & Grisoni, F. (2024). peptidy: A light-weight Python library for peptide representation in machine learning. DOI
[3] Dorfer, T. (2021). protlearn: A Python package for extracting protein sequence features. (v0.0.3 on Mar 24, 2021) URL: https://github.com/tadorfer/protlearn.

About

mACPpred 2.0: Stacked deep learning for anticancer peptide prediction with integrated spatial and probabilistic feature representations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%