Standalone program for the paper "mACPpred 2.0: Integrating NLP-derived and conventional ML-based probabilistic features for accurate anticancer peptide identification using stacked deep learning"
Introduction • Installation • Getting Started • Citation • References
This repository provides the standalone program that was added to the mACPpred 2.0 web server at https://balalab-skku.org/mACPpred2/. The baseline and final models are available via Zenodo at
conda create -n mACPpred2 python=3.9.12
conda activate mACPpred2
conda install -c conda-forge cudatoolkit=11.7.0
python -m pip install nvidia-cudnn-cu11==8.6.0.163 --no-cache-dir
python -m pip install tensorflow==2.11.* --no-cache-dir
python -m pip install chardet --no-cache-dir
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'CUDNN_PATH=$(dirname $(python -c "import nvidia.cudnn; print(nvidia.cudnn.__file__)"))' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib:$CUDNN_PATH/lib' >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
source $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
python -m pip install bio-embeddings[seqvec] --no-cache-dir
python -m pip install scipy==1.10.1 --no-cache-dir
python -m pip install protobuf==3.20.* --no-cache-dir
python -m pip install bio-embeddings[all] --no-cache-dir
python -m pip uninstall torch
python -m pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117 --no-cache-dir
python -m pip install peptidy==0.0.1 --no-cache-dir
python -m pip install protlearn==0.0.3 --no-cache-dir
python -m pip install catboost==1.2 lightgbm==3.3.5 scikit-learn==0.24.2 xgboost==0.82 --no-cache-dir
git clone https://github.com/nhattruongpham/mACPpred2.git
cd mACPpred2
- Please download the baseline and final models via Zenodo at
- For the baseline models, please extract and put all *.pkl files into the models/baseline_models folder.
- For the final models, please please extract and put all *.h5 files into models/final_models folder.
CUDA_VISIBLE_DEVICES=<GPU_NUMBER> python predictor.py --input_file <PATH_TO_INPUT_FILE> --output_file <PATH_TO_OUTPUT_FILE>
CUDA_VISIBLE_DEVICES=0 python predictor.py --input_file examples/test.fasta --output_file result.csv
If you use this code or part of it, please cite the following papers:
@article{Sangaraju2024article,
title={mACPpred 2.0: Integrating NLP-derived and conventional ML-based probabilistic features for accurate anticancer peptide identification using stacked deep learning},
author={Sangaraju, Vinoth Kumar and Pham, Nhat Truong and Wei, Leyi and Yu, Xue and Manavalan, Balachandran},
journal={},
volume={},
number={},
pages={},
year={},
publisher={}
}
[1] Dallago, C., Schütze, K., Heinzinger, M., Olenyi, T., Littmann, M., Lu, A. X., Yang, K. K., Min, S., Yoon, S., Morton, J. T., & Rost, B. (2021). Learned embeddings from deep learning to visualize and predict protein sets. Current Protocols, 1, e113.
[2] Özçelik, R., van Weesep, L., de Ruiter, S., & Grisoni, F. (2024). peptidy: A light-weight Python library for peptide representation in machine learning.
[3] Dorfer, T. (2021). protlearn: A Python package for extracting protein sequence features. (v0.0.3 on Mar 24, 2021) URL: https://github.com/tadorfer/protlearn.