This repository has been archived by the owner on Mar 25, 2023. It is now read-only.
forked from nanxstats/protr
-
Notifications
You must be signed in to change notification settings - Fork 0
protr: Generating Various Numerical Representation Schemes of Protein Sequences
License
koefoed/protr
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# protr Comprehensive toolkit for generating various numerical representation schemes of protein sequence. The descriptors included in the protr package are extensively utilized in bioinformatics and chemogenomics research. ## Package Description ### Commonly used descriptors * Amino acid composition * Amino acid composition * Dipeptide composition * Tripeptide composition * Autocorrelation * Normalized Moreau-Broto autocorrelation * Moran autocorrelation * Geary autocorrelation * CTD * Composition * Transition * Distribution * Conjoint Triad * Quasi-sequence-order descriptors * Sequence-order-coupling number * Quasi-sequence-order descriptors * Pseudo amino acid composition * Pseudo amino acid composition * Amphiphilic pseudo amino acid composition * Profile-based descriptors * Profile-based descriptors derived by PSSM (Position-Specific Scoring Matrix) ### Proteochemometric (PCM) modeling descriptors * Scales-based descriptors derived by principal components analysis * Scales-based descriptors derived by amino acid properties (AAindex) * Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.) * Scales-based descriptors derived by factor analysis * Scales-based descriptors derived by multidimensional scaling * BLOSUM and PAM matrix-derived descriptors ### Similarity Computation Local and global pairwise sequence alignment for protein sequences: * Between two protein sequences * Parallelized pairwise similarity calculation with a list of protein sequences GO semantic similarity measures: * Between two groups of GO terms / two Entrez Gene IDs * Parallelized pairwise similarity calculation with a list of GO terms / Entrez Gene IDs ### Miscellaneous tools and datasets * Retrieve protein sequences from UniProt * Read protein sequences in FASTA format * Read protein sequences in PDB format * Sanity check of the amino acid types appeared in the protein sequences * Protein sequence segmentation * Auto cross covariance (ACC) for generating scales-based descriptors of the same length * 20+ pre-computed 2D and 3D descriptor sets for the 20 amino acids to use with the scales-based descriptors * BLOSUM and PAM matrices for the 20 amino acids * Meta information of the 20 amino acids ## Web Server ProtrWeb, the web server built on protr, is located at: [http://cbdd.csu.edu.cn:8080/protrweb/](http://cbdd.csu.edu.cn:8080/protrweb/) ProtrWeb does not require any knowledge of R programming for the users, it is a user-friendly and one-click-to-go online platform for computing the descriptors presented in the protr package. ## Links * CRAN page: http://cran.r-project.org/web/packages/protr/ * Track development: https://github.com/road2stat/protr/ * Bug report: https://github.com/road2stat/protr/issues/ ## Authors * Nan Xiao <[email protected]> * Qing-Song Xu <[email protected]> * Dong-Sheng Cao <[email protected]> ## Publication * (to appear)
About
protr: Generating Various Numerical Representation Schemes of Protein Sequences
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- R 100.0%