Skip to content
This repository has been archived by the owner on Mar 25, 2023. It is now read-only.
/ protr Public archive
forked from nanxstats/protr

protr: Generating Various Numerical Representation Schemes of Protein Sequences

License

Notifications You must be signed in to change notification settings

koefoed/protr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# protr

Comprehensive toolkit for generating various numerical representation schemes of protein sequence. The descriptors included in the protr package are extensively utilized in bioinformatics and chemogenomics research.

## Package Description

### Commonly used descriptors

  * Amino acid composition
  
    * Amino acid composition
    * Dipeptide composition
    * Tripeptide composition

  * Autocorrelation
  
    * Normalized Moreau-Broto autocorrelation
    * Moran autocorrelation
    * Geary autocorrelation

  * CTD
  
    * Composition
    * Transition
    * Distribution

  * Conjoint Triad

  * Quasi-sequence-order descriptors
  
    * Sequence-order-coupling number
    * Quasi-sequence-order descriptors
  
  * Pseudo amino acid composition
  
    * Pseudo amino acid composition
    * Amphiphilic pseudo amino acid composition

  * Profile-based descriptors

    * Profile-based descriptors derived by PSSM (Position-Specific Scoring Matrix)

### Proteochemometric (PCM) modeling descriptors

  * Scales-based descriptors derived by principal components analysis

    * Scales-based descriptors derived by amino acid properties (AAindex)
    * Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.)

  * Scales-based descriptors derived by factor analysis

  * Scales-based descriptors derived by multidimensional scaling
  
  * BLOSUM and PAM matrix-derived descriptors

### Similarity Computation

Local and global pairwise sequence alignment for protein sequences:

  * Between two protein sequences
  * Parallelized pairwise similarity calculation with a list of protein sequences

GO semantic similarity measures:

  * Between two groups of GO terms / two Entrez Gene IDs
  * Parallelized pairwise similarity calculation with a list of GO terms / Entrez Gene IDs

### Miscellaneous tools and datasets

  * Retrieve protein sequences from UniProt
  
  * Read protein sequences in FASTA format

  * Read protein sequences in PDB format
  
  * Sanity check of the amino acid types appeared in the protein sequences
  
  * Protein sequence segmentation

  * Auto cross covariance (ACC) for generating scales-based descriptors of the same length

  * 20+ pre-computed 2D and 3D descriptor sets for the 20 amino acids to use with the scales-based descriptors

  * BLOSUM and PAM matrices for the 20 amino acids

  * Meta information of the 20 amino acids

## Web Server

ProtrWeb, the web server built on protr, is located at:

[http://cbdd.csu.edu.cn:8080/protrweb/](http://cbdd.csu.edu.cn:8080/protrweb/)

ProtrWeb does not require any knowledge of R programming for the users, it is a user-friendly and one-click-to-go online platform for computing the descriptors presented in the protr package.

## Links

  * CRAN page: http://cran.r-project.org/web/packages/protr/

  * Track development: https://github.com/road2stat/protr/

  * Bug report: https://github.com/road2stat/protr/issues/

## Authors

  * Nan Xiao <[email protected]>

  * Qing-Song Xu <[email protected]>

  * Dong-Sheng Cao <[email protected]>

## Publication

  * (to appear)

About

protr: Generating Various Numerical Representation Schemes of Protein Sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%