forked from crux-toolkit/crux-toolkit.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
generate-peptides.html
166 lines (165 loc) · 13.8 KB
/
generate-peptides.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
<!DOCTYPE html>
<html>
<head>
<title>generate-peptides</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="../styles.css">
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script type="text/javascript">
MathJax.Hub.Config({jax: ['input/TeX','output/HTML-CSS'], displayAlign: 'left'});
</script>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-26136956-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
<script type="text/javascript">
// Main Menu
$( document ).ready(function() {
var pull = $('.btn');
menu = $('nav ul');
menuHeight = menu.height();
$(pull).on('click', function(e) {
e.preventDefault();
menu.slideToggle();
});
$(window).resize(function(){
var w = $(window).width();
if(w > 320 && menu.is(':hidden')) {
menu.removeAttr('style');
}
});
});
</script>
</head>
<body>
<div class="page-wrap">
<nav>
<div class="btn">
</div>
<img src="../images/crux-logo.png" id="logo"></a>
<ul id="navitems">
<li><a href="../index.html">Home</a></li>
<li><a href="../download.html">Download</a></li>
<li><a href="../fileformats.html">File Formats</a></li>
<li><a href="http://groups.google.com/group/crux-users">Contact</a></li> <!--Link to google support board-->
</ul>
</nav>
<div id="content" class="autogenerated">
<!-- START CONTENT -->
<h1>generate-peptides</h1>
<h2>Usage:</h2>
<p><code>crux generate-peptides [options] <protein fasta file></code></p>
<h2>Description:</h2>
<p>This command takes as input a protein FASTA file and outputs the corresponding list of peptides, as well as a matched list of decoy peptides and decoy proteins. Decoys are generated either by reversing or shuffling the non-terminal amino acids of each peptide. The program will shuffle each peptide multiple times to attempt to ensure that there is no overlap between the target and decoy peptides. For homopolymers, this is not possible. In this case, the occurrence of these target/decoy overlaps is recorded in the log file.</p><p>The program considers only the standard set of 20 amino acids. Peptides containing non-amino acid alphanumeric characters (BJOUXZ) are skipped. Non-alphanumeric characters are ignored completely.</p>
<h2>Input:</h2>
<ul>
<li><code>protein fasta file</code> – The name of the file in FASTA format from which to retrieve proteins.</li>
</ul>
<h2>Output:</h2>
<p>The program writes files to the folder <code>crux-output</code> by default. The name of the output folder can be set by the user using the <code>--output-dir</code> option. The following files will be created:
<ul>
<li><code>generate-peptides.target.txt</code> – A text file containing the target peptides, one per line. Each line has three tab-delimited columns, containing the peptide sequence, the m+h mass of the unmodified peptide, and a comma-delimited list of protein IDs in which the peptide occurs.</li>
<li><code>generate-peptides.decoy.txt</code> – A text file containing the decoy peptides, one per line. Each line has three tab-delimited columns, containing the peptide sequence, the m+h mass of the unmodified peptide, and a comma-delimited list of protein IDs in which the peptide occurs. There is a one-to-one correspondence between targets and decoys.</li>
<li><code>generate-peptides.proteins.decoy.txt</code> – a FASTA format file containing decoy proteins, in which all of the peptides have been replaced with their shuffled or reversed counterparts. Note that this file will only be created if the enzyme specificity is "full-digest" and no missed cleavages are allowed.</li>
<li><code>generate-peptides.params.txt</code> – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.</li>
<li><code>generate-peptides.log.txt</code> – a log file containing a copy of all messages that were printed to the screen during execution.</li>
</ul>
<h2>Options:</h2>
<ul style="list-style-type: none;">
<li class="nobullet">
<h3>Peptide properties</h3>
<ul>
<li class="nobullet"><code>--min-mass <float></code> – The minimum mass (in Da) of peptides to consider. Default = <code>200</code>.</li>
<li class="nobullet"><code>--max-mass <float></code> – The maximum mass (in Da) of peptides to consider. Default = <code>7200</code>.</li>
<li class="nobullet"><code>--min-length <integer></code> – The minimum length of peptides to consider. Default = <code>6</code>.</li>
<li class="nobullet"><code>--max-length <integer></code> – The maximum length of peptides to consider. Default = <code>50</code>.</li>
<li class="nobullet"><code>--isotopic-mass average|mono</code> – Specify the type of isotopic masses to use when calculating the peptide mass. Default = <code>mono</code>.</li>
<li class="nobullet"><code>--clip-nterm-methionine T|F</code> – When set to T, for each protein that begins with methionine, tide-index will put two copies of the leading peptide into the index, with and without the N-terminal methionine. Default = <code>false</code>.</li>
</ul>
</li>
<li class="nobullet">
<h3>Decoy database generation</h3>
<ul>
<li class="nobullet"><code>--seed <string></code> – When given a unsigned integer value seeds the random number generator with that value. When given the string "time" seeds the random number generator with the system time. Default = <code>1</code>.</li>
<li class="nobullet"><code>--decoy-format none|shuffle|peptide-reverse|protein-reverse</code> – Include a decoy version of every peptide by shuffling or reversing the target sequence or protein. In shuffle or peptide-reverse mode, each peptide is either reversed or shuffled, leaving the N-terminal and C-terminal amino acids in place. Note that peptides appear multiple times in the target database are only shuffled once. In peptide-reverse mode, palindromic peptides are shuffled. Also, if a shuffled peptide produces an overlap with the target or decoy database, then the peptide is re-shuffled up to 5 times. Note that, despite this repeated shuffling, homopolymers will appear in both the target and decoy database. The protein-reverse mode reverses the entire protein sequence, irrespective of the composite peptides. Default = <code>shuffle</code>.</li>
<li class="nobullet"><code>--keep-terminal-aminos N|C|NC|none</code> – When creating decoy peptides using decoy-format=shuffle or decoy-format=peptide-reverse, this option specifies whether the N-terminal and C-terminal amino acids are kept in place or allowed to be shuffled or reversed. For a target peptide "EAMPK" with decoy-format=peptide-reverse, setting keep-terminal-aminos to "NC" will yield "EPMAK"; setting it to "C" will yield "PMAEK"; setting it to "N" will yield "EKPMA"; and setting it to "none" will yield "KPMAE". Default = <code>NC</code>.</li>
</ul>
</li>
<li class="nobullet">
<h3>Enzymatic digestion</h3>
<ul>
<li class="nobullet"><code>--enzyme no-enzyme|trypsin|trypsin/p|chymotrypsin|elastase|clostripain|cyanogen-bromide|iodosobenzoate|proline-endopeptidase|staph-protease|asp-n|lys-c|lys-n|arg-c|glu-c|pepsin-a|elastase-trypsin-chymotrypsin|custom-enzyme</code> – Specify the enzyme used to digest the proteins in silico. Available enzymes (with the corresponding digestion rules indicated in parentheses) include no-enzyme ([X]|[X]), trypsin ([RK]|{P}), trypsin/p ([RK]|[]), chymotrypsin ([FWYL]|{P}), elastase ([ALIV]|{P}), clostripain ([R]|[]), cyanogen-bromide ([M]|[]), iodosobenzoate ([W]|[]), proline-endopeptidase ([P]|[]), staph-protease ([E]|[]), asp-n ([]|[D]), lys-c ([K]|{P}), lys-n ([]|[K]), arg-c ([R]|{P}), glu-c ([DE]|{P}), pepsin-a ([FL]|{P}), elastase-trypsin-chymotrypsin ([ALIVKRWFY]|{P}). Specifying --enzyme no-enzyme yields a non-enzymatic digest. <strong>Warning:</strong> the resulting index may be quite large. Default = <code>trypsin</code>.</li>
<li class="nobullet"><code>--custom-enzyme <string></code> – Specify rules for in silico digestion of protein sequences. Overrides the enzyme option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as [RK]|{P}. AspN cuts after any residue but only before D which is represented as [X]|[D]. Default = <code><empty></code>.</li>
<li class="nobullet"><code>--digestion full-digest|partial-digest|non-specific-digest</code> – Specify whether every peptide in the database must have two enzymatic termini (full-digest) or if peptides with only one enzymatic terminus are also included (partial-digest). Default = <code>full-digest</code>.</li>
<li class="nobullet"><code>--missed-cleavages <integer></code> – Maximum number of missed cleavages per peptide to allow in enzymatic digestion. Default = <code>0</code>.</li>
</ul>
</li>
<li class="nobullet">
<h3>Input and output</h3>
<ul>
<li class="nobullet"><code>--decoy-prefix <string></code> – Specifies the prefix of the protein names that indicate a decoy. Default = <code>decoy_</code>.</li>
<li class="nobullet"><code>--overwrite T|F</code> – Replace existing files if true or fail when trying to overwrite a file if false. Default = <code>false</code>.</li>
<li class="nobullet"><code>--fileroot <string></code> – The fileroot string will be added as a prefix to all output file names. Default = <code><empty></code>.</li>
<li class="nobullet"><code>--output-dir <string></code> – The name of the directory where output files will be created. Default = <code>crux-output</code>.</li>
<li class="nobullet"><code>--parameter-file <string></code> – A file containing parameters. See the <a href="../file-formats/parameter-file.html">parameter documentation</a> page for details. Default = <code><empty></code>.</li>
<li class="nobullet"><code>--verbosity <integer></code> – Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = <code>30</code>.</li>
</ul>
</li>
</ul>
<!-- END CONTENT -->
</div>
</div>
<footer class="site-footer">
<div id="centerfooter">
<div class="footerimportantlinks">
<img src="../images/linkicon.png" style="width:16px; height:16px"><h3>Important links</h3>
<ul>
<li><a href="../faq.html">Crux <strong>FAQ</strong></a></li>
<li><a href="../glossary.html">Glossary of terminology</a></li>
<li><a href="http://scholar.google.com/citations?hl=en&user=Rw9S1HIAAAAJ">Google Scholar profile</a></li>
<li><a href="https://sourceforge.net/projects/cruxtoolkit/">SourceForge Issue's list</a></li>
<li><a href="../release-notes.html">Release Notes</a></li>
<li><a href="https://mailman1.u.washington.edu/mailman/listinfo/crux-users" title="Receive announcements of new versions">Join the mailing list</a></li>
<li><a href="http://www.apache.org/licenses/LICENSE-2.0">Apache license</a></li>
<li><a href="http://groups.google.com/group/crux-users">Support Board</a></li>
</ul>
</div>
<div class="footerimportantlinks tutoriallinks">
<img src="../images/tutorialicon.png" style="height:16px"><h3>Tutorials</h3>
<ul>
<li><a href="../tutorials/install.html">Installation</a></li>
<li><a href="../tutorials/gettingstarted.html">Getting started with Crux</a></li>
<li><a href="../tutorials/search.html">Running a simple search using Tide and Percolator</a></li>
<li><a href="../tutorials/customizedsearch.html">Customization and search options</a></li>
<li><a href="../tutorials/spectralcounts.html">Using spectral-counts</a></li>
</ul>
</div>
<div id="footertext">
<p>
The original version of Crux was written by Chris Park and Aaron Klammer
under the supervision
of <a href="http://www.gs.washington.edu/faculty/maccoss.htm">Prof. Michael
MacCoss</a>
and <a href="http://noble.gs.washington.edu/~noble">Prof. William
Stafford Noble</a> in the Department of Genome Sciences at the
University of Washington, Seattle. Website by <a href="http://www.yuvalboss.com/">Yuval Boss</a>
<br />The complete list of contributors
can be found <a href="../contributors.html">here</a>.
<br />
<br />
Maintenance and development of Crux is funded by the <a href="https://www.nih.gov/">National Institutes of Health</a> awards R01 GM096306 and P41 GM103533.
</p>
</div>
</div>
</footer>
</body>
</html>