forked from crux-toolkit/crux-toolkit.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
spectral-counts.html
152 lines (151 loc) · 14.3 KB
/
spectral-counts.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<!DOCTYPE html>
<html>
<head>
<title>spectral-counts</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="../styles.css">
<script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script type="text/javascript">
MathJax.Hub.Config({jax: ['input/TeX','output/HTML-CSS'], displayAlign: 'left'});
</script>
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-26136956-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
<script type="text/javascript">
// Main Menu
$( document ).ready(function() {
var pull = $('.btn');
menu = $('nav ul');
menuHeight = menu.height();
$(pull).on('click', function(e) {
e.preventDefault();
menu.slideToggle();
});
$(window).resize(function(){
var w = $(window).width();
if(w > 320 && menu.is(':hidden')) {
menu.removeAttr('style');
}
});
});
</script>
</head>
<body>
<div class="page-wrap">
<nav>
<div class="btn">
</div>
<img src="../images/crux-logo.png" id="logo"></a>
<ul id="navitems">
<li><a href="../index.html">Home</a></li>
<li><a href="../download.html">Download</a></li>
<li><a href="../fileformats.html">File Formats</a></li>
<li><a href="http://groups.google.com/group/crux-users">Contact</a></li> <!--Link to google support board-->
</ul>
</nav>
<div id="content" class="autogenerated">
<!-- START CONTENT -->
<h1>spectral-counts</h1>
<h2>Usage:</h2>
<p><code>crux spectral-counts [options] <input PSMs></code></p>
<h2>Description:</h2>
<p>Given a collection of scored PSMs, produce a list of proteins or peptides ranked by a quantification score. Spectral-counts supports four types of quantification: Normalized Spectral Abundance Factor (NSAF), Distributed Normalized Spectral Abundance (dNSAF), Normalized Spectral Index (SI<sub>N</sub>) and Exponentially Modified Protein Abundance Index (emPAI). The NSAF method is from <a href="http://www.ncbi.nlm.nih.gov/pubmed/17138671">Paoletti et al. (2006)</a>. The SI<sub>N</sub> method is from <a href="http://www.nature.com/nbt/journal/v28/n1/abs/nbt.1592.html">Griffin et al. (2010)</a>. The emPAI method was first described in <a href="http://www.mcponline.org/content/4/9/1265">Ishihama et al (2005)</a>. The quantification methods are defined below and in the following paper:<blockquote>S McIlwain, M Mathews, M Bereman, EW Rubel, MJ MacCoss, and WS Noble. <a href="http://www.biomedcentral.com/1471-2105/13/308/abstract">"Estimating relative abundances of proteins from shotgun proteomics data."</a> <em>BMC Bioinformatics</em>. 13:308, 2012.</blockquote></p><h3>Protein Quantification</h3><ol><li>For each protein in a given database, the NSAF score is:<br>$$NSAF_N=\frac{S_N/L_N}{\sum_{i=1}^ns_i/L_i}$$<br>where:<ul><li>N is protein index</li><li>S<sub>N</sub> is the number of peptide spectra matched to the protein</li><li>L<sub>N</sub> is the length of protein N</li><li>n is the total number of proteins in the input database</li></ul></li><li>For each protein in a given database, the dNSAF score is:<br>$$NSAF_N=\frac{\frac{uSpc_N+(d)sSpc_N}{uL_N+sL_N}}{\frac{uSpc_i+(d)sSpc_i}{uL_i+sL_i}}$$<br>where:<ul><li>N is the protein index</li><li>uSpc<sub>N</sub> is the unique number spectra matched to the protein index</li><li>sSpc<sub>N</sub> is the shared number peptide spectra matched to the protein index</li><li>L<sub>N</sub> is the length of protein N</li><li>n is the total number of proteins in the input database</li><li>d is the distribution factor of peptide K to protein N, given by<br>$$d=\frac{uSpc_N}{\sum_{i=1}^nuSpc_i}$$</li></ul></li><li>For each protein in a given database, the SI<sub>N</sub> score is:<br>$$SI_N=\frac{\sum_{j=1}^{p_N}(\sum_{k=1}^{s_j}i_k)}{L_N(\sum_{j=1}^nSI_j)}$$<br>where:<ul><li>N is protein index</li><li>p<sub>n</sub> is the number of unique peptides in protein N</li><li>s<sub>j</sub> is the number of spectra assigned to peptide j</li><li>i<sub>k</sub> is the total fragment ion intensity of spectrum k</li><li>L<sub>N</sub> is the length of protein N</li></ul></li><li>For each protein in a given database, the emPAI score is:<br>$$emPAI=10^{\frac{N_{observed}}{N_{observable}}}-1$$<br>where:<ul><li>N<sub>observed</sub> is the number of experimentally observed peptides with scores above a specified threshold.</li><li>N<sub>observable</sub> is the calculated number of observable peptides for the protein given the search constraints.</li></ul></li></ol><h3>Peptide Quantification</h3><ol><li>For each peptide in a given database, the NSAF score is:<br>$$NSAF_N=\frac{S_N/L_N}{\sum_{i=1}^ns_i/L_i}$$<br>where: <ul><li>N is the peptide index</li><li>S<sub>N</sub> is the number spectra matched to peptide N</li><li>L<sub>N</sub> is the length of peptide N</li><li>n is the total number of peptides in the input database</li></ul></li><li>For each peptide in a given database, the SI<sub>N</sub> score is:<br>$$SI_N=\frac{(\sum_{k=1}^{S_N}i_k)}{L_N(\sum_{j=1}^nSI_J)}$$<br>where:<ul><li>N is the peptide index</li><li>S<sub>N</sub> is the number of spectra assigned to peptide N</li><li>i<sub>k</sub> is the total fragment ion intensity of spectrum k</li><li>L<sub>N</sub> is the length of peptide N</li></ul></li></ol>
<h2>Input:</h2>
<ul>
<li><code>input PSMs</code> – A PSM file in either tab delimited text format (as produced by percolator, q-ranker, or barista) or pepXML format.</li>
</ul>
<h2>Output:</h2>
<p>The program writes files to the folder <code>crux-output</code> by default. The name of the output folder can be set by the user using the <code>--output-dir</code> option. The following files will be created:
<ul>
<li><code>spectral-counts.target.txt</code> – a tab-delimited text file containing the protein IDs and their corresponding scores, in sorted order.</li>
<li><code>spectral-counts.params.txt</code> – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other Crux programs.</li>
<li><code>spectral-counts.log.txt</code> – All messages written to standard error.</li>
</ul>
<h2>Options:</h2>
<ul style="list-style-type: none;">
<li class="nobullet">
<h3>spectral-counts options</h3>
<ul>
<li class="nobullet"><code>--parsimony none|simple|greedy</code> – Perform a parsimony analysis on the proteins, and report a "parsimony rank" column in the output file. This column contains integers indicating the protein's rank in a list sorted by spectral counts. If the parsimony analysis results in two proteins being merged, then their parsimony rank is the same. In such a case, the rank is assigned based on the largest spectral count of any protein in the merged meta-protein. The "simple" parsimony algorithm only merges two proteins A and B if the peptides identified in protein A are the same as or a subset of the peptides identified in protein B. The "greedy" parsimony algorithm does additional merging, using the peptide q-values to greedily assign each peptide to a single protein. Default = <code>none</code>.</li>
<li class="nobullet"><code>--threshold <float></code> – Only consider PSMs with a threshold value. By default, q-values are thresholded using a specified threshold value. This behavior can be changed using the --custom-threshold and --threshold-min parameters. Default = <code>0.01</code>.</li>
<li class="nobullet"><code>--threshold-type none|qvalue|custom</code> – Determines what type of threshold to use when filtering matches. none : read all matches, qvalue : use calculated q-value from percolator or q-ranker, custom : use --custom-threshold-name and --custom-threshold-min parameters. Default = <code>qvalue</code>.</li>
<li class="nobullet"><code>--input-ms2 <string></code> – MS2 file corresponding to the psm file. Required to measure the SIN. Ignored for NSAF, dNSAF and EMPAI. Default = <code><empty></code>.</li>
<li class="nobullet"><code>--unique-mapping T|F</code> – Ignore peptides that map to multiple proteins. Default = <code>false</code>.</li>
<li class="nobullet"><code>--quant-level protein|peptide</code> – Quantification at protein or peptide level. Default = <code>protein</code>.</li>
<li class="nobullet"><code>--measure RAW|NSAF|dNSAF|SIN|EMPAI</code> – Type of analysis to make on the match results: (RAW|NSAF|dNSAF|SIN|EMPAI). With exception of the RAW metric, the database of sequences need to be provided using --protein-database. Default = <code>NSAF</code>.</li>
<li class="nobullet"><code>--custom-threshold-name <string></code> – Specify which field to apply the threshold to. The direction of the threshold (<= or >=) is governed by --custom-threshold-min. By default, the threshold applies to the q-value, specified by "percolator q-value", "q-ranker q-value", "decoy q-value (xcorr)", or "barista q-value". Default = <code><empty></code>.</li>
<li class="nobullet"><code>--custom-threshold-min T|F</code> – When selecting matches with a custom threshold, custom-threshold-min determines whether to filter matches with custom-threshold-name values that are greater-than or equal (F) or less-than or equal (T) than the threshold. Default = <code>true</code>.</li>
<li class="nobullet"><code>--mzid-use-pass-threshold T|F</code> – Use mzid's passThreshold attribute to filter matches. Default = <code>false</code>.</li>
<li class="nobullet"><code>--protein-database <string></code> – The name of the file in FASTA format. Default = <code><empty></code>.</li>
</ul>
</li>
<li class="nobullet">
<h3>Input and output</h3>
<ul>
<li class="nobullet"><code>--verbosity <integer></code> – Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = <code>30</code>.</li>
<li class="nobullet"><code>--parameter-file <string></code> – A file containing parameters. See the <a href="../file-formats/parameter-file.html">parameter documentation</a> page for details. Default = <code><empty></code>.</li>
<li class="nobullet"><code>--spectrum-parser pwiz|mstoolkit</code> – Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser can read the MS/MS file formats listed <a href="http://proteowizard.sourceforge.net/formats.shtml">here</a>. The alternative is <a href="../mstoolkit.html">MSToolkit parser</a>. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default = <code>pwiz</code>.</li>
<li class="nobullet"><code>--fileroot <string></code> – The fileroot string will be added as a prefix to all output file names. Default = <code><empty></code>.</li>
<li class="nobullet"><code>--output-dir <string></code> – The name of the directory where output files will be created. Default = <code>crux-output</code>.</li>
<li class="nobullet"><code>--overwrite T|F</code> – Replace existing files if true or fail when trying to overwrite a file if false. Default = <code>false</code>.</li>
</ul>
</li>
</ul>
<!-- END CONTENT -->
</div>
</div>
<footer class="site-footer">
<div id="centerfooter">
<div class="footerimportantlinks">
<img src="../images/linkicon.png" style="width:16px; height:16px"><h3>Important links</h3>
<ul>
<li><a href="../faq.html">Crux <strong>FAQ</strong></a></li>
<li><a href="../glossary.html">Glossary of terminology</a></li>
<li><a href="http://scholar.google.com/citations?hl=en&user=Rw9S1HIAAAAJ">Google Scholar profile</a></li>
<li><a href="https://sourceforge.net/projects/cruxtoolkit/">SourceForge Issue's list</a></li>
<li><a href="../release-notes.html">Release Notes</a></li>
<li><a href="https://mailman1.u.washington.edu/mailman/listinfo/crux-users" title="Receive announcements of new versions">Join the mailing list</a></li>
<li><a href="http://www.apache.org/licenses/LICENSE-2.0">Apache license</a></li>
<li><a href="http://groups.google.com/group/crux-users">Support Board</a></li>
</ul>
</div>
<div class="footerimportantlinks tutoriallinks">
<img src="../images/tutorialicon.png" style="height:16px"><h3>Tutorials</h3>
<ul>
<li><a href="../tutorials/install.html">Installation</a></li>
<li><a href="../tutorials/gettingstarted.html">Getting started with Crux</a></li>
<li><a href="../tutorials/search.html">Running a simple search using Tide and Percolator</a></li>
<li><a href="../tutorials/customizedsearch.html">Customization and search options</a></li>
<li><a href="../tutorials/spectralcounts.html">Using spectral-counts</a></li>
</ul>
</div>
<div id="footertext">
<p>
The original version of Crux was written by Chris Park and Aaron Klammer
under the supervision
of <a href="http://www.gs.washington.edu/faculty/maccoss.htm">Prof. Michael
MacCoss</a>
and <a href="http://noble.gs.washington.edu/~noble">Prof. William
Stafford Noble</a> in the Department of Genome Sciences at the
University of Washington, Seattle. Website by <a href="http://www.yuvalboss.com/">Yuval Boss</a>
<br />The complete list of contributors
can be found <a href="../contributors.html">here</a>.
<br />
<br />
Maintenance and development of Crux is funded by the <a href="https://www.nih.gov/">National Institutes of Health</a> awards R01 GM096306 and P41 GM103533.
</p>
</div>
</div>
</footer>
</body>
</html>