spectral-counts.html

<!DOCTYPE html>
<html>
<head>
<title>spectral-counts</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="../styles.css">
<script type="text/javascript"
  src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>
<script type="text/javascript">
  MathJax.Hub.Config({jax: ['input/TeX','output/HTML-CSS'], displayAlign: 'left'});
</script>
<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-26136956-1']);
  _gaq.push(['_trackPageview']);
  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
</script>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
<script type="text/javascript">
  // Main Menu
  $( document ).ready(function() {
      var pull 		= $('.btn');
        menu 		= $('nav ul');
        menuHeight	= menu.height();
      $(pull).on('click', function(e) {
        e.preventDefault();
        menu.slideToggle();
      });
      $(window).resize(function(){
          var w = $(window).width();
          if(w > 320 && menu.is(':hidden')) {
            menu.removeAttr('style');
          } 
      });
  });
</script>
</head>
<body>
  <div class="page-wrap">
    <nav>
      <div class="btn">
        </div>
        <img src="../images/crux-logo.png" id="logo"></a>
      <ul id="navitems">
          <li><a href="../index.html">Home</a></li>
          <li><a href="../download.html">Download</a></li>
          <li><a href="../fileformats.html">File Formats</a></li>
            <li><a href="http://groups.google.com/group/crux-users">Contact</a></li> <!--Link to google support board-->
        </ul>
    </nav>
    <div id="content" class="autogenerated">
    <!-- START CONTENT -->
<h1>spectral-counts</h1>
<h2>Usage:</h2>
<p><code>crux spectral-counts [options] &lt;input PSMs&gt;</code></p>
<h2>Description:</h2>
<p>Given a collection of scored PSMs, produce a list of proteins or peptides ranked by a quantification score. Spectral-counts supports four types of quantification: Normalized Spectral Abundance Factor (NSAF), Distributed Normalized Spectral Abundance (dNSAF), Normalized Spectral Index (SI<sub>N</sub>) and Exponentially Modified Protein Abundance Index (emPAI). The NSAF method is from <a href="http://www.ncbi.nlm.nih.gov/pubmed/17138671">Paoletti et al. (2006)</a>. The SI<sub>N</sub> method is from <a href="http://www.nature.com/nbt/journal/v28/n1/abs/nbt.1592.html">Griffin et al. (2010)</a>. The emPAI method was first described in <a href="http://www.mcponline.org/content/4/9/1265">Ishihama et al (2005)</a>. The quantification methods are defined below and in the following paper:<blockquote>S McIlwain, M Mathews, M Bereman, EW Rubel, MJ MacCoss, and WS Noble.  <a href="http://www.biomedcentral.com/1471-2105/13/308/abstract">"Estimating relative abundances of proteins from shotgun proteomics data."</a>  <em>BMC Bioinformatics</em>. 13:308, 2012.</blockquote></p><h3>Protein Quantification</h3><ol><li>For each protein in a given database, the NSAF score is:<br>$$NSAF_N=\frac{S_N/L_N}{\sum_{i=1}^ns_i/L_i}$$<br>where:<ul><li>N is protein index</li><li>S<sub>N</sub> is the number of peptide spectra matched to the protein</li><li>L<sub>N</sub> is the length of protein N</li><li>n is the total number of proteins in the input database</li></ul></li><li>For each protein in a given database, the dNSAF score is:<br>$$NSAF_N=\frac{\frac{uSpc_N+(d)sSpc_N}{uL_N+sL_N}}{\frac{uSpc_i+(d)sSpc_i}{uL_i+sL_i}}$$<br>where:<ul><li>N is the protein index</li><li>uSpc<sub>N</sub> is the unique number spectra matched to the protein index</li><li>sSpc<sub>N</sub> is the shared number peptide spectra matched to the protein index</li><li>L<sub>N</sub> is the length of protein N</li><li>n is the total number of proteins in the input database</li><li>d is the distribution factor of peptide K to protein N, given by<br>$$d=\frac{uSpc_N}{\sum_{i=1}^nuSpc_i}$$</li></ul></li><li>For each protein in a given database, the SI<sub>N</sub> score is:<br>$$SI_N=\frac{\sum_{j=1}^{p_N}(\sum_{k=1}^{s_j}i_k)}{L_N(\sum_{j=1}^nSI_j)}$$<br>where:<ul><li>N is protein index</li><li>p<sub>n</sub> is the number of unique peptides in protein N</li><li>s<sub>j</sub> is the number of spectra assigned to peptide j</li><li>i<sub>k</sub> is the total fragment ion intensity of spectrum k</li><li>L<sub>N</sub> is the length of protein N</li></ul></li><li>For each protein in a given database, the emPAI score is:<br>$$emPAI=10^{\frac{N_{observed}}{N_{observable}}}-1$$<br>where:<ul><li>N<sub>observed</sub> is the number of experimentally observed peptides with scores above a specified threshold.</li><li>N<sub>observable</sub> is the calculated number of observable peptides for the protein given the search constraints.</li></ul></li></ol><h3>Peptide Quantification</h3><ol><li>For each peptide in a given database, the NSAF score is:<br>$$NSAF_N=\frac{S_N/L_N}{\sum_{i=1}^ns_i/L_i}$$<br>where: <ul><li>N is the peptide index</li><li>S<sub>N</sub> is the number spectra matched to peptide N</li><li>L<sub>N</sub> is the length of peptide N</li><li>n is the total number of peptides in the input database</li></ul></li><li>For each peptide in a given database, the SI<sub>N</sub> score is:<br>$$SI_N=\frac{(\sum_{k=1}^{S_N}i_k)}{L_N(\sum_{j=1}^nSI_J)}$$<br>where:<ul><li>N is the peptide index</li><li>S<sub>N</sub> is the number of spectra assigned to peptide N</li><li>i<sub>k</sub> is the total fragment ion intensity of spectrum k</li><li>L<sub>N</sub> is the length of peptide N</li></ul></li></ol>
<h2>Input:</h2>
<ul>
  <li><code>input PSMs</code> &ndash; A PSM file in either tab delimited text format (as produced by percolator, q-ranker, or barista) or pepXML format.</li>
</ul>
<h2>Output:</h2>
<p>The program writes files to the folder <code>crux-output</code> by default. The name of the output folder can be set by the user using the <code>--output-dir</code> option. The following files will be created:
<ul>
  <li><code>spectral-counts.target.txt</code> &ndash; a tab-delimited text file containing the protein IDs and their corresponding scores, in sorted order.</li>
  <li><code>spectral-counts.params.txt</code> &ndash; a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other Crux programs.</li>
  <li><code>spectral-counts.log.txt</code> &ndash; All messages written to standard error.</li>
</ul>
<h2>Options:</h2>
<ul style="list-style-type: none;">
<li class="nobullet">
<h3>spectral-counts options</h3>
<ul>
  <li class="nobullet"><code>--parsimony none|simple|greedy</code> &ndash; Perform a parsimony analysis on the proteins, and report a "parsimony rank" column in the output file. This column contains integers indicating the protein's rank in a list sorted by spectral counts. If the parsimony analysis results in two proteins being merged, then their parsimony rank is the same. In such a case, the rank is assigned based on the largest spectral count of any protein in the merged meta-protein. The "simple" parsimony algorithm only merges two proteins A and B if the peptides identified in protein A are the same as or a subset of the peptides identified in protein B. The "greedy" parsimony algorithm does additional merging, using the peptide q-values to greedily assign each peptide to a single protein. Default = <code>none</code>.</li>
  <li class="nobullet"><code>--threshold &lt;float&gt;</code> &ndash; Only consider PSMs with a threshold value. By default, q-values are thresholded using a specified threshold value. This behavior can be changed using the --custom-threshold and --threshold-min parameters. Default = <code>0.01</code>.</li>
  <li class="nobullet"><code>--threshold-type none|qvalue|custom</code> &ndash; Determines what type of threshold to use when filtering matches. none : read all matches, qvalue : use calculated q-value from percolator or q-ranker, custom : use --custom-threshold-name and --custom-threshold-min parameters. Default = <code>qvalue</code>.</li>
  <li class="nobullet"><code>--input-ms2 &lt;string&gt;</code> &ndash; MS2 file corresponding to the psm file. Required to measure the SIN. Ignored for NSAF, dNSAF and EMPAI. Default = <code>&lt;empty&gt;</code>.</li>
  <li class="nobullet"><code>--unique-mapping T|F</code> &ndash; Ignore peptides that map to multiple proteins. Default = <code>false</code>.</li>
  <li class="nobullet"><code>--quant-level protein|peptide</code> &ndash; Quantification at protein or peptide level. Default = <code>protein</code>.</li>
  <li class="nobullet"><code>--measure RAW|NSAF|dNSAF|SIN|EMPAI</code> &ndash; Type of analysis to make on the match results: (RAW|NSAF|dNSAF|SIN|EMPAI). With exception of the RAW metric, the database of sequences need to be provided using --protein-database. Default = <code>NSAF</code>.</li>
  <li class="nobullet"><code>--custom-threshold-name &lt;string&gt;</code> &ndash; Specify which field to apply the threshold to. The direction of the threshold (<= or >=) is governed by --custom-threshold-min. By default, the threshold applies to the q-value, specified by "percolator q-value", "q-ranker q-value", "decoy q-value (xcorr)", or "barista q-value". Default = <code>&lt;empty&gt;</code>.</li>
  <li class="nobullet"><code>--custom-threshold-min T|F</code> &ndash; When selecting matches with a custom threshold, custom-threshold-min determines whether to filter matches with custom-threshold-name values that are greater-than or equal (F) or less-than or equal (T) than the threshold. Default = <code>true</code>.</li>
  <li class="nobullet"><code>--mzid-use-pass-threshold T|F</code> &ndash; Use mzid's passThreshold attribute to filter matches. Default = <code>false</code>.</li>
  <li class="nobullet"><code>--protein-database &lt;string&gt;</code> &ndash; The name of the file in FASTA format. Default = <code>&lt;empty&gt;</code>.</li>
</ul>
</li>
<li class="nobullet">
<h3>Input and output</h3>
<ul>
  <li class="nobullet"><code>--verbosity &lt;integer&gt;</code> &ndash; Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = <code>30</code>.</li>
  <li class="nobullet"><code>--parameter-file &lt;string&gt;</code> &ndash; A file containing parameters.  See the <a href="../file-formats/parameter-file.html">parameter documentation</a> page for details. Default = <code>&lt;empty&gt;</code>.</li>
  <li class="nobullet"><code>--spectrum-parser pwiz|mstoolkit</code> &ndash; Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser can read the MS/MS file formats listed <a href="http://proteowizard.sourceforge.net/formats.shtml">here</a>. The alternative is <a href="../mstoolkit.html">MSToolkit parser</a>. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default = <code>pwiz</code>.</li>
  <li class="nobullet"><code>--fileroot &lt;string&gt;</code> &ndash; The fileroot string will be added as a prefix to all output file names. Default = <code>&lt;empty&gt;</code>.</li>
  <li class="nobullet"><code>--output-dir &lt;string&gt;</code> &ndash; The name of the directory where output files will be created. Default = <code>crux-output</code>.</li>
  <li class="nobullet"><code>--overwrite T|F</code> &ndash; Replace existing files if true or fail when trying to overwrite a file if false. Default = <code>false</code>.</li>
</ul>
</li>

</ul>
    <!-- END CONTENT -->
    </div>
  </div>
<footer class="site-footer">
  <div id="centerfooter">
    <div class="footerimportantlinks">
      <img src="../images/linkicon.png" style="width:16px; height:16px"><h3>Important links</h3>
      <ul>
        <li><a href="../faq.html">Crux <strong>FAQ</strong></a></li>
        <li><a href="../glossary.html">Glossary of terminology</a></li>
        <li><a href="http://scholar.google.com/citations?hl=en&user=Rw9S1HIAAAAJ">Google Scholar profile</a></li>
        <li><a href="https://sourceforge.net/projects/cruxtoolkit/">SourceForge Issue's list</a></li>
        <li><a href="../release-notes.html">Release Notes</a></li>
        <li><a href="https://mailman1.u.washington.edu/mailman/listinfo/crux-users" title="Receive announcements of new versions">Join the mailing list</a></li>
        <li><a href="http://www.apache.org/licenses/LICENSE-2.0">Apache license</a></li>
        <li><a href="http://groups.google.com/group/crux-users">Support Board</a></li>
      </ul>
    </div>
    <div class="footerimportantlinks tutoriallinks">
      <img src="../images/tutorialicon.png" style="height:16px"><h3>Tutorials</h3>
      <ul>
        <li><a href="../tutorials/install.html">Installation</a></li>
        <li><a href="../tutorials/gettingstarted.html">Getting started with Crux</a></li>
        <li><a href="../tutorials/search.html">Running a simple search using Tide and Percolator</a></li>
        <li><a href="../tutorials/customizedsearch.html">Customization and search options</a></li>
        <li><a href="../tutorials/spectralcounts.html">Using spectral-counts</a></li>
      </ul>
    </div>
    <div id="footertext">
      <p>
        The original version of Crux was written by Chris Park and Aaron Klammer
        under the supervision
        of <a href="http://www.gs.washington.edu/faculty/maccoss.htm">Prof. Michael
        MacCoss</a>
        and <a href="http://noble.gs.washington.edu/~noble">Prof. William
        Stafford Noble</a> in the Department of Genome Sciences at the
        University of Washington, Seattle. Website by <a href="http://www.yuvalboss.com/">Yuval Boss</a>
        <br />The complete list of contributors
        can be found <a href="../contributors.html">here</a>.
        <br />
        <br />
        Maintenance and development of Crux is funded by the <a href="https://www.nih.gov/">National Institutes of Health</a> awards R01 GM096306 and P41 GM103533. 
      </p>
    </div>
  </div>
</footer>
</body>
</html>