-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathusage
146 lines (109 loc) · 6.64 KB
/
usage
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
msmsEval v1.3 (03-Aug-07)
by Jason W. H. Wong, Cartwright Lab, PTCL, University of Oxford. 2006-7(c)
USAGE INSTRUCTIONS:
msmsEval 1.3 (03-Aug-07)
by Jason W. H. Wong, Cartwright Lab, PTCL, University of Oxford, 2006-7
Usage: msmsEval <-s source_file> <-d param_file> [-t target_dir]
[-f found_scans][-o eval_output_file][-m em_output_file]
[-p pF probability][-r remove probability]
[-b first_scan][-e last_scan][-z file format][-c][-q][-x]
----------------------------------------------------------------
(source_file) path of source file (mzXML only)
(param_file) path of discriminant/em parameters file
-t where "target_dir" is path of target directory for DTA export
-f where "found_scans" is a file with scans that are already annotated
-o where "eval_output_file" is path of target file name for the
evaluation summary
-m where "em_output_file" is path of target file name for EM
algorithm summary
-p where "pF probability" is a float specifying the target p(+|D)
cutoff for DTA files generation being a sequest hit (default: 0.01)
-r where "remove probability" is a float specifying the target
fraction of identifiable spectra a user is willing to sacrifice for
DTA files generation being a sequest hit (default: 0.01)
-b specify "first_scan" for exporting a limited range of scans
-e specify "last_scan" for exporting a limited range of scans
-z force "file_format" regardless of file extension.
Either: "mzZML" or "mzData"
-c if specified, msmsEval will try to guess the charge of a spectrum
for DTA output
-x generate extra charge states (+4/+5) when making dta files
-q suppress summary outputs
Usage examples:
msmsEval -s /home/guest/raw_data/example.mzXML -d ./msmsEval.params
msmsEval -s /home/guest/raw_data/example.mzXML -t ./home/guest/filtered_dtas/
-d ./msmsEval.params -p 0.9 -c
BY default, the evaluation file and the results of the em algorithm will be
outputted in the same directory as the source file named '<source file>_eval.csv'and
'<source_file>_em.csv' respectively
1 INPUT FILES
1.1 SOURCE FILE (-s)
Currently msmsEval only accepts files in mzXML format. If the LCn/MS/MS data that you
wish to analyze is not in mzXML format, file converters for various vendor file formats,
such as Finnigan, Micromass, etc can be obtained from:
http://sashimi.sourceforge.net/
Further details regarding mzXML files can also be found from the above.
1.2 PARAMETERS FILE (-d)
This file specifies various parameters required for the discriminant function and the
EM algorithm used by msmsEval. The default values are provided in with the source
package. These default values have been carefully trained and optimized using our
training datasets (refer to our publication). They have been further tested on a
variety of test data from difference sources and shown to be suitable. While the parameters
are optimized for ESI mass spectrometers, they should still work reasonably for MALDI
sources too. However, if you have your own training data, it is possible to obtain
new parameters (using an external standard logistic regression algorithm).
1.3 FILTER SCANS (-f)
This file contains a list of scans within the mzXML input file that have already been
identified. This is used such that if a user wants to output high quality unidentified
spectra, DTA files for ones that have already been identified will not be produced. This
flag is designed to be used with the -p flag where a user specifies a cutoff for
spectra that have a high probability of being identifiable. The input filter scan file
may contain scan numbers either as a list of scan numbers down a page or a list of scan
numbers separated by commas.
2 OUT FILES
2.1 DTA FILES (-t)
DTA files will be outputed to the specified directory.
2.2 EVALUATION SUMMARY (-o)
The evaluation summary file contains all values for features used to calculate the
discriminant value for each level 2 scan (ms/ms spectrum) within a dataset. The file
also includes the actual discriminant value, p(+|D) value and Z probability. The
format of the file is CSV. This file may be used to determine candidates with high
p(+|D) values that are not identified by standard search algorithms. Further, the
feature values maybe used to train new parameters from msmsEval using an external
program.
2.3 EM SUMMARY (-m)
The file contains the summary of the predicted values for the identifiable and
unidentifiable spectra distributions, i.e.
p(+),p(+) avg, p(+) stdev, p(-), p(-) avg, p(-) stdev
3 OTHER PARAMETERS
3.1 P(+|D) PROBABILITY (-p)
This specifies the cutoff P(+|D) probability for a spectrum being identifiable for DTA
file generation. For instance, when a value of 0.9 is set, msmsEval will attempt
only output DTA files for spectra which has greater than 0.9 probability of being
identifiable. This option is most useful with (-f) where a list of already identified
spectra is also specified such that only high P(+|D) spectra that are not identified will
be generated. This parameter CANNOT be specified with (-r). Default: 0.9
3.2 ESTIMATED FRACTION REMOVED (-r)
This specifies the maximum cutoff fraction of identifiable spectra that a user is willing
to potentially remove when using to filter low quality unidentifiable spectra. For
example when 0.01 is set, msmsEval will attempt not to remove more than 1% of identifiable
spectra when removing unidentifiable spectra. Why does identifiable spectra get removed at
all? Inevitably, there will be some low quality identifiable spectra that are
indistinguishable from other low quality unidentifiable spectra. This parameter CANNOT
be specified with (-r). Default: 0.01
3.3 START AND END SCANS (-b/-e)
Specfies the range of scans for which to export DTA files.
3.4
Forces parsing of the specified file format. Options are "mzXML" or "mzData".
3.5 GUESS CHARGE STATE (-c)
msmsEval will attempt to guess whether a spectrum is 2+ or 3+ when this option
is specified. An algorithm applied is one which is similar to 2to3 where the
aim is to look for complementary peaks for determination of charge state. If
no option is specified, msmsEval will output both 2+ and 3+ for all DTA files
that are multiplely charged. Note that this flag overrides (-x).
3.6 EXTRA CHARGE STATES (-x)
When this flag is specified, msmsEval will output DTA files with charges states +4 and +5
in addition to +2 and +3 for all spectra with multiply charge precursor ions. This
flag is ineffective when used with -c.
3.7 QUIET (-q)
Suppress output of summary files.