Spectrum Fundamentals.
+API
+Import spectrum_fundamentals using
+import spectrum_fundamentals as specfun
+
diff --git a/.buildinfo b/.buildinfo index 72419a0..3cc5e86 100644 --- a/.buildinfo +++ b/.buildinfo @@ -1,4 +1,4 @@ # Sphinx build info version 1 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. -config: 302aee58d610cad8670a912e27df1e23 +config: aafe3390f3e6efdd4e17678cfb72cb5c tags: 645f666f9bcd5a90fca523b33c5a78b7 diff --git a/API.html b/API.html new file mode 100644 index 0000000..a6d0866 --- /dev/null +++ b/API.html @@ -0,0 +1,146 @@ + + +
+ + + +Spectrum Fundamentals.
+Import spectrum_fundamentals using
+import spectrum_fundamentals as specfun
+
convert
peptide
annot
similarity
utils
In the interest of fostering an open and welcoming environment, we as -contributors and maintainers pledge to making participation in our -project and our community a harassment-free experience for everyone, -regardless of age, body size, disability, ethnicity, gender identity and -expression, level of experience, nationality, personal appearance, race, -religion, or sexual identity and orientation.
-Examples of behavior that contributes to creating a positive environment -include:
-Using welcoming and inclusive language
Being respectful of differing viewpoints and experiences
Gracefully accepting constructive criticism
Focusing on what is best for the community
Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
-The use of sexualized language or imagery and unwelcome sexual -attention or advances
Trolling, insulting/derogatory comments, and personal or political -attacks
Public or private harassment
Publishing others’ private information, such as a physical or -electronic address, without explicit permission
Other conduct which could reasonably be considered inappropriate in a -professional setting
Project maintainers are responsible for clarifying the standards of -acceptable behavior and are expected to take appropriate and fair -corrective action in response to any instances of unacceptable behavior.
-Project maintainers have the right and responsibility to remove, edit, -or reject comments, commits, code, wiki edits, issues, and other -contributions that are not aligned to this Code of Conduct, or to ban -temporarily or permanently any contributor for other behaviors that they -deem inappropriate, threatening, offensive, or harmful.
-This Code of Conduct applies both within project spaces and in public -spaces when an individual is representing the project or its community. -Examples of representing a project or community include using an -official project e-mail address, posting via an official social media -account, or acting as an appointed representative at an online or -offline event. Representation of a project may be further defined and -clarified by project maintainers.
-Instances of abusive, harassing, or otherwise unacceptable behavior may -be reported by opening an issue. The project team -will review and investigate all complaints, and will respond in a way -that it deems appropriate to the circumstances. The project team is -obligated to maintain confidentiality with regard to the reporter of an -incident. Further details of specific enforcement policies may be posted -separately.
-Project maintainers who do not follow or enforce the Code of Conduct in -good faith may face temporary or permanent repercussions as determined -by other members of the project’s leadership.
-This Code of Conduct is adapted from the Contributor Covenant, version 1.4, -available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
-FragmentsRatio
FragmentsRatio.calc()
FragmentsRatio.count_observation_states()
FragmentsRatio.count_with_ion_mask()
FragmentsRatio.get_mask_observed_valid()
FragmentsRatio.get_observation_state()
FragmentsRatio.make_boolean()
FragmentsRatio.metrics_val
FragmentsRatio.pred_intensities
FragmentsRatio.true_intensities
ObservationState
-Percolator
Percolator.add_common_features()
Percolator.add_percolator_metadata_columns()
Percolator.apply_lda_and_get_indices_below_fdr()
Percolator.calc()
Percolator.calculate_fdrs()
Percolator.calculate_mass_difference()
Percolator.calculate_mass_difference_ppm()
Percolator.count_arginines_and_lysines()
Percolator.count_missed_cleavages()
Percolator.fdr_cutoff
Percolator.fdrs_to_qvals()
Percolator.get_aligned_predicted_retention_times()
Percolator.get_delta_score()
Percolator.get_indices_below_fdr()
Percolator.get_specid()
Percolator.get_target_decoy_label()
Percolator.input_type
Percolator.metadata
Percolator.sample_balanced_over_bins()
Percolator.target_decoy_labels
TargetDecoyLabel
-get_fitting_func()
logistic()
spline()
SimilarityMetrics
SimilarityMetrics.abs_diff()
SimilarityMetrics.calc()
SimilarityMetrics.calculate_quantiles()
SimilarityMetrics.correlation()
SimilarityMetrics.cos()
SimilarityMetrics.l2_norm()
SimilarityMetrics.metrics_val
SimilarityMetrics.modified_cosine()
SimilarityMetrics.pred_intensities
SimilarityMetrics.rowwise_dot_product()
SimilarityMetrics.spectral_angle()
SimilarityMetrics.spectral_entropy_similarity()
SimilarityMetrics.true_intensities
SimilarityMetrics.unit_normalization()
get_metric_func()
Discuss usage, development and issues on GitHub.
Check the usage principles or the API.
Check the Contributor Guide if you want to participate in developing.
Consider citing the main publication, Oktoberfest.
Annotate a set of spectra.
-This function takes a DataFrame of raw peaks and metadata, and for each spectrum, it calls the parallel_annotate function -to annotate the spectrum and extract the necessary information. If there are any redundant peaks found in the annotation -process, the function removes them and logs the information. Finally, it returns a Pandas DataFrame containing the annotated -spectra with meta data.
-The returned DataFrame has the following columns: -- INTENSITIES: a NumPy array containing the intensity values of each peak in the annotated spectrum -- MZ: a NumPy array containing the m/z values of each peak in the annotated spectrum -- CALCULATED_MASS: a float representing the calculated mass of the spectrum -- removed_peaks: a NumPy array containing the indices of any peaks that were removed during the annotation process
-un_annot_spectra (DataFrame) – a Pandas DataFrame containing the raw peaks and metadata to be annotated
mass_tolerance (float | None) – mass tolerance to calculate min and max mass
unit_mass_tolerance (str | None) – unit for the mass tolerance (da or ppm)
a Pandas DataFrame containing the annotated spectra with meta data
-DataFrame
-Generate the annotation matrix in the prosit format from matched peaks.
-matched_peaks (DataFrame) – matched peaks needed to be converted
unmod_seq (str) – Un modified peptide sequence
charge (int) – Precursor charge
numpy array of intensities and numpy array of masses
-Tuple[ndarray, ndarray]
-Generate the annotation matrix in the xl_prosit format from matched peaks.
-matched_peaks (DataFrame) – matched peaks needed to be converted
unmod_seq (str) – unmodified peptide sequence
crosslinker_position (int) – position of crosslinker
numpy array of intensities and numpy array of masses
-Tuple[ndarray, ndarray]
-Resolve cases where multiple peaks have been matched to the same fragment ion.
-This function takes a list of dictionaries representing matched peaks and resolves cases where multiple peaks have -been matched to the same fragment ion. The function sorts the peaks based on the provided sort_by parameter and -removes duplicate matches based on ion type, ion number, and charge state.
-matched_peaks (List[Dict[str, str | int | float]]) – A list of dictionaries, each representing a matched peak. Each dictionary must contain the -following keys: ‘ion_type’, ‘no’, ‘charge’, ‘exp_mass’, ‘theoretical_mass’, and ‘intensity’.
sort_by (str) – A string indicating the criterion to use when sorting matched peaks. Valid options are: -‘mass_diff’ (sort by absolute difference between experimental and theoretical mass, ascending order), -‘intensity’ (sort by intensity, descending order), and ‘exp_mass’ (sort by experimental mass, -descending order).
ValueError – If an unsupported value is passed to sort_by.
-A tuple containing a DataFrame of matched peaks (with duplicates removed) and an integer indicating the -number of duplicate matches that were removed.
-Tuple[DataFrame, int]
-Matching experimental peaks with theoretical fragment ions.
-fragments_meta_data (List[dict]) – Fragments ions meta data eg. ion type, number, theo_mass…
peaks_intensity (ndarray) – Experimental peaks intensities
peaks_masses (ndarray) – Experimental peaks masses
tmt_n_term (int) – Flag to check if there is tmt modification on n_terminus 1: no_tmt, 2:tmt
unmod_sequence (str) – Unmodified peptide sequence
charge (int) – Precursor charge
List of matched/annotated peaks
-List[Dict[str, str | int | float]]
-Perform parallel annotation of a spectrum.
-This function takes a spectrum and its index columns and performs parallel annotation of the spectrum. -It starts by initializing the peaks and extracting necessary data from the spectrum. -It then matches the peaks to the spectrum and generates an annotation matrix based on the matched peaks. -If there are multiple matches found, it removes the redundant matches. -Finally, it returns annotated spectrum with meta data including intensity values, masses, calculated masses, -and any peaks that were removed. The function is designed to run in different threads to speed up the annotation pipeline.
-spectrum (ndarray) – a np.ndarray that contains the spectrum to be annotated
index_columns (Dict[str, int]) – a dictionary that contains the index columns of the spectrum
mass_tolerance (float | None) – mass tolerance to calculate min and max mass
unit_mass_tolerance (str | None) – unit for the mass tolerance (da or ppm)
a tuple containing intensity values (np.ndarray), masses (np.ndarray), calculated mass (float), -and any removed peaks (List[str])
-Tuple[ndarray, ndarray, float, int] | Tuple[ndarray, ndarray, ndarray, ndarray, float, float, int, int] | None
-Determines the positions of all potential normal and xl fragments within the vector generated by generate_annotation_matrix.
-This function is used only for cleavable crosslinked peptides.
-unmod_seq (str) – Unmodified peptide sequence
crosslinker_position (int) – The position of the crosslinker
ValueError – if peptides exceed a length of 30
-position of different fragments as list
-ndarray
-Initialize annotation.
-FragmentsRatio
FragmentsRatio.calc()
FragmentsRatio.count_observation_states()
FragmentsRatio.count_with_ion_mask()
FragmentsRatio.get_mask_observed_valid()
FragmentsRatio.get_observation_state()
FragmentsRatio.make_boolean()
FragmentsRatio.metrics_val
FragmentsRatio.pred_intensities
FragmentsRatio.true_intensities
ObservationState
-Percolator
Percolator.add_common_features()
Percolator.add_percolator_metadata_columns()
Percolator.apply_lda_and_get_indices_below_fdr()
Percolator.calc()
Percolator.calculate_fdrs()
Percolator.calculate_mass_difference()
Percolator.calculate_mass_difference_ppm()
Percolator.count_arginines_and_lysines()
Percolator.count_missed_cleavages()
Percolator.fdr_cutoff
Percolator.fdrs_to_qvals()
Percolator.get_aligned_predicted_retention_times()
Percolator.get_delta_score()
Percolator.get_indices_below_fdr()
Percolator.get_specid()
Percolator.get_target_decoy_label()
Percolator.input_type
Percolator.metadata
Percolator.sample_balanced_over_bins()
Percolator.target_decoy_labels
TargetDecoyLabel
-get_fitting_func()
logistic()
spline()
SimilarityMetrics
SimilarityMetrics.abs_diff()
SimilarityMetrics.calc()
SimilarityMetrics.calculate_quantiles()
SimilarityMetrics.correlation()
SimilarityMetrics.cos()
SimilarityMetrics.l2_norm()
SimilarityMetrics.metrics_val
SimilarityMetrics.modified_cosine()
SimilarityMetrics.pred_intensities
SimilarityMetrics.rowwise_dot_product()
SimilarityMetrics.spectral_angle()
SimilarityMetrics.spectral_entropy_similarity()
SimilarityMetrics.true_intensities
SimilarityMetrics.unit_normalization()
get_metric_func()
Convert a single or a list of labels to one-hot encoding.
-labels (int | List[int] | ndarray) – The labels to be one-hot encoding. Must be one-based.
classes (int | None) – The number of classes, i.e. the length of the encoding. If omitted, set to the max label + 1.
TypeError – If the type of labels is not understood
ValueError – If the highest label in labels is larger or equal to the number of classes.
np.ndarray with the one-hot encoded labels.
-ndarray
-Collects an integer sequence e.g. [1,2,3] with charge 2 and returns array with 174 positions for ion masses.
-Invalid masses are set to -1.
-seq_int (List[int]) – TODO
charge_onehot (List[int]) – is a onehot representation of charge with 6 elems for charges 1 to 6
tmt (str) – TODO
list of masses as floats
-ndarray | None
-Compute the theoretical mass of the peptide sequence.
-sequence (str) – Modified peptide sequence
-Theoretical mass of the sequence
-float
-Helper function to get min and max mass based on mass analyzer.
-If both mass_tolerance and unit_mass_tolerance are provided, the function uses the provided tolerance -to calculate the min and max mass. If either mass_tolerance or unit_mass_tolerance is missing -(or both are None), the function falls back to the default tolerances based on the mass_analyzer.
-Default mass tolerances for different mass analyzers: -- FTMS: +/- 20 ppm -- TOF: +/- 40 ppm -- ITMS: +/- 0.35 daltons
-mass_tolerance (float | None) – mass tolerance to calculate min and max mass
unit_mass_tolerance (str | None) – unit for the mass tolerance (da or ppm)
mass_analyzer (str) – the type of mass analyzer used to determine the tolerance.
mass (float) – the theoretical fragment mass
ValueError – if mass_analyzer is other than one of FTMS, TOF, ITMS
ValueError – if unit_mass_tolerance is other than one of ppm, da
a tuple (min, max) denoting the mass tolerance range.
-Tuple[float, float]
-Generate theoretical peaks for a modified peptide sequence.
-sequence (str) – Modified peptide sequence
mass_analyzer (str) – Type of mass analyzer used eg. FTMS, ITMS
charge (int) – Precursor charge
mass_tolerance (float | None) – mass tolerance to calculate min and max mass
unit_mass_tolerance (str | None) – unit for the mass tolerance (da or ppm)
noncl_xl (bool) – whether the function is called with a non-cleavable xl modification
peptide_beta_mass (float) – the mass of the second peptide to be considered for non-cleavable XL
xl_pos (int) – the position of the crosslinker for non-cleavable XL
List of theoretical peaks, Flag to indicate if there is a tmt on n-terminus, Un modified peptide sequence
-Tuple[List[dict], int, str, float]
-Generate theoretical peaks for a modified (potentially cleavable cross-linked) peptide sequence.
-This function get only one modified peptide (peptide a or b))
-sequence (str) – Modified peptide sequence (peptide a or b)
mass_analyzer (str) – Type of mass analyzer used eg. FTMS, ITMS
crosslinker_position (int) – The position of crosslinker
crosslinker_type (str) – Can be either DSSO, DSBU or BuUrBU
mass_tolerance (float | None) – mass tolerance to calculate min and max mass
unit_mass_tolerance (str | None) – unit for the mass tolerance (da or ppm)
sequence_beta (str | None) – optional second peptide to be considered for non-cleavable XL
ValueError – if crosslinker_type is unkown
AssertionError – if the short and long XL sequence (the one with the short / long crosslinker mod) -has a tmt n term while the other one does not
List of theoretical peaks, flag to indicate if there is a tmt on n-terminus, unmodified peptide -sequence, therotical mass of modified peptide (without considering mass of crosslinker)
-Tuple[List[dict], int, str, float]
-Generate different peptide sequences with moving the modification to all possible residues.
-modified_sequence (str) – Peptide sequence
unimod_id (int) – modification unimod id to be used for generating different permutations.
residues (List[str]) – possible amino acids where this mod can exist
list of possible sequence permutations
-Helper function to get mods list.
-mods_variable (str) –
mods_fixed (str) –
Function to exchange the internal mod identifiers with the masses of the specific modifiction.
-sequences (List[str]) – List[str] of sequences
-List[str] of modified sequences
-List[str]
-Function to translate an internal modstring to MSP format.
-sequences (List[str]) – List[str] of sequences
-List[Tuple[str, str] of mod summary and mod sequences
-List[Tuple[str, str]]
-Function to translate a modstring from the internal format to the spectronaut format.
-sequences (ndarray | Series | List[str]) – List[str] of sequences
-List[str] of modified sequences
-List[str]
-Function to remove any mod identifiers and return the plain AA sequence.
-sequences (List[str]) – List[str] of sequences
-List[str] of modified sequences
-List[str]
-Function to translate a MaxQuant modstring to the Prosit format.
-sequences (ndarray | Series | List[str]) – List[str] of sequences
fixed_mods (Dict[str, str] | None) – Optional dictionary of modifications with key aa and value mod, e.g. ‘M’: ‘M(UNIMOD:35)’. -Fixed modifications must be included in the variable modificatons dictionary. -By default, i.e. if nothing is supplied to fixed_mods, carbamidomethylation on cystein will be included -in the fixed modifications. If you want to have no fixed modifictions at all, supply fixed_mods={}
AssertionError – if illegal modification was provided in the fixed_mods dictionary.
-a list of modified sequences
-List[str]
-Function to translate a MSFragger modstring to the Prosit format.
-sequences (ndarray | Series | List[str]) – List[str] of sequences
fixed_mods (Dict[str, str] | None) – Optional dictionary of modifications with key aa and value mod, e.g. ‘M[147]’: ‘M(UNIMOD:35)’. -Fixed modifications must be included in the variable modificatons dictionary. -By default, i.e. if nothing is supplied to fixed_mods, carbamidomethylation on cystein will be included -in the fixed modifications. If you want to have no fixed modifictions at all, supply fixed_mods={}
AssertionError – if illegal modification was provided in the fixed_mods dictionary.
-a list of modified sequences
-List[str]
-Parse modstrings.
-sequences (List[str]) – List of strings
alphabet (Dict[str, int]) – dictionary where the keys correspond to all possible ‘Elements’ that can occur in the string
translate (bool) – boolean to determine if the Elements should be translated to the corresponding values of ALPHABET
filter (bool) – boolean to determine if non-parsable sequences should be filtered out
generator that yields a list of sequence ‘Elements’ or the translated sequence “Elements”
-Function to create a sequence with UNIMOD modifications from given sequence and it’s varaible and fixed modifications.
-sequence (str) – The sequence to modify
mods_variable (str) – the variable modifacations (e.g. “Oxidation@M45”)
mods_fixed (str) – the fixed modifacations (e.g. “Carbamidomethyl@C”)
sequence with unimods (e.g.”AAC[UNIMOD:4]GHK”)
-str
-Convert mod string from sage to the internal format.
-This function converts sequences using the mass change of a modification in -square brackets as done by Sage to the internal format by replacing the mass -shift with the corresponding UNIMOD identifier of known and supported -modifications defined in the constants.
-sequences (List[str]) – A list of sequences with values inside square brackets.
-A list of modified sequences with values converted to internal format.
-List[str]
-Function to translate a xisearch modstring to the XL-Prosit format.
-xl (str) – type of crosslinker used. Can be ‘DSSO’ or ‘DSBU’.
seq (str) – unmodified peptide sequence
mod (str) – all modifications of pep
crosslinker_position (int) – crosslinker position of peptide
mod_positions (str) – position of all modifications of peptide
ValueError – if suplied type of crosslinker is unknown
-modified sequence
-Initialize fundamentals.
-Bases: Metric
Main to initialize a FragmentsRatio obj.
-pred_intensities (ndarray | csr_matrix | None) –
true_intensities (ndarray | csr_matrix | None) –
mz (ndarray | csr_matrix | None) –
xl (bool) –
Adds columns with count, fraction and fraction_predicted features to metrics_val dataframe.
-xl (bool) –
-Count the number of observation states.
-observation_state (csr_matrix) – integer observation_state, array of length 174
test_state (int) – integer for the test observation state
ion_mask (ndarray | csr_matrix | None) – mask with 1s for the ions that should be counted and 0s for ions that should be ignored, integer array of length 174
xl (bool) – whether or not the function is executed with xl mode
number of observation states equal to test_state per row
-ndarray
-Count the number of ions.
-boolean_array (csr_matrix) – boolean array with True for observed/predicted peaks and False for missing observed/predicted peaks, array of length 174
ion_mask (ndarray | spmatrix | None) – mask with 1s for the ions that should be counted and 0s for ions that should be ignored, integer array of length 174 for linear and 348 for crosslinked peptides, or a list of integers, -or a scipy.sparse.csr_matrix or scipy.sparse._csc.csc_matrix.
xl (bool) – whether to process with crosslinked or linear peptides
number of observed/predicted peaks not masked by ion_mask
-ndarray
-Creates a mask out of an observed m/z array with True for invalid entries and False for valid entries in the observed intensities array.
-observed_mz (csr_matrix) – observed m/z, array of length 174
-boolean array, array of length 174
-csr_matrix
-Computes the observation state between the observed and predicted boolean arrays.
-possible values: -- 4: not seen in either -- 3: predicted but not in observed -- 2: seen in both -- 1: observed but not in predicted -- 0: invalid -:param observed_boolean: boolean observed intensities, boolean array of length 174 -:param predicted_boolean : boolean predicted intensities, boolean array of length 174 -:param mask: mask with True for invalid values in the observed intensities array, boolean array of length 174 -:return: integer array, array of length 174
-observed_boolean (csr_matrix) –
predicted_boolean (csr_matrix) –
mask (csr_matrix) –
csr_matrix
-Transform array of intensities into boolean array with True if > cutoff and False otherwise.
-intensities (csr_matrix) – observed or predicted intensities, array of length 174
mask (csr_matrix) – mask with True for invalid values in the observed intensities array, boolean array of length 174
cutoff (float) – minimum intensity value to be considered a peak, for observed intensities use the default cutoff of 0.0, for predicted intensities, set a cutoff, e.g. 0.05
boolean array, array of length 174
-csr_matrix
-Bases: IntEnum
States.
-4: not seen in either
3: predicted but not in observed
2: seen in both
1: observed but not in predicted
0: invalid
Bases: object
Main to init a Metric obj.
-pred_intensities (ndarray | csr_matrix | None) –
true_intensities (ndarray | csr_matrix | None) –
mz (ndarray | csr_matrix | None) –
xl (bool) –
Bases: Metric
Expects the following metadata columns.
-RAW_FILE -SCAN_NUMBER -MODIFIED_SEQUENCE: sequence with modifications -SEQUENCE: sequence without modifications -CHARGE: precursor charge state -MASS: experimental precursor mass -CALCULATED_MASS: calculated mass based on sequence and modifications -SCORE: Andromeda score -REVERSE: does the sequence come from the reversed (=decoy) database -FRAGMENTATION: fragmentation method, e.g. HCD, CID -RETENTION_TIME: observed retention time -PREDICTED_RETENTION_TIME: predicted retention time by Prosit
-metadata (DataFrame) –
input_type (str) –
pred_intensities (ndarray | csr_matrix | None) –
true_intensities (ndarray | csr_matrix | None) –
mz (ndarray | csr_matrix | None) –
all_features_flag (bool) –
regression_method (str) –
fdr_cutoff (float) –
Add features used by both Andromeda and Prosit feature scoring sets.
-Add metadata columns needed by percolator, e.g. to identify a PSM.
-Applies a linear discriminant analysis on the features calculated so far (before retention time alignment) to estimate false discovery rates (FDRs).
-initial_scoring_feature (str) – name of the initial scoring feature
fdr_cutoff (float) – FDR cutoff as float
array with indices below FDR
-Adds percolator metadata and feature columns to metrics_val based on PSM metadata.
-Calculate FDR.
-sorted_labels (Series | ndarray) – array with labels sorted (target, decoy)
-array with calculated FDRs
-ndarray
-Calculate mass difference.
-metadata_subset (Tuple[float, float]) – experimental and calculated mass as tuple
-mass difference
-float
-Calculate mass difference in ppm.
-metadata_subset (Tuple[float, float]) – experimental and calculated mass as tuple
-mass difference in ppm
-float
-Count number of arginines and lysines.
-sequence (str) – peptide sequence
-number of arginines and lysines
-int
-Count number of missed cleavages assuming Trypsin/P proteolysis.
-sequence (str) – peptide sequence
-number of missed cleavages
-int
-Converts FDRs to q-values.
-fdrs (ndarray) – array with FDRs
-array with qvals
-ndarray
-Apply regression to find a mapping from predicted iRT values to experimental retention times.
-observed_retention_times_fdr_filtered (ndarray | Series) – observed retention times after FDR filter
predicted_retention_times_fdr_filtered (ndarray | Series) – predicted retention times after FDR filter
predicted_retention_times_all (ndarray | Series) – all predicted retention times
curve_fitting_method (str) – method for curve fitting (lowess, spline, or logistic)
aligned predicted retention times
-ndarray
-Calculates delta scores by sorting (from high to low) and grouping PSMs by scan number.
-Inside each group the delta scores are calculated per PSM to the next best of that group. -The lowest scoring PSM of each group receives a delta score of 0. -:param scores_df: must contain two columns: scoring_feature (eg. ‘spectral_angle’) and ‘ScanNr’ -:param scoring_feature: feature name to get the delta scores of -:raises NotImplementedError: If there is only one unique value for ScanNr in the scores_df. -:return: numpy array of delta scores
-scores_df (DataFrame) –
scoring_feature (str) –
ndarray
-Get indices below FDR.
-feature_name (str) – name of the feature to sort by as string
fdr_cutoff (float) – FDR cutoff as float
array with indices below FDR
-ndarray
-Create a unique identifier used as spectrum id in percolator, this is not parsed by percolator but functions as a key to map percolator results back to our internal representation.
-metadata_subset (Series | Tuple) – tuple of (raw_file, scan_number, modified_sequence, charge and optionally scan_event_number)
-percolator spectrum id
-str
-Get target or decoy label.
-reverse (bool) – if true, return the label for DECOY, otherwise return the label for TARGET
-target/decoy label for percolator
-Sample balanced over bins.
-retention_time_df (DataFrame) – DataFrame with observed and predicted retention times
sample_size (int) – number of samples
RT Index
-Index
-Bases: IntEnum
Target and decoy labels as used by Percolator.
-Retrieve the correct function given a curve fitting method.
-curve_fitting_method (str) – method for curve fitting (lowess, spline, or logistic)
-ValueError – if an invalid curve_fitting_method is supplied
-Callable that accepts x and y, i.e. fit_func(x,y) where x are the data points and y -are the corresponding measures for which the fit should be done.
-Bases: Metric
Class to generate several features than can be used by percoltor for rescoring.
-pred_intensities (ndarray | csr_matrix | None) –
true_intensities (ndarray | csr_matrix | None) –
mz (ndarray | csr_matrix | None) –
xl (bool) –
Calculate several similarity metrics.
-observed_intensities (csr_matrix) – observed intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), array of length 174
predicted_intensities (csr_matrix) – predicted intensities, see observed_intensities for details, array of length 174
metric (str) – metric (mean, std, q1, q2, q3, min, max, or mse)
calculated similarity values
-List[float]
-Adds columns with spectral angle feature to metrics_val dataframe.
-all_features (bool) – if True, calculate all metrics
xl (bool) – whether calculating for crosslinked or linear peptides
Helper function to calculcate quantiles.
-observed (ndarray) – observed intensities
predicted (ndarray) – predicted intensities
quantile (str) – quantile method
calculated quantile
-float
-Calculate correlation between observed and predicted.
-observed_intensities (csr_matrix) – observed intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), array of length 174
predicted_intensities (csr_matrix) – predicted intensities, see observed_intensities for details, array of length 174
charge (int) – to filter by the peak charges, 0 means everything
method (str) – either pearson or spearman
xl (bool) – wheter or not to use xl mode
ValueError – if charge is smaller than 1 or larger than 5
-calculated correlations
-List[float]
-Calculate cosine similarity.
-observed_intensities (csr_matrix) – observed intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), array of length 174
predicted_intensities (csr_matrix) – predicted intensities, see observed_intensities for details, array of length 174
cosine values
-List[float]
-Compute the l2-norm (sqrt(sum(x^2) ) for each row of the matrix.
-matrix – matrix with intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), matrix of size (nspectra, 174)
-vector with rowwise norms of the matrix
-ndarray
-Calculate modified cosine similarity as defined in Chris D. McGann et al. (Real-time spectral library matching for sample multiplexed quantitative proteomics).
-observed_intensities (csr_matrix | ndarray) – observed intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), array of length 174
predicted_intensities (csr_matrix | ndarray) – predicted intensities, see observed_intensities for details, array of length 174
observed_mz (csr_matrix | ndarray) – observed mz values
theoretical_mz (csr_matrix | ndarray) – theoretical mz values
calculates cosine values
-List[float]
-Calculate rowwise dot product.
-observed_intensities (csr_matrix | ndarray) – observed intensities, constants.EPSILON intensity indicates zero intensity peaks, -0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), -array of length 174
predicted_intensities (csr_matrix | ndarray) – predicted intensities, see observed_intensities for details, array of length 174
matrix containing the rowwise dotproduct
-ndarray
-Calculate spectral angle.
-observed_intensities (csr_matrix | ndarray) – observed intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), array of length 174
predicted_intensities (csr_matrix | ndarray) – predicted intensities, see observed_intensities for details, array of length 174
charge (int) – to filter by the peak charges, 0 means everything
xl (bool) – whether operating on cleavable crosslinked or linear peptides
ValueError – if charge is smaller than 1 or larger than 5
-SA values
-ndarray
-Calculate spectral entropy similarity as defined in Li et al. (Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification).
-observed_intensities (csr_matrix | ndarray) – observed intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), array of length 174
predicted_intensities (csr_matrix | ndarray) – predicted intensities, see observed_intensities for details, array of length 174
spectral entropy similarity values
-List[float]
-Normalize each row of the matrix such that the norm equals 1.0.
-matrix (csr_matrix | ndarray) – matrix with intensities, constants.EPSILON intensity indicates zero intensity peaks, 0 intensity indicates invalid peaks (charge state > peptide charge state or position >= peptide length), matrix of size (nspectra, 174)
-normalized matrix
-csr_matrix | ndarray
-Return a callable function for a given metric shortcut.
-metric (str) – a shortcut for the desired metric.
-ValueError – if the provided metric is not known
-callable metric function
-Initialize metrics.
-