Skip to content

Commit

Permalink
Lint with Ruff and add other pre-commit hooks (#98)
Browse files Browse the repository at this point in the history
* Migrate to linting with Ruff

* Bump changelog

* Fix error from pandas update

* Bump changelog

* Remove debugging print statements

* Fix lurking bug and improve test coverage

* Bump changelog

* Make suggested edits

* Catch prints with Ruff

* Use extend-exclude instead of exclude for Ruff
  • Loading branch information
wfondrie authored Apr 10, 2023
1 parent ccafcf2 commit da2d545
Show file tree
Hide file tree
Showing 23 changed files with 146 additions and 92 deletions.
15 changes: 8 additions & 7 deletions .github/workflows/black.yml → .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,19 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Python 3
- name: Setup Python 3.10
uses: actions/setup-python@v4
with:
python-version: "3.10"

- name: Install Ruff
run: |
python -m pip install --upgrade pip
pip install ruff
- name: Run black
uses: psf/black@stable

- name: Check for debugging print statements
- name: Lint with Ruff
run: |
if grep -rq "print(" mokapot; then
echo "Found the following print statements:"
grep -r "print(" mokapot
exit 1
fi
ruff check . --format=github
23 changes: 18 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,19 @@
repos:
- repo: https://github.com/psf/black
rev: 23.1.0 # Replace by any tag/version: https://github.com/psf/black/tags
hooks:
- id: black
language_version: python3 # Should be a command that runs python3.6+
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-toml
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- id: detect-private-key
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.261
hooks:
- id: ruff
args: ['--fix']
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
language_version: python3.10
73 changes: 40 additions & 33 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
# Changelog for mokapot
# Changelog for mokapot

## [Unreleased]
### Breaking changes
- Mokapot now uses `numpy.random.Generator` instead of the deprecated `numpy.random.RandomState` API.
- Mokapot now uses `numpy.random.Generator` instead of the deprecated `numpy.random.RandomState` API.
New `rng` arguments have been added to functions and classes that rely on randomness in lieu of setting a global random seed with `np.random.seed()`. Thanks @sjust-seerbio!

### Changed
- Added linting with Ruff to tests and pre-commit hooks (along with others)!

### Fixed
- The PepXML reader, which broke due to a Pandas update.
- Potential bug if lowercase peptide sequences were used and protein-level confidence estimates were enabled

## [0.9.1] - 2022-12-14
### Changed
- Cross-validation classes are now detected by looking for inheritance from the `sklearn.model_selection._search.BaseSearchCV` class.
Expand All @@ -30,8 +37,8 @@

## [0.8.2] - 2022-07-18
### Added
- `mokapot.Model()` objects now recored the CV fold that they were fit on.
This means that they can be provided to `mokapot.brew()` in any order
- `mokapot.Model()` objects now recorded the CV fold that they were fit on.
This means that they can be provided to `mokapot.brew()` in any order
and still maintain proper cross-validation bins.

### Fixed
Expand All @@ -41,7 +48,7 @@
## [0.8.1] - 2022-06-24

### Added
- Support for previously trained models in the `brew()` function and the CLI
- Support for previously trained models in the `brew()` function and the CLI
using the `--load_models` argument. Thanks @sambenfredj!

### Fixed
Expand All @@ -51,18 +58,18 @@

## [0.8.0] - 2022-03-11

Thanks to @sambenfredj, @gessulat, @tkschmidt, and @MatthewThe for
Thanks to @sambenfredj, @gessulat, @tkschmidt, and @MatthewThe for
PR #44, which made these things happen!

### Added
- A new command line argument, `--max_workers`. This allows the
cross-validation folds to be computed in parallel.
- The `PercolatorModel` class now has an `n_jobs` parameter, which
- The `PercolatorModel` class now has an `n_jobs` parameter, which
controls parallelization of the grid search.

### Changes
- Improved speed by using multiple jobs for grid search by default.
- Parallelization within `mokapot.brew()` now uses `joblib`
- Parallelization within `mokapot.brew()` now uses `joblib`
instead of `concurrent.futures`.

## [0.7.4] - 2021-09-03
Expand All @@ -75,37 +82,37 @@ PR #44, which made these things happen!
- Fixed bug where the `--keep_decoys` did not work with `--aggregate`. Also,
added tests to cover this. Thanks @jspaezp!

## [0.7.2] - 2021-07-16
### Added
## [0.7.2] - 2021-07-16
### Added
- `--keep_decoys` option to the command line interface. Thanks @jspaezp!
- Notes about setting a random seed to the Python API documentation. (Issue #30)
- Added more information about peptides that couldn't be mapped to proteins. (Issue #29)
- Added more information about peptides that couldn't be mapped to proteins. (Issue #29)

### Fixed
### Fixed
- Loading a saved model with `mokapot.load_model()` would fail because of an
update to Pandas that introduced a new exception. We've updated mokapot
update to Pandas that introduced a new exception. We've updated mokapot
accordingly.

### Changed
### Changed
- Updates to unit tests. Warnings are now treated as errors for system tests.

## [0.7.1] - 2021-03-22
### Changed
## [0.7.1] - 2021-03-22
### Changed
- Updated the build to align with
[PEP517](https://www.python.org/dev/peps/pep-0517/)

## [0.7.0] - 2021-03-19
### Added
## [0.7.0] - 2021-03-19
### Added
- Support for downstream peptide and protein quantitation with
[FlashLFQ](https://github.com/smith-chem-wisc/FlashLFQ). This is accomplished
through the `mokapot.to_flashlfq()` function or the `to_flashlfq()` method of
`LinearConfidence` objects. Note that to support the FlashLFQ format, you'll
need to specify additional columns in `read_pin()` or use a PepXML input file
(`read_pepxml()`).
(`read_pepxml()`).
- Added a top-level function for exporting confident PSMs, peptides, and
proteins from one or more `LinearConfidence` objects as a tab-delimited file:
`mokapot.to_txt()`.
- Added a top-level function for reading FASTA files for protein-level
- Added a top-level function for reading FASTA files for protein-level
confidence estimates: `mokapot.read_fasta()`.
- Tests accompanying the support for the features above.
- Added a "mokapot cookbook" to the documentation with helpful code snippets.
Expand All @@ -120,39 +127,39 @@ PR #44, which made these things happen!
`importlib.metadata` to the standard library, saving a few hundred
milliseconds.

## [0.6.2] - 2021-03-12
## [0.6.2] - 2021-03-12
### Added
- Now checks to verify there are no debugging print statements in the code
base when linting.

### Fixed
### Fixed
- Removed debugging print statements.

## [0.6.1] - 2021-03-11
### Fixed
- Parsing Percolator tab-delimited files with a "DefaultDirection" line.
- `Label` column is now converted to boolean during PIN file parsing.
- `Label` column is now converted to boolean during PIN file parsing.
Previously, problems occurred if the `Label` column was of dtype `object`.
- Parsing modifications from pepXML files were indexed incorrectly on the
peptide string.

## [0.6.0] - 2021-03-03
### Added
## [0.6.0] - 2021-03-03
### Added
- Support for parsing PSMs from PepXML input files.
- This changelog.

### Fixed
- Parsing a FASTA file previously failed if an entry was not followed by a
sequence. Now, missing sequences are tolerated and a warning is given instead.
### Fixed
- Parsing a FASTA file previously failed if an entry was not followed by a
sequence. Now, missing sequences are tolerated and a warning is given instead.
- When the learned model was worse than the best feature and the lower scores
were better for the best feature, assigning confidence would fail.
were better for the best feature, assigning confidence would fail.
- Easy access to grouped confidence estimates in the Python API were not working
due to a typo.
- Deprecation warnings from Pandas about the `regex` argument.
due to a typo.
- Deprecation warnings from Pandas about the `regex` argument.
- Sometimes peptides were removed as shared incorrectly when part of a protein
group.
group.

### Changed
### Changed
- Refactored and added many new unit and system tests.
- New pull-requests must now improve or maintain test coverage.
- Improved error messages.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
copyright = "2022, William E. Fondrie"
author = "William E. Fondrie"

import mokapot
import mokapot # noqa: E402

version = str(mokapot.__version__)
release = version
Expand Down
4 changes: 1 addition & 3 deletions mokapot/confidence.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,7 @@
"""
import copy
import logging
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from triqler import qvality
Expand Down Expand Up @@ -651,7 +649,7 @@ def plot_qvalues(qvalues, threshold=0.1, ax=None, **kwargs):

ax.set_xlim(0 - xmargin, threshold + xmargin)
ax.set_xlabel("q-value")
ax.set_ylabel(f"Discoveries")
ax.set_ylabel("Discoveries")

ax.step(qvals["qvalue"].values, qvals.num.values, where="post", **kwargs)

Expand Down
4 changes: 2 additions & 2 deletions mokapot/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ class MokapotHelpFormatter(argparse.HelpFormatter):
"""Format help text to keep newlines and whitespace"""

def _fill_text(self, text, width, indent):
text_list = text.splitlines(keepends=True)
return "\n".join(_process_line(l, width, indent) for l in text_list)
lines = text.splitlines(keepends=True)
return "\n".join(_process_line(txt, width, indent) for txt in lines)


class Config:
Expand Down
1 change: 0 additions & 1 deletion mokapot/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
import copy
import logging
import pickle
import warnings

import numpy as np
import pandas as pd
Expand Down
2 changes: 1 addition & 1 deletion mokapot/mokapot.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def main():
model = list(plugin_models.values())[0]

if model is None:
logging.debug(f"Loading Percolator model.")
logging.debug("Loading Percolator model.")
model = PercolatorModel(
train_fdr=config.train_fdr,
max_iter=config.max_iter,
Expand Down
2 changes: 2 additions & 0 deletions mokapot/parsers/pepxml.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,8 @@ def _log_features(col, features):
"""
if col.name not in features:
return col
elif col.dtype == "bool":
return col.astype(float)

col = col.astype(str).str.lower()

Expand Down
3 changes: 1 addition & 2 deletions mokapot/parsers/pin.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
import gzip
import logging

import numpy as np
import pandas as pd

from .. import utils
Expand Down Expand Up @@ -234,7 +233,7 @@ def _parse_in_chunks(file_obj, columns, chunk_size=int(1e8)):
if not psms:
break

psms = [l.rstrip().split("\t", len(columns) - 1) for l in psms]
psms = [p.rstrip().split("\t", len(columns) - 1) for p in psms]
psms = pd.DataFrame.from_records(psms, columns=columns)
yield psms.apply(pd.to_numeric, errors="ignore")

Expand Down
4 changes: 0 additions & 4 deletions mokapot/peptides.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,6 @@
"""
from collections import defaultdict

import numpy as np
import pandas as pd
import numba as nb


def match_decoy(decoys, targets, ignore_mods=True):
"""Find a corresponding target for each decoy.
Expand Down
53 changes: 37 additions & 16 deletions mokapot/picked_protein.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
confidence estimates.
"""
import logging
import numpy as np
import pandas as pd

from .peptides import match_decoy
Expand Down Expand Up @@ -42,21 +41,7 @@ def picked_protein(
columns={peptide_column: "best peptide"}
)

# Strip modifications and flanking AA's from peptide sequences.
prots["stripped sequence"] = (
prots["best peptide"]
.str.replace(r"[\[\(].*?[\]\)]", "", regex=True)
.str.replace(r"^.*?\.", "", regex=True)
.str.replace(r"\..*?$", "", regex=True)
)

# Sometimes folks use lowercase letters for the termini or mods:
if all(prots["stripped sequence"].str.islower()):
seqs = prots["stripped sequence"].upper()
else:
seqs = prots["stripped sequence"].str.replace(r"[a-z]", "", regex=True)

prots["stripped sequence"] = seqs
prots["stripped sequence"] = strip_peptides(prots["best peptide"])

# There are two cases we need to deal with:
# 1. The fasta contained both targets and decoys (ideal)
Expand Down Expand Up @@ -131,6 +116,42 @@ def picked_protein(
return prots.loc[prot_idx, final_cols]


def strip_peptides(sequences):
"""Strip modifications and flanking AA's from peptide sequences.
Parameters
----------
sequences : pandas.Series
The peptide sequences.
Returns
-------
pandas.Series
The stripped peptide sequences.
Example
-------
>>> pep = pd.Series(["A.LES[+79.]LIEK.A"])
>>> srip_peptides(pep)
0 LESLIEK
dtype: object
"""
# Strip modifications and flanking AA's from peptide sequences.
sequences = (
sequences.str.replace(r"[\[\(].*?[\]\)]", "", regex=True)
.str.replace(r"^.*?\.", "", regex=True)
.str.replace(r"\..*?$", "", regex=True)
)

# Sometimes folks use lowercase letters for the termini or mods:
if all(sequences.str.islower()):
sequences = sequences.str.upper()
else:
sequences = sequences.str.replace(r"[a-z]", "", regex=True)

return sequences


def group_with_decoys(peptides, proteins):
"""Retrieve the protein group in the case where the FASTA has decoys.
Expand Down
1 change: 0 additions & 1 deletion mokapot/qvalues.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
This module estimates q-values.
"""
import numpy as np
import pandas as pd
import numba as nb


Expand Down
Loading

0 comments on commit da2d545

Please sign in to comment.