Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: DLomix integration #250

Merged
merged 172 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from 171 commits
Commits
Show all changes
172 commits
Select commit Hold shift + click to select a range
afac1e9
Store working state to squash later
JSchlensok Jun 20, 2024
9958289
feat: local intensity prediction using DLomix
JSchlensok Jul 8, 2024
f5f044b
Merge remote-tracking branch 'origin/development' into feature/dlomix…
JSchlensok Jul 8, 2024
0f9c8fe
chore: fix typo in docstrings
JSchlensok Jul 8, 2024
ceb2ed9
chore: Clean up a bit
JSchlensok Jul 8, 2024
2f5ce34
fix: add missing annotation array in DLomix.predict
JSchlensok Jul 8, 2024
0b08966
feat: parametrize DLomix inference batch size
JSchlensok Jul 8, 2024
cf4aae8
chore: formatting
JSchlensok Jul 8, 2024
c40d7f6
feat: parametrize DLomix inference batch size
JSchlensok Jul 8, 2024
67d4200
feat: Implement local intensity prediction via DLomix
JSchlensok Jul 26, 2024
689268a
Merge remote-tracking branch 'origin/development' into feature/dlomix…
JSchlensok Jul 26, 2024
e487c3c
chore(pre-commit): keep runtime typing
JSchlensok Aug 5, 2024
68ab9ca
chore: update dependencies
JSchlensok Aug 5, 2024
5a8f3fd
feat(data): additional preprocessing of spectra
JSchlensok Aug 5, 2024
9d51ff5
chore: more consistent spelling & formatting
JSchlensok Aug 5, 2024
deec9cd
feat: refinement learning of intensity predictor
JSchlensok Aug 5, 2024
4e90290
chore: expose DLomix & Koina interfaces through public prediction API
JSchlensok Aug 5, 2024
613238e
chore: Some housekeeping for mypy & flake8
JSchlensok Aug 5, 2024
49f2d0d
chore: encrypt Koina connection by default
JSchlensok Aug 5, 2024
264fe3d
refactor(config): tidy up config validation
JSchlensok Aug 5, 2024
f0fda1a
feat(config): validate config for local prediction/refinement learning
JSchlensok Aug 5, 2024
0f35f6f
chore: add autosectionlabels to sphinx
JSchlensok Aug 6, 2024
ba720e2
docs: Include local prediction/refinement learning
JSchlensok Aug 6, 2024
db5b149
refactor(tests): Enforce test suite naming convention
JSchlensok Aug 6, 2024
e085829
test(config): Add tests for verification of optional dependencies
JSchlensok Aug 6, 2024
be4ba6d
test(predict): Add stub test cases for local prediction & refinement …
JSchlensok Aug 6, 2024
5d22df4
Merge remote-tracking branch 'origin/development' into feature/dlomix…
JSchlensok Aug 6, 2024
dcec9de
chore: Resolve spectrum_io-side TODO
JSchlensok Aug 6, 2024
00df47a
chore: add ProcessStep for refinement learning
JSchlensok Aug 6, 2024
10ef46f
fix(ML dataset processing): column name handling
JSchlensok Aug 6, 2024
3b53a62
fix(DLomix interface): correctly handle model path
JSchlensok Aug 6, 2024
bddb31e
perf(DLomix): reduce batch size to not blow up GPU
JSchlensok Aug 6, 2024
12f2eee
fix(DLomix): Pin DLomix dependency
JSchlensok Aug 6, 2024
66f4177
fix(DLomix data preprocessing): ensure column name consistency
JSchlensok Aug 6, 2024
4258ab9
fix(config): actually check config
JSchlensok Aug 6, 2024
b1250b7
chore: remove outdated TODO
JSchlensok Aug 6, 2024
b697be8
refactor(config): handle baseline model download more gracefully
JSchlensok Aug 6, 2024
46efe67
build: Specify DLomix as extra instead of optional group
JSchlensok Aug 7, 2024
c6e5f3e
fix: Manually install DLomix in Nox session
JSchlensok Aug 7, 2024
8f03d64
refactor(test): use proper tempfile for garbage config
JSchlensok Aug 7, 2024
8697ca2
Merge remote-tracking branch 'origin/development' into feature/dlomix…
JSchlensok Aug 7, 2024
a5f112a
fix: double dependency from merge mess-up
JSchlensok Aug 7, 2024
1592421
feat(dlomix): separate data & logging directories
JSchlensok Aug 7, 2024
cc85cfd
fix: ETD fragmentation encoding not yet in spectrum_fundamentals
JSchlensok Aug 7, 2024
bf74a34
style: typo
JSchlensok Aug 7, 2024
d1a8e9f
style(data): return type annotations for inplace methods
JSchlensok Aug 7, 2024
57e53e6
fix(data): replace non-abbreviated fragmentation method names
JSchlensok Aug 7, 2024
6dc7323
feat: completely mute TensorFlow output on import
JSchlensok Aug 7, 2024
2a3c086
fix(tests): rename broken test
JSchlensok Aug 7, 2024
09d7298
fix(tests): add missing import
JSchlensok Aug 7, 2024
b8380ee
refactor(tests): switch to class-based fixtures
JSchlensok Aug 7, 2024
20afd9a
fix(tests): manually remove DLomix for optional dependency tests
JSchlensok Aug 7, 2024
58a071a
test: remove obsolete WandB dependency test
JSchlensok Aug 7, 2024
2220d7f
refactor(config): remove obsolete check for WandB installation
JSchlensok Aug 7, 2024
875c649
refactor(dlomix): enforce consistent path of downloaded baseline model
JSchlensok Aug 7, 2024
6b017b1
feat(dlomix): infer model type from name
JSchlensok Aug 7, 2024
b11afd8
style: pre-commit
JSchlensok Aug 7, 2024
eaf3bb3
refactor(runner): remove degenerate kwargs dict
JSchlensok Aug 7, 2024
c27a2f9
refactor(dlomix): streamline local model checking
JSchlensok Aug 7, 2024
27220c6
refactor(predictor): correct type annotations
JSchlensok Aug 7, 2024
007e0cd
refactor(predictor): remove unused prediction method
JSchlensok Aug 7, 2024
433fe84
docs: add missing return type
JSchlensok Aug 7, 2024
98b716f
refactor(data): remove unused return value
JSchlensok Aug 7, 2024
f1f29b6
fix: typos
JSchlensok Aug 7, 2024
ccaea78
docs: typo
JSchlensok Aug 7, 2024
b414f14
fix: more typos
JSchlensok Aug 7, 2024
e82901f
fix: typos galore (need coffee)
JSchlensok Aug 7, 2024
82f71a7
chore: remove dangling TODO
JSchlensok Aug 7, 2024
1f6ccd0
fix: properly pass kwargs to dlomix/koina
JSchlensok Aug 7, 2024
5c0860a
fix: typo
JSchlensok Aug 7, 2024
f7ee4b8
fix: typo
JSchlensok Aug 7, 2024
a2a4d27
chore: Update to spectrum_fundamentals 0.6.1
JSchlensok Aug 8, 2024
4a77a53
chore: more robust type checking
JSchlensok Aug 8, 2024
bb74487
refactor(data): straighten out _gen_vars_df
JSchlensok Aug 8, 2024
82a0d1d
chore (dlomix): Depend on spectrum_fundamentals.constants instead of …
JSchlensok Aug 8, 2024
76c3b90
chore: exclude TYPE_CHECKING blocks from coverage statistics
JSchlensok Aug 8, 2024
b75faa5
feat(dlomix): Improve fragment ion annotation handling
JSchlensok Aug 8, 2024
118cdfa
chore: manually install unreleased spectrum-fundamentals for testing …
JSchlensok Aug 8, 2024
490861d
chore: update packages, fix requirements for pip-based install
JSchlensok Aug 9, 2024
c574422
fix(data): sanitize fragmentation method keys in preprocessing instea…
JSchlensok Aug 9, 2024
e82fc95
fix(data): infer dtype correctly when generating var_df
JSchlensok Aug 9, 2024
07c927d
fix(data): properly handle intensity dataframe to nested array conver…
JSchlensok Aug 9, 2024
80a9d40
tests(spectra): fix broken spectra tests
JSchlensok Aug 9, 2024
6c4be71
fix(preprocessing): typo
JSchlensok Aug 9, 2024
74a7585
fixed custom mods tokens
Aug 9, 2024
dc611a5
chore: update requirements to support z● ions
JSchlensok Aug 9, 2024
761e456
Merge remote-tracking branch 'origin/feature/dlomix-integration' into…
JSchlensok Aug 9, 2024
c4d71ef
style: formatting
JSchlensok Aug 9, 2024
0936121
feat(data): support z● ions
JSchlensok Aug 9, 2024
f6f1be6
chore: remove outdated parameters
JSchlensok Aug 9, 2024
3b5d28f
fix(preprocessing): standardize fragmentation method names in all run…
JSchlensok Aug 9, 2024
59c1fed
fix(dlomix): properly tile annotations of prediction
JSchlensok Aug 9, 2024
1c41922
Include new DLomix changes
JSchlensok Aug 9, 2024
0b6ac74
fix(dlomix): include z● ions in ion type ordering
JSchlensok Aug 9, 2024
9c949e9
chore: switch to revised spectrum_fundamentals fragment ion annotations
JSchlensok Aug 10, 2024
fa46da3
feat(dlomix): parametrize improve_further
JSchlensok Aug 12, 2024
12f5fc7
style: spelling
JSchlensok Aug 12, 2024
b51d0c5
refactor(dlomix): move standard out muting to utils
JSchlensok Aug 12, 2024
e0c6ce7
style(dlomix): spelling
JSchlensok Aug 12, 2024
6092c38
refactor(dlomix): cleanup
JSchlensok Aug 12, 2024
7872ea5
chore: switch to spectrum_fundamentals dev branch
JSchlensok Aug 12, 2024
79c632a
fix(dlomix): typo
JSchlensok Aug 12, 2024
1a4e1b9
style(dlomix): clearer variable naming
JSchlensok Aug 12, 2024
506591d
style(alignment): fix misleading docstring
JSchlensok Aug 12, 2024
5d0ed8b
fix(alignment): handle spectral libraries with <1000 matching spectra…
JSchlensok Aug 12, 2024
237f84c
fix(preprocessing): remove redundant ion type annotation
JSchlensok Aug 12, 2024
ea04152
fix(spectra): outdated constant reference
JSchlensok Aug 12, 2024
feb6a5b
style: pre-commit
JSchlensok Aug 12, 2024
ce635bd
fix(noxfile): Install correct spectrum_fundamentals branch for testing
JSchlensok Aug 12, 2024
a011be2
fix: typos
JSchlensok Aug 13, 2024
6b2cf63
refactor(alignment): adopt alignment df with <1000 spectra from https…
JSchlensok Aug 13, 2024
0e7158d
fix: iron out inconsistencies
JSchlensok Aug 13, 2024
399b2eb
Merge branch 'development' into feature/dlomix-integration
picciama Aug 13, 2024
16f8c03
updated fundamentals dep
picciama Aug 13, 2024
e9826db
fix(dlomix): Column name case in refinement training dataset
JSchlensok Aug 13, 2024
13eef9f
docs(dlomix): update config parameters
JSchlensok Aug 13, 2024
3ac452a
docs: fix config table indentation
JSchlensok Aug 13, 2024
26bf5e8
chore: remove dangling TODO
JSchlensok Aug 13, 2024
ffd0696
tests: install git dependencies for typeguard session
JSchlensok Aug 13, 2024
a41cae6
style: formatting
JSchlensok Aug 13, 2024
b17898b
tests(data): add tests for additional ion types
JSchlensok Aug 13, 2024
1fa5e6b
tests: add unfinished tests for alignment & prediction
JSchlensok Aug 13, 2024
c569834
Merge remote-tracking branch 'origin/feature/dlomix-integration' into…
JSchlensok Aug 13, 2024
bd7bdcd
style: formatting
JSchlensok Aug 13, 2024
8333b0b
chore: remove outdated spectrum_fundamentals git dependency
JSchlensok Aug 13, 2024
24f1aac
added koinapy and extend superclass
picciama Aug 16, 2024
1d84260
Merge remote-tracking branch 'origin/development' into feature/dlomix…
JSchlensok Aug 20, 2024
a75979e
feat(dlomix): include original & modified sequence in refinement dataset
JSchlensok Aug 23, 2024
d46901a
feat(dlomix): skip CE calibration when refinement learning
JSchlensok Aug 23, 2024
440e261
chore: update dependencies
JSchlensok Aug 23, 2024
9d796b0
Revert "feat(dlomix): include original & modified sequence in refinem…
JSchlensok Aug 23, 2024
525fdbf
feat(dlomix): pass raw modified sequence to DLomix for downstream ana…
JSchlensok Aug 23, 2024
09e2462
fix(dlomix): Keep decoys in inference data
JSchlensok Aug 24, 2024
4c93b94
fix(predict): don't predict iRT for citrullination
JSchlensok Aug 27, 2024
2b13564
chore: set spectrum-io dependency to hotfix
JSchlensok Aug 27, 2024
b6f149b
fix: plot_pred_rt_vs_irt failed when having a perfect prediction (art…
juli-p Aug 29, 2024
26d2d2b
style: generalize search engine score threshold variable naming
JSchlensok Sep 11, 2024
aeebef2
chore: upgrade from 3.8 to 3.9 in overlooked spots
JSchlensok Sep 11, 2024
3aef3e6
style: remove dangling commented-out code
JSchlensok Sep 11, 2024
659b488
style: correct grammar
JSchlensok Sep 11, 2024
b9e2ffc
docs: clear up phrasing
JSchlensok Sep 11, 2024
7621db2
docs: explicitly add default parameters to rescoring config example
JSchlensok Sep 11, 2024
8251b43
chore: formatting
JSchlensok Sep 11, 2024
0364c84
refactor: separate batch size between speclib generation & DLomix inf…
JSchlensok Sep 11, 2024
8fb2b0e
fix: typo
JSchlensok Sep 11, 2024
c837c12
tests: remove non-existing model path
JSchlensok Sep 11, 2024
c2aeff3
feat(spectra): implement duplicate filtering
JSchlensok Sep 11, 2024
563b1fc
refactor: remove unnecessary stardardization of fragmentation method …
JSchlensok Sep 11, 2024
814a5fc
fix(predict): make predictor implementations take arbitrary kwargs
JSchlensok Sep 11, 2024
b8ecf3f
fix(predict): only import dlomix module if dlomix is installed
JSchlensok Sep 11, 2024
ebdb12f
chore: fix pre-commit complaints
JSchlensok Sep 11, 2024
76ff7f9
chore: pyupgrade 3.8->3.9
JSchlensok Sep 11, 2024
1d87b0f
Merge branch 'development' into feature/dlomix-integration
picciama Sep 12, 2024
8790992
Merge branch 'feature/dlomix-integration' into chore/switch_to_koinapy
picciama Sep 12, 2024
746eeb4
fixed shape issue when transforming to dict
picciama Sep 12, 2024
e0b4a87
fix: remove obsolete kwarg for Koina
JSchlensok Sep 13, 2024
20af248
fix(dlomix): clean up arbitrary kwarg passing to predictor interface …
JSchlensok Sep 13, 2024
b655dd4
Merge pull request #257 from wilhelm-lab/chore/switch_to_koinapy
JSchlensok Sep 13, 2024
8585705
refactor(dlomix): generate zero iRT predictions through predictor int…
JSchlensok Sep 13, 2024
eb15144
style: ignore complexity score of methods in runner
JSchlensok Sep 13, 2024
09e498a
style: formatting
JSchlensok Sep 13, 2024
0320f10
style: pre-commit
JSchlensok Sep 13, 2024
9222673
tests: comment out unfinished tests
JSchlensok Sep 13, 2024
9484cbe
tests: fix data type
JSchlensok Sep 13, 2024
151f657
tests: fix method call for alphapept
JSchlensok Sep 13, 2024
1af6a94
style: formatting
JSchlensok Sep 13, 2024
4656588
dix speclib: don't pickle global predictor object
picciama Sep 13, 2024
21f4b6a
don't create explicit cast copy
picciama Sep 13, 2024
8d3d6d1
fix xdoctest
picciama Sep 13, 2024
9c24d1c
fix typeguard: use df instead of Spectra object
picciama Sep 13, 2024
3490afa
fix xdoctest
picciama Sep 13, 2024
9be535c
readded Spectra instead of df + added dlomix check
picciama Sep 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ per-file-ignores =
tests/*:S101
**/__init__.py:F401,F403
docs/conf.py:S404,S607,S603
oktoberfest/runner.py:S301,S403
oktoberfest/runner.py:C901,S301,S403
oktoberfest/predict/dlomix.py:E402
docstring_style = sphinx
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -151,5 +151,8 @@ tutorials/
# example data
data/

# Machine learning artifacts
wandb/

# doctest IO files
tests/doctests/
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ repos:
entry: pyupgrade
language: system
types: [python]
args: [--py38-plus]
args: [--py39-plus, --keep-runtime-typing]
- id: trailing-whitespace
name: Trim Trailing Whitespace
entry: trailing-whitespace-fixer
Expand Down
35 changes: 24 additions & 11 deletions docs/API.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Preprocessing: :code:`pp`
.. currentmodule:: oktoberfest

Generating libraries
~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/pp
Expand All @@ -31,6 +32,7 @@ Generating libraries
pp.annotate_spectral_library

Spectra preprocessing
~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/pp
Expand All @@ -42,6 +44,7 @@ Spectra preprocessing


Peptide preprocessing
~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/pp
Expand All @@ -57,33 +60,43 @@ Peptide preprocessing

Predicting: :code:`pr`
----------------------
.. TODO
add full class documentation through autosummary

.. module:: oktoberfest.pr

.. currentmodule:: oktoberfest

Access to functions that communicate with a Koina server to retrieve predictions from various prediction models.
Access to functions that interface either a Koina server to retrieve predictions from various prediction models, or DLomix to serve & refinement-learn pre-trained models locally.

High level features
~~~~~~~~~~~~~~~~~~~
High-level prediction runner
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/pr
:recursive:
:toctree: api/pr

pr.predict_intensities
pr.predict_rt
pr.ce_calibration
pr.Predictor

Koina interface
~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api/pr
:recursive:
:toctree: api/pr

pr.predict
pr.predict_at_once
pr.predict_in_chunks
pr.Koina

DLomix interface
~~~~~~~~~~~~~~~~

.. autosummary::
:recursive:
:toctree: api/pr

pr.DLomix
pr.create_dlomix_dataset
pr.refine_intensity_predictor

Rescoring: :code:`re`
---------------------
Expand Down
19 changes: 19 additions & 0 deletions docs/_static/custom_cookietemple.css
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,25 @@ table.align-default {
padding-left: 50px;
}

.lib-refinement-learning-config-table
tbody
tr:nth-child(n + 2):nth-child(-n + 5)
td:nth-child(1),
.lib-refinement-learning-config-table tbody tr:nth-child(8) td:nth-child(1) {
padding-left: 50px;
}

.lib-refinement-learning-config-table
tbody
tr:nth-child(n + 6):nth-child(-n + 7)
td:nth-child(1),
.lib-refinement-learning-config-table
tbody
tr:nth-child(n + 9):nth-child(-n + 10)
td:nth-child(1) {
padding-left: 100px;
}

.rescore-config-table tbody tr:last-child td:first-child {
padding-left: 50px;
}
Expand Down
4 changes: 4 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
"sphinx_autodoc_typehints",
"sphinx.ext.intersphinx",
"sphinx_click",
"sphinx.ext.autosectionlabel",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -250,3 +251,6 @@ def modurl(qualname):
# and there’s no way to insert filters into those templates
# so we have to modify the default filters
DEFAULT_FILTERS["modurl"] = modurl

# -- Options for autosectionlabel mappings -----------------------------
autosectionlabel_prefix_document = True
48 changes: 45 additions & 3 deletions docs/config.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Configuration
=============

The following provides an overview of all available flags in the configuration file to use the high level API and run jobs. Parameters may be applicable to more than one job type and are collected within indivdual tables.
The following provides an overview of all available flags in the configuration file to use the high-level API and run jobs. Parameters may be applicable to more than one job type and are collected within indivdual tables.

Always applicable
-----------------
Expand All @@ -18,7 +18,7 @@ Always applicable
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| models | Contains information about the used models for peptide property prediction (see following 2 nested parameters) |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| intensity | Name of the model used for fragment intensity prediction |
| intensity | Name or path of the model used for fragment intensity prediction |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| irt | Name of the model used for indexed retention time prediction |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Expand Down Expand Up @@ -145,4 +145,46 @@ Applicable to in-silico digestion
| specialAas | Special amino acids for decoy generation; default = "KR" |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| db | Defines whether the digestion should contain only targets, only decoys or both (concatenated); can be "target", "decoy" or "concat"; default = "concat" |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Applicable to local intensity prediction
----------------------------------------

.. table::
:class: fixed-table

+--------------------------+---------------------------------------------------+
| Parameter | Description |
+==========================+===================================================+
| dlomixInferenceBatchSize | Batch size to use for local inference with DLomix |
+--------------------------+---------------------------------------------------+

Applicable to transfer/refinement learning
------------------------------------------

.. table::
:class: fixed-table lib-refinement-learning-config-table

+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parameter | Description |
+====================================+====================================================================================================================================================================+
| refinementLearningOptions | Contains specific settings for local refinement learning of intensity predictor on provided spectra. If not present, no refinement learning will be performed. |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batchSize | Defines batch size to use for training; default = 1024 |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| includeOriginalSequences | Defines whether unmodified peptide sequences should be kept in processed DLomix dataset for downstream analysis; default = False |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| improveFurther | Defines whether to perform an additional third training phase during refinement learning to further improve the predictor; default = False. |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| wandbOptions | Contains specific settings for using WandB when doing refinement learning. If not present, WandB will not be used. |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| project | Project to save WandB run to; default = "DLomix_auto_RL_TL" |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| targets | Tags to use for WandB run; default = None |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| datasetFilteringOptions | Contains specific settings for filtering the refinement/transfer learning dataset. If not provided, will only remove decoys. |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| searchEngineScoreThreshold | Threshold for included peptides, everything below will be discarded. |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| numDuplicates | Number of (peptide, charge, collision energy) duplicates to include. |
+------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+
40 changes: 33 additions & 7 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The installer script automatically installs dependencies and creates a new conda
wget https://raw.githubusercontent.com/wilhelm-lab/oktoberfest/main/installer.sh -O install_oktoberfest.sh
bash install_oktoberfest.sh

The installer searches for existing anaconda / miniconda installation. If none was found, it will download and install miniconda.
The installer searches for an existing anaconda / miniconda installation. If none is found, it will download and install miniconda.

Docker Image
------------
Expand All @@ -44,7 +44,7 @@ This is a step-by-step guide for the manual installation of all mandatory and op
Install Python
~~~~~~~~~~~~~~

Oktoberfest requires python >=3.8 and <=3.11. Best practise is to use a clean conda environment (`Miniconda <https://docs.conda.io/en/latest/miniconda.html>`_).
Oktoberfest requires python >=3.9 and <=3.11. Best practice is to use a clean conda environment (`Miniconda <https://docs.conda.io/en/latest/miniconda.html>`_).
Follow the installation guide for your operating system, then create a new environment using

.. code-block:: bash
Expand All @@ -55,25 +55,47 @@ Follow the installation guide for your operating system, then create a new envir
Optional dependencies
~~~~~~~~~~~~~~~~~~~~~

There are three dependencies that are required for specific tasks:
There are multiple optional dependencies depending on job types. Detailed notes and installation instructions can be found below.

.. table::

+------------------------------------------------------------------+---------------------+---------------------------------------------------------+
| Job type | Dependency | Notes |
+==================================================================+=====================+=========================================================+
| Pre-processing | ThermoRawFileParser | not required if only mzML files are provided |
+ +---------------------+---------------------------------------------------------+
| | `mono` | required for ThermoRawFileParser to work on Linux/macOS |
+------------------------------------------------------------------+---------------------+---------------------------------------------------------+
| :ref:`Rescoring <jobs:a) without refinement>` | Percolator | |
+------------------------------------------------------------------+---------------------+---------------------------------------------------------+
| :ref:`Rescoring + refinement learning <jobs:b) with refinement>` | DLomix | |
+------------------------------------------------------------------+---------------------+---------------------------------------------------------+

**ThermoRawFileParser**
`ThermoRawFileParser v1.4.3 <https://github.com/compomics/ThermoRawFileParser/releases/tag/v1.4.3>`_:
For conversion of RAW to mzML format. Download and unpack the zip or tar.gz file. The default locations Oktoberfest expects the executable to be at "/opt/compomics/" (Linux/MacOS) or the folder from which you want to execute Oktoberfest (Windows).
You do not need this package if you only ever provide mzML file. However, it is recommended let Oktoberfest convert RAW files for you, to ensure the mzML files are formatted in the way Oktoberfest expects it.
For ThermoRawFileParser to work on Linux, you also need to ensure mono is installed using
For conversion of RAW to mzML format. Download and unpack the zip or tar.gz file. The default locations Oktoberfest expects the executable to be at `/opt/compomics/` (Linux/MacOS) or the folder from which you want to execute Oktoberfest (Windows).
You do not need this package if you only ever provide mzML files. However, it is recommended to let Oktoberfest convert RAW files for you, to ensure the mzML files are formatted in the way Oktoberfest expects it.

**`mono`**
For ThermoRawFileParser to work on Linux, you also need to ensure `mono` is installed using

.. code-block:: bash

sudo apt -y update && sudo apt -y install mono-devel # Debian / Ubuntu

For MacOS, follow the instructions provided by `Mono <https://www.mono-project.com/docs/getting-started/install/mac/>`_.


**Percolator**
`Percolator v3.06.1 <https://github.com/percolator/percolator/releases/tag/rel-3-06-01>`_:
This is the tool Mokapot is based on. As it has more options and is generally more stable wrt. to FDR cutoffs and deduplication, it is recommended to use this tool instead of Mokapot.
Installable packages are provided for Linux/MacOS/Windows.

**DLomix**
`DLomix <https://github.com/wilhelm-lab/dlomix>`_ is a Python framework for deep learning in proteomics. Oktoberfest uses DLomix to refinement-learn intensity predictors on input spectra. It is listed as an optional dependency and can be installed using

.. code-block:: bash

poetry install -E dlomix

Installing Oktoberfest
~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -85,5 +107,9 @@ Oktoberfest is listed on the Python Package Index (PyPI) and can be installed wi
conda activate oktoberfest
pip install oktoberfest jupyterlab

For local prediction & refinement learning, you have to install Oktoberfest with the `dlomix` extra:

.. code-block:: bash

conda activate oktoberfest
pip install oktoberfest[dlomix]
Loading
Loading