Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File exists error mzml input #124

Closed
patrick-willems opened this issue Sep 13, 2023 · 4 comments · Fixed by #108
Closed

File exists error mzml input #124

patrick-willems opened this issue Sep 13, 2023 · 4 comments · Fixed by #108
Assignees
Labels
bug Something isn't working

Comments

@patrick-willems
Copy link

Hey,

First of all thanks for the great tool. I was trying to run it on timsTOF data with MSFragger search results. Given that no RAW input is logically available I went with a generated mzML file, however I keep getting a File exists error, please see the log:

2023-09-14 00:21:48,961 - INFO - oktoberfest::main Issued command: run_oktoberfest.py --config_path test.json
2023-09-14 00:21:48,961 - INFO - oktoberfest.utils.config::read Reading configuration from test.json
2023-09-14 00:21:48,967 - INFO - oktoberfest.runner::run_rescoring Starting rescoring run...
2023-09-14 00:21:48,968 - INFO - oktoberfest.utils.config::read Reading configuration from test.json
2023-09-14 00:21:48,993 - INFO - oktoberfest.ce_calibration::_load_search search_type is msfragger
2023-09-14 00:21:48,993 - INFO - oktoberfest.ce_calibration::_gen_internal_search_result_from_msms Converting msms data at T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409.pepXML to internal search result.
100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:38<00:00, 38.85s/it]
2023-09-14 00:22:28,005 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences before filtering for valid prosit sequences: 47010
2023-09-14 00:22:28,040 - INFO - spectrum_io.search_result.search_results::filter_valid_prosit_sequences #sequences after filtering for valid prosit sequences: 41835
2023-09-14 00:22:28,692 - INFO - oktoberfest.re_score::split_msms Read 41835 PSMs from out/msms/msms.prosit
2023-09-14 00:22:28,714 - INFO - oktoberfest.re_score::split_msms Creating split search results file out/msms/T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409.rescore
Traceback (most recent call last):
File "/home/pawil/.local/bin/oktoberfest", line 8, in
sys.exit(main())
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/run_oktoberfest.py", line 30, in main
runner.run_job(args.config_path)
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/runner.py", line 211, in run_job
run_rescoring(msms_path, search_dir, config_path, output_path)
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/runner.py", line 171, in run_rescoring
re_score.calculate_features()
File "/home/pawil/.local/lib/python3.10/site-packages/oktoberfest/re_score.py", line 168, in calculate_features
mzml_path.mkdir(exist_ok=True)
File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
self._accessor.mkdir(self, mode)
FileExistsError: [Errno 17] File exists: 'T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409_uncalibrated.mzML'

Now I tried this on other data as well with the same result, unsure whether its really a big or a mistake from my side.

Thanks in advance!

@patrick-willems patrick-willems added the bug Something isn't working label Sep 13, 2023
@picciama
Copy link
Contributor

picciama commented Sep 15, 2023

Hello @patrick-willems , timsTOF support is not yet added but we are actively working on getting this integrated (#115). Concerning your issue:
I suspect that you provided the path to the mzML directory including the file itself in the "spectra" option, which would explain why you get the file exist error when Oktoberfest is trying to make the directory for the mzML files. Apparently, this is not checked properly.
For now, you could try to provide the folder without the file itself and see if that works. You just need to make sure that "T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409_uncalibrated.mzML" is the only mzML file there since Oktoberfest scans for all mzML files in the provided spectra directory.
Should this also fail, you could copy your file directly into the provided "output"/ mzML/ and Oktoberfest should detect that an mzML file is already in the output folder so it should skip the conversion.
As a last resort, you could tell Oktoberfest that "spectra_type" is "raw" and provide a dummy file with the name "T03062_EvoAurEl3_20SPDDDAPASEF-IMP-CMB-1356_1_S1-A1_1_3409_uncalibrated.raw" in the directory specified with "spectra". As long as the mzml file is already in "out"/ mzML/, Oktoberfest will only see that an mzml file with the same name as the provided dummy raw file exists and think that it has already converted it to mzml, so it will skip conversion and carry on.

In the new API, which I am hopeful to release next week, this is better handled.

Please keep in mind though, that timsTOF is not yet tested properly and that your mzML might actually not be supported.

@picciama picciama self-assigned this Sep 15, 2023
@patrick-willems
Copy link
Author

patrick-willems commented Sep 18, 2023

Thanks for the reply,

Indeed specifying the directory resolved the first hurdle, though it does not want to rescore due to :

"AssertionError: The mass analyzer with accession MS:1000031 is not supported."

Looking forward to the timsTOF implementation, eager to rescore some results and test the performance.

Best
Patrick

@tobiasko
Copy link

Just wanted to mention that I submitted a related issue #129 . I was trying to CE calibrate FragPipe results based on mzML formatted data raw files and got

 File "/home/tobiasko/oktoberfest-env/lib/python3.9/site-packages/spectrum_io/raw/msraw.py", line 34, in check_analyzer
    raise AssertionError(f"The mass analyzer with accession {accession} is not supported.")
AssertionError: The mass analyzer with accession MS:1000081 is not supported.

@picciama picciama linked a pull request Oct 3, 2023 that will close this issue
4 tasks
@picciama picciama removed a link to a pull request Oct 3, 2023
4 tasks
@picciama
Copy link
Contributor

picciama commented Oct 3, 2023

"AssertionError: The mass analyzer with accession MS:1000031 is not supported."

I published a hotfix release for spectrum-io (v0.3.3) because it was only there to check if we have default values for the mass tolerance and unit. As long as you supply these yourself, it should be fine. If you install the newest release of oktoberfest (v0.5.0), this error should be gone. The release will be published tonight and the issue will be closed accordingly. Please reopen should you still encounter the problem.

@picciama picciama linked a pull request Oct 3, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants