Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add busco-results input to dereplicate-mags #213

Merged
merged 17 commits into from
Dec 13, 2024

Conversation

VinzentRisch
Copy link
Contributor

@VinzentRisch VinzentRisch commented Oct 23, 2024

solves #161

  • Adds metadata and metadata_column parameters to dereplicate-mags.
  • If metadata and metadata_column parameters are provided the bin with the highest value in that column is chosen. If there is a tie then the longest bin is chosen.

@VinzentRisch VinzentRisch marked this pull request as ready for review October 24, 2024 08:58
@VinzentRisch VinzentRisch requested a review from misialq October 24, 2024 08:58
Copy link

codecov bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.64%. Comparing base (b6d068a) to head (a18c138).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #213      +/-   ##
==========================================
+ Coverage   95.60%   95.64%   +0.04%     
==========================================
  Files          34       34              
  Lines        1956     1975      +19     
  Branches      226      229       +3     
==========================================
+ Hits         1870     1889      +19     
  Misses         48       48              
  Partials       38       38              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @VinzentRisch, nice work, thanks! See some suggestions below :)

q2_moshpit/dereplication/derep.py Outdated Show resolved Hide resolved
q2_moshpit/dereplication/derep.py Outdated Show resolved Hide resolved
q2_moshpit/plugin_setup.py Outdated Show resolved Hide resolved
@VinzentRisch VinzentRisch requested a review from misialq November 4, 2024 15:45
Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @VinzentRisch, seems like there was a coverage failure and this time I think it's correct - could you please investigate and address accordingly? 🙏

@VinzentRisch
Copy link
Contributor Author

Hi Michal
Now the coverage should be sorted out.
I have a question. Now the user can choose any metadata column and how its set up now the highest value in that column is used for dereplication.
Can you see any case where a user would want the lowest value in a column for dereplication.
I could add another parameter that would specify if the user wants the highest or the lowest value in the column. Do you see an application for that?

@VinzentRisch VinzentRisch requested a review from misialq November 6, 2024 14:32
@misialq
Copy link
Contributor

misialq commented Nov 7, 2024

Hey @VinzentRisch, excellent question - thanks for bringing that up! I think there may be some cases like this, although not with BUSCO. For example, CheckM has a contamination score: if a user was to present a table with those results, they may want to pick MAGs with the lowest value of contamination. So your suggestion makes sense - it would be nice if you introduce one more param and maybe set it to 'max' by default. 🚀

@misialq misialq requested a review from Copilot December 12, 2024 14:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 suggestion.

Files not reviewed (1)
  • q2_moshpit/dereplication/tests/data/busco_results.tsv: Language not supported
Comments skipped due to low confidence (3)

q2_moshpit/dereplication/tests/test_dereplication.py:150

  • Add a test case to ensure that the dereplicate_mags function works correctly when metadata and metadata_column are provided.
obs_mags, obs_pa = dereplicate_mags(mags, self.dist_matrix, threshold=0.99)

q2_moshpit/dereplication/derep.py:300

  • The walrus operator ':=' is used, which is not available in Python versions prior to 3.8. Ensure compatibility with the required Python version or avoid using the walrus operator.
values := metadata_column[bins]

q2_moshpit/dereplication/derep.py:293

  • [nitpick] The error message could be more specific. Consider changing it to 'The specified metadata column must contain numerical values.'
raise ValueError('The specified metadata column has to be numerical.')

q2_moshpit/plugin_setup.py Outdated Show resolved Hide resolved
@misialq misialq changed the title ENH: Added busco-results input to dereplicate-mags ENH: Add busco-results input to dereplicate-mags Dec 13, 2024
Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @VinzentRisch!

@misialq misialq merged commit f09a381 into bokulich-lab:main Dec 13, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants