Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allow SampleData[MAGs] as input to predict-genes-prodigal #154

Merged
merged 10 commits into from
May 6, 2024

Conversation

Sann5
Copy link
Contributor

@Sann5 Sann5 commented Apr 12, 2024

What's new

Blocked by...

Preview Give feedback

Run it locally

  1. Check out the PR branch.

Assuming 1) you already have a local copy of q2-moshpit that is installed in editable mode in your activated virtual environment and 2) the working directory is q2-moshpit; run the following.

PR=154
git fetch origin pull/${PR}/head:pr-${PR}
git checkout pr-${PR}
  1. Let's get you some data to play with:
cd my_dir
wget https://polybox.ethz.ch/index.php/s/slGyW1uQbyxmyb9/download -O sample_data_mags.qza
  1. Test it out!
qiime moshpit predict-genes-prodigal --i-mags sample_data_mags.qza --o-genes genes.qza --o-loci loci.qza --o-proteins proteins.qza --verbose

@Sann5 Sann5 requested review from misialq and VinzentRisch April 12, 2024 15:50
@Sann5 Sann5 self-assigned this Apr 12, 2024
Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.74%. Comparing base (897d7e0) to head (12d4778).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #154      +/-   ##
==========================================
+ Coverage   97.72%   97.74%   +0.01%     
==========================================
  Files          64       64              
  Lines        3390     3415      +25     
==========================================
+ Hits         3313     3338      +25     
  Misses         77       77              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Sann5, see some refactor suggestions below.

q2_moshpit/prodigal/prodigal.py Outdated Show resolved Hide resolved
q2_moshpit/prodigal/prodigal.py Outdated Show resolved Hide resolved
q2_moshpit/prodigal/prodigal.py Outdated Show resolved Hide resolved
q2_moshpit/prodigal/prodigal.py Outdated Show resolved Hide resolved
q2_moshpit/prodigal/prodigal.py Outdated Show resolved Hide resolved
@Sann5 Sann5 requested a review from misialq April 26, 2024 12:17
@Sann5
Copy link
Contributor Author

Sann5 commented Apr 26, 2024

Hey @misialq 🙌🏼 . I did quite a few changes to the code with your suggestions in mind. Please take a look and tell me what you think. While I have you here I want to clarify something we previously discussed.

This action outputs 3 artifacts whose semantic types are GenomeData[Loci], GenomeData[Genes], and GenomeData[Proteins]. The formats associated with these semantic types have the following structure:

class <name>DirectoryFormat(model.DirectoryFormat):
    <name> = model.FileCollection(r'(.*\_)?<name>[0-9]*\.<extension>$',
                                format=<format>)
  • where <name> is either loci, genes or proteins
  • <format> is the format of the files like FASTA of gff and <extension> is the associated file extension

We discussed changing the regex to something like

(.*\/)?(.*)\.<extension>$

bascially matching any random file name with an optional parent directory.

Now... for the question... Right now the files that are created in the output artifacts all have this <name> thing in the middle. Should I already remove it (this <name> string)? That would mean that I have to open a PR on q2-types making these changes to the formats and then this PR would depend on that one.

@misialq
Copy link
Contributor

misialq commented Apr 26, 2024

Hey @Sann5, yeah, that would be great! As discussed, there is really no need to have that in the file name - the regex should pretty much be based on the extension itself as the specific formats are anyways linked to their respective semantic types so there is no need to indicate that in file names (it only makes it too restrictive).

@Sann5
Copy link
Contributor Author

Sann5 commented Apr 30, 2024

Hey @misialq. I did as we discussed :). I also made the PR as dependent.

@github-actions github-actions bot removed the dependent label May 3, 2024
Copy link

github-actions bot commented May 3, 2024

Copy link
Contributor

@misialq misialq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Sann5, looks good to me! I tried to run through both, sample data and feature data annotation and all worked as expected. Feel free to merge after updating the branch and resolving the conflicts ✅

@misialq misialq merged commit 2b3b03d into bokulich-lab:main May 6, 2024
10 checks passed
@Sann5 Sann5 deleted the feature_data_prodigal_iss_101 branch May 6, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: allow gene prediction from SampleData[MAGs]
2 participants