Replies: 8 comments
-
Hey @misialq, I think this makes a lot of sense (I like the diagram), and I can see how kraken really wouldn't treat the reads any differently from the contigs here, so of course we would end up with the contig ids rather than MAG ids 🤦♂️ I'm definitely in support of a contig-map kind of artifact as well, and think people are likely to ask for it anyhow as binning is probably going to feel mysterious to many. Regarding the unique IDs, does the bin de-replicator solve this at all? |
Beta Was this translation helpful? Give feedback.
-
good call. We discussed using q2-sourmash for dereplication, in which case we are already making a sketch of the full sequence. The mash sketches are too long to use as an ID (and anyway I think sourmash sketches are not a fixed length). Maybe for the unique derep IDs we could take an md5 hash of the mash sketch? Is that a totally stupid idea? for undereplicated MAGs I like the sound of |
Beta Was this translation helpful? Give feedback.
-
This all makes perfect sense in hindsight! And I like your plan. Just some minor questions/comments:
For the un-dereplicated MAGs, my vote would be for regular V4 uuids since our users are already used to looking at those and they're pretty easily recognized. For dereplicated MAGs, my vote would be to use ids generated by the dereplicator, if it does that and approach for generating the ids is reasonable, and V4 uuids if not. I notice in the
Agreed. Thanks for all the work on this everyone! |
Beta Was this translation helpful? Give feedback.
-
Thanks for inputs, everyone! On
On
On assignment consistency (point 4):
On MAG IDs - the way I imagined this was the following:
Does that makes sense? |
Beta Was this translation helpful? Give feedback.
-
I think the former, but I could probably be convinced otherwise. I think it comes down to whether there are things we'd do with one subtype that would be fundamentally flawed operations with a different subtype. That seems like it is probably the case for
I also think this makes sense. It's actually the
Makes sense - the skeptic in me feels like we're going to have a reasonable amount of inconsistency, but we'll see how it goes and that could ultimately be very helpful information for assessing confidence in MAG taxonomy assignment.
As long as we're just selecting a representative MAG, that sounds perfect. If the MAG bins are modified in anyway in this process, I think assigning a new V4 UUID (and outputting Thanks @misialq! This all feels very intuitive to me! |
Beta Was this translation helpful? Give feedback.
-
Agreed on all accounts - I'll create some issues next week and start working on those then :) |
Beta Was this translation helpful? Give feedback.
-
Hey @bokulich-lab/moshpit-team, little update: I've got the first two PRs ready:
Anyone wants to have a look? |
Beta Was this translation helpful? Give feedback.
-
hey @misialq, sorry for the slow reply! I should be able to take a look at both tomorrow. |
Beta Was this translation helpful? Give feedback.
-
Flagstaff, we have a problem. @bokulich-lab/moshpit-team
When working on the tests for the "final" Kraken2 PR (#38) I had an awful realization. We discussed how the
kraken2-to-mag-features
would generate aFeatureData[Taxonomy]
artifact mapping MAG ids to taxa - unfortunately, that's not what is happening. As a single MAG is a FASTA file containing a collection of contigs, Kraken2 classifies each contig and not the entire MAG. In effect, the hits table (aka Kraken2 output) contains contig IDs and not MAG IDs. In a short discussion with @nbokulich today we were trying to come up with a solution to this issue and what we would like to propose is the following:ContigMap
which maps contigs to respective MAGs (or vice-versa)classify-kraken2
action to acceptFeatureData[MAG]
rather thanSampleData[MAGs]
so that we classify the dereplicated MAGs (I think we discussed doing this anyways)classify-kraken2
action then outputsFeatureData[KrakenReport]
andFeatureData[KrakenOutput]
for thosekraken2-to-mag-features
, together with theContigMap
from the binning step - this does the selection as it was written in the original PR but we add a small step at the end to perform an LCA selection to assign a single taxon to a single MAG - we can then output the same artifacts as originally designed and desiredSome thoughts:
kraken2-to-mag-features
action until everything else is readyWhat do y'all think, does this make sense? I'm including a quick sketch of what this could look like after those updates:
If everyone is ok with that, I would create some issues for all those things and work on those. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions