add support for FeatureTable
and FeatureData
generation from classify-kraken
results
#36
Replies: 14 comments 3 replies
-
Hey @gregcaporaso, |
Beta Was this translation helpful? Give feedback.
-
@misialq, I wasn't certain about that but that's what makes sense to me too. Do you have thoughts on the pathway from Kraken results on MAGs to a feature table? We do have |
Beta Was this translation helpful? Give feedback.
-
Hey this is a complex issue, with a few other issues linked. I opened the following issue after looking into this a bit, and discussing with @misialq that Regarding feature tables:
Regarding
|
Beta Was this translation helpful? Give feedback.
-
Thanks @nbokulich!
That seems like the way to go. Just FYI, @colinvwood is doing some exploration of this on our end as we need to use this for some experimental analyses that we're running.
We were thinking that it may be possible to generate a
It may still be useful for practical purposes to generate |
Beta Was this translation helpful? Give feedback.
-
Hey @gregcaporaso ,
If the MAGs are not yet dereplicated then this would make a massively sparse table with MAGs unique to samples. Without abundance information I am not sure what value this would have. But I propose that we discuss MAGs->table on the other issue to keep it separate from taxonomic classification.
For read-based profiling the table we will get (via bracken) will already be collapsed by taxonomy. The only other way would be a massive sparse presence-absence table with each feature observed once, which seems like an inefficient solution. For the purposes of taxa barplots, I think that we should instead alter the action to make |
Beta Was this translation helpful? Give feedback.
-
regardless of the decision we make here wrt where/how MAG abundances are tabulated, I put in this PR to make it possible for collapsed feature tables (e.g., from bracken) to be passed to |
Beta Was this translation helpful? Give feedback.
-
👍
Yep, I agree that that wouldn't make sense or be useful. I was thinking that the feature ids in the table would be taxa, but that we might want to have a way to create a corresponding Here's an example of what the
|
Beta Was this translation helpful? Give feedback.
-
yeah this is how my PR in q2-taxa handles taxonomies/labels pulled from tables without accompanying
Can you think of some use cases? Aside from |
Beta Was this translation helpful? Give feedback.
-
@nbokulich, I'll follow up with you on this within about the next week. We're working through some analyses, so I may either have specific examples or ideas of when it's needed, or not (in which case that will suggest we don't need this after all). Thanks for the q2-taxa PR! |
Beta Was this translation helpful? Give feedback.
-
@colinvwood and I are going to write a prototype of a function that reads braken reports and generates a |
Beta Was this translation helpful? Give feedback.
-
Hey @gregcaporaso, thanks, that's great! I haven't gotten to that part quite yet, so it would be wonderful if you could contribute that to the plugin. I'm wrapping up rewriting the database building action to add the Bracken DB (PR coming very soon) and will then move to expanding the reads-based classification by the Bracken step - I will then ping you guys on the respective PR. Thanks! |
Beta Was this translation helpful? Give feedback.
-
That all sounds great @misialq. We are currently running braken outside of QIIME 2 for our analysis, so we should have some good context for reviewing PRs. Turns out generating feature tables from the Braken output is very straightforward - we have that working now (again, outside of QIIME 2 for now) but we can contribute that whenever it makes sense. |
Beta Was this translation helpful? Give feedback.
-
Great! I'll get back to you on that most likely some time next week. |
Beta Was this translation helpful? Give feedback.
-
Hey @gregcaporaso, after some more discussions with @nbokulich, we came up with this little diagram showing the proposed way forward for taxonomic classification in the shotgun workflows (see the diagram below). It includes the additional steps of mapping reads to dereplicated MAGs (if MAGs were used as classification target) or reference sequences (if reads were used as input for a classifier different than Kraken2; I'm currently working on Kaiju), followed by RPKM estimation to get an abundance table normalized by genome length of every taxon. This means we will require some small modifications of the current state + some new actions:
How does that sound? For visibility, I'll be creating more issues to track those steps. For now, I'll start by adjusting those Kraken2 action(s) for which we have PRs in the works. |
Beta Was this translation helpful? Give feedback.
-
Ultimately we'll want
FeatureTable
andFeatureData[Taxonomy]
results to use these data in downstream applications. One option would be to use Bracken to go fromSampleData[Kraken2Output]
and/orSampleData[Kraken2Report]
to aFeatureTable
andFeatureData[Taxonomy]
. Should we start thinking about that, or is there another approach that is planned?Beta Was this translation helpful? Give feedback.
All reactions