Replies: 17 comments
-
Hi Silask Some observation I made. I have 16 metagenomic samples from the water column. Overall, these had a "percent assembled reads" between 26% and 42% So, I thought that the low percent assembled could be due to a strict pre-filtering? With that said, I did try to decrease the "minimum_percent_coverage_bases" to 5. It did not change anything. At this point, I am stuck :) Cheers |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your comments. It seems that at least for some genomes you have 'too many reads' which then creates problems by the assembly (example) In this case maybe the normalization would improve the assembly. @ChristianFurbo Can I suggest to you to run bbnorm on all the reads (no splitting) with the parameters What you also could try is to use megahit with the preset |
Beta Was this translation helpful? Give feedback.
-
Interesting! Thank you very much for the discussion! I have 58 soil samples, and only yet tried coassembly of everything (using megahit with only the Definitely want to try normalization before assembly, thanks for the suggestion! At the moment I'm aiming to run atlas and use metaspades to assemble each sample separately. But I also don't want to lose the less dominant members of my microbial communities...so I would like to add bins generated from the coassembly or the ones generated when using the |
Beta Was this translation helpful? Give feedback.
-
Hi I tried your suggestion SilasK, and it helped. Using MetaSpades, using the default, I got ~40% assembled reads. Without normalization, I got roughly 21%. Nevertheless, it looks like normalization worked :) |
Beta Was this translation helpful? Give feedback.
-
Thank you for the update. Ideally one should normalize the reads after error correction but then use the unnormalized QC reads for the mapping + downstream. I think I should finalize #289 I meant to use normalization + spades (which is arguably still the better assembler) or megahit with min_count 6. |
Beta Was this translation helpful? Give feedback.
-
This might also be relevant here: bbcms.sh vs bbnorm
and:
They seem to use it at JGI: Loxahatchee wildlife refuge study & giant virus study But bbcms appears to also include error correction:
Will test bbcms without the error-correction by using |
Beta Was this translation helpful? Give feedback.
-
Hi I ran the assembly again using metaspades. I attach the contigs stats below.
What I observed was: Question: |
Beta Was this translation helpful? Give feedback.
-
The percent aligned reads is misleading if they are based on different number of reads in the input, isn't it? It seems that the unnormalized assembly produces the largest contigs followed by the normalized (Assembled_Reads,N50, number of bp, and genes). Normalization doesn't seem to improve the assembly and bbcms seems even to be worse. If I understand it correctly, bcms is an alternative to tadpole for the error correction. I don't see a reason to use bbcms instead of tadpole unless your dataset is too big. However, maybe I should adapt the filtering parameters to achieve the same. @ChristianFurbo Now the question is do you want to use normalization for your assembly? I don't know, could normalization to target=10 be worth trying? My idea in the atlas workflow is that we start with QC reads, then they are error corrected and merged before the assembly. But then we map the QC reads to the assembly. |
Beta Was this translation helpful? Give feedback.
-
Hi Yes, you are right. They would be misleading. Looking at my "read_stats_length", I can see that I have roughly 50% more reads in my non-normalized samples. I would assume losing the filtering parameters will help a little. However, on a previous run, I did try setting the "minimum_percent_covered_bases" to 5. It did not change a lot. I ran metaspades with target=10. It did not change so much (photo attached). The binning output, via atlas on default. Is there anything specific you want to see? Otherwise, I attached a photo with some number of bins, complete and contamination. However, based on the last figure I attached. It seems, just by observing, that the non-normalized bins have "higher" quality bins since 7 out of 8 bins are below 5% contamination, except one which is at 8%. I can also see that I get a difference in the taxonomy, e.g. 3 bins in non-normalized which could not be resolved, while only 1 bin in bbnorm were unresolved. Lastly, your question. |
Beta Was this translation helpful? Give feedback.
-
interesting! @ChristianFurbo did you run bbcms with or without error-correction (i.e. |
Beta Was this translation helpful? Give feedback.
-
Hi slambrechts I ran bbcms as default. So it must have been ecc=t. I ran it with both R1 and R2 using in=R1... in2=R2.. out=R1... out2=R2... However, I think I made a mistake. I ran the bbcms on my error_corrected_reads, as I used to bbnorm. Which I realize may be wrong? :) |
Beta Was this translation helpful? Give feedback.
-
Indeed, bbcms does error correction by default, so if you want to test the effect of only the bbcms depth filter you need to set I'm also not sure whether error correction should be done before or afer filtering. The bbcms description states:
So I was thinking to use bbcms with |
Beta Was this translation helpful? Give feedback.
-
I am running it agian now where I used bbcms on QC_reads. Also I am doing the error correction with atlas, as you suggest. A question - I was wondering, is there a limit to "how much" you can error correct? e.g. in this case, we are error-correcting with bbcms, followed by error-correcting with atlas and lastly after the filtering? So three error-correction? |
Beta Was this translation helpful? Give feedback.
-
if you run bbcms with In case of running bbcms in default mode on QC reads, before |
Beta Was this translation helpful? Give feedback.
-
Sorry, yes you are right :) ecc=f would leave one error-correction step in the assembly step of atlas. |
Beta Was this translation helpful? Give feedback.
-
Hi |
Beta Was this translation helpful? Give feedback.
-
Hello, @makrez You also have a difficult metagenome. For everybody here: Here is how to run atlas on the dev branch:
|
Beta Was this translation helpful? Give feedback.
-
I group here a discussion on how to assemble complex metagenomes, e.g. soil.
Spades is probably still the best assembler even for complex metagenomes (https://twitter.com/ryneches/status/1352732023262089216)
If it really the assembly doesn't give good results, maybe using bbnorm on the QC reads is a solution.
@botellaflotante @slambrechts did find a better solution to assemble complex metagenomes?
Beta Was this translation helpful? Give feedback.
All reactions