Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for Assembling an Extremely Large Dataset #1373

Open
howla1ke opened this issue Sep 13, 2024 · 1 comment
Open

Ideas for Assembling an Extremely Large Dataset #1373

howla1ke opened this issue Sep 13, 2024 · 1 comment

Comments

@howla1ke
Copy link

Hello, I have NovaSeq 150 bp PE data, that was run on 2 separate runs to obtain the quantity of data we needed. I want to co-assemble both of these, but my dilemma is that I can only allocate 996 GB of RAM. My job was killed because it ran out of memory and it was noted it the spades log that I need approximately 1118 GB of RAM to assemble. Would it be advised to perform the error correction only step separately on each run and then try to co-assemble the output of both of those on assembler only? Is that possible? Do you have any ideas beyond normalizing the data? Thank you, for your time.

@yqy6611
Copy link

yqy6611 commented Sep 24, 2024

One approach is using longer k-mer. I found that the default k-mer set for metagenomics is not enough. Longer k-mer will reduce RAM consumption during tandem-repeat resolution. If more SSD is available, personally I use 21,33,55,77 or 21,33,55,77,99,127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants