-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce computational time of AWS megatest #158
Conversation
Whenever we have merged the chunk_model update I will launch the updated fulltest on my cluster. |
d105ead
to
b5e7d91
Compare
VCF_SPLIT_BCFTOOLS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always impressed by the amount of work you put on these contributions - much appreciated! :)
I noticed several distinct topics being addressed here. To make reviews faster and keep the changelog and file history cleaner, it would help if each of these topics were split into separate PRs. This also makes it easier to track changes and ensure each update is well-documented. Here are the main topics I see in this PR and that I tried to provide feedback on:
- Reducing computational cost in fulltest configuration on AWS: This is the primary focus, and it's a great enhancement. You said previously that you have evidence that these new changes make the pipeline full test work, so I believe this is ready.
VCF_SPLIT_BCFTOOLS
edge case fix: Since this is a specific edge case fix, a separate PR would be ideal. In this particular case I see a great benefit of having this particular subworkflow in nf-core/modules, as it allows the community to review and provide feedback on these changes, speeding up the review process and then only making this a simple update in the pipeline :)- Snapshot file updates and output file prefix changes: I understand that there have been changes in how files are named and therefore the snapshots, but it would be nice if this could have been something separate to provide context on why these prefixes were adjusted.
- "Lenient mode" addition: This appears to be a new feature but I couldn't find any reference in your PR description talking about this, could you explain its purpose and the scenarios where it would be useful?
- Minor corrections in the changelog and punctuation fixes: These small adjustments (like punctuation or formatting) would be best suited for a separate "cleanup" PR but they are OK.
extra_fn_clean_exts: | ||
- type: regex | ||
pattern: "\\.batch[0-9]+" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we now remove the batch number from the statistics, aren't all batches going to have the same name? How we will differentiate between batches? I haven't been able to process a test with >1 batch to see how the report will look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is batch file processed in the statistic. The aim to remove this pattern was to clean up a bit the name of the sample in the multiqc output html.
What could be done would be to allow the grouping of the samples by batch in the multiqc output (enable by the last multiqc version). But I think this should be done outside this PR.
I'm sorry that this PR have too much files changed. |
Co-authored-by: Anabella Trigila <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready for changes
VCF_SPLIT_BCFTOOLS
Hi @atrigila this PR is now ready ! |
cpus: 2, | ||
memory: '10.GB', | ||
cpus: 4, | ||
memory: '4.GB', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can handle up to 15GB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This limit was also set for my WSL virtual machine that only go up to 7Gb.
VCF_PHASE_SHAPEIT5
is updated as it was not the case. (my bad in the previous PR).PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).