Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce computational time of AWS megatest #158

Merged
merged 42 commits into from
Nov 24, 2024

Conversation

LouisLeNezet
Copy link
Collaborator

@LouisLeNezet LouisLeNezet commented Nov 10, 2024

  • This PR change the features tested in the fulltest config on AWS to reduce computational cost.
  • The snapshot of VCF_PHASE_SHAPEIT5 is updated as it was not the case. (my bad in the previous PR).

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/phaseimpute branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@LouisLeNezet LouisLeNezet self-assigned this Nov 10, 2024
@LouisLeNezet LouisLeNezet added bug Something isn't working enhancement New feature or request labels Nov 10, 2024
@LouisLeNezet LouisLeNezet added this to the v0.99.0 milestone Nov 10, 2024
@LouisLeNezet
Copy link
Collaborator Author

Whenever we have merged the chunk_model update I will launch the updated fulltest on my cluster.

@LouisLeNezet LouisLeNezet marked this pull request as draft November 11, 2024 20:41
@LouisLeNezet LouisLeNezet changed the title Reduce computational time of AWS megatest Reduce computational time of AWS megatest and Fix VCF_SPLIT_BCFTOOLS Nov 12, 2024
@LouisLeNezet LouisLeNezet marked this pull request as ready for review November 13, 2024 16:24
Copy link
Collaborator

@atrigila atrigila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always impressed by the amount of work you put on these contributions - much appreciated! :)

I noticed several distinct topics being addressed here. To make reviews faster and keep the changelog and file history cleaner, it would help if each of these topics were split into separate PRs. This also makes it easier to track changes and ensure each update is well-documented. Here are the main topics I see in this PR and that I tried to provide feedback on:

  1. Reducing computational cost in fulltest configuration on AWS: This is the primary focus, and it's a great enhancement. You said previously that you have evidence that these new changes make the pipeline full test work, so I believe this is ready.
  2. VCF_SPLIT_BCFTOOLS edge case fix: Since this is a specific edge case fix, a separate PR would be ideal. In this particular case I see a great benefit of having this particular subworkflow in nf-core/modules, as it allows the community to review and provide feedback on these changes, speeding up the review process and then only making this a simple update in the pipeline :)
  3. Snapshot file updates and output file prefix changes: I understand that there have been changes in how files are named and therefore the snapshots, but it would be nice if this could have been something separate to provide context on why these prefixes were adjusted.
  4. "Lenient mode" addition: This appears to be a new feature but I couldn't find any reference in your PR description talking about this, could you explain its purpose and the scenarios where it would be useful?
  5. Minor corrections in the changelog and punctuation fixes: These small adjustments (like punctuation or formatting) would be best suited for a separate "cleanup" PR but they are OK.

Comment on lines +24 to +26
extra_fn_clean_exts:
- type: regex
pattern: "\\.batch[0-9]+"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we now remove the batch number from the statistics, aren't all batches going to have the same name? How we will differentiate between batches? I haven't been able to process a test with >1 batch to see how the report will look like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is batch file processed in the statistic. The aim to remove this pattern was to clean up a bit the name of the sample in the multiqc output html.
What could be done would be to allow the grouping of the samples by batch in the multiqc output (enable by the last multiqc version). But I think this should be done outside this PR.

conf/modules.config Show resolved Hide resolved
conf/steps/imputation_stitch.config Show resolved Hide resolved
conf/steps/panel_prep.config Show resolved Hide resolved
conf/steps/validation.config Outdated Show resolved Hide resolved
conf/test.config Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
nextflow.config Outdated Show resolved Hide resolved
workflows/phaseimpute/tests/main.nf.test.snap Outdated Show resolved Hide resolved
workflows/phaseimpute/tests/main.nf.test.snap Outdated Show resolved Hide resolved
@LouisLeNezet
Copy link
Collaborator Author

I'm sorry that this PR have too much files changed.
It's just that when I do some debugging there is some small changed that needs to be adressed and they pile up quite quickly.
I'll try to split it.

Copy link
Collaborator Author

@LouisLeNezet LouisLeNezet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for changes

@LouisLeNezet LouisLeNezet changed the title Reduce computational time of AWS megatest and Fix VCF_SPLIT_BCFTOOLS Reduce computational time of AWS megatest Nov 17, 2024
@LouisLeNezet
Copy link
Collaborator Author

Hi @atrigila this PR is now ready !

@LouisLeNezet LouisLeNezet linked an issue Nov 24, 2024 that may be closed by this pull request
cpus: 2,
memory: '10.GB',
cpus: 4,
memory: '4.GB',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can handle up to 15GB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limit was also set for my WSL virtual machine that only go up to 7Gb.

@LouisLeNezet LouisLeNezet merged commit 69badc3 into nf-core:dev Nov 24, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix -resume with fai usage
2 participants