Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/private fire support #23

Merged
merged 11 commits into from
Nov 21, 2024
Merged

Feature/private fire support #23

merged 11 commits into from
Nov 21, 2024

Conversation

mberacochea
Copy link
Member

@mberacochea mberacochea commented Nov 21, 2024

This PR adds support for downloading private short reads from FIRE using a custom script. I didn't use the fetch tool because I'm hoping the Nextflow team will upgrade/fix the nf-aws plugin (the one that adds support for S3-compatible storage).

When using the private_study flag, the code retrieves the FASTQ FTP paths and downloads the files using a boto3-based script. It then feeds those files to the other processes; the only part that was functionally modified is the beginning of the short reads assembler.

I added one relatively simple unit test, as we can't currently run EBI network-dependent tests (such as those that require FIRE). Additionally, nf-test doesn't support Nextflow secrets, which are used to obtain the FIRE credentials.

There are many changes related to formatting and linting issues that I've fixed (apologies for that, as it makes the PR messier... I just can't control myself! :).

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests! (sort of)
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • README.md is updated (including new tool citations and authors/contributors).

Copy link
Contributor

@KateSakharova KateSakharova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

modules/local/download_from_fire.nf Show resolved Hide resolved
tests/main.nf.test Outdated Show resolved Hide resolved
@SandyRogers
Copy link
Member

@mberacochea one general thought on integration tests for S3/FIRE dependent pipelines... could we use a minio service on GitHub actions / local docker compose setups? E.g. following https://rohanverma.net/blog/2021/02/09/minio-github-actions/

@mberacochea
Copy link
Member Author

@mberacochea one general thought on integration tests for S3/FIRE dependent pipelines... could we use a minio service on GitHub actions / local docker compose setups? E.g. following https://rohanverma.net/blog/2021/02/09/minio-github-actions/

I really like that idea.

tests/main.nf.test Outdated Show resolved Hide resolved
Copy link
Member

@Ge94 Ge94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks Martin :)

Copy link
Member

@SandyRogers SandyRogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, thank you @mberacochea . A few comments/probable typos mentioned inline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff @mberacochea . In future I think we could put this in pipelines-toolkit repo, or even a little pypi package of its own? As we have something very similar (the FTP URL to FIRE URL converter) in https://github.com/EBI-Metagenomics/emgapi-v2/blob/main/workflows/ena_utils/ena_file_fetching.py and I imagine in some other pipelines too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is taken from the fetch tool.. but I agree, even the fetch tool should source all the ENA dancing from one lib

bin/s3fire_downloader.py Outdated Show resolved Hide resolved
bin/s3fire_downloader.py Outdated Show resolved Hide resolved
bin/s3fire_downloader.py Outdated Show resolved Hide resolved
modules/local/download_from_fire.nf Outdated Show resolved Hide resolved
modules/local/download_from_fire.nf Show resolved Hide resolved
modules/nf-core/fastqc/meta.yml Show resolved Hide resolved
modules/nf-core/multiqc/meta.yml Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nf-core linting + fixes all over the place to follow sensible nf-core linting rules
Updates multiqc and blastn
Enabled a nf-core linting github action
Tweaked the tests slightly (I think they are still failling .. testing and fixing ATM)
Upgraded to nf-schema 2.0.2 - also pinned this dependency
Remove check_max and moved to resources limits (require nextflow 24.0.0 as min - which I've set on the config)
Created some missing .diff for some modules
@mberacochea mberacochea merged commit 5e53151 into dev Nov 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants