-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/private fire support #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
@mberacochea one general thought on integration tests for S3/FIRE dependent pipelines... could we use a minio service on GitHub actions / local docker compose setups? E.g. following https://rohanverma.net/blog/2021/02/09/minio-github-actions/ |
I really like that idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks Martin :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant, thank you @mberacochea . A few comments/probable typos mentioned inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff @mberacochea . In future I think we could put this in pipelines-toolkit repo, or even a little pypi package of its own? As we have something very similar (the FTP URL to FIRE URL converter) in https://github.com/EBI-Metagenomics/emgapi-v2/blob/main/workflows/ena_utils/ena_file_fetching.py and I imagine in some other pipelines too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is taken from the fetch tool.. but I agree, even the fetch tool should source all the ENA dancing from one lib
nf-core linting + fixes all over the place to follow sensible nf-core linting rules Updates multiqc and blastn Enabled a nf-core linting github action Tweaked the tests slightly (I think they are still failling .. testing and fixing ATM) Upgraded to nf-schema 2.0.2 - also pinned this dependency Remove check_max and moved to resources limits (require nextflow 24.0.0 as min - which I've set on the config) Created some missing .diff for some modules
This PR adds support for downloading private short reads from FIRE using a custom script. I didn't use the fetch tool because I'm hoping the Nextflow team will upgrade/fix the nf-aws plugin (the one that adds support for S3-compatible storage).
When using the private_study flag, the code retrieves the FASTQ FTP paths and downloads the files using a boto3-based script. It then feeds those files to the other processes; the only part that was functionally modified is the beginning of the short reads assembler.
I added one relatively simple unit test, as we can't currently run EBI network-dependent tests (such as those that require FIRE). Additionally, nf-test doesn't support Nextflow secrets, which are used to obtain the FIRE credentials.
There are many changes related to formatting and linting issues that I've fixed (apologies for that, as it makes the PR messier... I just can't control myself! :).
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).README.md
is updated (including new tool citations and authors/contributors).