Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CloudOS and NextFlow to integrate TCGA and GTEx data #14

Open
ianfore opened this issue Jan 18, 2021 · 3 comments
Open

Use CloudOS and NextFlow to integrate TCGA and GTEx data #14

ianfore opened this issue Jan 18, 2021 · 3 comments
Labels
FASPHackathon2 January 2021 FASP Hackathon

Comments

@ianfore
Copy link
Collaborator

ianfore commented Jan 18, 2021

See GTEX_TCGA_Federated_Analysis notebook for an iPython workflow.

The overall flow of the script/notebook is probably more illustrative that directly usable in NextFlow. A good question is how to do the equivalent work from CloudOS and NextFlow where that is the platform of preference e.g. as it is for the JAX team. If it is of interest to you to explore that we could work together to get started.

I have no experience of writing NextFlow, I’ll leave that to you. Nevertheless, I was able to get a sense from the nf script that Sangram had shared previously of how you structure things.
https://github.com/lifebit-ai/sra-dbgap-datafetch/blob/main/main.nf

One of the questions is which capabilities it makes sense to do from with NextFlow, and where those capabilities should be in some library which NextFlow calls. Either way, I think we could do some useful experimentation. Some of the fasp package may be useful, if not I could modify it to help.

See #15 for a possible additional step.

@ianfore ianfore added the FASPHackathon2 January 2021 FASP Hackathon label Jan 18, 2021
@sk-sahu
Copy link

sk-sahu commented Jan 19, 2021

Hi @ianfore

For JAX research team, getting the GTEx data from AnVIL-GEN3 we made a work around after getting the signed URLs from modified get_drs_url.py (from fasp-scripts) in the Nextflow.

An extrapolated example Nextflow pipeline can be found here - https://github.com/lifebit-ai/drs-nf

Although it works, but in teams of code design which transit from fasp-scripts to Nextflow is not perfect, we can take this as an action point.

@ianfore
Copy link
Collaborator Author

ianfore commented Jan 19, 2021

Thanks @sk-sahu for the links. I'd highlight what seem to me some key issues.

  • Can the same DRS methods be used to access both the TCGA and GTEx files?
  • What parallel/alternative role would the SRA DRS service play in that?
  • What role is it desirable/possible that sratools (prefetch, fasterq-dump) play?
    -- The existing community use of that toolset is a significant factor to conside
    -- There's an existing code-base that goes with that

To my mind those would fit your wish towards better code design. You may have others.

What I would advocate more strongly is that, rather in theory, we all explore these issues in working code examples, hackathon style!

@ianfore
Copy link
Collaborator Author

ianfore commented Jan 22, 2021

Possible to dos:

  • Review Jupyter notebook to understand flow
  • Understand approaches to doing the same thing in NextFlow
  • Add query source for GTEx (Anvil) data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FASPHackathon2 January 2021 FASP Hackathon
Projects
None yet
Development

No branches or pull requests

2 participants