Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script that runs parts of a single workflow with different files on different WES implementations #11

Open
3 of 5 tasks
mbarkley opened this issue Jan 15, 2021 · 7 comments
Assignees
Labels
FASPHackathon2 January 2021 FASP Hackathon

Comments

@mbarkley
Copy link
Collaborator

mbarkley commented Jan 15, 2021

Motivation

A long-standing goal of the FASP GA4GH group has been to have a federated workflow demo that anyone can run. In this context, a federated workflow means:

  1. Multiple compute environments are used for a single computational analysis
  2. Different data is analysed in each compute environment
  3. A single script drives the analysis using GA4GH standards, where possible

Goals

  1. Automate as much of this task as possible in scripts in this repo
  2. Use WES API to control and monitor workflows
  3. Use public or synthetic data to make this more accessible for other people to try

Todo

  • Identify some data (preferably with mirrors in different regions)
  • Fix script using DNAstack WES
  • Write script using ELIXIR WES
  • Write script using Seven Bridges WES
  • Write single script calling DNAstack, WES, SB scripts on different subsets of files
@mbarkley mbarkley self-assigned this Jan 15, 2021
@mbarkley
Copy link
Collaborator Author

mbarkley commented Jan 15, 2021

There is a Seven Bridges WES script now thanks to #9

@ianfore
Copy link
Collaborator

ianfore commented Jan 15, 2021

Great proposal. I think there's even value in running something against more than one Seven Bridges implementation. I'd like to revive this model of looking at a FASP script to highlight which aspects of federation it hits. In this case it would address those in bold. It also has the option that we can test federated authentication and authorization early.

  • Technology (stack)
  • Funder
  • Data provider
  • National boundaries ?
  • Scientific discipline

Your list above would check all those.

@ianfore
Copy link
Collaborator

ianfore commented Jan 18, 2021

Have updated FASPScript17 which runs a compute on TCGA and GTEx data in a single script. These come from different repositories and driver projects.

  • Create notebook version of FASPScript17
  • Use DNAStack WES to run samtools workflow on GCP
  • Use Amazon copy of the GTEX data and run the workflow using the Seven Bridges CGC WES Client
    See FASPScript18 (script) and GTEXExample (notebook) which illustrate what would have to be added in 17 and the new notebook equivalent.

See GTEX_TCGA_Federated_Analysis notebook

@ianfore ianfore added the FASPHackathon2 January 2021 FASP Hackathon label Jan 18, 2021
@mbarkley
Copy link
Collaborator Author

There's now a PR sent to fix the DNAstack WES client. Tomorrow I'll start tinkering with an ELIXIR script.

@ianfore
Copy link
Collaborator

ianfore commented Jan 20, 2021

Merged the PR. Now thinking of tinkering with running samtools via the DNAStack WES. The problem we hit with that last summer seemed to be the "requester pays" buckets.

Added DNAStackWESTour notebook to explore some more.

It looks like we may still have the requester pays problem, but otherwise the server looks in good shape.

@mbarkley
Copy link
Collaborator Author

I've sent PR #19 with an ELIXIR WES client implementation. I don't think I'll get to write a script using all the WES clients this week for a single federated workflow, but I think we're a lot closer now.

ianfore added a commit that referenced this issue Jan 22, 2021
Add ELIXIR WES client [#11]
Additional notebook using DNAStack WES
Adding the above identified need for return value from FASPRunner.RunQuery - added
@ianfore
Copy link
Collaborator

ianfore commented Jul 17, 2022

Raising the question whether the Federated VUS notebook addresses the intent of this issue. The link shown is the notebook that aggregates the results. Notebooks in the same folder show running the same workflow on three different instances of the Seven Bridges platform.

This...

  1. Demonstrates the concept and checks the boxes in the 15 Jan 2021 comment above.
  2. Is less convincing than running it on three different technical platforms

Barriers to the latter are

  • Being able to access or deploy the application/container on Elixir or DNAStack
  • Ability to get results back from those platforms when workflow is run

Proposing that we close this issue and address those barriers. Perhaps via issues in this repo which are specific to those barriers, or by other means.

Thoughts @mbarkley ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FASPHackathon2 January 2021 FASP Hackathon
Projects
None yet
Development

No branches or pull requests

2 participants