Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting started guide #29

Open
jb-adams opened this issue Mar 9, 2021 · 13 comments
Open

Getting started guide #29

jb-adams opened this issue Mar 9, 2021 · 13 comments
Assignees

Comments

@jb-adams
Copy link
Member

jb-adams commented Mar 9, 2021

@ianfore From our Connect call, I'm thinking it might be helpful to create a "getting started guide" as an entrypoint to being able to run these scripts. As we want more of the community to use and contribute to these scripts, we'll want to provide an easy path for them to working with this repo.

This guide could take the form of a one-pager within the repo that explains how to go about getting registered for the various services/platforms, how to configure keys locally, test scripts with expected output, etc. It would take a user/researcher with no pre-existing identity with any of the FASP platforms to being able to run most, or all scripts. Since I fall into this category (I only have an ID with CGC and Cavatica), I'm happy to take notes on the process and collate into a getting started guide.

Does this sound useful?

@briandoconnor @mbarkley

@ianfore
Copy link
Collaborator

ianfore commented Mar 9, 2021

Yes this is definitely needed. Some of the existing issues are intended to help with that. #5, #7, #8, #26 and #28 for example. Some have been dealt with but the repo is not yet at a point where it's easy to pick up and get started.

I have to take some responsibility for this, so am self-assigning, but any help would be welcome.

@ianfore ianfore self-assigned this Mar 9, 2021
@ianfore
Copy link
Collaborator

ianfore commented Mar 9, 2021

Created starter branch.

@jb-adams jb-adams self-assigned this Mar 10, 2021
@jb-adams
Copy link
Member Author

I'm happy to work on this as well, I can bring the perspective of someone who doesn't have existing accounts with any of these services, am I able to use the guide and run scripts successfully.

If it works for you, I can create a branch on my fork off of your fork

@ianfore
Copy link
Collaborator

ianfore commented Mar 10, 2021

Yes, please fork as needed.

I was going to go back to square one and create a new python environment using conda or venv and check the dependencies bottom up.

I'm still reluctant to list everything used in the scripts as a dependency. Not all users would want to run all scripts - so I wanted to avoid them having to install packages they wouldn't use.

A reasonable strategy would be to list as dependencies only those modules used by the fasp package, and leave out anything which are used only within specific scripts, but we'll have to review that as we go.

Besides that we also need to consider this as 'tutorial' to get someone started and able to do something useful with minimal preliminaries.

@jb-adams
Copy link
Member Author

For simplicity i've listed out all the dependencies in setup.py so that python setup.py install takes care of all dependencies. if we want something more sophisticated we can consider a CLI option when running install to handle different "dependency groups", but for now i think it's good to have all dependencies transparently listed and installable via one command.

I'm trying to work up to running FASPScript2.py successfully, but I don't have the FASP_SETTINGS environment variable set. Can you explain what this is so I can translate it into the guide?

@jb-adams
Copy link
Member Author

never mind, found the FASP_SETTINGS example

@jb-adams
Copy link
Member Author

hi @ianfore , I'm trying to run FASPScript2.py to completion, while translating all the workup steps into the getting started guide. Currently, I'm getting a 403 error when trying to run the bdcquery (line 47). Who is hosting this service, and who should I reach out to for access?

@ianfore
Copy link
Collaborator

ianfore commented Mar 11, 2021

That query is against a BigQuery table which contains controlled access data. I created the table from a dbGaP file to which I have been granted access, but I can't grant that access to anyone else. We could follow through on that but it would distract from what we're trying to get done. (Created issue #31 which we can pursue in parallel to deal with the access issue).

More directly we should find a script that fits the criterion to "get someone started and able to do something useful with minimal preliminaries". FASPScript2 is nice, in that it's federating two sources, but it doesn't fit the bill for "minimal preliminaries".

FASPNotebook06 would be better. Neither the Search nor DRS steps will require any authenticated access. However the WES step will require you to get a log on to a WES Server. I can't see a way around that for any WES server because I don't know of anyone prepared to give open access to compute. However, @mbarkley might grant you access to the DNAStack WES for what you want to do.

For steps beyond that; we could look at notebooks that use the various Seven Bridges WES servers. For the CRDC sponsored http://cgc.sbgenomics.com you should be able to create yourself an account with enough credits to do basic compute. The other SB instances which offer WES services (Cavatica and BioDataCatalyst) also offer some "starter" access.

General point about the scripts - I've shifted focus to the notebooks rather than the scripts. The most current work in on the notebooks because it seemed that had more relevance to the community. If that's not the case we can shift focus back to the scripts. (The notebook vs scripts question got some consideration in issue #7 if you want more background on the thinking, though it does get into the weeds of some WES issues).

@jb-adams
Copy link
Member Author

Thanks for the clarification Ian, I'll move over to trying to replicate notebook 6 for now. I think we should provide some indication within the scripts about which ones are not possible to run based on closed access.

I think it should also be sufficient to say for certain scripts just what you mention here, i.e. "this script requires you to have an account with XYZ company and access to their WES service." Perhaps we can list out the "Platform reps" for each institute, e.g:

  • if you want to run this on Seven Bridges, contact Michele or SB support desk
  • if you want to run this on DNAstack, contact Max or support desk

That way, we give people trying to run the fasp scripts a chance to jump off into setting up accounts with the related platforms, and a point of contact for further clarification if necessary.

@ianfore
Copy link
Collaborator

ianfore commented Mar 11, 2021

Yes, the access required would fall under the heading of the metadata that I felt we needed about scripts. Issue #8 touched on script metadata and which datasets scripts access, but we should revisit.

The access_keys.md page has the beginnings of some of how and where to get access to various systems.

Adding a to do in here. Creating additional issues for the would work too.

  • Add Seven Bridges nodes to access_keys.md
  • Consider adding contact names to access_keys.md. Links to the relevant helpdesk in each case are likely more useful than the names of individuals. The latter are subject to change. First consider if the types of link already on access_keys.md are sufficient,
  • Add datasets/access required to ScriptSummary
  • Consider alternate format than Excel for ScriptSummary

@ianfore
Copy link
Collaborator

ianfore commented Apr 6, 2021

Added details for Seven Bridges keys. In the process it required code changes so the same approach was used for the SB WES and DRS services.

Felt that, rather than providing help desk contacts, the links to home pages for each of the systems would be sufficient. In each case that leads into the standard process for getting an account on the relevant system.

@ianfore
Copy link
Collaborator

ianfore commented Jul 17, 2022

Reviewing this. Much of what we set out to do was accomplished for the tutorial at ISMB, and with the addition of Starter Kit. SK adds examples of how, as a provider, to get DRS, WES, Data Connect and Passport running.

One rationalization, for fasp-scripts, was to separate the clients from the scripts1. That was done as the fasp-client branch.
Suggest that we create a separate fasp-client repository for that.

This simplifies the complex scenario access scenario required for all the scripts, and explored above. In total, the various scripts need keys for 10-12 systems2 . That's too complex to manage, and likely unnecessary for all users. Most scripts probably only need to authenticate against three or four systems, sometimes less. Best to deal with that script by script.

The tutorial handled the "if you want to run this on Seven Bridges, contact Michele or SB support desk" question above.

1 For "script" read script/notebook.
2 Passport might change that - but that vision is not yet fulfilled, and we should not be dependent on that to explore other functionality in parallel.

Suggest we have the following to do's to close this issue:

  • Create new repository ga4gh/fasp-client (or fasp-clients) from fasp-client branch of this repository
  • Check that https://github.com/ga4gh/ismb-2022-ga4gh-tutorial works independently of the event itself
  • Consider a renamed repository for the tutorial which is not tied to the event

@ianfore
Copy link
Collaborator

ianfore commented Dec 1, 2024

Regarding the to dos above...

The fasp-clients branch became https://github.com/ga4gh/fasp-clients

The ISMB tutorial was renamed as https://github.com/ga4gh/Get-Started-with-GA4GH-APIs. It seems to be runnable standalone outside the event.

Removal of the clients from this repository still requires to be done. Notebooks here need to be checked that they run against the clients in fasp-clients. Some pruning of the notebooks should also be done. No need to check notebooks that are being retired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants