-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting started guide #29
Comments
Yes this is definitely needed. Some of the existing issues are intended to help with that. #5, #7, #8, #26 and #28 for example. Some have been dealt with but the repo is not yet at a point where it's easy to pick up and get started. I have to take some responsibility for this, so am self-assigning, but any help would be welcome. |
Created starter branch. |
I'm happy to work on this as well, I can bring the perspective of someone who doesn't have existing accounts with any of these services, am I able to use the guide and run scripts successfully. If it works for you, I can create a branch on my fork off of your fork |
Yes, please fork as needed. I was going to go back to square one and create a new python environment using conda or venv and check the dependencies bottom up. I'm still reluctant to list everything used in the scripts as a dependency. Not all users would want to run all scripts - so I wanted to avoid them having to install packages they wouldn't use. A reasonable strategy would be to list as dependencies only those modules used by the fasp package, and leave out anything which are used only within specific scripts, but we'll have to review that as we go. Besides that we also need to consider this as 'tutorial' to get someone started and able to do something useful with minimal preliminaries. |
For simplicity i've listed out all the dependencies in I'm trying to work up to running |
never mind, found the |
hi @ianfore , I'm trying to run FASPScript2.py to completion, while translating all the workup steps into the getting started guide. Currently, I'm getting a |
That query is against a BigQuery table which contains controlled access data. I created the table from a dbGaP file to which I have been granted access, but I can't grant that access to anyone else. We could follow through on that but it would distract from what we're trying to get done. (Created issue #31 which we can pursue in parallel to deal with the access issue). More directly we should find a script that fits the criterion to "get someone started and able to do something useful with minimal preliminaries". FASPScript2 is nice, in that it's federating two sources, but it doesn't fit the bill for "minimal preliminaries". FASPNotebook06 would be better. Neither the Search nor DRS steps will require any authenticated access. However the WES step will require you to get a log on to a WES Server. I can't see a way around that for any WES server because I don't know of anyone prepared to give open access to compute. However, @mbarkley might grant you access to the DNAStack WES for what you want to do. For steps beyond that; we could look at notebooks that use the various Seven Bridges WES servers. For the CRDC sponsored http://cgc.sbgenomics.com you should be able to create yourself an account with enough credits to do basic compute. The other SB instances which offer WES services (Cavatica and BioDataCatalyst) also offer some "starter" access. General point about the scripts - I've shifted focus to the notebooks rather than the scripts. The most current work in on the notebooks because it seemed that had more relevance to the community. If that's not the case we can shift focus back to the scripts. (The notebook vs scripts question got some consideration in issue #7 if you want more background on the thinking, though it does get into the weeds of some WES issues). |
Thanks for the clarification Ian, I'll move over to trying to replicate notebook 6 for now. I think we should provide some indication within the scripts about which ones are not possible to run based on closed access. I think it should also be sufficient to say for certain scripts just what you mention here, i.e. "this script requires you to have an account with XYZ company and access to their WES service." Perhaps we can list out the "Platform reps" for each institute, e.g:
That way, we give people trying to run the fasp scripts a chance to jump off into setting up accounts with the related platforms, and a point of contact for further clarification if necessary. |
Yes, the access required would fall under the heading of the metadata that I felt we needed about scripts. Issue #8 touched on script metadata and which datasets scripts access, but we should revisit. The access_keys.md page has the beginnings of some of how and where to get access to various systems. Adding a to do in here. Creating additional issues for the would work too.
|
Added details for Seven Bridges keys. In the process it required code changes so the same approach was used for the SB WES and DRS services. Felt that, rather than providing help desk contacts, the links to home pages for each of the systems would be sufficient. In each case that leads into the standard process for getting an account on the relevant system. |
Reviewing this. Much of what we set out to do was accomplished for the tutorial at ISMB, and with the addition of Starter Kit. SK adds examples of how, as a provider, to get DRS, WES, Data Connect and Passport running. One rationalization, for fasp-scripts, was to separate the clients from the scripts1. That was done as the fasp-client branch. This simplifies the complex scenario access scenario required for all the scripts, and explored above. In total, the various scripts need keys for 10-12 systems2 . That's too complex to manage, and likely unnecessary for all users. Most scripts probably only need to authenticate against three or four systems, sometimes less. Best to deal with that script by script. The tutorial handled the "if you want to run this on Seven Bridges, contact Michele or SB support desk" question above. 1 For "script" read script/notebook. Suggest we have the following to do's to close this issue:
|
Regarding the to dos above... The fasp-clients branch became https://github.com/ga4gh/fasp-clients The ISMB tutorial was renamed as https://github.com/ga4gh/Get-Started-with-GA4GH-APIs. It seems to be runnable standalone outside the event. Removal of the clients from this repository still requires to be done. Notebooks here need to be checked that they run against the clients in fasp-clients. Some pruning of the notebooks should also be done. No need to check notebooks that are being retired. |
@ianfore From our Connect call, I'm thinking it might be helpful to create a "getting started guide" as an entrypoint to being able to run these scripts. As we want more of the community to use and contribute to these scripts, we'll want to provide an easy path for them to working with this repo.
This guide could take the form of a one-pager within the repo that explains how to go about getting registered for the various services/platforms, how to configure keys locally, test scripts with expected output, etc. It would take a user/researcher with no pre-existing identity with any of the FASP platforms to being able to run most, or all scripts. Since I fall into this category (I only have an ID with CGC and Cavatica), I'm happy to take notes on the process and collate into a getting started guide.
Does this sound useful?
@briandoconnor @mbarkley
The text was updated successfully, but these errors were encountered: