-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C&P/GKS/Discovery components #67
Comments
To clarify, I am not proposing Discovery develops a new metadata model, rather that it aligns with the SchemaBlocks at https://github.com/ga4gh-metadata/schemas and informs their further development. |
@mcourtot +1 - We want the requirements from SearchAPI, Beacon & data exchange products (think phenopackets PXF) to inform the development of standards in SchemaBlocks, which then should serve as the reference for "product" developments. (Maybe this isn't the final place for the code; could be starting point for a work stream subgroup...) |
I think there's a lot of overlap here in terms of what Discovery has been ask to do regarding its component architecture and component data model definitions, and the MetaData Schema Blocks. I need to discuss this further with @mfiume. I'll be putting out a Discovery Search API work retrospective soon regaridng the standard not getting approved at the SC meeting in Basel. Part of the feedback I heard was we need to be clear about where components are defined, by whom, and how the process works. I need to write up some documents detailiing those aspects. My personal expectation is these processes will change, and obviously there are others who want define schemas or blocks of data for interoperability. Search (and broader, Discovery) does have specific requirements, but from my perspective these are less about the models, and more about the format of the models and how they can be used. For example, using JSON Schema to make the components non-ambigious and automatically validatable. It looks like Schema Blocks may have intended to make machine readable schemas, but has missed that mark. I'm hoping the documents I draw up regarding process and format of schemas can help with this moving forward, for wherever these schemas sit. |
No, actually not (intended to be machine readable). It is just "human readable but decent in consistency", waiting to be picked up for formalisation ... Format so far "informed by" OpenAPI, but more in a pseudo-code way. |
@mbaudis OK, that's actaully good news from my perspective! Payload definitions in OpenAPI are actually a sub/superset of JSON Schema, which is better suited for individual model definitions for data represented in JSON. There are lots of aspects at play here regarding model definitions and formalisation. Happy to discuss more in detail if you think that's useful right now. |
The intent was indeed to have a human readable spec, but leave the implementation free. For example, Phenopackets uses protobuf, but for my purposes I want to use JSON as we are also working on a common validator (with Elixir/HCA). With respect to Discovery search I would like to have a consensus on attributes to be used, i.e., the Minimum core metadata attributes for search mentioned in 1. above. Anything that can be reused should be, anything that is missing could be added. It'd be good to discover the same things consistently :) |
I think one of the aspects here is you're defining models which can aslo be for storage, and not just models to be used for transmission of data, right? That's a tangent anyway. Could you expand on what you mean by "metadata attributes"? In terms of the previously mentioned spreadsheet sent round by Tony, it's something we still need to consider, but it doesn't reflect anything in terms of group requirements. There is work to be done. |
In his very first email in June 2018 @mfiume said "The GA4GH Discovery Work Stream is incubating a new standard for data discovery. Many of you have or are currently developing data exploration portals for faceted search of genomics and clinical data, and we think there is value in developing a set of standards to create a common API for data exploration and/or defining and harmonizing metadata (e.g. sex vs. gender) so that searches across the universe of genomics datasets can be made to be more consistent." This would allow data providers to have a GA4GH compliant API over their own data sources. This GA4GH API would provide the federated search capabilities as well as common export formats such as phenopackets. Did you mean something different? If you want we could try and have a quick f2f and see where there is intersection then report to the group? |
@Relequestual Metadata attributes: "Everything but the sequence" (well, sequence and associated positioning, quantitative elements). So it could mean phenotypic attributes, disease codes, time attributes, geolocation data ... See https://ga4gh-metadata.github.io for examples. For Beacon (but anywhere else "data search to transmit" for that matter), it is not sufficient (in the long run) to have some attribute being queried; it has to be scoped, too. If you query a disease code, it has to be clear if this is directed at a biosample level, or at the individual:
So in the current Beacon demonstrator (click "CNV example"), the "metadata" query (cancer ontology codes) is directed against the "biosamples" annotation level. Therefore, one doesn't only need the proper terminology for an attribute (e.g. There has been the use case assessment previously by the C/P group - need a pointer again. But anyway, a lot of pieces are in preformed & just have to be put, gradually, in place (using your preferred schema language ...). |
@mcourtot Thanks for the clarification, and thanks @mbaudis too, I think this is now clearer to me. I think scoping objects as you described above is interesting, and a discussion we need to have, but I'm not keen to have right now, at least not in the long form required to give the problem justice. My hope is that scoping can be done in a different way to in your example, but I'm not sure that's possible, and I know Tony Brookes has strong opinions on this also. @mcourtot I'm not sure we now need a face to face, but appreciate the offer. I expect we will at some point in the next month or so! @mbaudis I'm time limited as to how much effort I can put in, so I need to focus on core work for now. I want to table the scoping discussion, but I fully acknowledge it's one that needs to be had! |
The text was updated successfully, but these errors were encountered: