Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider/implement alternate schema/model representations #8

Open
ianfore opened this issue Mar 2, 2022 · 0 comments
Open

Consider/implement alternate schema/model representations #8

ianfore opened this issue Mar 2, 2022 · 0 comments

Comments

@ianfore
Copy link

ianfore commented Mar 2, 2022

As we widen Data Connect interactions with other GA4GH work streams it may be worth an experimental branch in which to implement DataModelSuppliers that provide the schema or model in different forms than the current json schema. The purpose of this would be experimental - to match the representation required for user need with what different schema types provide. How models are represented in GA4GH is currently an open question being considered by TASC. That the Data Connect implementation could provide a workbench to test out different possibilities may be helpful to that effort.

Some formats worth looking at might include, but are most definitely not limited to

  • Simple extended dbGAP dictionary format
  • ISO11179 - at least a couple of metadata repositories of relevance use this standard.
  • R approach to documenting data structures
  • Link-ML
  • SchemaBlocks
  • Protobuf
  • XML Metadata Interchange (XMI)
  • RDA Data Type Registries

Some of the representation in specific formats could be handled on the client end. For example, an R client could deal with translation of the Data Connect/GA4GH schema format for the format used to define the data structures in R. This likely the best solution architecturally. The base question though is what needs to be provided by Data Connect in order to meet user need.

A high level summary of the specific user needs referred to are:

  • Understand the data:
    • from an unfamiliar domain
    • from standard, but niche, specialities e.g. AJCC cancer stage for glioblastoma multiforme
    • the data structure of a particular, perhaps unique, experimental design
  • For the data described by the schema/model; be provided with sufficient information to:
    • Transform the data as needed for the user's purpose
    • Aggregate the data with data from other sources

It is clear that at least the following are core to the needs:

  • References to semantic descriptions (standard or not)
  • Use of scientific units

These would be relevant to data scientists who would be direct users of Data Connect or who would use tools that make use of Data Connect services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant