You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we widen Data Connect interactions with other GA4GH work streams it may be worth an experimental branch in which to implement DataModelSuppliers that provide the schema or model in different forms than the current json schema. The purpose of this would be experimental - to match the representation required for user need with what different schema types provide. How models are represented in GA4GH is currently an open question being considered by TASC. That the Data Connect implementation could provide a workbench to test out different possibilities may be helpful to that effort.
Some formats worth looking at might include, but are most definitely not limited to
Simple extended dbGAP dictionary format
ISO11179 - at least a couple of metadata repositories of relevance use this standard.
R approach to documenting data structures
Link-ML
SchemaBlocks
Protobuf
XML Metadata Interchange (XMI)
RDA Data Type Registries
Some of the representation in specific formats could be handled on the client end. For example, an R client could deal with translation of the Data Connect/GA4GH schema format for the format used to define the data structures in R. This likely the best solution architecturally. The base question though is what needs to be provided by Data Connect in order to meet user need.
A high level summary of the specific user needs referred to are:
Understand the data:
from an unfamiliar domain
from standard, but niche, specialities e.g. AJCC cancer stage for glioblastoma multiforme
the data structure of a particular, perhaps unique, experimental design
For the data described by the schema/model; be provided with sufficient information to:
Transform the data as needed for the user's purpose
Aggregate the data with data from other sources
It is clear that at least the following are core to the needs:
References to semantic descriptions (standard or not)
Use of scientific units
These would be relevant to data scientists who would be direct users of Data Connect or who would use tools that make use of Data Connect services.
The text was updated successfully, but these errors were encountered:
As we widen Data Connect interactions with other GA4GH work streams it may be worth an experimental branch in which to implement DataModelSuppliers that provide the schema or model in different forms than the current json schema. The purpose of this would be experimental - to match the representation required for user need with what different schema types provide. How models are represented in GA4GH is currently an open question being considered by TASC. That the Data Connect implementation could provide a workbench to test out different possibilities may be helpful to that effort.
Some formats worth looking at might include, but are most definitely not limited to
Some of the representation in specific formats could be handled on the client end. For example, an R client could deal with translation of the Data Connect/GA4GH schema format for the format used to define the data structures in R. This likely the best solution architecturally. The base question though is what needs to be provided by Data Connect in order to meet user need.
A high level summary of the specific user needs referred to are:
It is clear that at least the following are core to the needs:
These would be relevant to data scientists who would be direct users of Data Connect or who would use tools that make use of Data Connect services.
The text was updated successfully, but these errors were encountered: