-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: use clearer class names following SPHN conventions #3
Conversation
Thank you for letting me review the schema @cmdoret! Major
Minor
|
Thanks @supermaxiste, these are Excellent points ! :) |
Supporting experiments |
Tracking data changes |
Biosamples |
Data location I agree that the naming is problematic. Other options I considered were EDIT: temporarily went with |
ReferenceSequence version |
Using ncbi taxonomy Based on your advice, we also added source_material, sex and cell_type properties for sample. For now they take string as values, but we will want to use controlled sets as well. |
Epigenomics |
In practice, yes it will likely be used for {prote,metabol}omics data, but not only. |
Study completion_date In the process, we removed properties associated with studies, incl. start/completion date instead and went with "Creation date" instead. EDIT: added |
To wrap things up from my previous comments: Major
Minor
|
Thanks @supermaxiste ! For now I've already replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all of your replies @cmdoret!
I'll provide a couple more in-depth comments here that you can decide whether we can turn them into issues or address them in this PR. To some extent they address clarity, but I'll let you be the judge.
- suggestion (ontology): For some of the genomic terms such as "Reference Genome" and "Reference Sequence" we could also point to the GENO Ontology since they provide some nice definitions too. With "point to" I mean adding a
see_also
property pointing to GENO entries:
- Reference sequence:
http://purl.obolibrary.org/obo/GENO_0000017
(link) - Reference genome:
http://purl.obolibrary.org/obo/GENO_0000914
(link)
note: I checked for alignment set and variant set but couldn't find anything there
- question (cardinalities): in the
project/
folder I saw a bunch of files with different formats and I noticed that some include shacl shapes. This lead me to check cardinalities and I don't think it's very clear what needs to be there and what not. In the overall diagram you shared it looks like a class existing doesn't require any other class from existing and I'm wondering if that should be changed. If you add a MODO, shouldn't it include at least 1 Assay with at least 1 DataEntity? Or are we thinking in a "placeholder" way where people can create empty MODO objects and fill them up later?
On top of this, it's not clear right now 2.1) which properties are mandatory and 2.2) where the information is coming from, because the schema include somerequired
entries, but the shacl shapes seem to go further than that.
To end on a sweet note: |
Thanks @supermaxiste !
|
Thank you for all the work @cmdoret and your clarification about the All looking good to me now 🌞 |
Expectations: I would like you to raise any issue with the clarity or structure of the schema. This will drive the development of a companion library (sdsc-ordes/smoc-api).
Notes:
src/smoc_schema/schema/smoc_schema.yaml
All other files are auto-generated from this one.This PR makes some class names clearer by taking inspiration from SPHN conventions.
Note that we cannot be fully compliant with SPHN 2024 for the
DataFile
, as we aim to support Zarr arrays. This means that assay's outputs may be arrays nested inside of a Zarr file instead of individual file.Here is the schema without showing subclasses:
See here for a schema of the full schema with subclasses and cardinalities.