-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata keys incongruency #939
Comments
This issue covers the frontend also. If accepted, this issue will generated two PRs, one for FE and one for BE. |
Sounds like a solid proposal. |
What backend change are you proposing here? Currently there is no validation or other type of processing on the scentific metadata dictionary. Are you proposing that we start? I would be very worried about that...since we have almost 100k datasets that don't comply to that spec (merely because when we setup the ingestor we didn't know that the frontend did special handling like this. |
I think we would need:
It would also be a chance to implement nested entries better (by using new type key for the parent entries. |
@dylanmcreynolds I'm proposing to adopt this individual schema for each scientific metadata. The intent is to make it easier to manage, interpret and visualize the information that we store as metadata. Also notice that it is not a schema on the overall metadata, but just the single metadata entry. Mr @bpedersen2 just beat me and also he read my mind. |
I support the concept. I definitely think it needs to be optional for a long time. Migration scripts will be a little tricky, give that it might involve adding information that does not existing in existing |
A couple of other random thoughts about this. You are using the This leads to propose looking at |
@dylanmcreynolds I have no objections to adopt json-ld.
than I assume that we need to add the @context key and define the structure somewhere. Am I correct? |
Well, my suggestion was kind of in two parts. The first, yes, that looks better to me as I have seen Let's forget the second part of my suggestion for now. :) |
Do you mean the migration script? |
No, the reference to json-ld. I think there's something there (describing metadata in standard ways) but the json-ld standard is more about describing links and not metadata fields themselves. |
Do you really want to include I guess the bigger issue is that the frontend does expect a fixed metadata format for some sites. We should decide on and document standard formats supported by the frontend. We should also add options for validating scientificMetadata (needs a separate issue). JSON-LD seems like a reasonable way to annotate scientificMetadata with type information and could be used by the frontend to determine whether it can be displayed as a table, a tree, etc. But from what I see this would require much more than spelling some fields with an
If we don't want to actually enforce json-ld for metadata then it might actually be better not to use {
"key_name": {
"unit": "metadata_entry_unit",
"type": "type of this field",
"human_readable_name": "Key Name",
"value": {
"sub_key_name": {
"value": ...
}
}
}
} |
Currently you can have a metadata key with a value and sub-fields. If that's the case, your example above will not be able to capture them. I have seen examples where the fields that are required to interpret the entry being clearly marked with a suffix, while the fields defined by the user are not. I'm looking for suggestions to find a solution that allows both human and machine to recognize if the field is user defined or is required by the system to understand how to interpret the information stored in it. All of it should be self-contained. Going back to # prefix, I can see that been really useful (IMHO) in examples like the following:
This scientific metadata encodes the following structure:
which we are able to present it without any issue as follow in the frontend:
|
An additional example is the following: With the # prefixed notation, we could create the following metadata entry:
|
Please comment!!! I Would like to be able to do something like that, but I'm not convinced 100% and unsure of the scope. |
Do you really want to store all possible renamings/mappings of the data in scicat's scientific metadata? Or, in your example, do you want to leave the key as "Temperature" and let downstream systems that read nexus interpret the nexus file as it needs to? It seems to me like SciCat is the front end search and display tool, and has stayed clear of analysis. |
@dylanmcreynolds I'm just brainstorming and trying to collect all the examples that I cross path with. I would like to keep SciCat simple but allow maximum flexibility and solve the issue of metadata representation in the frontend. |
I guess if this is currently supported then we can argue for including it for backwards compatibility. However I think this is a bad design because the type of |
I feel we are touching on a core concept of scicat which is the flexibility in the metadata structure. And I am a little afraid of that, as it would require circulating the information to existing adopters, convincing them, and making sure they keep using the datacatologue, which is already a challenge as they often don't see its value. This said, in general, I like the idea of having some sort of high-level structure of the scientific metadata, but a the same time I think it should be customisable at least by every facility, but also, maybe, at a lower level, by every instrument or experiment. So, as a first step: why don't we allow defining a scientific metadata structure when deploying the backend, as part of its configuration, and build the frontend functionality dynamically? The FE would need to fetch the defined structure schema first and then know what to do. From there one could expand the concept and add a feature to the FE which enables e.g. some members (e.g. the principal investigator of an experiment) to define a schema that all the members of that experiment should comply with. And from there, we could expose all these "user defined" or "developer defined" schemas and make them then machine readable (simply by having another set of endpoints that exposes the schemas). To wrap it up, I think the idea of having a schema is ok (which is still up to discussion from what I see), but I would strongly prefer to be able to opt out of its enforcement and leave freedom for customisation. |
I think we really need to be clear that this 'schema' only applies to the leaves of the meta tree, but does not enforce anything on the overall tree. Currently we already have to different schemas on the leaves:
Adding a richer version here seems helpful in a number of cases. For keeping maximum backwards compatiblity, we should keep the opaque type as it is , and provide migrations on types where we can infer richer information automatically. |
but who will manage the changes on the leaves in the future? Is it something that we will be able to do automatically or requires users' intervention? This latter part is what I fear a little, namely needing to convince all users to change their scripts, and on the other side, I struggle to see how we could automate this fully. E.g., how can the BE understand if the user is storing a temperature or something else if the user gives it in a non trivial name? |
Search is a good example for how structured scientificMetadata could be important for other BE functionality, not just the FE visualization. By default SciCat should not enforce any particular structure. @minottic I think searching by temperature is already supported for leaves with a unit, with conversions provided via mathjs (eg #926)? Or is this still in development? Finally, I created an issue for the validation feature (#966). Let's move the general conversation there and focus this more narrowly on @nitrosx's issue. |
thanks. That's better I think, and with that distinction in mind, I would suggest that we leave the "aliases" part as something that could be tackled by custom schemas (as well as the JSON-LD part, and my previous comment). If I understand right, we are asking ourselves:
IMHO:
Last, brainstorming I still see some overlap between this issue and #966 as one could expand the concept of |
Dear All
"my_key" : {
"value": my_value,
"unit": "my_unit",
"type" : <type enumeration>,
"human_readable_name": "my_readable_key",
}
"my_key": "my_value" This syntax is backward compatible to older data already present in difference instances. "my_key" : {
"value": "my_value",
"unit" : "",
"type": "string",
"human_readable_name": "My Key"
}
"my_key": my_value This syntax is backward compatible to older data already present in difference instances. "my_key" : {
"value": my_value,
"unit" : "",
"type": "number",
"human_readable_name": "My Key"
} This solution should address all the concerns for backward compatibility, no-schema metadata (except for metadata field schema and its system fields). |
The type enumeration is currently defined as:
In issue #984, I'm proposing an expanded list to cover additional cases that we have seen here at ESS and in collaborators metadata |
I will not be opposed to allow the following metadata entry schema alternatives: "my_key" : {
"v[alue]" : "my_value",
"u[nit]" : "my_unit",
"t[ype]" : "my_type",
"hrn[ame]|human_readable_name" : "My Key"
} |
User is allowed to add any additional fields to the metadata entry schema as he/she sees fit for their purposes, as long as they do not collide with the system fields |
The proposed metadata entry schema will allow us to address the proposed metadata types highlighted in #984 |
minor comment: I would use |
I also added the object type in the list of allowed types proposed in #924 |
Metadata keys Incongruency
Summary
Currently the backend accept any type of string as key for a metadata. This is not an issue per se, but it becomes when users access the metadata both through the frontend and the backend.
The frontend performs some changes to the metadata keys when it renders them. Here is an example:
Data Type
-> FE:Data Type
data type
-> FE:Data Type
data_type
-> FE:Data Type
This behavior is by design to render the metadata keys more human readable, but it can be confusing to data users.
Proposed Solution
We propose to have the backend storing the metadata entries as follow:
key_name
is expected to be always lower case and with no spaces, only underscores are allowed. If#human_readable_human
is not specified, the frontend will default back to the current behavior, which is to use the key, substitute the underscores with spaces and capitalize every word.We also suggest to add the
type
(such as string, number, quantity, datetime, etc) to reduce ambiguity of interpretation in the frontend and across different languages when the data is used. iftype
is not specified, the system will default back to the current behavior.Also, the system fields
value
,unit
,type
, andhuman_readable_name
are suffixed with#
, so it is clear that they are system fields and we reduce the probability of collisions with user fields when nested metadata are used.The text was updated successfully, but these errors were encountered: