Null value terminology #828
Replies: 23 comments 3 replies
-
Do we want the same value set for each field? Do we always allow missing values? E.g. if a field is required, so we allow missing value enums? |
Beta Was this translation helpful? Give feedback.
-
Hello @only1chunts - the null standards were devised with the GSC in the CIG working group and we agreed to adopt and promote their use with the INSDC. If the HL7 values are in wide use/useful to our community, I would suggest, we look to see how they map to the INSDC terms, and add them as synonyms where possible. Cheers, |
Beta Was this translation helpful? Give feedback.
-
@lschriml , I'm good with just using the set of 4 simple terms that we already encourage, I'm just raising the possibility that there are other options out there that are more comprehensive in their coverage. |
Beta Was this translation helpful? Give feedback.
-
What are the consequences of making a term mandatory, specifying a non-string Value syntax like {float}, and then allowing these stringy INSDC null terms? |
Beta Was this translation helpful? Give feedback.
-
INSDC are now moving to a more extensive set of missing value terms to help provide more details on the reason the value is missing https://www.insdc.org/submitting-standards/missing-value-reporting/ |
Beta Was this translation helpful? Give feedback.
-
This is a bit off. You can't validly add these values to fields that have types like number or boolean. It's more normal to leave missing values missing or add NA, then include a separate property where explanations of the missingness can be included. Otherwise we're creating the need for users to write tools to understand a bespoke vocabulary (under unclear governance), and thats not very good practice for a standards organisatiin. |
Beta Was this translation helpful? Give feedback.
-
This is the latest list of missing values is below: not collected|not provided|restricted access|missing: control sample|missing: sample group|missing: synthetic construct|missing: lab stock|missing: third party data|missing: data agreement established pre-2023|missing: endangered species|missing: human-identifiable |
Beta Was this translation helpful? Give feedback.
-
I have updated the gensc.org page to include the INSDC missing value terms. |
Beta Was this translation helpful? Give feedback.
-
Agree Chris, that "missing: data agreement established pre-2023" is very INSDC specific. |
Beta Was this translation helpful? Give feedback.
-
Terms to add need reviewed as some don't exactly match INSDC - https://www.insdc.org/submitting-standards/missing-value-reporting/
Adjustments, remove capitalization & change : to something else as : is not friendly to all software. |
Beta Was this translation helpful? Give feedback.
-
The INSDC web page URL is the correct one. https://www.insdc.org/submitting-standards/missing-value-reporting/ These are the exact expected values, if missing values are used in INSDC where mandatory values can not be provided: In ENA this is the relevant regex we are using where a mandatory field allows missing values. There is no capitalisation for any of these missing values. |
Beta Was this translation helpful? Give feedback.
-
Yes, it is a bit inconsistent with "missing:" being prefixed sometimes. There is this sentence of guidance on the web page When reporting a missing mandatory field, the eight granular ‘reporting level’ terms need to be preceded with the term ‘missing: ’ to declare both the absence of a true value as well as the reason. |
Beta Was this translation helpful? Give feedback.
-
goals:
eg
|
Beta Was this translation helpful? Give feedback.
-
@pbuttigieg, @mslarae13 and I are curious about having clear GSC guidance on missing data annotations. If INSDC can't provide crystal clear guidance, we can still provide mappings to their namespace |
Beta Was this translation helpful? Give feedback.
-
Also including textual missing data indicators and numeric fields leads to poor computability We should have a pattern for indicating missing data |
Beta Was this translation helpful? Give feedback.
-
I can report the relative abundance of the verbatim INSDC missing value indicators, but it would be harder to anticipate all of the ways submitters have modified the INSDC codes |
Beta Was this translation helpful? Give feedback.
-
From what Peter provided, would the MissingValueEnum really be
Making "missing" valid means numeric fields are not text, and that's inconsistent. |
Beta Was this translation helpful? Give feedback.
-
I propose we start a new issue on how GSC standards are going to handle the reporting of MAR, MNAR, MCAR, and other forms of missing data generically. We absolutely should not allow strings like "missing" into numeric fields. We could map to or allow INSDC values (among others) in a broader specification on how to add explanations to empty values, without messing up data types |
Beta Was this translation helpful? Give feedback.
-
Here are is the rst file (I had to give it a .txt extension) and I rendered it into PDF via HTML(from the rst) so that one can see it more clearly. Had to learn the wonders of pandoc, so a useful side effect. Reporting Missing Values.pdf
Colman(the ENA product owner) agreed that it was inconsistent with not all terms needing the "missing: " prefix, and will propose that as a change at the next INSDC meeting(May 2025!). and to answer Montana's question, following the logic of the documentation, all the highest level terms which include "missing" ought to be accepted as a standalone term. Currently, the regex at ENA does not support this. I am now checking this with the NCBI and DDJB, if a yes, I will dink the regex's at least at ENA to accept such. If not I will push strongly for any early inclusion. |
Beta Was this translation helpful? Give feedback.
-
Converting to discussion until we resolve GSC methods |
Beta Was this translation helpful? Give feedback.
-
Assuming @Woolly-at-EBI 's table is approved, the missing values that are allowed by INSDC are
@pbuttigieg has made the point that when a slot requires a value like
Decisions to make BEFORE making issues
|
Beta Was this translation helpful? Give feedback.
-
A while ago, We worked with the INSDC to define the values. If we want to change the options, we should reconvene that working group.Cheers,LynnSent from my iPhoneOn Jul 17, 2024, at 8:25 PM, Pier Luigi Buttigieg ***@***.***> wrote:
The cleanest solution is to add two sub-properties to each MIxS property (term). This is easy in JSON or YAML type data, less so in tabular formats.
The first property should express the more globally accepted and useful type of missingness (MAR, MCAR, MNAR, Structured missingness)
The second property could allow levels like those below.
not applicable
not collected
not provided
restricted access
Unfortunately, the other levels (below) are poorly constructed and should not be base values. They can be added to a description sub-property, as some could be used to explain the levels above (e.g. endangered species explains "not provided" or perhaps "restricted"). That suggests a third, more verbose property.
missing: control sample
missing: sample group
missing: synthetic construct
missing: lab stock
missing: third party data
missing: endangered species
missing: human-identifiable
Confirm missing value information should only be permissible in REQUIRED fields. Optional fields do NOT need missing value qualifiers
Missing values apply everywhere - it doesn't make sense to limit them to required fields. That's poor data management.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
@lschriml I don't think is needed, it's more that we parse the outputs of that group and place them in more logically consistent slots |
Beta Was this translation helpful? Give feedback.
-
Previously we have been encouraging the use of the INSDC set of null terms when required:
These have worked OK when used, so there may not be any need to change things, but I wanted to raise this option for discussion. Could /should we be using the HL7 set of "Null Flavors" instead: https://terminology.hl7.org/3.0.0/ValueSet-v3-NullFlavor.html
As these are more comprehensive than the INSDC set.
Beta Was this translation helpful? Give feedback.
All reactions