-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How unique do id-like identifiers need to be? #75
Comments
Yes, the container (the data block) creates a scope. Identifiers have to be sufficiently unique that a unique row in the category is identified by the values of the key data names of the category, bearing in mind that child data names of single-valued items e.g. Where a data set is created that is not going to be expanded in the future or have data blocks mixed and matched to create new data sets, it is likely that very simple ids are sufficient. That, I think, describes the vast bulk of data sets produced. Where data blocks containing calibration information are included, they would need to have more or less unique ids to make sure that they don't clash with IDs chosen by the data set they are being joined to. Again, such clashes can be detected and corrected at data block amalgamation time if desired. |
I mean "container" as in all of the data blocks that are in the CIF file(s) that make up a single "experiment" (there are two blocks in the one container above.). I think you use the word "data collection" in §1.4.2.1.1 in the draft you sent me.
As an example, I think what you said means that if I have a Does "child data names" mean (i) a linked data name, or (ii) the data name of a category that is a subcategory of another?
So having data values that are keys have the same value in difference data blocks in the one container/collection is OK?
Calibration datasets are an outlier; I believe they should be specified quite uniquely, as they will be referenced in a lot of places. |
OK, well, all containers create a scope. In the case of e.g. a directory of multi-block CIF data files, there are no datanames that are in the container scope, because they are all inside data blocks, so the scope of any data name is the data block that they are in. In a hierarchical data file like HDF5, each level in the hierarchy creates a scope. When describing how HDF5 contents are described by a CIF dictionary (which they very much can be) scope is one of the things that should be pinned down - and in any case has to be done by anyone wanting to interpret an HDF5 file with or without the aid of a CIF dictionary. Getting sidetracked.
Yes, generally if the data name in
It must be an explicitly linked data name.
If it is a collection that is meant to form a single data set, it is only OK for keys to have the same value if the values don't lead to rows in the tables that have the same key data values having contradictory values for the other data names in that row.
yes. |
Ug. My brain is starting to hurt and I'm getting confused in between all the conversations/thread/emails. But I do think you're starting to get through |
Long and short, IDs need to be unique in the whole world. See https://github.com/COMCIFS/comcifs.github.io/blob/master/draft/block_collections.md on a proposed method to provide a namespace for identifiers to live in. Also #56 (comment) |
How unique do id-like identifiers need to be?
Yes,
_pd_diffractogram.id
,_pd_phase.id
, and_pd_block.id
need to/should be unique in the whole world, but what about things like_pd_pref_orient_March_Dollase.id
or_pd_data.point_id
?or do we need to ensure that there are sufficient category keys such that their combination is unique in the container?
For example, in this single container:
Are
_pd_phase_list.id
,_pd_pref_orient_March_Dollase.id
, and_pd_data.point_id
sufficiently unique?Are the values of
_pd_phase_mass.phase_list_id
sufficient to properly identify the correct phase to which the mass percent applies?Do all values of every data item have the scope of the entire container?
The text was updated successfully, but these errors were encountered: