Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposition for alternative property for InProgressDataDate column in MARCO-BOLO_Metadata_Dataset_Record_WP5 #28

Open
laurianvm opened this issue Jul 18, 2024 · 12 comments
Assignees

Comments

@laurianvm
Copy link
Contributor

MARCO-BOLO_Metadata_Dataset_Record_WP5

Duplicate keys are not allowed in json(-ld)
--> meaning that the schema.org equivalent property of 'additionalProperty' for the columns "InProgressDataDate" and "EVDescription" needs to change for one these columns;
--> propose to change 'additionalProperty' into 'datePublished' for "InProgressDataDate"

Is this okay?

@kmexter
Copy link
Contributor

kmexter commented Jul 18, 2024

fine by me - would need to change for all the WP spreadsheets, mind

@kmexter
Copy link
Contributor

kmexter commented Jul 18, 2024

except ... these googlesheets do not need to be transformed, laurian. only the URIs they refer to. At some point we may (again) ask for metadata to be added here that will need to be transformed, but for now, not.

@laurianvm
Copy link
Contributor Author

only the value in cell D8 of the 'Column descriptions' tab would have to be changed (no other values need changing) - and this would only be to be aligned with what is actually used in the triples (but not required for things to work/not break)

@kmexter
Copy link
Contributor

kmexter commented Aug 22, 2024

Are we OK here? In that, the "in progress date" is not something you need to read into your templates. Can I close this?

@kmexter kmexter assigned laurianvm and unassigned kmexter Aug 22, 2024
@laurianvm
Copy link
Contributor Author

laurianvm commented Jan 12, 2025

with #36 duplicate keys issue disappeared on its own

however, https://schema.org/additionalProperty is not expected to be used on Dataset,
hence proposition to use:
https://github.com/marco-bolo/dataset-catalogue/blob/9e95ff3052e4a4b34d3116dbcfacca6ee5f2960f/scripts/tests/templates/gsheet.jsonld.ldt.j2#L404-L406

@kmexter , @pieterprovoost or @pbuttigieg could you indicate whether schema.org-model needs to be rigidly followed? and which predicate is preferred in this case?

@kmexter
Copy link
Contributor

kmexter commented Jan 13, 2025

yeh, this is for @pieterprovoost or @pbuttigieg - not my expertise. I suggest that you two P's go thru the list of properties - col D of https://docs.google.com/spreadsheets/d/1jH8Gp50y9w_SsoFELYTKV6ohPlkO3nqm/edit?gid=115560455#gid=115560455 for example - and double check them

@pbuttigieg
Copy link
Contributor

pbuttigieg commented Jan 13, 2025

however, https://schema.org/additionalProperty is not expected to be used on Dataset,

It does validate and ODIS does use it on other types (we see no reason for limiting its range).

That being said, there may be a better way.

hence proposition to use:
https://github.com/marco-bolo/dataset-catalogue/blob/9e95ff3052e4a4b34d3116dbcfacca6ee5f2960f/scripts/tests/templates/gsheet.jsonld.ldt.j2#L404-L406

@kmexter , @pieterprovoost or @pbuttigieg could you indicate whether schema.org-model needs to be rigidly followed? and which predicate is preferred in this case?

"For data that are still in progress; please fill in an approximate date (YYYY-MM) you expect them to be published"

This says to me that schema:creativeWorkStatus should be set to something like "Incomplete".

If the value of creativeWorkStatus uses a DefinedTerm type, one can add the expected date of release in a description (or other) property of DefinedTerm.

One could just use text and say something like "Incomplete. Expected release on ..." but this isn't great.

@pbuttigieg
Copy link
Contributor

pbuttigieg commented Jan 14, 2025

@laurianvm something like the following is good form for all creative works (inclusing data sets)

"creativeWorkStatus": {
           [
             {
               "@type": "DefinedTerm",
               "name": "Incomplete",       
               "identifier": "http://purl.obolibrary.org/obo/NCIT_C49160",
               "inDefinedTermSet": "http://purl.obolibrary.org/obo/ncit.owl",
               "termCode": "NCIT:C49160"
             },
            "Incomplete. The expected completion and release date of this asset is 2024-11-25"
          ]
        }

The maintainer property must be filled in these cases, for follow up.

One could put a future date in the usual date properties, but I don't think this is accurate or wise.

@kmexter
Copy link
Contributor

kmexter commented Jan 15, 2025

these would then only be for the jsonld files filled with info in the googlesheets, clearly not for something from a DOI. The maintainer would be the agent mentioned in the DatasetDescriber column
Tho...no-one has filled this column in, so you wont actually get any values in these stanzas

@pbuttigieg
Copy link
Contributor

pbuttigieg commented Jan 15, 2025

these would then only be for the jsonld files filled with info in the googlesheets, clearly not for something from a DOI. The maintainer would be the agent mentioned in the DatasetDescriber column

Any Incomplete data can be tagged this way. Any incomplete data generated by MBO participants must have this filled in of course.

For third-party data used by MBO participants, it's less likely it will be incomplete data. If it is, that should of course be communicated in the project asset catalogue.

For WPs that have downloaded hundreds or thousands of files and have not allocated appropriate resources for data management, a JSON-LD record describing the aggregate should be created.

It is insufficient to say "I didn't create this data, so I'm not responsible for it's metadata" if one is using the data to create downstream products. There must have been some QA/QC done (otherwise the science is suspect) and the processes and results of such activity should be recorded.

Tho...no-one has filled this column in, so you wont actually get any values in these stanzas

Then we must motivate them to verify that this is the true state of things and, if not, to rectify it.

@kmexter
Copy link
Contributor

kmexter commented Jan 16, 2025

all very true
one comment makes me think: we have not created a template for how to describe an aggregate dataset - which indeed, we will be creating in MBO. Does this merit a new issue?

@pbuttigieg
Copy link
Contributor

all very true
one comment makes me think: we have not created a template for how to describe an aggregate dataset - which indeed, we will be creating in MBO. Does this merit a new issue?

Yes, I'll create one. A Dataset is of arbitrary size and mereology, so that type can be used unless a DataCatalog is in play.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants