Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json ld templates to accommodate new changes to the googlesheets #36

Open
kmexter opened this issue Jan 9, 2025 · 9 comments
Open

json ld templates to accommodate new changes to the googlesheets #36

kmexter opened this issue Jan 9, 2025 · 9 comments
Assignees
Labels

Comments

@kmexter
Copy link
Contributor

kmexter commented Jan 9, 2025

3 new columns have been added and two previous ones I think have new titles
new EOVDescription
new EOVDescription-BioGeochemistry
new EOVDescription-Physics
rename? EBV
rename? Indicator

See https://docs.google.com/spreadsheets/d/1jH8Gp50y9w_SsoFELYTKV6ohPlkO3nqm/edit?gid=1593813399#gid=1593813399 for an example - cols H-L

@laurianvm: you need update the templates and the json-ld files that come out (as we need to add this info to them direcltly from the googlesheets, they will almost certainly not be in the datasets). Clearly these need to go in the fromgs files (see #9) and also in the fromuri ones BUT @pieterprovoost or @pbuttigieg or @ptagliolato can you advise as to how this should be done? they are variablesMeasured?
For now just add these as a string - I am asking Joana Beja for the associated URIs, I think those from col H should have URI, not sure about the other ones

FYI for the WP3 googlesheet we are still sorting some things out, but I added these columns there also and have retitled the ones we are still sorting out (columns M-P), and as you say you harvest on column name, then your template should nicely ignore those anyway.
Note that we have very little actual information in these columns, only a few rows have selections there: so in most cases your template will not have any information to work with here.

@kmexter
Copy link
Contributor Author

kmexter commented Jan 13, 2025

I update the schema properties in colum D of the Column Descriptions tab - you should check that out and update your templates. I noticed some mistakes in one of them and corrected that

  • changed the property for citation
  • corrected the mistakes (I cant remember which WP it was tho)
  • added some identifiers attributes that were missing before

@kmexter
Copy link
Contributor Author

kmexter commented Jan 13, 2025

I have the URIs for the columns

Col H EOVDescription- Biology and Ecosystems
string in the googlesheet column - URL
Phytoplankton biomass and diversity - https://goosocean.org/document/17507
Zooplankton biomass and diversity - https://goosocean.org/document/17509
Fish abundance and distribution - https://goosocean.org/document/17510
Marine turtles, birds, mammals abundance and distribution - https://goosocean.org/document/17511
Hard coral cover and composition - https://goosocean.org/document/17512
Seagrass cover and composition - https://goosocean.org/document/17513
Macroalgal canopy cover and composition - https://goosocean.org/document/17515
Mangrove cover and composition - https://goosocean.org/document/17514

Col I EOVDescription-Biogeochemistry
Oxygen - https://goosocean.org/document/17473
Nutrients - https://goosocean.org/document/17474
Inorganic carbon - https://goosocean.org/document/17475
Transient tracers - https://goosocean.org/document/17476
Particulate matter - https://goosocean.org/document/17477
Nitrous oxide - https://goosocean.org/document/17478
Stable carbon isotopes - https://goosocean.org/document/17479
Dissolved organic carbon - https://goosocean.org/document/17480

Col J EOVDescription-Physics
Sea state - https://goosocean.org/document/17462
Ocean surface stress - https://goosocean.org/document/17463)
Sea ice - https://goosocean.org/document/17464
Sea surface height - https://goosocean.org/document/17465
Sea surface temperature - https://goosocean.org/document/17466
Subsurface temperature - https://goosocean.org/document/17467
Surface currents - https://goosocean.org/document/17468
Subsurface currents - https://goosocean.org/document/17469
Sea surface salinity - https://goosocean.org/document/17470
Subsurface salinity - https://goosocean.org/document/17471
Ocean surface heat flux - https://goosocean.org/document/17472
Ocean bottom pressure - https://goosocean.org/document/32488

@pieterprovoost
Copy link
Contributor

@pbuttigieg You created IRIs for the EOVs in ENVO, is there a preference to use these over the URIs listed above?

@pbuttigieg
Copy link
Contributor

pbuttigieg commented Jan 13, 2025

@pbuttigieg You created IRIs for the EOVs in ENVO, is there a preference to use these over the URIs listed above?

Yes. The links above are not PIDs and the ENVO IRIs will be the official markup for EOVs in the IOC Data Architecture.

Note that the granularity of the BioEco EOVs is finer in ENVO (e.g. rather than "X biomass and composition" it would be "X biomass" and "X composition").

This means, rather than dropdowns, we'd need something more array like in the spreadsheets. I'm aware this is more burden, so we can compromise with:

  • ENVO making aggregate classes (for data set level markup) that match the EOV names, which can go into DefinedTerms in the keyword array.
  • MBO spreadsheets offering an additional column where those that want to do better an array based field to enter more granular markup that goes into variableMeasured. Reporting on who actually does this will be very interesting.

We could make aggregate classes on ENVO that match the literal names of the EOVs, but that's not really good practice for variable markup. It may be okay for data sets that feature multiple variables.

@pbuttigieg
Copy link
Contributor

@pbuttigieg
Copy link
Contributor

@laurianvm: you need update the templates and the json-ld files that come out (as we need to add this info to them direcltly from the googlesheets, they will almost certainly not be in the datasets). Clearly these need to go in the fromgs files (see #9) and also in the fromuri ones BUT @pieterprovoost or @pbuttigieg or @ptagliolato can you advise as to how this should be done? they are variablesMeasured?

@kmexter the granular forms of these in ENVO (e.g. "microbial biomass") could be variable measured. The aggregate forms (e.g. "microbial biomass and diversity") are suspect there: there is rarely if ever a variable that combines both.

One could add these as DefinedTerms in the keyword array (example syntax) for dataset level rather than variable level markup. That seems more appropriate.

Note that we have very little actual information in these columns, only a few rows have selections there: so in most cases your template will not have any information to work with here.

We should try to motivate more action here at the GA - stressing that this is how the data will be efficiently discovered by IOC and others. The right semantic markup is one key link in the science to society value chain, not in an abstract sense, but very practically .

@kmexter
Copy link
Contributor Author

kmexter commented Jan 13, 2025

OK, so the situation right now is that we do not have the envo terms nor quite the final format in the googlesheets to hold these terms. TBC. For now, @laurianvm can adjust her templates to read those information, link to the URIs I have added above, and to put this the jsonld files that hold only the googlesheet info, not that harvested from the DOI of the dataset (see #9).

I suggest that this is the final change we make to the googlesheets and templating script for now, so that we have something we can present at the GA. We can discuss then at the GA, while we are all there, what to do next

@pbuttigieg
Copy link
Contributor

pbuttigieg commented Jan 13, 2025

OK, so the situation right now is that we do not have the envo terms nor quite the final format in the googlesheets to hold these terms. TBC.

We have most of the ENVO IRIs, modelled in the correct manner for good data management.

Right now, if an aggregate concept (X diversity and biomass) is chosen in the spreadsheet, the script can and should add "X diversity variable" and "X biomass variable" from ENVO as two separate DefinedTerm entries in the keyword array.

For now, @laurianvm can adjust her templates to read those information, link to the URIs I have added above, and to put this the jsonld files that hold only the googlesheet info, not that harvested from the DOI of the dataset (see #9).

The links can and should be the ENVO classes where available. Where they are not, they should be a temporary link to the GOOS URLs.

All the BioEco variables have their high level classes available (the others are variable level, like "microbial alpha diversity", which can be postponed until we have the new columns). I didn't realise MBO would markup others. I'll add them.

I suggest that this is the final change we make to the googlesheets and templating script for now, so that we have something we can present at the GA. We can discuss then at the GA, while we are all there, what to do next

Sure.

@laurianvm
Copy link
Contributor

laurianvm commented Jan 19, 2025

The template has been updated to include new information and link to the tmp GOOS URLS
(to be updated once ENVO URLs are available);
This has prompted a few recommended updates in the Google Sheet — please refer to the comments there for details @kmexter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants