Add specification for how to extend the schema #27

briri · 2020-03-10T21:05:38Z

We are currently converting our API over to use this Common Standard metadata schema. We have a few scenarios where we also need to convey information that its required for our system but outside the scope of this schema.

It would be good if the schema provided guidance on how best to include this type of information. So that systems adopting the Common Standard schema follow similar patterns.

For example, the DMPTool API requires that a DMP template identifier be specified along with some other information specific to the caller's system (called 'abc' below) when creating a new DMP.

We will be using the following structure to accomplish this:

{
  "dmp": {
    "title": "My new DMP",
    ...
    // the rest of the common standard attributes
    ...
    "extended_attributes": [
      "dmptool": { "template_id": "123" },
      "abc": { "reserve_id": { "type": "doi", "identifier": "https://dx.doi.org/10.9999/999xyz" } }
    ]
  }
}

Apologies if this has already been discussed and I just missed it in the documentation somewhere.

hmpf · 2020-06-02T10:16:42Z

Would it be relevant to have extensions elsewhere than the top level as well? Like, extra stuff for host/distribution for instance.

briri · 2020-06-02T15:00:25Z

I can see value in allowing for extensions at the dataset, distribution and host levels (perhaps project as well). For us (so far) the use case for using extensions has been in the import (creation) of a DMP via our API. It could be useful as well in

We ended up using the following during the hackathon:

{
  "dmp": {
    "extension": [
      {
        "dmptool": {
          "template": {
            "id": 946,
            "title": "Environmental Resilience Institute Data Management Plan"
          }
        }
      }
    ]
  }
}

Related issue: RDA-DMP-Common/hackathon-2020#3

TomMiksa · 2020-08-28T09:08:32Z

Do you have more examples of extensions needed? This could help us find the best strategy for including them.

What about doing it in a slightly different way by using within the dmp section a field to define extensions. This would indicate at the beginning what specific extensions are used and hence what additional fields are to be expected. Each extension must be identified by an URL to a JSON schema. For example:

{
  "dmp": {
     ...
    "extensions": [
      "http://json-schema.org/dmptool",
      "http://json-schema.org/funderX"
    ],
    ...
    "dataset": [
      {
        "title": "My Dataset",
        "dmptool-specific-field": "generated by DMPTool"
       ....
      }
    ]
  }
}

briri · 2020-08-28T15:33:32Z

I think that could be a useful approach.

We are currently working through an integration that is using the common standard as the method of communication. We are still in the early stages of the project though and have not finished defining what additional information we would like to pass along. Much of the information is at the project/dmp level for example:

the DOI of the research field station where the research will be performed
yes|no|unknown value along with some descriptive text (like the ethical issues section) that would be used to indicate whether the research involves endangered species
yes|no|unknown value along with some descriptive text indicating if culturally sensitive information for native populations are a factor

cpina · 2021-01-19T11:30:20Z

I'm new here - sorry if I miss-interpreted something in this issue. I was in the call earlier on and I thought of adding some of my thoughts here.

{
  "dmp": {
     ...
    "extensions": [
      "http://json-schema.org/dmptool",
      "http://json-schema.org/funderX"
    ],
    ...
    "dataset": [
      {
        "title": "My Dataset",
        "dmptool-specific-field": "generated by DMPTool"
       ....
      }
    ]
  }
}

I like it. In the Frictionless Data community we had a similar discussion: frictionlessdata/datapackage#663

In that case we were looking at adding specific fields. E.g. at the Swiss Polar Institute we are prefixing them with x_spi_:
https://github.com/Swiss-Polar-Institute/frictionless-data-packages/blob/master/10.5281_zenodo.2616605/datapackage.json#L146
It makes clear that these fields are extensions from SPI (the approach in this issue also makes it clear).

Only one possible problem (hypothetical) with the current suggestion: might two institutions come up with two extensions with the same name and some fields would be the same? I can think of two possible solutions:

Prefix the extensions by an institution name (similar to Frictionless Data approach but importing extensions instead of adding fields)
In the "import" step have an alias (like Python): such as:

{
  "dmp": {
     ...
    "extensions": [
      {"uri": "http://university/citations", name: "university-citations"},
      {"uri": "http://school/citations", name:"school-citations"},
    ],
    ...
    "dataset": [
      {
        "title": "My Dataset",
        "school-citations-specific-field": "generated by DMPTool",
        "university-citations-specific-field": "something else",
       ....
      }
    ]
  }
}

froggypaule · 2021-01-19T16:04:51Z

hello ... also following this morning's call. Thanks to @cpina : this is the reservation I was trying to convey at the call:

If fields coming from two different extensions share the same name and the same meaning, then all is well: they are simply mapped one onto the other.
If fields coming from two different extensions share the same name BUT not the same meaning, then the solution proposed by @cpina would work.
If fields coming from two different extensions do NOT share the same name but the same meaning: again, un mapping would do the trick
If fields coming from two different extensions do NOT share the same name NOR the same meaning, then all is well also.

Sorry if I misunderstand the question.

cpina · 2021-01-20T09:02:37Z

hello ... also following this morning's call. Thanks to @cpina : this is the reservation I was trying to convey at the call:

1. If fields coming from two different extensions share the same name and the same meaning, then all is well: they are simply mapped one onto the other.

2. If fields coming from two different extensions share the same name BUT not the same meaning, then the solution proposed by @cpina would work.

3. If fields coming from two different extensions do NOT share the same name but the same meaning: again, un mapping would do the trick

4. If fields coming from two different extensions do NOT share the same name NOR the same meaning, then all is well also.

This is a perfect summary, thanks!

Sorry if I misunderstand the question.

My thoughts are: should we make work the case 2? (two different extensions share the name, share a field name and not the same meaning). If this is a concern and should work: what's the best way to go? (a "name" or a "prefix")

briri · 2021-01-29T16:10:52Z

We are going to begin work on the schema extensions for DMPRoadmap in late March or early April.

We will plan to follow the pattern described by @cpina @froggypaule above by using a tool/codebase specific prefix like:dmproadmap-[x].

Any early suggestions or feedback (once we start work on it) would be welcome. :)

froggypaule · 2021-02-01T08:13:31Z

Hello! a quick one: why the name 'dmproadmap' ? I am saying this because that DMPRoadmap is the common code base to DMPTool and DMPonline. Is the name intentional?

briri · 2021-02-01T15:46:18Z

Yes. Any changes we'd be making would benefit the entire codebase (DMPTool, DMPonline, DMPOPIDoR, DMPAssistant, etc.).

For example the DMPRoadmap system is driven in part by specific templates (e.g. Horizon2020, NSF, USGS, etc.). We have an API endpoint that allows users to create a DMP by passing in this metadata standard. To help facilitate the use of specific templates we would add a dmproadmap_template_id or something similar to convey that information to the system.

froggypaule · 2021-02-01T15:52:40Z

ok thanks.... I was just commenting :)

paulwalk · 2021-03-16T10:02:50Z

Hi - I've been reading this thread, and I'm concerned that the consensus seems to be to invent a mechanism for handling namespaces in JSON.

I would strongly recommend not doing this.

At the start of this work, we decided to limit our focus and ambition with the standard, so that it was developed and managed as an information exchange format. More formally, it could be described as a metadata application profile. However, the interest in this work has grown and, as such, we are now faced with a decision. Do we accept that there is demand for a more expansive standard - essentially an ontology within which new concepts can be added? Or do we continue to limit our scope, while recognising that there is demand to include extra information in, or alongside, the information exchange?

As I understand it, there are two viable options available to us:

Option 1: Widen our scope, and become an ontology

It could be argued that this is inevitable. In any case, there is already work underway to formally describe the standard as an OWL ontology, so there does appear to be demand for this. If this is the direction of travel for the DMP Common Standard, then I would recommend that we act sooner rather than later, and move from supporting plain JSON to supporting JSON-LD.

Pros:

JSON-LD allows us to extend by adding contexts (namespaces) which are easily and robustly implemented
JSON-LD allows us to describe DMPs in a manner which is not just machine-readable, but which is more machine-understandable
JSON-LD is increasingly well supported in software libraries

Cons:

this may be disruptive to the current implementations

Option 2: Continue as before, with a new section for arbitrary extensions

We had certainly been considering how to handle extensions from the beginning of this work, and this was our original idea. With this approach, the scope of the DMP Common Standard is unchanged, but a place is added for third-parties to add arbitrary data. With this approach, the DMP Common Standard has nothing to say about how these extensions are encoded. If implementers add extensions which cause name collisions, then they will need to sort this out (typically by agreeing conventions).

Pros:

potentially less disruptive to current implementations (although this needs some verification)

Cons:

risk to the DMP Common Standard that it becomes gradually marginalised as demand increases for the extensions to be more broadly interoperable.

My recommendation:

Absolutely do not invent a new mechanism for name-spacing JSON properties as part of the DMP Common Standard
Consider the implications of moving to JSON-LD. In many cases, it may simply involve adding a context to the JSON, and changing to a JSON-LD software library for parsing. However, there may be other issues for the software that has implemented the standard. It would be good to find out - how disruptive actually is this?
If not moving to JSON-LD, then define the place for extensions (as already suggested above) and then say no more. Make it clear that all further definition is out of scope for this standard. However, we could consider providing a place for implementers to document "community conventions" for using these.

Of these two options, I think that the JSON-LD option is the more future-proof at this point.

froggypaule · 2021-03-17T16:41:17Z

Thanks @paulwalk for clairifying this: having come to the CS quite late, this helps a lot.
And yes, I agree with you on JSON-LD and option 1 (not that I am quite versed in these matters....)

cpina · 2021-03-18T10:26:23Z

Thanks @paulwalk . Sadly I'm not extremely familiar with JSON-LD and I need to do some refreshing on it. I 100% agree to avoid reinventing the wheel. If any of the ideas of my suggestions already exist in a standard I would say to go with the standard unless there is a very good reason for this use-case.

fekaputra · 2021-03-18T19:45:49Z

Hi @paulwalk, in case it is decided that the community will go with the first option, we (mainly me, @JoaoMFCardoso, @ljgarcia and Marie-Christine) have been working on the ontology version of the DMP Common Standard (DMP Common Standard Ontology - DCSO), which is already committed as a part of this repository (https://github.com/RDA-DMP-Common/RDA-DMP-Common-Standard/tree/master/ontologies). This was a result of the DCS hackathon last year.

The goal of the ontology is to have a 1-to-1 mapping to the current DCS, to ensure the compatibility between the DCSO and the original DCS standard.

We will be very happy to discuss the ontology development (which you can later serialise as JSON-LD) to include the latest changes since the hackathon if you wish.

As a note, we are currently working on an (invited) journal paper to showcase the DCSO and its features. So in case that the community decided to go with the JSON-LD, we can also report this development in the paper as well.

MarekSuchanek · 2021-03-20T17:02:20Z

Hi, I would vote for JSON-LD way.

There is already DCSO, (re)using it would be great... It would be one unified definition of DMP Common Standard.
Working with JSON-LD will be definitely more convenient and flexible.
The documentation (of the specification, i.e., ontology) could be generated... no duplication of information, less inconsistency issues.
Easier to refer to specific parts of the standard.
Could also link directly to concepts/ontologies that the standard (re)uses, e.g., DCAT.

@paulwalk It should be possible to remain backwards compatible (when someone ignores @context, @type, etc., the structure can be done in the same way as is now), right? Question is if that is a good idea or it would be better to work directly on some redesign (again, sooner rather than later)...

One might also ask why JSON-LD and not directly RDF.

paulwalk · 2021-04-19T14:42:29Z

@paulwalk It should be possible to remain backwards compatible (when someone ignores @context, @type, etc., the structure can be done in the same way as is now), right? Question is if that is a good idea or it would be better to work directly on some redesign (again, sooner rather than later)...

I think it would remain backwards-compatible for people parsing the document as JSON rather than JSON-LD. As far as I can see, the main thing that would be lost would be the namespace URI mapping - but the namespace prefixes would still be in the JSON.

One might also ask why JSON-LD and not directly RDF.

This is really just about tooling. The DMP system APIs are already handling JSON. Developers mostly prefer it to RDF because they get native programming language support etc. JSON-LD seems to hit the "sweet-spot" for many.

nicolasfranck · 2022-08-02T12:40:54Z

I think the use of JSON-LD would only break usage if you would decide to use a different way of expressing your attributes.
JSON-LD allows for short attribute names or expanded names (name vs http://schema.org/name), compacted result or not; allows to express your values as regular strings, array of strings, array of hashes ..

A little sidewalk: IIIF v2 uses JSON-LD, but implementers rapidly started to realise that attribute values can be anything (reference url? regular string? array of reference urls?). IIIF v3 therefore decided to be far more strict;

And that is what one should probably do to make other developers' live easier. Let's not forget that most JSON parsers are just JSON parsers, and are not like XML parsers that can handle namespaces.

briri mentioned this issue May 27, 2020

How to add funder template specific extensions needs be worked out #30

Open

TomMiksa assigned peterneish, paulwalk and TomMiksa Aug 28, 2020

TomMiksa added the decision Decision to be taken that alligns the approach label Aug 28, 2020

briri mentioned this issue Jan 28, 2021

Create a complimentary JSON schema for DMPRoadmap specific extensions to the RDA common standard DMPRoadmap/roadmap#2797

Closed

briri mentioned this issue Mar 1, 2021

Add a JSON export to the download tab DMPRoadmap/roadmap#2438

Closed

paulwalk mentioned this issue Mar 16, 2021

Use Case: indicating that a Data Entity conforms to a particular specification ResearchObject/ro-crate#115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add specification for how to extend the schema #27

Add specification for how to extend the schema #27

briri commented Mar 10, 2020

hmpf commented Jun 2, 2020

briri commented Jun 2, 2020

TomMiksa commented Aug 28, 2020

briri commented Aug 28, 2020

cpina commented Jan 19, 2021

froggypaule commented Jan 19, 2021

cpina commented Jan 20, 2021

briri commented Jan 29, 2021

froggypaule commented Feb 1, 2021

briri commented Feb 1, 2021 •

edited

Loading

froggypaule commented Feb 1, 2021

paulwalk commented Mar 16, 2021

froggypaule commented Mar 17, 2021

cpina commented Mar 18, 2021

fekaputra commented Mar 18, 2021 •

edited

Loading

MarekSuchanek commented Mar 20, 2021 •

edited

Loading

paulwalk commented Apr 19, 2021

nicolasfranck commented Aug 2, 2022 •

edited

Loading

Add specification for how to extend the schema #27

Add specification for how to extend the schema #27

Comments

briri commented Mar 10, 2020

hmpf commented Jun 2, 2020

briri commented Jun 2, 2020

TomMiksa commented Aug 28, 2020

briri commented Aug 28, 2020

cpina commented Jan 19, 2021

froggypaule commented Jan 19, 2021

cpina commented Jan 20, 2021

briri commented Jan 29, 2021

froggypaule commented Feb 1, 2021

briri commented Feb 1, 2021 • edited Loading

froggypaule commented Feb 1, 2021

paulwalk commented Mar 16, 2021

Option 1: Widen our scope, and become an ontology

Pros:

Cons:

Option 2: Continue as before, with a new section for arbitrary extensions

Pros:

Cons:

My recommendation:

froggypaule commented Mar 17, 2021

cpina commented Mar 18, 2021

fekaputra commented Mar 18, 2021 • edited Loading

MarekSuchanek commented Mar 20, 2021 • edited Loading

paulwalk commented Apr 19, 2021

nicolasfranck commented Aug 2, 2022 • edited Loading

briri commented Feb 1, 2021 •

edited

Loading

fekaputra commented Mar 18, 2021 •

edited

Loading

MarekSuchanek commented Mar 20, 2021 •

edited

Loading

nicolasfranck commented Aug 2, 2022 •

edited

Loading