Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalogue and dataset versioning #10

Open
skjeves opened this issue Oct 12, 2023 · 14 comments
Open

Catalogue and dataset versioning #10

skjeves opened this issue Oct 12, 2023 · 14 comments

Comments

@skjeves
Copy link

skjeves commented Oct 12, 2023

Hi guys,
I am preparing a paper related to dataset vs catalogue referencing for the upcoming S-100 meeting. Not sure to how large degree it is related to your discussion here, but I think there may be commonalities. If you would have a look it would be much appreciated. Do you agree this relationship is not covered yet in S-100 5.1.0, or have we missed something?
Catalogue and dataset versioning .docx

@skjeves
Copy link
Author

skjeves commented Oct 12, 2023

From Tom Richardson:
@skjeves thanks for progressing this we didn't get time for my versioning paper at PT11 which also touches on this. I agree that a dataset must reference a single FC. But for Portrayal Catalogues I think this constraint is counter productive and limits the machine readability concept. For example introduction of a new symbol will take many years to reach all cells (many ENCs have not had new editions for 10 years) and reissuing data creates nugatory effort. Allowing Portrayal Catalogues to work with multiple versions of a feature catalogue offers greater flexibility and ensures that data presents consistently. Some rules would need to be in place for this but this works for AML for example as that Presentation Library supports multiple editions and Feature Catalogues.
Whatever route is chosen S-98 needs to be more clear on this. So I will try and review the paper in detail.

@skjeves
Copy link
Author

skjeves commented Oct 12, 2023

From Jonathan Pritchard:
I think a general rule for backwards compatibility isn't well described anywhere - it would need documenting in S-98 (and I believe there is a starting point there already). NIWC have raised this on a number of occasions and there's an issue for it in S-164 already as well. I think the general rule is (although I'm not sure this is properly written down anywhere (yet)):

FC with version X.Y.Z+1 should also be backward compatible with X.Y.Z 100% but it becomes more murky when X.Y+1.* is defined.

I would definitely welcome this being better defined and will review paper above. S100WG would be a good opportunity to get it sorted....

ah, I pressed send at the same time as @TomRichardson6 :-).. Agree needs better defn (and testing in S-164 - we have tests for multiple FC/Dataset versions but could use some work. v1.2 gives us something concrete to work with.

@skjeves
Copy link
Author

skjeves commented Oct 12, 2023

From David Grant:
I'm all for better requirements, but a PC should not target multiple FC's. The PC can be updated at any time to change symbology/colors/etc. - it doesn't require an updated dataset or FC. However, when the FC is updated the PC and data must also be updated. Depending on the scope of the changes, the older datasets can be brought into alignment by releasing an update to the older PC in parallel with pushing the latest revision.
DG

@skjeves, it isn't necessary to add featureCatalogueVersion and portrayalCatalogueVersion to S100_ProductSpecification. S100_CatalogueDiscoveryMetadata already provides productSpecification and we had intended the S100_ProductSpecification productIdentifier field to uniquely identify a version of a product spec. The attribute could be used to tie the catalogs and datasets together, but the registry has yet to be updated to support this, and there is also no supporting test data; we had to proceed with an alternate implementation so that we could move on with testing.
We currently link datasets to FC's via the DSID/PRSP subfield in the 8211 encoding. In a GML encoding this would be the productIdentifier field (Table 10b-4), and in HDF-5 you could look at productSpecification (Table 10c-6). My understanding is that the FC version is supposed to align with the Product Spec version, although I don't think there is currently a written requirement we can reference. We establish a match when the first two version elements match. So, INT.IHO.S-101.1.1 matches FC 1.1.x, INT.IHO.S-101.1.2 matches FC 1.2.x, etc.
The ECDIS doesn't really care what the version of the product specification is. It just wants to associate each dataset with a FC, and each PC with a FC. The association of a dataset to a PC is indirect (through the FC). Our preference would be:

• Assign an MRN to each FC by adding an element to the FC. S100FC:mrnurn:mrn:iho:fc:...</S100FC:mrn>
• Provide a required field in each encoding which stores the MRN of the FC used to produce the data.
• The exchange set creator could copy the MRN from the encoded dataset to a field in the XS discovery metadata.
o Minimizes risk of human error
• Each PC would reference its associated FC via MRN:
Unfortunately, I think it's too late to make this change in S-100 v5. Since v5 will be used for production, there will be no point in making the change in v6.

@skjeves
Copy link
Author

skjeves commented Oct 12, 2023

From Svein Skjaeveland:
@DavidGrant-NIWC I think your table demonstrates well how you envisage to update older PCs to match new ones - it should be proposed. If this becomes an agreed upon mechanism it must be described - is S-98 the correct place for that?
This is the table describing different versioning use cases:
Table

If I understand correctly the association between dataset and PC is not relevant - this is an indirect relationship defined in the FC.
What happens then when you have an uptick in PC version without uptick in FC? (use case 4 and 8 in the table)

"it isn't necessary to add featureCatalogueVersion and portrayalCatalogueVersion to S100_ProductSpecification. S100_CatalogueDiscoveryMetadata already provides productSpecification and we had intended the S100_ProductSpecification productIdentifier field to uniquely identify a version of a product spec. The attribute could be used to tie the catalogs and datasets together, but the registry has yet to be updated to support this, and there is also no supporting test data; we had to proceed with an alternate implementation so that we could move on with testing".
For use cases where catalogues can be updated independently of the product specification, this may work with the caveat (defined in S-98 (?)) that the newest catalogue must be used. S100_CatalogueDiscoveryMetadata - S100_ProductSpecification - productIdentifier field would then be equal in different versions of the catalogue. (use case 2, 3, 4, 7 and 8 in the table).

"We currently link datasets to FC's via the DSID/PRSP subfield in the 8211 encoding. In a GML encoding this would be the productIdentifier field (Table 10b-4), and in HDF-5 you could look at productSpecification (Table 10c-6). My understanding is that the FC version is supposed to align with the Product Spec version, although I don't think there is currently a written requirement we can reference. We establish a match when the first two version elements match. So, INT.IHO.S-101.1.1 matches FC 1.1.x, INT.IHO.S-101.1.2 matches FC 1.2.x, etc"
I think your understanding of alignment between FC version and ProdSpec version is not correct. (use case 2,3 and 7 in the table).

If your preferred solution is assigning MRNs to FC and dataset encodings, I think you should propose it nevertheless. We have made other suggestions that has been determined not suitable before S-100 v6 - and although v5 will be used for production I guess continous development of the S-100 framework will take place eventually (especially based on all the topics still to be uncovered when people start implementing support for v5).

@RohdeBSH
Copy link

Hi @skjeves,

Please correct me if I am wrong. But this does not apply to the HDF5 product specifications.
In the S-102 for example, the dataset is uniquely linked to the product specification version via an attribute field in the header and therefore also to the FC and the PC.

S102_ed2 2 0_Table8_PS-Version
S102_HDF5_RootGroup_PS-Version

It would be important to me that the different data formats are taken into account in the considerations. I'm not interested in having the HDF5 product specifications changed simply because the S101 doesn't work properly.

@skjeves
Copy link
Author

skjeves commented Oct 12, 2023

@RohdeBSH I think it applies also to the HDF5 prodspecs. The reference you refer to do not take into account the scenarios/use cases described earlier in this discussion:
Table

I would believe that also for HDF5 specs you could have similar changes to PC/FC that does not require an uptick in version numbering for DPS but only upticks the versioning of FC/PC (at least theoretical).

@RohdeBSH
Copy link

I got that already, but I can't agree with your table.
In my opinion, the first two numbers of the version should always be synchronous.
Therefore, I think ref. 4 & 5 of the table are not quite correct.

Ref. 4:
When a symbol is changed, it is a correction and not a new feature. From my point of view it should behave the same as in Ref. 8. Thus Ref. 4 would then be superfluous.

Ref. 5:
A change can of course be more than a correction. The correction of a wrong validation check must not lead to a change of the middle version number. From my point of view, the case is not dealt with in the table at all.
The introduction of a new validation check may lead to a change of the middle version number. However, the question is whether this should be allowed if the PS itself has not changed. Example: A validation check was forgotten. Then it is rather a correction (third position of the version number).

@DavidGrant-NIWC
Copy link

DavidGrant-NIWC commented Oct 13, 2023

I also think the table has some errors. I agree with @RohdeBSH that:

  • the first two numbers of the version should always be synchronous
  • Ref. 4 and 8 are equivalent.
    • In addition, I believe Ref. 2 and 3 are equivalent.

I added a ref to account for editorial changes to the PS.

Ref. Type of change Example DPS DCEG FC PC Validation
1 Major change includes an S-100 version change New concept in S-100 used by S-101 ❌.0.0 ❌.0.0 ❌.0.0 ❌.0.0 ❌.0.0
2, 3 New content Attribute value / Feature added or removed, A.❌.0 A.❌.0 A.❌.0 A.❌.0 A.❌.0
4, 8 PC change or correction Add a symbol, correct a rule, change a color A.B.- A.B.- A.B.- A.B.❌ A.B.-
5 Validation change (possibly resulting from ref. 6) Additional check non critical A.B.- A.B.- A.B.- A.B.- A.B.❌
6 Encoding change Change of guidance to cover new real world concept consistently, editorial change A.B.- A.B.❌ A.B.- A.B.- A.B.-
7 FC correction, no content or portrayal impact Correct attribute which is not evaluated by PC A.B.- A.B.- A.B.❌ A.B.- A.B.-
9 DPS correction/clarifications Editorial change A.B.❌ A.B.- A.B.- A.B.- A.B.-

Notes:

  1. It's probably worth pointing out that Ref. 1 likely also requires application updates to both production and end-user systems.
  2. The DPS is also a version identifier for the datasets because the dataset encodes the version of the PS. It would probably be better to encode the version of the DCEG or the FC, but none of the encodings currently support that.

@skjeves
Copy link
Author

skjeves commented Oct 13, 2023

@TomRichardson6 You provided the table initially, do you have any viewpoints on Daniels and Davids input?

@TomRichardson6
Copy link

@skjeves sorry for the slow response here. The table was never fully discussed or finalized I propose that it is tabled again at S-101PT12 but ultimately this needs to be agreed for S-100/S-97 for consistency across all products.

I do not agree with Daniel that the first two version elements must always be synchronous this means that despite the feature catalogue being machine readable a full major version of the product specification would be required to enable that meaning that the concept of making changes to the specifications using plug and play catalogues is not realized.

Maybe I'm misunderstanding, the updated version provided by David seems reasonable to me.

Obviously I'm hopeful that the S-100WG8 meeting resolved this and I missed that!

@DavidGrant-NIWC
Copy link

I do not agree with Daniel that the first two version elements must always be synchronous this means that despite the feature catalogue being machine readable a full major version of the product specification would be required to enable that meaning that the concept of making changes to the specifications using plug and play catalogues is not realized.

My table incorporated Daniel's comments. I agree with him that the first two version numbers should be synchronized.

See note 2 on the bottom of the table which explains why the version of the DPS must remain in sync:

  1. The DPS is also a version identifier for the datasets because the dataset encodes the version of the PS. It would probably be better to encode the version of the DCEG or the FC, but none of the encodings currently support that.

Obviously I'm hopeful that the S-100WG8 meeting resolved this and I missed that!

Yes and no. My understanding is that they decided that all numbers should be synchronized. This is unrealistic because it requires updating everything anytime a correction needs to be made to any artifact. Alternatively, it promotes publication of corrections without updating the version number, which is a horrible idea.

@TomRichardson6
Copy link

Thanks Dave well that's an awful approach and puts us back to where we were with S-57 before the UOC was unfrozen and for example would constrain changes to validation checks as we do now with S-58. I will look out for that in the actions and seek to challenge that. I'm concerned that some parties are still seeking to increase complexity here and this is reaching the point where S-57 ECDIS will end up looking much more attractive! I hope that's not the case.

My hope is that through S-164 scenarios will be developed and tested which can then be used to validate the rules in the table. At the moment I expect to use S-101 1.1.0 and 1.2.0 data in the ShoreECDIS presented with the same PC, I see no logical reason why that should not be possible. I will seek to discuss this with JP.

@DavidGrant-NIWC
Copy link

It's possible I misunderstood, but the versioning issue is something that certainly needs to be resolved ASAP.

@DavidGrant-NIWC
Copy link

DavidGrant-NIWC commented Feb 21, 2024

At the moment I expect to use S-101 1.1.0 and 1.2.0 data in the ShoreECDIS presented with the same PC, I see no logical reason why that should not be possible.

A PC is written with explicit knowledge of how the data model described by the FC is set up. Between the 1.1 and 1.2 FC there were wholesale changes to feature names and the relationships between associated feature/information types.

The PC assumes that the data being fed into it was produced using a compatible FC. The PC has no way to discover if data was produced on a different FC. I suppose a context parameter could be added, but this would tremendously complicate PC development / maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants