-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After validating normalised data, how should we produce aggregrate stats for the status site? #14
Comments
We'll go for four categories: 'Conformant', 'Core', 'Accessibility', and 'Social Prescribing'. The profiles for each of these consist essentially of a list of attributes; testing for these will involve (i) establishing whether an attribute is populated; by We can check this using JSON-LD. The end output will be a percentage value for the number of records that satisfy these conditions. In other words, we'll want to display four columns on the status page. If there are 100 records, and:
... then we should end up with a series of columns next to the dataset link with '100', '80', '60', '3' I don't think there's any value in weighting particular attributes or counting partial satisfaction of the profiles. That way lies madness. |
Sorry, have just realised after discussion with @nickevansuk that this really only deals with items after normalisation. Stats should ideally also be kept of items failing validation prior to normalisation - again, expressed as a percentage. And with warnings left unregistered. In an ideal world, a list of the individual items failing validation would also be kept and linked from the validation page as an aid to data users. |
To add some further detail to this, you'll want to filter something like See validator integration from test suite for more info: https://github.com/openactive/openactive-test-suite/blob/master/packages/openactive-integration-tests/test/shared-behaviours/validation.js#L94 |
To provide "a list of the individual items failing validation would also be kept and linked from the validation page as an aid to data users", as @thill-odi mentions above, one option is that a specific link to the validator which includes the item in the feed can be constructed as follows: For example:
Note that the validator only validates the first 10 non-deleted items in any RPDE page that's provided, so the When you're using the validator programmatically, RPDE items should be validated individually (i.e. the |
Thanks for many replies - this touches on a lot and is interesting. I'm going to move a lot of things out to other issues tho, and be strict about keeping this on track on the original question. Hope that's ok.
So Conformant is the results from the validation library and the other 3 are data profiles? Because these come from different mechanisms i'd like to deal with them differently - I'll deal with data profiles in another ticket soon.
Moved to openactive-archive/conformance-status-page#4
To be clear: We should be running the validation library against the raw data we download, the un-normalised data? And calculating stats for that. So, we take the results and filter ...
Can you be clear which one it is? Then on the status page; show a % of how many records pass - ie have no validation library results against them after filtering. When calculating the % and counting the total records, should it be total all records or just total of records that aren't deletes? Probably the latter I assume. |
Filter |
Also on the other points:
Yes, so that publishers can fix issues
Suggest ignoring deleted records |
Q: In the case where a publisher has multiple feeds (eg https://onlinebooking.1610.org.uk/OpenActive/ Slot, FacilityUse, ..... ) should we calculate the stat per Publisher or per each feed for that publisher? |
not sure how it's presented, guess it depends on the UI - "% of data published that is conformant" would work per-publisher, but as in openactive-archive/conformance-status-page#4 they need to get to the detail of which feeds have errors, (so a % conformance per-feed could be useful?) and example pages/items within the feeds that exhibit the errors Ideally we want the headline of every publisher being 100% conformant (though this is unlikely to be the case on day 1 of this tool going live) |
We put normalised data through https://github.com/openactive/data-model-validator and store the results in the database.
Then, how should we produce aggregate stats for the status site for each publisher?
In openactive/data-model-validator#349 I noted there can be different values of "severity" for instance - should we filter some of those out?
Ultimately, what does the user want to see on the status page when considering validation stats?
Thanks
The text was updated successfully, but these errors were encountered: