Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a list of the individual items failing validation - linked from the validation page as an aid to data users #4

Open
ghost opened this issue Jun 23, 2020 · 7 comments

Comments

@ghost
Copy link

ghost commented Jun 23, 2020

From openactive-archive/conformance-services#14:

In an ideal world, a list of the individual items failing validation would also be kept and linked from the validation page as an aid to data users.

It would be good to be slightly clearer about the user need here; please discuss

Note we'll keep this issue on results from the validation library at the moment; there may be an issue about results from the data profiles later.

@ghost
Copy link
Author

ghost commented Jun 23, 2020

Having asked for more discussion on user stories, I must go to a technical note for a min before returning to user stories.

From openactive-archive/conformance-services#14 (comment)

one option is that a specific link to the validator which includes the item in the feed can be constructed as follows:
https://validator.openactive.io/?url={url}&rpdeId={rpdeId}

I see a problem here; I did some experiments and the URL passed must be exactly the same URL that contains the data; so if a publisher has many records and hence id XYZ is on page 3 of the feed, and you pass id XYZ and the base URL it won't find it. ie https://validator.openactive.io/?url=https%3A%2F%2Fbookwhen.com%2Fapi%2Fopenactive%2Fsessionseries&rpdeId=vd8vxf4ejdip fails. vd8vxf4ejdip is several pages in. I can't see a good answer here that doesn't get complex or have edge cases.

Instead, we maybe store the results from the validation library ourself and can use them? This means there might be cases where there is a lag of data being published and results being calculated but that problem applies to the whole of this project and we will look at that.

In this case, there are several options we could do here:

  • Just pick 20 records and show validation errors from there, on the assumption that people just want a quick summary?
  • Don't show in web, but make an API end point available that directly lists all the results. On the assumption people want to go over every single error.
  • Reorder by type of errors seen; so calculate and say "5 types of error were seen in validation results; here are the ID's the fail the 1st one (20%); here are the ID's the fail the 2nd one (50%)". On the assumption that the people looking will just want a list of the types of errors they need to be looking out for.

It really depends on the user story (I said I'd come back to that 😄) so please comment on that.

@nickevansuk
Copy link

nickevansuk commented Jun 23, 2020

If helpful here's a very common scenario:

  • Often the feed will pass for say 80-90% of items, and some conditional logic in the publisher causes errors in the remaining 10-20%. Historically a major limitation with the validator has been that just validating the first 10 "updated" items of the first page of the feed does not give a good indication of whether the whole feed is conformant.
  • Ideally it will now be possible for both the publisher of the data, and any consumers using the data, to easily see where in the feed validation errors are occurring.
  • A summary that simply says "80% conformant" does not give developers any easy next step to solve the problem, so the requirement here is to make it easy for developers to jump to where in the feed the conformance issues have been identified.
  • Often a conversation arises between a data consumer and a publisher about an issue of non-conformance that is preventing them from using the data, and so allowing everyone to easily jump to the data in question is important.

The idea of linking to the validator was to make it really easy for a developer to jump into an environment where all validation errors are enumerated for a specific item (as often multiple validation errors tend to arise together), and where the feed JSON can be edited to test any potential fix... however that is certainly just one potential solution.

The limitation of linking directly to the feed is exactly as you've outlined - it's a real-time feed so any link could possibly get out-of-date if items move (the error for "item not found" in the validator reflects this). Although some feeds are updated very frequently (e.g. ScheduledSessions feeds), many feeds - which also generally contain more properties and hence more potential for error (e.g. FacilityUses and SessionSeries) - are updated much less frequently. Hence the link becoming out-of-date is not so much of an issue in many cases.

"Reorder by type of errors seen" would fit my understanding of what most developers are looking for here: "I've got some conditional logic in my feed that's triggering for certain items and causing issues, I want to know what main issues there are in my feed, and example items I can jump to in order to understand what's going on."

Note that if there's an error based on conditional logic it could affect a very large number of items in the feed, so a list of IDs could be very large. Perhaps listing a representative sample of IDs that contained a particular error across the feed would be most useful? Items near the beginning of the feed are less likely to be updated, so provide a stable link, however items near the end of the feed are likely using the latest data (in the case where e.g. a data migration has occurred and all items before a certain date used a different schema).

@ghost
Copy link
Author

ghost commented Jun 23, 2020

Thanks Nick;

One question; on openactive-archive/conformance-services#14 we discuss filtering to severity === "failure". Would the same apply here? Would we not even bother storing ones which aren't "failure"?

@ghost
Copy link
Author

ghost commented Jun 23, 2020

"Reorder by type of errors seen"

What would the logic be that detects that an error was one we had or had not seen before? If type, category and severity where the same? Or include message in that?

@nickevansuk
Copy link

Yes "passing" above refers to no errors returned with severity === "failure".

The challenge is that all recommended property validation errors return a warning, which means that almost every feed will return a large volume of warning errors as there are very few feeds that implement all recommended properties.

So we'd greatly reduce storage requirements by only storing severity === "failure".

@nickevansuk
Copy link

Possible grouping options for this:

  • severity - always failure
  • type - generic type of error
  • rule - rule used to generate the error
  • message - specific variation of the error, which will be the same even if the same error appears in two places
  • path - where in the JSON the error occurs

Suggest grouping by both message and path might yield the most intuitive results, as that would show something like this (though only a few results as there are fewer failures than other types of error as above): https://openactive.io/openactive-test-suite/controlled/multiple-sellers_conflicting-seller_ScheduledSession.html#validations

@robredpath
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants