You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue relates to #476, so please read the description for that issue first. That issue should be completed before completing this one as that informs the user what the problem is that they need to resolve, whereas this issue gives the user info that may help them resolve that problem.
So, with #476 complete, the Test Suite user now knows that a particular timeout has happened because either:
An Opportunity was not correctly updated in the Opportunity RPDE feed with an expected capacity change (e.g. capcity would be expected to increase after a successful cancellation).
An Order or OrderProposal was not correctly updated in either of their respective RPDE feeds after some action is invoked.
(these points correspond with the scenarios described in #476)
From working with many different integrations, I can confirm that a very common issue that comes up is a broken implementation of RPDE. Here are some very common examples of RPDE implementation issues which can cause this kind of timeout error in tests:
Opportunity/Order/OrderProposals are correctly inserted into the feed, but, when they are updated, they are not pushed to the end of the feed.
The paging itself is broken, such that, by following the next URL of each RPDE page (which is what it should do, according to the RPDE spec), Broker Microservice will miss some items
With scenario 2, the issue could very simply be that the capacity was not updated to the correct number.
In any of these cases, the user would have a profoundly better ability to diagnose what happened if they could see the progress that Broker Microservice made through their RPDE feed.
High Level Solution Design
When an RPDE issue occurs, of the kind that #476 should more properly clarify to the user as an RPDE issue, the user should be shown some information which informs them:
What RPDE pages Broker Microservice visited in the appropriate RPDE feed (e.g. the Opportunities RPDE feed when there is an issue with an Assert Opportunity Capacity stage) since it started listening for the RPDE feed update.
Some of the contents of those RPDE pages that it visited. Obviously more content is more useful, but there may be resource-based constraints here. Some key contents that should be included for each page: the next URL; the ID and modified of each item in the page; the fully expanded contents of any items that have the same ID as the one that is being listened for (this is essential for the Assert Opportunity Capacity scenario, and would clearly demonstrate that the Opportunity did update, but to the wrong capacity value).
From Broker Microservice's cache, it's most up-to-date version of the Opportunity/Order/OrderProposal in question. If the item has never been seen by Broker, then this will be made clear.
Starter-for-10 Solution Proposal
Glossary:
Broker-RPDE-Listen-Op (BRLO): Every time Broker Microservice is instructed to listen out for changes to an Order/OrderProposal/Opportunity in its respective RPDE feed. There are a few different API endpoints that Broker exposes for this. To find all of the different ways in which BRLOs happen, look to the Broker API endpoints that get called by each of the following FlowStages: packages/openactive-integration-tests/test/helpers/flow-stages/fetch-opportunities.js, packages/openactive-integration-tests/test/helpers/flow-stages/order-feed-update.js and packages/openactive-integration-tests/test/helpers/flow-stages/assert-opportunity-capacity.js.
The Proposal:
More information is always better, so what if Broker Microservice just stored a copy of every RPDE page that it fetched while performing a BRLO.
Specifically, Broker Microservice would simply save every fetched RPDE page to a new file. It could then associate each BRLO with both a first page (which could be either the 1st page fetched after a BRLO is initialized or the last page fetched before the BRLO is initialized) and a last page, which would be set once the BRLO has completed (i.e. the item has been found or a timeout has occurred).
As long as these files are stored sequentially, a first page and last page should be sufficient. A user can start at the first page and check out successive pages until the last page (for the BRLO in question) is reached
The test output should include, for every stage which requires setting up a BRLO, a link to the first and last pages (in the local filesystem) of the respective RPDE page. So, this may look like: (first page) ../../openactive-broker-microservice/output/rpde-pages/orders/primary/18.json; (last page) ../../openactive-broker-microservice/output/rpde-pages/orders/primary/39.json. It would also need some way of showing the previous state of the item that was being searched for, in a way that doesn't clutter the results (e.g. maybe it's a hidden section that can be toggled to visible by clicking something)
A stretch goal or perhaps a goal for a subsequent issue might be to create a web page which simplifies the process, for a user, of looking through a given set of pages. This could be a route in Broker, which could be accessed like http://localhost:3000/rpde-feed-viewer?feedType=orders&auth=primary&firstPage=18&lastPage=39, which just renders a page at a time and provides "next" and "back" buttons for flicking through.
Things to look out for:
This proposal will create a LOT of files as RPDE feeds can be large and Broker polls quite aggressively. Creating a huge amount of files may take up too much space, or hit the inode limit in a linux system, and otherwise add a performance overhead to every RPDE page fetch
The vast majority of these will just be repeated polls of the (current) last page of the feed, so an obvious optimisation would be only saving a new page if it's different from the last
To aid in tracing visibility, each cached RPDE page should contain any useful additional info like the timestamp when the page was fetched and the page's URL
Implementing this issue may also help identify nuanced logic errors within the test suite itself (#545), though this is hard to confirm without doing this
lukehesluke
changed the title
[DRAFT] Add RPDE tracing to test suite to help debug most common error in RPDE implementation
Add RPDE tracing to help debug common errors in RPDE implementation
Mar 22, 2024
This issue relates to #476, so please read the description for that issue first. That issue should be completed before completing this one as that informs the user what the problem is that they need to resolve, whereas this issue gives the user info that may help them resolve that problem.
So, with #476 complete, the Test Suite user now knows that a particular timeout has happened because either:
(these points correspond with the scenarios described in #476)
From working with many different integrations, I can confirm that a very common issue that comes up is a broken implementation of RPDE. Here are some very common examples of RPDE implementation issues which can cause this kind of timeout error in tests:
next
URL of each RPDE page (which is what it should do, according to the RPDE spec), Broker Microservice will miss some itemsIn any of these cases, the user would have a profoundly better ability to diagnose what happened if they could see the progress that Broker Microservice made through their RPDE feed.
High Level Solution Design
When an RPDE issue occurs, of the kind that #476 should more properly clarify to the user as an RPDE issue, the user should be shown some information which informs them:
next
URL; the ID andmodified
of each item in the page; the fully expanded contents of any items that have the same ID as the one that is being listened for (this is essential for the Assert Opportunity Capacity scenario, and would clearly demonstrate that the Opportunity did update, but to the wrong capacity value).Starter-for-10 Solution Proposal
Glossary:
The Proposal:
../../openactive-broker-microservice/output/rpde-pages/orders/primary/18.json
; (last page)../../openactive-broker-microservice/output/rpde-pages/orders/primary/39.json
. It would also need some way of showing the previous state of the item that was being searched for, in a way that doesn't clutter the results (e.g. maybe it's a hidden section that can be toggled to visible by clicking something)http://localhost:3000/rpde-feed-viewer?feedType=orders&auth=primary&firstPage=18&lastPage=39
, which just renders a page at a time and provides "next" and "back" buttons for flicking through.Things to look out for:
Implementing this issue may also help identify nuanced logic errors within the test suite itself (#545), though this is hard to confirm without doing this
This issue was spawned from #607
The text was updated successfully, but these errors were encountered: