You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenSearch interface: maximum value for ‘(page - 1) * maxRecords + index - 1’ will be set to 10 000, where by deafult maxRecords = 20, page = 1 and index = 1; maximum value for ‘index’ will be set to 10001
For example, this search query reproduces the issue,
The impact on our system is that our "link fetcher" application encounters an error for the last few batches of link fetching. For example, recent executions have found a total number of results of ~10,500. Our link fetcher pulls a maximum of 100 results at one time so we end up requesting an index=10001, resulting in a 400 "bad request" error, Input should be less than or equal to 10001.
The impact of this is that we do not fetch the links for the last few hundred result items, causing potentially missing granules.
Resolution
There are at least a few approaches to mitigate this issue that we could take. In the longer term we should be able to avoid this issue entirely by switching to the "granule created" Subscriptions API (see PR for implementation).
To mitigate this issue, there are at least 2 styles of approaches we could take,
Refine our search to prevent >10,000 search results
A relatively straightforward way to do this would be to include the platform=[S2A | S2B | S2C] in our query
Pro:
Splitting the query by platform is relatively trivial and would "just work" once Sentinel-2C begins regular processing operations.
Con:
This solution requires relatively more work in additional link fetching orchestration
When necessary, grow our maxRecords=[int] to encompass all remaining search results
e.g., If we have 10,500 total results, our link fetcher would expand the maxRecords once we hit 10,000 to encompass the (totalResults - currentIndex) such that the final search request we perform finishes reading all records
The limit of our maxRecords parameter is be 2,000 based on the 400 "BadRequest" we get when trying to grow this parameter (Input should be less than or equal to 2000.)
Pro:
This would require the least amount of change to our current setup
Con:
This is pretty fragile as it relies on an assumption that the total number of search results will be less than 12,000. For example this would probably break if Sentinel-2A, -2B, and -2C are all producing granules at the same time.
I suggest option 2, but even simpler: just bump maxRecords from 100 to 2000 and be done with it -- no need to add logic to "grow" the value at the end.
Given that this adjustment is a stop-gap measure until we flip the switch to the new subscription-based solution, this should hopefully be the only thing we need to do until then.
If for some reason we bump up against the 12K limit before we make the switch, we can revisit this at that time. For the moment, I don't think the extra effort for a more complicated solution is necessary.
Thanks @chuckwondo! I wasn't sure if there was a reason to keep the current request limit so low (100) so I was inclined to keep that, but I don't see a technical reason why we couldn't use the max allowed limit (2000). The query takes a bit longer (~10 sec vs ~2 sec) but I don't think we'd ever have our Lambda function timeout because the ~10 seconds is well within the 60 second "bail early" threshold. The higher limit we'd send fewer requests to ESA which should be less stressful for their system considering limit/offset pagination requires DBs to read & discard results, so we'd have that operation happen fewer times
I'll have a PR up shortly, might need to update integration tests but I've updated all the unit tests already
Background
On November 12th the ESA OpenSearch Catalog API introduced a change to how the search pagination works that impacts our "link fetcher" scheduled search,
https://documentation.dataspace.copernicus.eu/APIs/Others/UpcomingChanges.html#catalogue-api-change-parameters-limits
Specifically for the OpenSearch endpoint we use,
For example, this search query reproduces the issue,
Impact
The impact on our system is that our "link fetcher" application encounters an error for the last few batches of link fetching. For example, recent executions have found a total number of results of ~10,500. Our link fetcher pulls a maximum of 100 results at one time so we end up requesting an
index=10001
, resulting in a 400 "bad request" error,Input should be less than or equal to 10001.
The impact of this is that we do not fetch the links for the last few hundred result items, causing potentially missing granules.
Resolution
There are at least a few approaches to mitigate this issue that we could take. In the longer term we should be able to avoid this issue entirely by switching to the "granule created" Subscriptions API (see PR for implementation).
To mitigate this issue, there are at least 2 styles of approaches we could take,
platform=[S2A | S2B | S2C]
in our querymaxRecords=[int]
to encompass all remaining search resultsmaxRecords
once we hit 10,000 to encompass the(totalResults - currentIndex)
such that the final search request we perform finishes reading all recordsmaxRecords
parameter is be 2,000 based on the 400 "BadRequest" we get when trying to grow this parameter (Input should be less than or equal to 2000.
)@sharkinsspatial and @chuckwondo might have other suggestions for ways to fix this!
Acceptance Criteria
The text was updated successfully, but these errors were encountered: