-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inventory fetch should query database in batches #4629
Comments
BackgroundThe inventory process generates a collection that describes everything in the system. That starts with baseboards and includes cabooses and other data about fixed software components (RoT slots, SP slots, etc.). When some code wants to use inventory data, it reads a complete collection. This involves reading a bunch of rows from a bunch of different inventory tables. In general, when querying database tables, we use a limit because lots of bad things can happen if the query returns a lot more rows than we expected. Limits, paginationWhat limit should we use? There are two options: use a limit large enough to read everything in one query (per table) or pick any old limit and implement client-side pagination (by "client-side" here, I mean "database client-side", which is Nexus). Pagination sounds like the obvious answer. I have generally avoided doing client-side pagination within Nexus but not for great reasons. I don't think we have a facility for doing this generally. I'm not sure how easy it would be to build. And it's not great to copy the (non-trivial) boilerplate to every place that needs it. Here, my assumption was that if these collections actually get large, we'll probably want to re-think whether it even makes sense to read the entire collection in one go. Plus, for the foreseeable future of just one rack, it's hard to imagine any of these tables containing so much data that we actually need to paginate it. Even if we implemented pagination, if we chose a typical limit of say 1000 records, it seems likely that the code would only ever make one query. I figured a pragmatic approach would be: assume that will be the case (by not implementing pagination but checking if we hit the limit) and fail gracefully if that turns out to be wrong. That's how we got here. The crux of the problem right now is: this limit is currently chosen by app-level code paths, and that if the limit is too small, we won't find out until runtime. Solution: commonize and test the limitA good bang-for-the-buck improvement would be:
This would be a big improvement because now it won't be the case that some app-specific code path could fail unexpectedly at runtime due to a limit that's too small. We'll essentially be testing that the limit we use in all code paths is big enough. To be clear, it's still not great because this code doesn't really know how big the limit should be either. But if it's wrong, we should find out in the test suite. Solution: implement client-side paginationI concluded earlier that this not worth it because the data's just not that large and I think it's going to be tricky to make a general-purpose pagination facility. But this sort of thing has come up in enough places that it's probably worth writing that general-purpose facility. Plus, even if it doesn't solve all the scaling issues as collections get large, it would avoid us having to worry about sizing this limit. Solution: drop the limitIn principle we could just stop using a limit. I don't like this. It implicitly assumes that the results won't be that large. If they are, it could faily very badly. If we're going to bet the limit isn't that large, let's put our money where our mouth is, pick a specific number, and use that as the limit. |
I took a swing at pagination in #4632. Assuming that lands, we could use that when fetching inventory. |
When querying the database for inventory data, we read a large number of rows from a lot of different tables. Therefore, we want to ensure that there is always an upper bound on the amount of data read per query. Currently, an upper bound is ensured using _limits_: we add a limit to the query, and if more rows would be returned, we fail the query. This means that we can ensure an upper bound. However, the issue with this is that the behavior when the upper bound is reached is to *fail* --- this means that it's possible to select an upper bound that's too small, and upon hitting that upper bound, we just fail...and there's no way to know whether the selected limit is big enough until we hit it at runtime. Instead of using hard limits, it's preferable to use _pagination_. By paginating these queries, we still set an upper bound on the number of rows returned by a single query, but rather than failing if that bound is reached, we instead perform multiple queries. This branch changes inventory fetch queries to use pagination with a batch size limit, rather than a hard limit on query size, using the pagination utilities added in #4632. I've replaced the `inventory_collection_read_best_effort` and `inventory_collection_read_all_or_nothing` methods, which limit the size of the query, with a new `inventory_collection_read_batched` method, which uses the limit parameter as a maximum size per batch. I've then changed the existing `inventory_get_latest_collection` method to use `inventory_collection_read_batched` rather than `inventory_collection_read_all_or_nothing`. Closes #4629
For context, see #4621. The list-uninitialized-sleds code was hardcoding a database limit of "50" to read the latest inventory collection, but that isn't large enough for even dogfood, let alone a full rack. It's easy to bump the limit (and we will for now -- see #4626), but the required limit could change in the future as we build out more inventory.
At the very least, we should write a test that will fail if the limit that we use isn't enough to read a full rack's standard inventory. But with #4632 we have a relatively easy way to query in batches and we should use that instead.
The text was updated successfully, but these errors were encountered: