inventory fetch should query database in batches #4629

davepacheco · 2023-12-06T16:55:12Z

For context, see #4621. The list-uninitialized-sleds code was hardcoding a database limit of "50" to read the latest inventory collection, but that isn't large enough for even dogfood, let alone a full rack. It's easy to bump the limit (and we will for now -- see #4626), but the required limit could change in the future as we build out more inventory.

At the very least, we should write a test that will fail if the limit that we use isn't enough to read a full rack's standard inventory. But with #4632 we have a relatively easy way to query in batches and we should use that instead.

davepacheco · 2023-12-06T17:36:53Z

Background

The inventory process generates a collection that describes everything in the system. That starts with baseboards and includes cabooses and other data about fixed software components (RoT slots, SP slots, etc.).

When some code wants to use inventory data, it reads a complete collection. This involves reading a bunch of rows from a bunch of different inventory tables. In general, when querying database tables, we use a limit because lots of bad things can happen if the query returns a lot more rows than we expected.

Limits, pagination

What limit should we use? There are two options: use a limit large enough to read everything in one query (per table) or pick any old limit and implement client-side pagination (by "client-side" here, I mean "database client-side", which is Nexus).

Pagination sounds like the obvious answer. I have generally avoided doing client-side pagination within Nexus but not for great reasons. I don't think we have a facility for doing this generally. I'm not sure how easy it would be to build. And it's not great to copy the (non-trivial) boilerplate to every place that needs it. Here, my assumption was that if these collections actually get large, we'll probably want to re-think whether it even makes sense to read the entire collection in one go.

Plus, for the foreseeable future of just one rack, it's hard to imagine any of these tables containing so much data that we actually need to paginate it. Even if we implemented pagination, if we chose a typical limit of say 1000 records, it seems likely that the code would only ever make one query. I figured a pragmatic approach would be: assume that will be the case (by not implementing pagination but checking if we hit the limit) and fail gracefully if that turns out to be wrong. That's how we got here.

The crux of the problem right now is: this limit is currently chosen by app-level code paths, and that if the limit is too small, we won't find out until runtime.

Solution: commonize and test the limit

A good bang-for-the-buck improvement would be:

Move the limit selection to a common code path. (Example: make the limit argument to the inventory_read_collection family of functions optional. Have the functions themselves choose a limit we think is big enough for a full rack.)
Add a test that creates a full-rack-sized inventory and reads it with the standard function and default limit.

This would be a big improvement because now it won't be the case that some app-specific code path could fail unexpectedly at runtime due to a limit that's too small. We'll essentially be testing that the limit we use in all code paths is big enough.

To be clear, it's still not great because this code doesn't really know how big the limit should be either. But if it's wrong, we should find out in the test suite.

Solution: implement client-side pagination

I concluded earlier that this not worth it because the data's just not that large and I think it's going to be tricky to make a general-purpose pagination facility. But this sort of thing has come up in enough places that it's probably worth writing that general-purpose facility. Plus, even if it doesn't solve all the scaling issues as collections get large, it would avoid us having to worry about sizing this limit.

Solution: drop the limit

In principle we could just stop using a limit. I don't like this. It implicitly assumes that the results won't be that large. If they are, it could faily very badly. If we're going to bet the limit isn't that large, let's put our money where our mouth is, pick a specific number, and use that as the limit.

davepacheco · 2023-12-06T20:18:29Z

I took a swing at pagination in #4632. Assuming that lands, we could use that when fetching inventory.

When querying the database for inventory data, we read a large number of rows from a lot of different tables. Therefore, we want to ensure that there is always an upper bound on the amount of data read per query. Currently, an upper bound is ensured using _limits_: we add a limit to the query, and if more rows would be returned, we fail the query. This means that we can ensure an upper bound. However, the issue with this is that the behavior when the upper bound is reached is to *fail* --- this means that it's possible to select an upper bound that's too small, and upon hitting that upper bound, we just fail...and there's no way to know whether the selected limit is big enough until we hit it at runtime. Instead of using hard limits, it's preferable to use _pagination_. By paginating these queries, we still set an upper bound on the number of rows returned by a single query, but rather than failing if that bound is reached, we instead perform multiple queries. This branch changes inventory fetch queries to use pagination with a batch size limit, rather than a hard limit on query size, using the pagination utilities added in #4632. I've replaced the `inventory_collection_read_best_effort` and `inventory_collection_read_all_or_nothing` methods, which limit the size of the query, with a new `inventory_collection_read_batched` method, which uses the limit parameter as a maximum size per batch. I've then changed the existing `inventory_get_latest_collection` method to use `inventory_collection_read_batched` rather than `inventory_collection_read_all_or_nothing`. Closes #4629

This was referenced Dec 6, 2023

500 error when attempting to list uninitialized sleds #4621

Closed

add a helper for querying the database in batches #4632

Merged

davepacheco changed the title ~~limits used for inventory collection could use work~~ inventory fetch should query database in batches Jan 4, 2024

hawkw self-assigned this Jan 30, 2024

hawkw mentioned this issue Jan 30, 2024

Perform inventory fetch queries in batches #4932

Merged

hawkw closed this as completed in #4932 Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inventory fetch should query database in batches #4629

inventory fetch should query database in batches #4629

davepacheco commented Dec 6, 2023 •

edited

Loading

davepacheco commented Dec 6, 2023

davepacheco commented Dec 6, 2023

inventory fetch should query database in batches #4629

inventory fetch should query database in batches #4629

Comments

davepacheco commented Dec 6, 2023 • edited Loading

davepacheco commented Dec 6, 2023

Background

Limits, pagination

Solution: commonize and test the limit

Solution: implement client-side pagination

Solution: drop the limit

davepacheco commented Dec 6, 2023

davepacheco commented Dec 6, 2023 •

edited

Loading