Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inventory fetch should query database in batches #4629

Closed
davepacheco opened this issue Dec 6, 2023 · 2 comments · Fixed by #4932
Closed

inventory fetch should query database in batches #4629

davepacheco opened this issue Dec 6, 2023 · 2 comments · Fixed by #4932
Assignees

Comments

@davepacheco
Copy link
Collaborator

davepacheco commented Dec 6, 2023

For context, see #4621. The list-uninitialized-sleds code was hardcoding a database limit of "50" to read the latest inventory collection, but that isn't large enough for even dogfood, let alone a full rack. It's easy to bump the limit (and we will for now -- see #4626), but the required limit could change in the future as we build out more inventory.

At the very least, we should write a test that will fail if the limit that we use isn't enough to read a full rack's standard inventory. But with #4632 we have a relatively easy way to query in batches and we should use that instead.

@davepacheco
Copy link
Collaborator Author

Background

The inventory process generates a collection that describes everything in the system. That starts with baseboards and includes cabooses and other data about fixed software components (RoT slots, SP slots, etc.).

When some code wants to use inventory data, it reads a complete collection. This involves reading a bunch of rows from a bunch of different inventory tables. In general, when querying database tables, we use a limit because lots of bad things can happen if the query returns a lot more rows than we expected.

Limits, pagination

What limit should we use? There are two options: use a limit large enough to read everything in one query (per table) or pick any old limit and implement client-side pagination (by "client-side" here, I mean "database client-side", which is Nexus).

Pagination sounds like the obvious answer. I have generally avoided doing client-side pagination within Nexus but not for great reasons. I don't think we have a facility for doing this generally. I'm not sure how easy it would be to build. And it's not great to copy the (non-trivial) boilerplate to every place that needs it. Here, my assumption was that if these collections actually get large, we'll probably want to re-think whether it even makes sense to read the entire collection in one go.

Plus, for the foreseeable future of just one rack, it's hard to imagine any of these tables containing so much data that we actually need to paginate it. Even if we implemented pagination, if we chose a typical limit of say 1000 records, it seems likely that the code would only ever make one query. I figured a pragmatic approach would be: assume that will be the case (by not implementing pagination but checking if we hit the limit) and fail gracefully if that turns out to be wrong. That's how we got here.

The crux of the problem right now is: this limit is currently chosen by app-level code paths, and that if the limit is too small, we won't find out until runtime.

Solution: commonize and test the limit

A good bang-for-the-buck improvement would be:

  • Move the limit selection to a common code path. (Example: make the limit argument to the inventory_read_collection family of functions optional. Have the functions themselves choose a limit we think is big enough for a full rack.)
  • Add a test that creates a full-rack-sized inventory and reads it with the standard function and default limit.

This would be a big improvement because now it won't be the case that some app-specific code path could fail unexpectedly at runtime due to a limit that's too small. We'll essentially be testing that the limit we use in all code paths is big enough.

To be clear, it's still not great because this code doesn't really know how big the limit should be either. But if it's wrong, we should find out in the test suite.

Solution: implement client-side pagination

I concluded earlier that this not worth it because the data's just not that large and I think it's going to be tricky to make a general-purpose pagination facility. But this sort of thing has come up in enough places that it's probably worth writing that general-purpose facility. Plus, even if it doesn't solve all the scaling issues as collections get large, it would avoid us having to worry about sizing this limit.

Solution: drop the limit

In principle we could just stop using a limit. I don't like this. It implicitly assumes that the results won't be that large. If they are, it could faily very badly. If we're going to bet the limit isn't that large, let's put our money where our mouth is, pick a specific number, and use that as the limit.

@davepacheco
Copy link
Collaborator Author

I took a swing at pagination in #4632. Assuming that lands, we could use that when fetching inventory.

@davepacheco davepacheco changed the title limits used for inventory collection could use work inventory fetch should query database in batches Jan 4, 2024
@hawkw hawkw self-assigned this Jan 30, 2024
hawkw added a commit that referenced this issue Jan 31, 2024
When querying the database for inventory data, we read a large number of
rows from a lot of different tables. Therefore, we want to ensure that
there is always an upper bound on the amount of data read per query.
Currently, an upper bound is ensured using _limits_: we add a limit to
the query, and if more rows would be returned, we fail the query. This
means that we can ensure an upper bound. However, the issue with this is
that the behavior when the upper bound is reached is to *fail* --- this
means that it's possible to select an upper bound that's too small, and
upon hitting that upper bound, we just fail...and there's no way to know
whether the selected limit is big enough until we hit it at runtime.
Instead of using hard limits, it's preferable to use _pagination_. By
paginating these queries, we still set an upper bound on the number of
rows returned by a single query, but rather than failing if that bound
is reached, we instead perform multiple queries.

This branch changes inventory fetch queries to use pagination with a
batch size limit, rather than a hard limit on query size, using the
pagination utilities added in #4632. I've replaced the
`inventory_collection_read_best_effort` and
`inventory_collection_read_all_or_nothing` methods, which limit the size
of the query, with a new `inventory_collection_read_batched` method,
which uses the limit parameter as a maximum size per batch. I've then
changed the existing `inventory_get_latest_collection` method to use
`inventory_collection_read_batched` rather than
`inventory_collection_read_all_or_nothing`.

Closes #4629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants