Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform inventory fetch queries in batches #4932

Merged
merged 10 commits into from
Jan 31, 2024
Merged

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Jan 30, 2024

When querying the database for inventory data, we read a large number of
rows from a lot of different tables. Therefore, we want to ensure that
there is always an upper bound on the amount of data read per query.
Currently, an upper bound is ensured using limits: we add a limit to
the query, and if more rows would be returned, we fail the query. This
means that we can ensure an upper bound. However, the issue with this is
that the behavior when the upper bound is reached is to fail --- this
means that it's possible to select an upper bound that's too small, and
upon hitting that upper bound, we just fail...and there's no way to know
whether the selected limit is big enough until we hit it at runtime.
Instead of using hard limits, it's preferable to use pagination. By
paginating these queries, we still set an upper bound on the number of
rows returned by a single query, but rather than failing if that bound
is reached, we instead perform multiple queries.

This branch changes inventory fetch queries to use pagination with a
batch size limit, rather than a hard limit on query size, using the
pagination utilities added in #4632. I've replaced the
inventory_collection_read_best_effort and
inventory_collection_read_all_or_nothing methods, which limit the size
of the query, with a new inventory_collection_read_batched method,
which uses the limit parameter as a maximum size per batch. I've then
changed the existing inventory_get_latest_collection method to use
inventory_collection_read_batched rather than
inventory_collection_read_all_or_nothing.

Closes #4629

@hawkw hawkw requested a review from davepacheco January 30, 2024 19:52
nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/inventory.rs Outdated Show resolved Hide resolved
@@ -1939,18 +2047,6 @@ mod test {
assert_eq!(collection2.cabooses.len(), coll_counts.cabooses);
assert_eq!(collection2.rot_pages.len(), coll_counts.rot_pages);

// Check that we get an error on the limit being reached for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a test that uses a batch size of 1 and makes sure we get the same result for some reasonably representative collection? (That way we know the pagination stuff is working.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we follow your advice from above about not making the batch size
configurable, we'd have to factor out a private function that takes a batch size
parameter...do you think it's worth doing that just for testing this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's easy enough to do and worthwhile. I've been burned by a surprising number of pagination bugs in the past!

dev-tools/omdb/src/bin/omdb/db.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good! Were you also going to add the new test?

nexus/src/app/deployment.rs Show resolved Hide resolved
@hawkw
Copy link
Member Author

hawkw commented Jan 31, 2024

The changes look good! Were you also going to add the new test?

Yup, I just didn't get to it yesterday. Planning to add that today!

@davepacheco
Copy link
Collaborator

Oops -- didn't mean to jump the gun!

hawkw added 2 commits January 31, 2024 11:23
It just set the default batch size, which we now do with the public API.
@hawkw hawkw force-pushed the eliza/paginate-inventory branch from 02f39d2 to abac2a0 Compare January 31, 2024 19:25
@hawkw
Copy link
Member Author

hawkw commented Jan 31, 2024

@davepacheco added a test as of abac2a0 --- let me know what you think?

@hawkw hawkw enabled auto-merge (squash) January 31, 2024 20:15
@hawkw hawkw merged commit 98180bf into main Jan 31, 2024
20 checks passed
@hawkw hawkw deleted the eliza/paginate-inventory branch January 31, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inventory fetch should query database in batches
2 participants