Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Make most transactions automatically retry #4487

Merged
merged 29 commits into from
Dec 6, 2023
Merged

[nexus] Make most transactions automatically retry #4487

merged 29 commits into from
Dec 6, 2023

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Nov 11, 2023

Integrates automatic transaction retry into Nexus for most transactions.

Additionally, this PR provides a "RetryHelper" object to help standardize how transaction retry is performed.
Currently, after a short randomized wait (up to an upper bound), we retry unconditionally, emitting each
attempt to Oximeter for further analysis.

Part of https://github.com/oxidecomputer/customer-support/issues/46
Part of #3814

@smklein smklein changed the title WIP: First attempt at automated txn retry [nexus] Make most transactions automatically retry Nov 16, 2023
Cargo.toml Outdated Show resolved Hide resolved
nexus/db-model/src/sled.rs Show resolved Hide resolved
nexus/db-queries/Cargo.toml Show resolved Hide resolved
nexus/db-queries/src/db/collection_attach.rs Show resolved Hide resolved
nexus/db-queries/src/db/datastore/mod.rs Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Outdated Show resolved Hide resolved
@smklein smklein marked this pull request as ready for review November 16, 2023 18:06
Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused on the Oximeter integration portions in transaction_retry.rs. I've a few suggestions and comments, but overall this looks pretty solid. Thanks for wrangling this massive diff!

nexus/db-queries/src/transaction_retry.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/transaction_retry.rs Show resolved Hide resolved
Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look with a focus on some of the parts I've been in more (rack & network_interface) and looks good overall. A couple of comments but not blockers


Ok((ips, dns_zones))
})
.transaction_async_with_retry(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the retry aside for a moment, the only caller for nexus_external_addresses is Nexus::silo_create which later calls Datastore::silo_create that does its own transaction. Feels like these queries should just be part of that transaction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that these could become a single transaction, but I'm not 100% sure they should be -- there's logic to construct DNS updates via DnsVersionUpdateBuilder in between these two transactions, and by combining them, we're forcing that logic to happen in a transactional context.

nexus/db-queries/src/db/datastore/rack.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/address_lot.rs Outdated Show resolved Hide resolved
@smklein smklein merged commit b3d641a into main Dec 6, 2023
22 checks passed
@smklein smklein deleted the txn-retry branch December 6, 2023 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants