-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporarily disable region snapshot replacement #6728
Temporarily disable region snapshot replacement #6728
Conversation
Until read-only region reference counting is implemented, region snapshot replacement should be disabled - it is currently the only thing that creates read-only regions.
RegionSnapshotReplacementGarbageCollect { | ||
datastore, | ||
sagas, | ||
disabled: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be
disabled: false, | |
disabled: true, |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, the answer's yes: 93fe946
nexus/src/app/background/init.rs
Outdated
@@ -778,7 +778,8 @@ impl BackgroundTasksInitializer { | |||
"detect if region snapshots need replacement and begin the \ | |||
process", | |||
period: config.region_snapshot_replacement_start.period_secs, | |||
task_impl: Box::new(RegionSnapshotReplacementDetector::new( | |||
// XXX temporarily disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an issue this could reference on top of XXX
? I don't feel strongly about this if it's going to be reenabled "soon" though, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crucible volumes are created by layering read-write regions over a hierarchy of read-only resources. Originally only a region snapshot could be used as a read-only resource for a volume. With the introduction of read-only regions (created during the region snapshot replacement process) this is no longer true! Read-only resources can be used by many volumes, and because of this they need to have a reference count so they can be deleted when they're not referenced anymore. The region_snapshot table uses a `volume_references` column, which counts how many uses there are. The region table does not have this column, and more over a simple integer works for reference counting but does not tell you _what_ volume that use is from. This can be determined (see omdb's validate volume references command) but it's information that is tossed out, as Nexus knows what volumes use what resources! Instead, record what read-only resources a volume uses in a new table. As part of the schema change to add the new `volume_resource_usage` table, a migration is included that will create the appropriate records for all region snapshots. In testing, a few bugs were found: the worst being that read-only regions did not have their read_only column set to true. This would be a problem if read-only regions are created, but they're currently only created during region snapshot replacement. To detect if any of these regions were created, find all regions that were allocated for a snapshot volume: SELECT id FROM region WHERE volume_id IN (SELECT volume_id FROM snapshot); A similar bug was found in the simulated Crucible agent. This commit also reverts oxidecomputer#6728, enabling region snapshot replacement again - it was disabled due to a lack of read-only region reference counting, so it can be enabled once again.
Crucible volumes are created by layering read-write regions over a hierarchy of read-only resources. Originally only a region snapshot could be used as a read-only resource for a volume. With the introduction of read-only regions (created during the region snapshot replacement process) this is no longer true! Read-only resources can be used by many volumes, and because of this they need to have a reference count so they can be deleted when they're not referenced anymore. The region_snapshot table uses a `volume_references` column, which counts how many uses there are. The region table does not have this column, and more over a simple integer works for reference counting but does not tell you _what_ volume that use is from. This can be determined (see omdb's validate volume references command) but it's information that is tossed out, as Nexus knows what volumes use what resources! Instead, record what read-only resources a volume uses in a new table. As part of the schema change to add the new `volume_resource_usage` table, a migration is included that will create the appropriate records for all region snapshots. In testing, a few bugs were found: the worst being that read-only regions did not have their read_only column set to true. This would be a problem if read-only regions are created, but they're currently only created during region snapshot replacement. To detect if any of these regions were created, find all regions that were allocated for a snapshot volume: SELECT id FROM region WHERE volume_id IN (SELECT volume_id FROM snapshot); A similar bug was found in the simulated Crucible agent. This commit also reverts #6728, enabling region snapshot replacement again - it was disabled due to a lack of read-only region reference counting, so it can be enabled once again.
Until read-only region reference counting is implemented, region snapshot replacement should be disabled - it is currently the only thing that creates read-only regions.