Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporarily disable region snapshot replacement #6728

Merged

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Sep 30, 2024

Until read-only region reference counting is implemented, region snapshot replacement should be disabled - it is currently the only thing that creates read-only regions.

Until read-only region reference counting is implemented, region
snapshot replacement should be disabled - it is currently the only thing
that creates read-only regions.
RegionSnapshotReplacementGarbageCollect {
datastore,
sagas,
disabled: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

Suggested change
disabled: false,
disabled: true,

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the answer's yes: 93fe946

@@ -778,7 +778,8 @@ impl BackgroundTasksInitializer {
"detect if region snapshots need replacement and begin the \
process",
period: config.region_snapshot_replacement_start.period_secs,
task_impl: Box::new(RegionSnapshotReplacementDetector::new(
// XXX temporarily disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue this could reference on top of XXX? I don't feel strongly about this if it's going to be reenabled "soon" though, up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6353 references it, so I've added that in e0efa1b

@jmpesp jmpesp enabled auto-merge (squash) September 30, 2024 20:19
@jmpesp jmpesp merged commit 469522d into oxidecomputer:main Sep 30, 2024
16 checks passed
@jmpesp jmpesp deleted the disable_region_snapshot_replacement_for_now branch October 1, 2024 13:44
jmpesp added a commit to jmpesp/omicron that referenced this pull request Oct 9, 2024
Crucible volumes are created by layering read-write regions over a
hierarchy of read-only resources. Originally only a region snapshot
could be used as a read-only resource for a volume. With the
introduction of read-only regions (created during the region snapshot
replacement process) this is no longer true!

Read-only resources can be used by many volumes, and because of this
they need to have a reference count so they can be deleted when they're
not referenced anymore. The region_snapshot table uses a
`volume_references` column, which counts how many uses there are. The
region table does not have this column, and more over a simple integer
works for reference counting but does not tell you _what_ volume that
use is from. This can be determined (see omdb's validate volume
references command) but it's information that is tossed out, as Nexus
knows what volumes use what resources! Instead, record what read-only
resources a volume uses in a new table.

As part of the schema change to add the new `volume_resource_usage`
table, a migration is included that will create the appropriate records
for all region snapshots.

In testing, a few bugs were found: the worst being that read-only
regions did not have their read_only column set to true. This would be a
problem if read-only regions are created, but they're currently only
created during region snapshot replacement. To detect if any of these
regions were created, find all regions that were allocated for a
snapshot volume:

    SELECT id FROM region
    WHERE volume_id IN (SELECT volume_id FROM snapshot);

A similar bug was found in the simulated Crucible agent.

This commit also reverts oxidecomputer#6728, enabling region snapshot replacement
again - it was disabled due to a lack of read-only region reference
counting, so it can be enabled once again.
jmpesp added a commit that referenced this pull request Nov 12, 2024
Crucible volumes are created by layering read-write regions over a
hierarchy of read-only resources. Originally only a region snapshot
could be used as a read-only resource for a volume. With the
introduction of read-only regions (created during the region snapshot
replacement process) this is no longer true!

Read-only resources can be used by many volumes, and because of this
they need to have a reference count so they can be deleted when they're
not referenced anymore. The region_snapshot table uses a
`volume_references` column, which counts how many uses there are. The
region table does not have this column, and more over a simple integer
works for reference counting but does not tell you _what_ volume that
use is from. This can be determined (see omdb's validate volume
references command) but it's information that is tossed out, as Nexus
knows what volumes use what resources! Instead, record what read-only
resources a volume uses in a new table.

As part of the schema change to add the new `volume_resource_usage`
table, a migration is included that will create the appropriate records
for all region snapshots.

In testing, a few bugs were found: the worst being that read-only
regions did not have their read_only column set to true. This would be a
problem if read-only regions are created, but they're currently only
created during region snapshot replacement. To detect if any of these
regions were created, find all regions that were allocated for a
snapshot volume:

    SELECT id FROM region
    WHERE volume_id IN (SELECT volume_id FROM snapshot);

A similar bug was found in the simulated Crucible agent.

This commit also reverts #6728, enabling region snapshot replacement
again - it was disabled due to a lack of read-only region reference
counting, so it can be enabled once again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants