Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add set_quick_repair() #893

Merged
merged 6 commits into from
Nov 17, 2024
Merged

Add set_quick_repair() #893

merged 6 commits into from
Nov 17, 2024

Conversation

mconst
Copy link
Contributor

@mconst mconst commented Nov 10, 2024

Here's the PR for #878!

This adds a new durability level above Paranoid that doesn't require repair. For now, it's just called "ParanoidPlus" -- not the most descriptive name, but it at least makes it very clear where it falls on the scale of commit speed vs. recoverability. If you have a better name, let me know 😄

To avoid repair, Durability::ParanoidPlus commits need to save the allocator state somewhere. We can't use the region headers, because we'd be overwriting them in place; we might crash partway through the overwrite, and then we'd need repair. So we instead save the allocator state to a new table in the system tree. Writing to the table is slightly tricky, because it needs to be done without allocating (see below), but other than that it's a perfectly ordinary transactional write with all the usual guarantees.

The other requirement to avoid repair is knowing whether the last transaction used 2-phase commit. For this, we add a new two_phase_commit bit to the god byte, which is always updated atomically along with swapping the primary bit. Old redb versions will ignore the new flag when reading and clear it when writing, which is exactly what we want.

This turns out to also fix a longstanding bug where Durability::Paranoid hasn't been providing any security benefit at all. The checksum forgery attack described in the Durability::Immediate documentation actually works equally well against Durability::Paranoid! The problem is that even though 2-phase commit guarantees the primary is valid, redb ignores the primary flag when repairing. It always picks whichever commit slot is newer, as long as the checksum is valid. So if you crash partway through a commit, it'll try to recover using the partially-written secondary rather than the fully-written primary, regardless of the durability mode.

The fix for this is exactly the two_phase_commit bit described above. After a crash, we check whether the last transaction used 2-phase commit; if so, we only look at the primary (which is guaranteed to be valid) and ignore the secondary. Durability::ParanoidPlus needs this check anyway for safety, so we get the Durability::Paranoid bug fix for free.

To write to the allocator state table without allocating, I've introduced a new insert_inplace() function. It's similar to insert_reserve(), but more general and maybe simpler. To use it, you have to first do an ordinary insert() with your desired key and a value of the appropriate length; then later in the same transaction you can call insert_inplace() to replace the value with a new one. Unlike insert_reserve(), this works with values that don't implement MutInPlaceValue, and it lets you hold multiple reservations simultaneously.

insert_inplace() could be safely exposed to users, but I don't think there's any reason to. Since it doesn't give you a mutable reference, there's no benefit over insert() unless you're storing data that cares about its own position in the database. So for now it's private, and I haven't bothered making a new error type for it; it just panics if you don't satisfy the preconditions.

The fuzzer is perfect for testing Durability::ParanoidPlus, because it can simulate a crash, reopen the database (skipping repair if possible), and then verify that the resulting allocator state exactly matches what would happen if it ran a full repair. I've updated the fuzzer to generate Durability::ParanoidPlus commits along with the existing Durability::None and Durability::Immediate.

Now that commit() has its own in-memory copy of the database header,
there's no need for the special-case swap_primary flag
/// Security considerations: Many hard disk drives and SSDs do not actually guarantee that data
/// has been persisted to disk after calling `fsync`. Even with this commit level, an attacker
/// with a high degree of control over the database's workload, including the ability to cause
/// the database process to crash, can cause the database to crash with the god byte primary bit
/// pointing to an invalid commit slot, leaving the database in an invalid, potentially attacker-controlled state.
Paranoid,
/// Like [`Durability::Paranoid`], but also saves the allocator state as part of each commit.
/// This means repair is no longer required when opening the database after a crash.
ParanoidPlus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest Enclosed here to convey that this writes more data inline to make the resulting persistent structure self-contained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this!

Yeah, Enclosed would work, although I think I might still prefer ParanoidPlus. My feeling is neither of those names really gives the user any hope of guessing the actual semantics, but ParanoidPlus at least makes it clear that it's even more durable than Paranoid, whereas Enclosed could be anywhere on the durability scale.

@cberner
Copy link
Owner

cberner commented Nov 10, 2024 via email

@mconst
Copy link
Contributor Author

mconst commented Nov 11, 2024

Sounds good! There's no hurry; enjoy your trip!

@cberner
Copy link
Owner

cberner commented Nov 12, 2024

Ok, read through most of it now. One question. Do you think this quick repair approach can be used for 1-phase commit transactions too?

My thinking is that you need to save the allocator state and use 2-phase commit to get really fast repairs, because you need 2PC to avoid validating all the checksums. However, if you have the allocator state it seems like we can still accelerate a repair under 1-phase commit. It would still validate all the checksums, but then if it found the allocator state it could skip rebuilding the allocator state which requires a second scan of the file.

Does that seem like it works to you?

If so, then I think we should add a new method like WriteTransaction::set_quick_repair(bool) instead of adding this new Durability level, and it would work with both Immediate and Paranoid. That solves the naming issue too :)

@mconst
Copy link
Contributor Author

mconst commented Nov 13, 2024

Yeah, I like the idea of having this be a setting separate from the durability level.

I'm not sure we want to support using it with 1-phase commit, though. It seems like a footgun -- if you call set_quick_repair(true) but forget to also do set_durability(Durability::Paranoid), your code will look like it's working, but when you actually try to recover from a crash, it'll be slow.

If we want to speed up repair with 1-phase commit, I feel like the better solution would be to verify the checksums and reconstruct the allocator state in a single pass. That would give the same speed benefit -- and it would work for all 1-phase commit users, without requiring them to opt into a new mode or pay the cost of writing out the allocator state with each commit.

@cberner
Copy link
Owner

cberner commented Nov 14, 2024

If we want to speed up repair with 1-phase commit, I feel like the better solution would be to verify the checksums and reconstruct the allocator state in a single pass. That would give the same speed benefit -- and it would work for all 1-phase commit users, without requiring them to opt into a new mode or pay the cost of writing out the allocator state with each commit.

Ah yes, good point, let's not support it for 1-phase commits.

Ok, so the naming options that seem appealing to me are:

  1. Durability::ParanoidPlus
  2. Durability::ParanoidWithQuickRepair or something like that
  3. set_quick_repair(bool) where passing true also sets the durability level to Paranoid, and false leaves the durability level alone but disables quick repair.

(2) or (3) seem more descriptive than (1) to me. What do you think is most clear?

@mconst
Copy link
Contributor Author

mconst commented Nov 14, 2024

I think my preference at the moment is (3), but (2) also sounds fine!

If we go with (3), do you think it might be worth splitting 2-phase commit out into its own flag too? This would let people use Durability::Eventual with 2-phase commit and with quick-repair, both of which seem like reasonable combinations. We'd still support Durability::Paranoid for compatibility, but it would become an alias for Durability::Immediate with set_two_phase_commit(true). I think this makes the durability levels a bit more intuitive -- we'd be left with just None, Eventual, and Immediate, which are all very clear names that describe their semantics well (whereas the semantics of Paranoid have never really been clear from the name).

@cberner
Copy link
Owner

cberner commented Nov 15, 2024

That sounds good to me!

Instead of actually making a commit, do_repair() should just return the
new roots that would be committed. This makes check_integrity() much
faster, since it no longer needs to verify the checksums twice, and
it means commit() can always trim the database without worrying about
whether it'll change the allocator hash.

It also allows check_integrity() to avoid making an unnecessary commit
if the database is clean, which is nice with quick-repair. It means a
successful check_integrity() won't invalidate the allocator state table,
so you can still do a quick-repair afterwards
Refactoring in preparation for quick-repair, which needs to be able to
call these separately
This adds a new quick-repair mode, which gives instant recovery after a
crash at the cost of slower commits.

To make this work, each commit with quick-repair enabled needs to save
the allocator state somewhere. We can't use the region headers, because
we'd be overwriting them in place; we might crash partway through the
overwrite, and then we'd need a full repair. So we instead save the
allocator state to a new table in the system tree. Writing to the table
is slightly tricky, because it needs to be done without allocating (see
below), but other than that it's a perfectly ordinary transactional write
with all the usual guarantees.

The other requirement to avoid full repair is knowing whether the last
transaction used 2-phase commit. For this, we add a new two_phase_commit
bit to the god byte, which is always updated atomically along with
swapping the primary bit. Old redb versions will ignore the new flag
when reading and clear it when writing, which is exactly what we want.

This turns out to also fix a longstanding bug where 2-phase commit hasn't
been providing any security benefit at all. The checksum forgery attack
described in the documentation for 1-phase commit actually works equally
well against 2-phase commit! The problem is that even though 2-phase
commit guarantees the primary is valid, redb ignores the primary flag
when repairing. It always picks whichever commit slot is newer, as long
as the checksum is valid. So if you crash partway through a commit,
it'll try to recover using the partially-written secondary rather than
the fully-written primary, regardless of the commit strategy.

The fix for this is exactly the two_phase_commit bit described above.
After a crash, we check whether the last transaction used 2-phase commit;
if so, we only look at the primary (which is guaranteed to be valid) and
ignore the secondary. Quick-repair needs this check anyway for safety,
so we get the bug fix for free.

To write to the allocator state table without allocating, I've introduced
a new insert_inplace() function. It's similar to insert_reserve(), but
more general and maybe simpler. To use it, you have to first do an
ordinary insert() with your desired key and a value of the appropriate
length; then later in the same transaction you can call insert_inplace()
to replace the value with a new one. Unlike insert_reserve(), this works
with values that don't implement MutInPlaceValue, and it lets you hold
multiple reservations simultaneously.

insert_inplace() could be safely exposed to users, but I don't think
there's any reason to. Since it doesn't give you a mutable reference,
there's no benefit over insert() unless you're storing data that cares
about its own position in the database. So for now it's private, and I
haven't bothered making a new error type for it; it just panics if you
don't satisfy the preconditions.

The fuzzer is perfect for testing quick-repair, because it can simulate
a crash, reopen the database (using quick-repair if possible), and then
verify that the resulting allocator state exactly matches what would
happen if it ran a full repair. I've modified the fuzzer to generate
quick-repair commits in addition to ordinary commits.
@mconst mconst changed the title Add new durability level that doesn't require repair Add set_quick_repair() Nov 16, 2024
@mconst
Copy link
Contributor Author

mconst commented Nov 16, 2024

Okay, I've updated the code to use set_two_phase_commit() and set_quick_repair(). The old Durability::Paranoid is now deprecated in favor of the former, but still supported for compatibility. Let me know if this looks like what you had in mind!

Copy link
Owner

@cberner cberner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I had one comment

if self.quick_repair {
self.two_phase_commit = true;
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed it and this is handled somewhere else, but I think we should also add

if self.two_phase_commit && self.durability == InternalDurability::None {
  self.durability = InternalDurability::Immediate;
}

Otherwise a user could try to set both Durability::None and 2PC, but the security benefit wouldn't be achieved unless the next durable commit is also 2PC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unsafe even with Durability::Immediate.

Suppose your application makes some commits that contain attacker-controlled data, and others that don't. You might think you can use 2-phase commit just for the former, but that's not good enough! The attacker's data is present in the database, so they can cause a crash at the moment when it's about to be overwritten by a new (non-attacker-controlled) page, leaving the new page pointer referring to the attacker-controlled data instead. Since we're already assuming they can predict the database layout and the content of the data they don't control, they can forge the checksum to make it appear valid.

If you want the security of 2-phase commit, you need to use it for every commit during which the attacker could cause a crash, which is probably all of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I agree that 2-phase commit with Durability::None isn't really a meaningful combination, and I'd be okay with disallowing it. (I'd rather do that than silently upgrade to Durability::Immediate, because the user presumably cares about commit speed if they're requesting Durability::None; we should tell them how to get that speed rather than silently doing the wrong thing.)

But my preference is to allow the combination, even though it doesn't do anything, for exactly the reason mentioned above: the right way to use 2-phase commit is to enable it on every single commit, so we may as well make that easy. The same thing applies to quick-repair -- if you're using quick-repair, you probably want it on every commit, so it seems convenient to allow it even on the Durability::None commits where it has no effect.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ya, you're right! Security is hard, haha

@cberner cberner merged commit c66447a into cberner:master Nov 17, 2024
3 checks passed
@cberner
Copy link
Owner

cberner commented Nov 17, 2024

Merged. Thanks! This is awesome

@mconst
Copy link
Contributor Author

mconst commented Nov 17, 2024

Yay, thanks for taking the time to review all of this!

There's no hurry, but if you decide you're interested in doing the transition we were talking about in #894 to eventually get rid of the old allocator state, let me know. I've got code to do the extra quick-repair commit on shutdown, which I can submit as another PR if you want it.

@mconst mconst deleted the paranoidplus branch November 17, 2024 04:19
@cberner
Copy link
Owner

cberner commented Nov 17, 2024

Oh ya, it'd be great to get rid of that old code path. Yes, please send that PR

@raphjaph raphjaph mentioned this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants