Add set_quick_repair() #893

mconst · 2024-11-10T01:24:47Z

Here's the PR for #878!

This adds a new durability level above Paranoid that doesn't require repair. For now, it's just called "ParanoidPlus" -- not the most descriptive name, but it at least makes it very clear where it falls on the scale of commit speed vs. recoverability. If you have a better name, let me know 😄

To avoid repair, Durability::ParanoidPlus commits need to save the allocator state somewhere. We can't use the region headers, because we'd be overwriting them in place; we might crash partway through the overwrite, and then we'd need repair. So we instead save the allocator state to a new table in the system tree. Writing to the table is slightly tricky, because it needs to be done without allocating (see below), but other than that it's a perfectly ordinary transactional write with all the usual guarantees.

The other requirement to avoid repair is knowing whether the last transaction used 2-phase commit. For this, we add a new two_phase_commit bit to the god byte, which is always updated atomically along with swapping the primary bit. Old redb versions will ignore the new flag when reading and clear it when writing, which is exactly what we want.

This turns out to also fix a longstanding bug where Durability::Paranoid hasn't been providing any security benefit at all. The checksum forgery attack described in the Durability::Immediate documentation actually works equally well against Durability::Paranoid! The problem is that even though 2-phase commit guarantees the primary is valid, redb ignores the primary flag when repairing. It always picks whichever commit slot is newer, as long as the checksum is valid. So if you crash partway through a commit, it'll try to recover using the partially-written secondary rather than the fully-written primary, regardless of the durability mode.

The fix for this is exactly the two_phase_commit bit described above. After a crash, we check whether the last transaction used 2-phase commit; if so, we only look at the primary (which is guaranteed to be valid) and ignore the secondary. Durability::ParanoidPlus needs this check anyway for safety, so we get the Durability::Paranoid bug fix for free.

To write to the allocator state table without allocating, I've introduced a new insert_inplace() function. It's similar to insert_reserve(), but more general and maybe simpler. To use it, you have to first do an ordinary insert() with your desired key and a value of the appropriate length; then later in the same transaction you can call insert_inplace() to replace the value with a new one. Unlike insert_reserve(), this works with values that don't implement MutInPlaceValue, and it lets you hold multiple reservations simultaneously.

insert_inplace() could be safely exposed to users, but I don't think there's any reason to. Since it doesn't give you a mutable reference, there's no benefit over insert() unless you're storing data that cares about its own position in the database. So for now it's private, and I haven't bothered making a new error type for it; it just panics if you don't satisfy the preconditions.

The fuzzer is perfect for testing Durability::ParanoidPlus, because it can simulate a crash, reopen the database (skipping repair if possible), and then verify that the resulting allocator state exactly matches what would happen if it ran a full repair. I've updated the fuzzer to generate Durability::ParanoidPlus commits along with the existing Durability::None and Durability::Immediate.

Now that commit() has its own in-memory copy of the database header, there's no need for the special-case swap_primary flag

adamreichold · 2024-11-10T06:56:52Z

src/transactions.rs

    /// Security considerations: Many hard disk drives and SSDs do not actually guarantee that data
    /// has been persisted to disk after calling `fsync`. Even with this commit level, an attacker
    /// with a high degree of control over the database's workload, including the ability to cause
    /// the database process to crash, can cause the database to crash with the god byte primary bit
    /// pointing to an invalid commit slot, leaving the database in an invalid, potentially attacker-controlled state.
    Paranoid,
+    /// Like [`Durability::Paranoid`], but also saves the allocator state as part of each commit.
+    /// This means repair is no longer required when opening the database after a crash.
+    ParanoidPlus,


I would suggest Enclosed here to convey that this writes more data inline to make the resulting persistent structure self-contained.

Thanks for looking at this!

Yeah, Enclosed would work, although I think I might still prefer ParanoidPlus. My feeling is neither of those names really gives the user any hope of guessing the actual semantics, but ParanoidPlus at least makes it clear that it's even more durable than Paranoid, whereas Enclosed could be anywhere on the durability scale.

cberner · 2024-11-10T21:28:04Z

Oh wow, this ended up being a lot more work than I'd initially thought. Thanks for the PR! I had been thinking it could work by setting the recovery bit to false, but this is much better. I'm traveling without my computer for the next few days, so it'll be a bit before I can review this in detail. Sent from my phone

…

On Sat, Nov 9, 2024, 10:57 PM Adam Reichold ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/transactions.rs <#893 (comment)>: > /// Security considerations: Many hard disk drives and SSDs do not actually guarantee that data /// has been persisted to disk after calling `fsync`. Even with this commit level, an attacker /// with a high degree of control over the database's workload, including the ability to cause /// the database process to crash, can cause the database to crash with the god byte primary bit /// pointing to an invalid commit slot, leaving the database in an invalid, potentially attacker-controlled state. Paranoid, + /// Like [`Durability::Paranoid`], but also saves the allocator state as part of each commit. + /// This means repair is no longer required when opening the database after a crash. + ParanoidPlus, I would suggest Enclosed here to convey that this writes more data inline to make the resulting persistent structure self-contained. — Reply to this email directly, view it on GitHub <#893 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGNXQDPRVRTNRZAUU2CCMLZ737UXAVCNFSM6AAAAABRPVKIBSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMRVGY3DGNBRGE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mconst · 2024-11-11T08:44:06Z

Sounds good! There's no hurry; enjoy your trip!

cberner · 2024-11-12T20:38:10Z

Ok, read through most of it now. One question. Do you think this quick repair approach can be used for 1-phase commit transactions too?

My thinking is that you need to save the allocator state and use 2-phase commit to get really fast repairs, because you need 2PC to avoid validating all the checksums. However, if you have the allocator state it seems like we can still accelerate a repair under 1-phase commit. It would still validate all the checksums, but then if it found the allocator state it could skip rebuilding the allocator state which requires a second scan of the file.

Does that seem like it works to you?

If so, then I think we should add a new method like WriteTransaction::set_quick_repair(bool) instead of adding this new Durability level, and it would work with both Immediate and Paranoid. That solves the naming issue too :)

mconst · 2024-11-13T08:31:49Z

Yeah, I like the idea of having this be a setting separate from the durability level.

I'm not sure we want to support using it with 1-phase commit, though. It seems like a footgun -- if you call set_quick_repair(true) but forget to also do set_durability(Durability::Paranoid), your code will look like it's working, but when you actually try to recover from a crash, it'll be slow.

If we want to speed up repair with 1-phase commit, I feel like the better solution would be to verify the checksums and reconstruct the allocator state in a single pass. That would give the same speed benefit -- and it would work for all 1-phase commit users, without requiring them to opt into a new mode or pay the cost of writing out the allocator state with each commit.

cberner · 2024-11-14T04:29:39Z

If we want to speed up repair with 1-phase commit, I feel like the better solution would be to verify the checksums and reconstruct the allocator state in a single pass. That would give the same speed benefit -- and it would work for all 1-phase commit users, without requiring them to opt into a new mode or pay the cost of writing out the allocator state with each commit.

Ah yes, good point, let's not support it for 1-phase commits.

Ok, so the naming options that seem appealing to me are:

Durability::ParanoidPlus
Durability::ParanoidWithQuickRepair or something like that
set_quick_repair(bool) where passing true also sets the durability level to Paranoid, and false leaves the durability level alone but disables quick repair.

(2) or (3) seem more descriptive than (1) to me. What do you think is most clear?

mconst · 2024-11-14T05:23:50Z

I think my preference at the moment is (3), but (2) also sounds fine!

If we go with (3), do you think it might be worth splitting 2-phase commit out into its own flag too? This would let people use Durability::Eventual with 2-phase commit and with quick-repair, both of which seem like reasonable combinations. We'd still support Durability::Paranoid for compatibility, but it would become an alias for Durability::Immediate with set_two_phase_commit(true). I think this makes the durability levels a bit more intuitive -- we'd be left with just None, Eventual, and Immediate, which are all very clear names that describe their semantics well (whereas the semantics of Paranoid have never really been clear from the name).

cberner · 2024-11-15T04:06:12Z

That sounds good to me!

Instead of actually making a commit, do_repair() should just return the new roots that would be committed. This makes check_integrity() much faster, since it no longer needs to verify the checksums twice, and it means commit() can always trim the database without worrying about whether it'll change the allocator hash. It also allows check_integrity() to avoid making an unnecessary commit if the database is clean, which is nice with quick-repair. It means a successful check_integrity() won't invalidate the allocator state table, so you can still do a quick-repair afterwards

Refactoring in preparation for quick-repair, which needs to be able to call these separately

This adds a new quick-repair mode, which gives instant recovery after a crash at the cost of slower commits. To make this work, each commit with quick-repair enabled needs to save the allocator state somewhere. We can't use the region headers, because we'd be overwriting them in place; we might crash partway through the overwrite, and then we'd need a full repair. So we instead save the allocator state to a new table in the system tree. Writing to the table is slightly tricky, because it needs to be done without allocating (see below), but other than that it's a perfectly ordinary transactional write with all the usual guarantees. The other requirement to avoid full repair is knowing whether the last transaction used 2-phase commit. For this, we add a new two_phase_commit bit to the god byte, which is always updated atomically along with swapping the primary bit. Old redb versions will ignore the new flag when reading and clear it when writing, which is exactly what we want. This turns out to also fix a longstanding bug where 2-phase commit hasn't been providing any security benefit at all. The checksum forgery attack described in the documentation for 1-phase commit actually works equally well against 2-phase commit! The problem is that even though 2-phase commit guarantees the primary is valid, redb ignores the primary flag when repairing. It always picks whichever commit slot is newer, as long as the checksum is valid. So if you crash partway through a commit, it'll try to recover using the partially-written secondary rather than the fully-written primary, regardless of the commit strategy. The fix for this is exactly the two_phase_commit bit described above. After a crash, we check whether the last transaction used 2-phase commit; if so, we only look at the primary (which is guaranteed to be valid) and ignore the secondary. Quick-repair needs this check anyway for safety, so we get the bug fix for free. To write to the allocator state table without allocating, I've introduced a new insert_inplace() function. It's similar to insert_reserve(), but more general and maybe simpler. To use it, you have to first do an ordinary insert() with your desired key and a value of the appropriate length; then later in the same transaction you can call insert_inplace() to replace the value with a new one. Unlike insert_reserve(), this works with values that don't implement MutInPlaceValue, and it lets you hold multiple reservations simultaneously. insert_inplace() could be safely exposed to users, but I don't think there's any reason to. Since it doesn't give you a mutable reference, there's no benefit over insert() unless you're storing data that cares about its own position in the database. So for now it's private, and I haven't bothered making a new error type for it; it just panics if you don't satisfy the preconditions. The fuzzer is perfect for testing quick-repair, because it can simulate a crash, reopen the database (using quick-repair if possible), and then verify that the resulting allocator state exactly matches what would happen if it ran a full repair. I've modified the fuzzer to generate quick-repair commits in addition to ordinary commits.

mconst · 2024-11-16T03:14:47Z

Okay, I've updated the code to use set_two_phase_commit() and set_quick_repair(). The old Durability::Paranoid is now deprecated in favor of the former, but still supported for compatibility. Let me know if this looks like what you had in mind!

cberner

This looks great! I had one comment

cberner · 2024-11-16T16:39:28Z

src/transactions.rs

+        if self.quick_repair {
+            self.two_phase_commit = true;
+        }
+


I may have missed it and this is handled somewhere else, but I think we should also add

if self.two_phase_commit && self.durability == InternalDurability::None { self.durability = InternalDurability::Immediate; }

Otherwise a user could try to set both Durability::None and 2PC, but the security benefit wouldn't be achieved unless the next durable commit is also 2PC.

I think this is unsafe even with Durability::Immediate.

Suppose your application makes some commits that contain attacker-controlled data, and others that don't. You might think you can use 2-phase commit just for the former, but that's not good enough! The attacker's data is present in the database, so they can cause a crash at the moment when it's about to be overwritten by a new (non-attacker-controlled) page, leaving the new page pointer referring to the attacker-controlled data instead. Since we're already assuming they can predict the database layout and the content of the data they don't control, they can forge the checksum to make it appear valid.

If you want the security of 2-phase commit, you need to use it for every commit during which the attacker could cause a crash, which is probably all of them.

That said, I agree that 2-phase commit with Durability::None isn't really a meaningful combination, and I'd be okay with disallowing it. (I'd rather do that than silently upgrade to Durability::Immediate, because the user presumably cares about commit speed if they're requesting Durability::None; we should tell them how to get that speed rather than silently doing the wrong thing.)

But my preference is to allow the combination, even though it doesn't do anything, for exactly the reason mentioned above: the right way to use 2-phase commit is to enable it on every single commit, so we may as well make that easy. The same thing applies to quick-repair -- if you're using quick-repair, you probably want it on every commit, so it seems convenient to allow it even on the Durability::None commits where it has no effect.

Oh ya, you're right! Security is hard, haha

cberner · 2024-11-17T04:00:01Z

Merged. Thanks! This is awesome

mconst · 2024-11-17T04:18:59Z

Yay, thanks for taking the time to review all of this!

There's no hurry, but if you decide you're interested in doing the transition we were talking about in #894 to eventually get rid of the old allocator state, let me know. I've got code to do the extra quick-repair commit on shutdown, which I can submit as another PR if you want it.

cberner · 2024-11-17T04:28:12Z

Oh ya, it'd be great to get rid of that old code path. Yes, please send that PR

Remove swap_primary, which is no longer needed

6a1cd05

Now that commit() has its own in-memory copy of the database header, there's no need for the special-case swap_primary flag

adamreichold reviewed Nov 10, 2024

View reviewed changes

cberner mentioned this pull request Nov 13, 2024

RFC: File format version 3 #894

Open

mconst added 5 commits November 15, 2024 16:52

Fix typos in header layout docs and Durability descriptions

d4373cc

Add set_two_phase_commit(), deprecate Durability::Paranoid

ee310a8

Move finalize_dirty_checksums() out of flush_table_root_updates()

bff1e43

Refactoring in preparation for quick-repair, which needs to be able to call these separately

mconst force-pushed the paranoidplus branch from f2e1eeb to 810ba8c Compare November 16, 2024 03:10

mconst changed the title ~~Add new durability level that doesn't require repair~~ Add set_quick_repair() Nov 16, 2024

cberner reviewed Nov 16, 2024

View reviewed changes

cberner merged commit c66447a into cberner:master Nov 17, 2024
3 checks passed

mconst deleted the paranoidplus branch November 17, 2024 04:19

raphjaph mentioned this pull request Dec 9, 2024

New release #909

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add set_quick_repair() #893

Add set_quick_repair() #893

mconst commented Nov 10, 2024

adamreichold Nov 10, 2024

mconst Nov 11, 2024

cberner commented Nov 10, 2024 via email

mconst commented Nov 11, 2024

cberner commented Nov 12, 2024

mconst commented Nov 13, 2024

cberner commented Nov 14, 2024

mconst commented Nov 14, 2024

cberner commented Nov 15, 2024

mconst commented Nov 16, 2024

cberner left a comment

cberner Nov 16, 2024

mconst Nov 17, 2024

mconst Nov 17, 2024

cberner Nov 17, 2024

cberner commented Nov 17, 2024

mconst commented Nov 17, 2024

cberner commented Nov 17, 2024

Add set_quick_repair() #893

Add set_quick_repair() #893

Conversation

mconst commented Nov 10, 2024

adamreichold Nov 10, 2024

Choose a reason for hiding this comment

mconst Nov 11, 2024

Choose a reason for hiding this comment

cberner commented Nov 10, 2024 via email

mconst commented Nov 11, 2024

cberner commented Nov 12, 2024

mconst commented Nov 13, 2024

cberner commented Nov 14, 2024

mconst commented Nov 14, 2024

cberner commented Nov 15, 2024

mconst commented Nov 16, 2024

cberner left a comment

Choose a reason for hiding this comment

cberner Nov 16, 2024

Choose a reason for hiding this comment

mconst Nov 17, 2024

Choose a reason for hiding this comment

mconst Nov 17, 2024

Choose a reason for hiding this comment

cberner Nov 17, 2024

Choose a reason for hiding this comment

cberner commented Nov 17, 2024

mconst commented Nov 17, 2024

cberner commented Nov 17, 2024