Skip to content

Latest commit

 

History

History
112 lines (82 loc) · 4.18 KB

42 - FSCK and Journaling.md

File metadata and controls

112 lines (82 loc) · 4.18 KB

Crash Consistency Problem

The Issue

  • File systems manage data structures that must persist on disk
  • Updates to these structures involve multiple disk writes
  • A crash or power loss can interrupt these writes mid-operation
  • This can leave the file system in an inconsistent state
  • The crash consistency problem: how to update the disk despite crashes

Example Scenario

  • Appending data to a file requires updating:
    • Inode (update file size, add new block pointer)
    • Data bitmap (mark new block as allocated)
    • New data block (write actual data)
  • Possible crash scenarios:
    • Only data block written: Not a problem, as if write never occurred
    • Only inode updated: Points to uninitialized data, inconsistent with bitmap
    • Only bitmap updated: Space leak, block marked used but not referenced

Solution 1: File System Checker (fsck)

How it Works

  • Runs at boot time after a crash
  • Scans entire file system to find and fix inconsistencies
  • Very slow on large disks (can take hours)

What fsck Checks and Fixes

  • Superblock: Sanity checks, may use backup superblock if corrupt
  • Free blocks: Scans inodes to rebuild correct allocation bitmaps
  • Inode state: Verifies each inode for corruption, clears bad inodes
  • Inode links: Verifies and corrects link counts for all files
  • Duplicates: Checks for and resolves duplicate block pointers
  • Bad blocks: Removes pointers to blocks outside valid range
  • Directory integrity: Verifies directory structure and entries

Drawbacks

  • Very slow on large disks
  • Scans entire disk even for small updates
  • Complex to implement correctly for all cases

Solution 2: Journaling

Basic Idea

  • Borrowed from database systems (write-ahead logging)
  • Write updates to a journal/log before modifying file system
  • If crash occurs, replay journal entries to recover
  • Much faster recovery than fsck

Data Journaling

  • Journal both metadata and data updates
  • Protocol:
    1. Journal write: Write transaction (TxB, metadata, data, TxE) to log
    2. Journal commit: Wait for transaction to be fully on disk
    3. Checkpoint: Write changes to final file system locations
    4. Free: Mark transaction as free in journal superblock

Metadata Journaling

  • Only journal metadata updates
  • Data written directly to file system
  • Reduces write traffic compared to data journaling
  • More commonly used (e.g., ext3, NTFS, XFS)
  • Protocol:
    1. Data write: Write data to final location
    2. Journal metadata write: Write begin block and metadata to log
    3. Journal commit: Write transaction end block to log
    4. Checkpoint metadata: Write metadata to final locations
    5. Free: Mark transaction as free in journal superblock

Challenges and Optimizations

  • Block reuse issues: Solved with revoke records
  • Ordering of writes is critical for consistency
  • Batching updates into larger transactions
  • Checksum-based optimization for faster commits

Other Approaches

Soft Updates

  • Carefully order all writes to maintain consistency
  • Never leave on-disk structures in an inconsistent state
  • Complex to implement, requires deep file system knowledge

Copy-on-Write (COW)

  • Never overwrite data or metadata in place
  • Write updates to new locations on disk
  • Atomically switch to new version by updating root structure
  • Used in ZFS, Btrfs

Backpointer-Based Consistency (BBC)

  • Add backpointers to all blocks (e.g., data block points back to inode)
  • Check consistency on access by verifying forward and backward pointers
  • Allows for lazy consistency checking

Optimistic Crash Consistency

  • Reduce waits for disk writes to complete
  • Use generalized transaction checksums
  • Can significantly improve performance for some workloads
  • Requires changes to disk interface for full benefits

In summary, crash consistency is a fundamental challenge in file system design. While journaling (especially metadata journaling) is widely used in modern file systems due to its balance of performance and fast recovery, research continues to develop new techniques to improve both consistency guarantees and performance. The choice of consistency mechanism can significantly impact file system behavior, recovery time, and overall system performance.