Skip to content

Latest commit

 

History

History
83 lines (55 loc) · 2.62 KB

43 - Log-Structured File System.md

File metadata and controls

83 lines (55 loc) · 2.62 KB

Motivation and Key Ideas

Observations Leading to LFS

  • System memories are growing larger, allowing more data to be cached
  • There is a large gap between random I/O and sequential I/O performance on hard drives
  • Existing file systems perform poorly on many common workloads
  • File systems are not RAID-aware

Key Ideas of LFS

  • Focus on write performance by buffering writes in memory
  • Write all updates (data and metadata) to disk sequentially in large segments
  • Never overwrite existing data in place, always write to free locations
  • Use a log structure to organize writes on disk
  • Periodically clean old versions of data to reclaim free space

LFS Design

Writing to Disk Sequentially

  • Buffer updates in memory segments
  • When segment is full, write entire segment to disk in one large sequential transfer
  • Write data blocks, inodes, and other metadata together in segments

Inode Map

  • Introduces indirection between inode numbers and inode locations on disk
  • Inode map tracks latest location of each inode
  • Allows inodes to be relocated without updating directory entries

Checkpoint Region

  • Fixed location on disk that points to latest pieces of inode map
  • Updated periodically (e.g. every 30 seconds)
  • Allows finding current file system state after crash

Segment Usage

  • Large segment writes improve disk efficiency
  • Segments also enable effective cleaning

Challenges and Solutions

Finding Inodes

  • Use inode map to translate inode number to current location
  • Inode map pieces written next to other data in segments
  • Checkpoint region points to latest inode map chunks

Garbage Collection

  • Periodically clean segments to reclaim free space
  • Read in multiple old segments, write out live data to new segments
  • Use segment summary info to determine which blocks are live

Crash Recovery

  • Use checkpoint region for consistent recovery point
  • Roll forward through log segments to recover recent updates
  • Write checkpoint region carefully to ensure consistency

Pros and Cons

Advantages

  • Efficient writing through large sequential transfers
  • Works well for RAID and SSDs too
  • Avoids small-write problem on parity-based RAID

Disadvantages

  • Generates garbage that must be cleaned
  • Cleaning costs were initially a concern
  • May perform poorly if cleaning not done efficiently

Impact and Legacy

  • Influenced modern file systems like WAFL, ZFS, btrfs
  • Core ideas like copy-on-write still relevant today
  • Showed promise of log-structured approach

That covers the key points from the LFS chapter in an organized format. Let me know if you would like me to expand on any section.