Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JBD support has issues when performing journal transction replay #86

Open
yeerwu opened this issue May 27, 2024 · 0 comments
Open

JBD support has issues when performing journal transction replay #86

yeerwu opened this issue May 27, 2024 · 0 comments

Comments

@yeerwu
Copy link

yeerwu commented May 27, 2024

@gkostka We've encountered a severe issue when integrating lwext4 in our system. Historical journal transcations have bee wrongly replayed in certain scenarios.

After some investigation and debugging, we found that the root cause is journal log does not specify where the end transcation is.

Consider this scenario

  1. we create and manipulate some files, JBD transcations have been created, e.g., 64 in the journal log.
  2. reboot the system, so the journal sb will point its start to 0 and will overwrite historic ones.
  3. delete one of created files and shut down power
  4. fsck will scan and replay journals to fix partition issues. And falsely, the deleted files are recovered in the replay (e.g., 64 transcations have been replayed).

In this case, journal transcation iteration will be like, new + new + old + old. By accident, we managed to create new journal transactions just to overwrite old transctions so SCAN process will continue iterating old transcations and replay them later.

Our current fix solution is simple, break the journal transcation chain. In "jbd_trans_write_commit_block", just before "write commit block", we will reset the next block which will "descriptor block" for next transaction like this

static void __jbd_journal_clean_next_trans(struct jbd_journal *journal)
{
	struct ext4_block block;
	int rc = jbd_block_get(journal->jbd_fs, &block, journal->last);
	if (rc != EOK) {
		return;
	}

	struct jbd_bhdr *header = (struct jbd_bhdr *)block.data;
	jbd_set32(header, magic, 0);
	ext4_bcache_set_dirty(block.buf);
	ext4_bcache_set_flag(block.buf, BC_TMP);
	(void)ext4_block_set(journal->jbd_fs->bdev, &block);
}

So by this way, we can make sure replay will never touch historic transcations as they are been marked as "invalid". Even in case of power shutdown, as commit block is always after cleanup of next transcation. We don't know if it is the best way to solve the problem and I am sorry we cannot push our fixes to your repo due to permission deny.

If there is a better official solution for that, that will be perfect! Many thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant