Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flushImmWALs is not thread-safe #35

Open
jayaprabhakar opened this issue Nov 22, 2024 · 1 comment
Open

flushImmWALs is not thread-safe #35

jayaprabhakar opened this issue Nov 22, 2024 · 1 comment

Comments

@jayaprabhakar
Copy link

jayaprabhakar commented Nov 22, 2024

I am working on testing SlateDB Go implementation for concurrency violations with the work-in-progress FizzBee tool. I see that flushImmWALs method that is called by FlushWAL is not thread safe.

Seed 1732230833492265000
ClientId 0 Put called [{key k1} {value v2}]
ClientId 1 FlushWal called
ClientId 0 Put return [{key k1} {value v2}] <nil> <nil>
ClientId 0 FlushWal called
Operation: {ClientId:1 Input:{Op:SlateDb#0.FlushWal Choices:[]} Call:1732233015246045000 Output:{ReturnValue:<nil> Error:<nil>} Return:1732233015246103000}
Operation: {ClientId:0 Input:{Op:SlateDb#0.Put Choices:[{Name:key Value:k1} {Name:value Value:v2}]} Call:1732233015246039000 Output:{ReturnValue:<nil> Error:<nil>} Return:1732233015246132000}
ClientId 0 FlushWal return <nil> <nil>
panic: deque: PopBack() called on empty queue

goroutine 8 [running]:
github.com/gammazero/deque.(*Deque[...]).PopBack(0x1056e9d60)
	/Users/jp/go/pkg/mod/github.com/gammazero/[email protected]/deque.go:128 +0x59c
github.com/slatedb/slatedb-go/slatedb.(*DBState).popImmWAL(0xc048888440)
	/Users/jp/src/slatedb-go/slatedb/db_state.go:153 +0x90
github.com/slatedb/slatedb-go/slatedb.(*DB).flushImmWALs(0xc0001e07e0)
	/Users/jp/src/slatedb-go/slatedb/flush.go:63 +0x3c
github.com/slatedb/slatedb-go/slatedb.(*DB).FlushWAL(0xc0001e07e0)
	/Users/jp/src/slatedb-go/slatedb/flush.go:38 +0x44
github.com/slatedb/slatedb-go/fizztest.(*SlateDbRoleAdapter).FlushWal(0xc04bada5e0, {0xc0001aa020?, 0xc04bc03d80?, 0x3?})
	/Users/jp/src/slatedb-go/fizztest/fizztest_adapter.go:124 +0x3c
github.com/slatedb/slatedb-go/fizztest_test.performOperations({0x12fe81358, 0xc04bada5e0}, 0x1, 0xc0001e0930, 0xc0488bbd40, 0x8a82d739490efb24)
	/Users/jp/src/slatedb-go/fizztest/mbt_test.go:643 +0xc00
created by github.com/slatedb/slatedb-go/fizztest_test.TestLinearizability in goroutine 50
	/Users/jp/src/slatedb-go/fizztest/mbt_test.go:456 +0xbdc
FAIL	github.com/slatedb/slatedb-go/fizztest	5.635s
FAIL


The cause of this issue is at,

func (db *DB) flushImmWALs() error {
	for {
		oldestWal := db.state.oldestImmWAL()
		// ... multiple threads could access the head of the queue and write the WAL
		
		db.state.popImmWAL()    // But when they try to pop, one of them fails.

		// ...
	}
	return nil
}

From my understanding, All the WAL flushing logic must be handled in a single thread (and memtable flush handled in its own separate thread). The public/exposed db.FlushWAL method does it in the callers' thread itself.
A current solution should be, make db.FlushWAL() as private (it is called by the actual flush thread), but have a separate Flush method that sends a message to the flush go routine that then should trigger the flush, similar to the db.Close()

@jayaprabhakar
Copy link
Author

A similar error, (although not a duplicate) happens here.

panic: deque: PopBack() called on empty queue

goroutine 136 [running]:
github.com/gammazero/deque.(*Deque[...]).PopBack(0x102742b60)
	/Users/jp/go/pkg/mod/github.com/gammazero/[email protected]/deque.go:128 +0x59c
github.com/slatedb/slatedb-go/slatedb.(*DBState).moveImmMemtableToL0(0xc0307adc40, {0xc00160386e?, 0xc000211800?}, 0xc0284d4940)
	/Users/jp/src/slatedb-go/slatedb/db_state.go:180 +0xa0
github.com/slatedb/slatedb-go/slatedb.(*MemtableFlusher).flushImmMemtablesToL0(0xc001603918)
	/Users/jp/src/slatedb-go/slatedb/flush.go:232 +0x11c
github.com/slatedb/slatedb-go/slatedb.(*DB).FlushMemtableToL0(0xc0001de930)
	/Users/jp/src/slatedb-go/slatedb/db.go:335 +0x114
github.com/slatedb/slatedb-go/fizztest.(*SlateDbRoleAdapter).FlushMemtable(0xc0328d89a0, {0xc000128020?, 0xc001603b38?, 0x3?})
	/Users/jp/src/slatedb-go/fizztest/fizztest_adapter.go:128 +0x3c
github.com/slatedb/slatedb-go/fizztest_test.call({0x14cf94008, 0xc0328d89a0}, 0x1, {0x102579b56, 0x17}, 0xc032a0cdb0, 0xc032a0cde0, 0xc001603e70)
	/Users/jp/src/slatedb-go/fizztest/mbt_test.go:688 +0x798
github.com/slatedb/slatedb-go/fizztest_test.performOperations({0x14cf94008, 0xc0328d89a0}, 0x1, 0xc0001deaf0, 0xc030829030, 0xa52a033b49091e11, 0xc032a0cdb0, 0xc032a0cde0)
	/Users/jp/src/slatedb-go/fizztest/mbt_test.go:639 +0x32c
created by github.com/slatedb/slatedb-go/fizztest_test.TestLinearizability in goroutine 114
	/Users/jp/src/slatedb-go/fizztest/mbt_test.go:469 +0x117c
FAIL	github.com/slatedb/slatedb-go/fizztest	48.876s
FAIL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant