-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range Locking support, MyRocks part #1185
Range Locking support, MyRocks part #1185
Conversation
When it is ON, MyRocks will: - initialize RocksDB to use range-locking lock manager - for all DML operations (including SELECT .. FOR UPDATE) will lock the scanned range before reading/modifying rows. - In range locking mode, there is no snapshot checking (cannot do that for ranges). Instead, MyRocks will read and modify latest committed data, just like InnoDB does (in the code, grep for (start|end)_ignore_ snapshot) - Queries that do not have a finite range to scan, like UPDATE t1 .... ORDER BY t1.key LIMIT n will use a "Locking iterator" which will read rows, lock the range, and re-read the rows. See class LockingIterator.
@hermanlee has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Tracking some additional issues found so far:
|
There is a use-after-free which caused the problem. The transaction clean-up is not removing the transaction from the lock manager's wait list correctly. Report from asan:
|
The range lock manager currently uses the address of the PessimisticTransaction object as the TXNID in the locktree. Various different callbacks cast this address back to PessimisticTransaction to get the rocksdb assigned transaction ID for reporting purposes. However, other than the callbacks made when walking a locktree's request queue where a mutex is held to guarantee those pending lock requests (and consequently PessimisticTransactions) are still active, most of the other callbacks are invoked without holding any type of mutex or reference count to prevent the PessimisticTransaction from being freed. It seems the following scenario is occurring:
|
Updated version: #1430 |
(Range Locking pull request,filed against fb-mysql-8.0.23)
This adds a my.cnf parameter, rocksdb_use_range_locking.
When it is ON, MyRocks will:
the scanned range before reading/modifying rows.
for ranges). Instead, MyRocks will read and modify latest committed
data, just like InnoDB does (in the code, grep for (start|end)ignore
snapshot)
UPDATE t1 .... ORDER BY t1.key LIMIT n
will use a "Locking iterator" which will read rows, lock the range,
and re-read the rows. See class LockingIterator.