Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #7783: Deadlock during index creation when parallel workers are used #8163

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ilya071294
Copy link
Contributor

Deadlock is possible when index deletion and creation are performed in the same transaction and a deleted index is used by a computed field expression. Each worker attachment loads table's metadata, parses the expression and tries to get SR lock on the deleted index. They will never get it because the main thread holds EX lock and waits for worker attachments to complete. The solution is to prevent the optimizer from using deleted indexes by making an attempt to get SR lock with NO_WAIT.

…rkers are used

Deadlock is possible when index deletion and creation are performed in the same transaction and a deleted index is used by a computed field expression. Each worker attachment loads table's metadata, parses the expression and tries to get SR lock on the deleted index. They will never get it because the main thread holds EX lock and waits for worker attachments to complete. The solution is to prevent the optimizer from using deleted indexes by making an attempt to get SR lock with NO_WAIT.
@ilya071294 ilya071294 requested a review from hvlad June 19, 2024 11:44
Copy link
Member

@hvlad hvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more questions:

  • if compileRelation() called more than once for the single query - how many times idl_count will be incremented ?
  • if statement was prepared while index is dropping (as shown in Deadlock during index creation when parallel workers are used #7783) - could it use new index instance after it will be created ? Note, statement could be cached in statements cache or in SP\triggers cache

}
}
// No need to lock the index here because it is already
// locked by Optimizer::compileRelation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is good, but assert will not harm, imho


if (idx_lock)
{
if (idx_lock->idl_count == 1 && idx_lock->idl_lock->lck_logical == LCK_EX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that idl_count > 1 ?
I.e. perhaps it should check for idl_count > 0 here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like currently LCK_EX is possible only with idl_count == 1. Should I correct it to idl_count > 0 to make it more safe (maybe for future changes)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, do it, please. It makes no harm but protect us from "impossible" cases ;)
Also, consider to add assert, if you sure it is really impossible.

@ilya071294
Copy link
Contributor Author

if compileRelation() called more than once for the single query - how many times idl_count will be incremented ?

compileRelation() is called from Optimizer::compile(), and Optimizer::checkIndices() is also called from there a bit later. As a result we won't get idl_count incremented if CMP_post_resource() was not called for the index (idx_used flag is not set in this case, see makeIndexScanNode()). In other words, idl_count will be incremented as many times as CMP_post_resource() is called (like it was before).

if statement was prepared while index is dropping (as shown in Deadlock during index creation when parallel workers are used #7783) - could it use new index instance after it will be created ? Note, statement could be cached in statements cache or in SP\triggers cache

I tested it. If DROP INDEX and CREATE INDEX are executed in different transactions then a cached statement will be recompiled and will use a new instance, but if they are in the same transaction then it won't. It happens because dbb_statement_cache->purgeAllAttachments() is called only once in the beginning of DFW_perform_work(). I think this problem can be reproduced using just CREATE INDEX (if it's executed at the right time).

@hvlad
Copy link
Member

hvlad commented Jun 20, 2024

if compileRelation() called more than once for the single query - how many times idl_count will be incremented ?

compileRelation() is called from Optimizer::compile(), and Optimizer::checkIndices() is also called from there a bit later. As a result we won't get idl_count incremented if CMP_post_resource() was not called for the index (idx_used flag is not set in this case, see makeIndexScanNode()). In other words, idl_count will be incremented as many times as CMP_post_resource() is called (like it was before).

CMP_post_resource() doesn't increment idl_count. It just add record into resource list, if it was not exists already. Thus, currently (with no patch) idl_count is incremented once (for given index). With patch idl_count incremented as many times as given index is used by statement execution tree. Am I wrong ?

@ilya071294
Copy link
Contributor Author

CMP_post_resource() doesn't increment idl_count. It just add record into resource list, if it was not exists already. Thus, currently (with no patch) idl_count is incremented once (for given index). With patch idl_count incremented as many times as given index is used by statement execution tree. Am I wrong ?

You are right. And allowing duplicates in the resource list is not a good idea, I guess.

@ilya071294
Copy link
Contributor Author

CMP_post_resource() doesn't increment idl_count. It just add record into resource list, if it was not exists already.

I see 2 ways to fix this problem:

  1. Allow duplicates in the resource list for indexes only.
  2. Check csb_resources in compileRelation() and don't increment idl_count if the index is already there.

Which one is better? I like the second one more.

@hvlad
Copy link
Member

hvlad commented Jun 24, 2024

Note comment in Statement::Statement():

relation locks MUST be taken before index locks

It become more complex than it looks

@ilya071294
Copy link
Contributor Author

Note comment in Statement::Statement():

relation locks MUST be taken before index locks

It become more complex than it looks

I tried a solution where compileRelation() gets an index lock and releases it immediately. And after that the index is locked again by Statement::Statement(). It works in most cases but it obviously not perfect as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants