Skip to content

Commit

Permalink
Fix lock order when dropping index
Browse files Browse the repository at this point in the history
If an index is dropped, it is necessary to lock the heap table (of
the index) before the index since all normal operations do it in this
order. When dropping an index, we did not take all the necessary locks
in the right order before calling `performMultipleDeletions`, which can
cause deadlocks when dropping an index on a hypertable at the same time
as running a utility statement that takes heavy locks, e.g., VACUUM or
ANALYZE.
  • Loading branch information
mkindahl committed Jan 17, 2025
1 parent 08050fb commit 0f5b107
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 12 deletions.
1 change: 1 addition & 0 deletions .unreleased/pr_7600
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixes: #7600 Fix lock order when dropping index
65 changes: 53 additions & 12 deletions src/chunk_index.c
Original file line number Diff line number Diff line change
Expand Up @@ -573,11 +573,52 @@ typedef struct ChunkIndexDeleteData
bool drop_index;
} ChunkIndexDeleteData;

/*
* Lock object.
*
* In particular, we need to ensure that we lock the table of an index before
* locking the index, or run the risk of ending up in a deadlock since the
* normal locking order is table first, index second. Since we're not a
* concurrent delete, we take a strong lock for this.
*
* It is also necessary that the parent table is locked first, but we have
* already done that at this stage, so it does not need to be done explicitly.
*/
static bool
chunk_lock_object_for_deletion(ObjectAddress *obj)

Check warning on line 588 in src/chunk_index.c

View check run for this annotation

Codecov / codecov/patch

src/chunk_index.c#L588

Added line #L588 was not covered by tests
{
/*
* If we're locking an index, we need to lock the table first. See
* RangeVarCallbackForDropRelation() in tablecmds.c. We can ignore
* partition indexes since we're not using that.
*/
char relkind = get_rel_relkind(obj->objectId);

Check warning on line 595 in src/chunk_index.c

View check run for this annotation

Codecov / codecov/patch

src/chunk_index.c#L595

Added line #L595 was not covered by tests

/*
* If we cannot find the object, it might have been concurrently deleted
* (we do not have locks on objects yet).
*/
if (!relkind)
return false;

Check warning on line 602 in src/chunk_index.c

View check run for this annotation

Codecov / codecov/patch

src/chunk_index.c#L602

Added line #L602 was not covered by tests
if (relkind == RELKIND_INDEX)
{
Oid heapOid = IndexGetRelation(obj->objectId, true);

Check warning on line 605 in src/chunk_index.c

View check run for this annotation

Codecov / codecov/patch

src/chunk_index.c#L605

Added line #L605 was not covered by tests
if (OidIsValid(heapOid))
LockRelationOid(heapOid, AccessExclusiveLock);

Check warning on line 607 in src/chunk_index.c

View check run for this annotation

Codecov / codecov/patch

src/chunk_index.c#L607

Added line #L607 was not covered by tests
}

LockRelationOid(obj->objectId, AccessExclusiveLock);
return true;

Check warning on line 611 in src/chunk_index.c

View check run for this annotation

Codecov / codecov/patch

src/chunk_index.c#L610-L611

Added lines #L610 - L611 were not covered by tests
}

/* Find all internal dependencies to be able to delete all the objects in one
* go. We do this by scanning the dependency table and keeping all the tables
* in our internal schema. */
* in our internal schema.
*
* We also lock the objects in the correct order here to make sure that we do
* not end up with deadlocks. */
static void
chunk_collect_objects_for_deletion(const ObjectAddress *relobj, ObjectAddresses *objects)
chunk_collect_and_lock_objects_for_deletion(const ObjectAddress *relobj, ObjectAddresses *objects)
{
Relation deprel = table_open(DependRelationId, RowExclusiveLock);
ScanKeyData scankey[2];
Expand Down Expand Up @@ -608,16 +649,10 @@ chunk_collect_objects_for_deletion(const ObjectAddress *relobj, ObjectAddresses
{
Form_pg_depend record = (Form_pg_depend) GETSTRUCT(tup);
ObjectAddress refobj = { .classId = record->refclassid, .objectId = record->refobjid };

switch (record->deptype)
{
case DEPENDENCY_INTERNAL:
add_exact_object_address(&refobj, objects);
break;
default:
continue; /* Do nothing */
}
if (record->deptype == DEPENDENCY_INTERNAL && chunk_lock_object_for_deletion(&refobj))
add_exact_object_address(&refobj, objects);

Check warning on line 653 in src/chunk_index.c

View check run for this annotation

Codecov / codecov/patch

src/chunk_index.c#L653

Added line #L653 was not covered by tests
}

systable_endscan(scan);
table_close(deprel, RowExclusiveLock);
}
Expand Down Expand Up @@ -651,14 +686,20 @@ chunk_index_tuple_delete(TupleInfo *ti, void *data)
* internal dependencies and use the function
* performMultipleDeletions.
*
* We lock the objects to delete first to make sure that the lock
* order is correct. This is done inside RemoveRelations and
* performMultipledeletions expect these locks to be taken
* first. If not, it will take very rudimentary locks, which will
* cause deadlocks in some cases.
*
* The function performMultipleDeletions accept a list of objects
* and if there are dependencies between any of the objects given
* to the function, it will not generate an error for that but
* rather proceed with the deletion. If there are any dependencies
* (internal or not) outside this set of objects, it will still
* abort the deletion and print an error. */
ObjectAddresses *objects = new_object_addresses();
chunk_collect_objects_for_deletion(&idxobj, objects);
chunk_collect_and_lock_objects_for_deletion(&idxobj, objects);
performMultipleDeletions(objects, DROP_RESTRICT, 0);
free_object_addresses(objects);
}
Expand Down

0 comments on commit 0f5b107

Please sign in to comment.