Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

fix the deadlock problem when using distributed training in VQA fintune #197

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Light-V
Copy link

@Light-V Light-V commented May 19, 2022

When using distributed training, the process with local_rank!=0 will not call torch.distributed.barrier() and cause a deadlock.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant