Question about the calculation method of loss when there are multiple gpus #101

YinAoXiong · 2023-03-03T05:01:42Z

Line 400 in 508ffa3

    
           retrieve_logits = logit_scale * torch.matmul(sequence_output, visual_output.t())

        if self.training:
            visual_output = allgather(visual_output, self.task_config)
            video_mask = allgather(video_mask, self.task_config)
            sequence_output = allgather(sequence_output, self.task_config)
            torch.distributed.barrier()

        visual_output = visual_output / visual_output.norm(dim=-1, keepdim=True)
        visual_output = self._mean_pooling_for_similarity_visual(visual_output, video_mask)
        visual_output = visual_output / visual_output.norm(dim=-1, keepdim=True)

        sequence_output = sequence_output.squeeze(1)
        sequence_output = sequence_output / sequence_output.norm(dim=-1, keepdim=True)

        logit_scale = self.clip.logit_scale.exp()
        retrieve_logits = logit_scale * torch.matmul(sequence_output, visual_output.t())

The current code seems to calculate the loss on the global similarity matrix on each gpu. Computing loss only for local and global features as described in openai/CLIP#132 seems to be more computationally and memory efficient.
Sorry to bother you if I misunderstood the code

The text was updated successfully, but these errors were encountered:

zsnoob · 2023-11-23T13:40:00Z

My idea is just like yours. After debugging, I found that during the training epoch, all GPUs compute the same global loss with the same sim_matrix instead of individually calculating local losses and then gathering and averaging them. There is a clear computation overlap here. I also have seen that in the function "train_epoch", there is an useless computation "loss.mean()" that seems do nothing after the model.forward(). We only need do local loss following the openai/CLIP#132 and do loss.backward(), The gradient synchronization will be done automatically by DDP.

zsnoob mentioned this issue Nov 24, 2023

accelerate sim_matrix process in multi-GPU #113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the calculation method of loss when there are multiple gpus #101

Question about the calculation method of loss when there are multiple gpus #101

YinAoXiong commented Mar 3, 2023

zsnoob commented Nov 23, 2023 •

edited

Loading

Question about the calculation method of loss when there are multiple gpus #101

Question about the calculation method of loss when there are multiple gpus #101

Comments

YinAoXiong commented Mar 3, 2023

zsnoob commented Nov 23, 2023 • edited Loading

zsnoob commented Nov 23, 2023 •

edited

Loading