Interaction of padding and bidirectional mask #6

vaibhavad · 2024-04-10T18:41:56Z

Hi,

Thanks for sharing this very interesting work. I had a question about how the bidirectional attention mask is implemented here

Based on this implementation, it seems like even the padding tokens in a batch will get unmasked, whereas they should remain masked in both unidirectional and bidirectional attention. Is my understanding correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction of padding and bidirectional mask #6

Interaction of padding and bidirectional mask #6

vaibhavad commented Apr 10, 2024

Interaction of padding and bidirectional mask #6

Interaction of padding and bidirectional mask #6

Comments

vaibhavad commented Apr 10, 2024