Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byebye test_batching_equivalence's flakiness #35729

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jan 16, 2025

What does this PR do?

Au revoir flakyness 👋

The major ways of the fix

  • ensure the norm layers get patched by set_xxx_less_flaky methods
  • don't use the cosine similarity in the equivalence check. It's bad as:
    • with the very small values (sometimes we get 1e-20 or 1e-39) and the eps=-38 used, the cosine similarity is not in the range of [-1, 1] and it's not stable.
    • the cosine similarity doesn't take magnitude into account

Still some flakyness

  • when a model using max() or topk methods to compute some indices and use them in later computation, this is not stable (small difference even in the order of 1e-9 could still get difference indices and cause later values to differ a bit larger)
  • Timm's backbone has somehow different weight initialization and it may sometimes give larger intermediate values (so the difference would be larger too)
  • AutoformerAttention: it uses topk, but also the computation of tmp_delay is wrong I believe (it won't keep the batch equivalence - I am sure about that, but not sure that is the original design)
  • Esmfold: unknown reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant