Byebye `test_batching_equivalence`'s flakiness #35729

ydshieh · 2025-01-16T15:52:02Z

What does this PR do?

Au revoir flakyness 👋

ensure the norm layers get patched by set_xxx_less_flaky methods
don't use the cosine similarity in the equivalence check. It's bad as:
- with the very small values (sometimes we get 1e-20 or 1e-39) and the eps=-38 used, the cosine similarity is not in the range of [-1, 1] and it's not stable.
- the cosine similarity doesn't take magnitude into account

when a model using max() or topk methods to compute some indices and use them in later computation, this is not stable (small difference even in the order of 1e-9 could still get difference indices and cause later values to differ a bit larger)
Timm's backbone has somehow different weight initialization and it may sometimes give larger intermediate values (so the difference would be larger too)
AutoformerAttention: it uses topk, but also the computation of tmp_delay is wrong I believe (it won't keep the batch equivalence - I am sure about that, but not sure that is the original design)
Esmfold: unknown reason

ydshieh added 20 commits January 16, 2025 16:18

remove flaky

02bad28

update set_model_for_less_flaky_test

ad4e171

update test_batching_equivalence

c13463e

add flaky or skip

75bc037

circleci

70293c6

fix is_flaky usage

06c0908

update circleci

413275d

fix is_flaky usage

1be8433

update circleci

595b2c6

upload pytest.txt

975cbf5

circleci 10

7792810

circleci 300

9efb6e1

add flaky or skip

e62fa08

add flaky or skip

8c440f4

upload pytest_failed.txt

46bb3e0

temp

e21c123

fix audio

d0a688d

fix audio

bfa1e54

fix for superpoint

5cf38e2

don't compare non-float outputs

4b91c4e