-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Bootstrapping Computation #409
Speed up Bootstrapping Computation #409
Conversation
Hi @JoelNiklaus , thanks a lot for this PR, I'll take a deeper look hopefully bf Monday |
(it's looking good from a first glance but I want to take some time to test it deeply) |
Hi! You get quite a huge difference in the bootstrap from the results we hardcoded in our test suite (like, an order of magnitude) - can you check why? (I was expecting a diff on a few decimal points, not stg this huge) |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I am trying to run your tests with |
It's should be taking some time (around 30 min if you're on CPU), as it first needs to generate a bunch of predictions using a gpt2 model. It will be way faster if you have a GPU available |
I aborted it after 30min on an A100 GPU and I only ran the lite version. |
Which metrics did you check? BLEU or CHRF? For these the original version computes corpus level metrics. I switched to sample level metrics to speed up computation. There it would make sense to me that the stderr differs since the corpora are different for the different samples distributions in bootstrapping. For the metrics that are already sample level, I don't see a reason why they should be different. |
30min on an A100 is not normal. I wonder if there's an issue with the command you're running. Let me share the raw logs with you. (Do you have the rights to access them, by clicking on "Details" next to the failing test in the check list?)
|
Thanks for the logs! Unfortunately, for me the test is just hanging and doesn't show any useful output at all. |
I think I may have found the problem. The parallelization was happening one level too high so it didn't have any effect. For me it happens way faster now and the stderrs are very similar. @clefourrier wdyt? |
Launching the tests on our CLI - if we're not too far, I'm OK with updating the numbers there |
Fixes Issue 408.
Added fix for heavy recomputation of sample level metrics.
No change for corpus level metrics, but for sample level metrics we can just look up the values instead of recomputing them for each sample, which can be prohibitively expensive for heavier metrics such as XCOMET-XXL.