-
-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pants test timeout includes venv creation overhead causing timeouts #19681
Comments
The timeout (as set globally or on individual python_tests targets) applies only to the So is this extra overhead happening within that process? |
Hmm, but I can reproduce what you're talking about, with your gist. |
OK, so this is the cost of seeding the venv from the pex, and does indeed happen in the pytest process. So there is no easy way to avoid this, short of pre-seeding the venv in a separate process. |
I think the problem is wider than just timeouts though. If each test constructs its own bespoke venv, but they all include pytorch, each test will incur this penalty on first run. And in CI first run is the only run. One solution, when there are many tests, is to run tests in batches: https://www.pantsbuild.org/docs/python-test-goal#batching-and-parallelism This doesn't help the single test cold start time, but it amortizes that cold start over many tests. |
You're completely right, from a user side we can do some steps to alleviate this. We often apply the
Maybe @jsirois has some idea? I wonder if the venv construction for large library dependencies can be somehow optimized. |
Have you tried setting this to True?: https://www.pantsbuild.org/docs/reference-pex#venv_use_symlinks |
This doesn't seem to make a difference in this case.
|
Ok, makes sense from a clean Pex cache. I'm not sure there is any way to speed up venv creation from a cold start save for populating the venv site-packages in parallel. That step is currently serial. I can give that a quick try in the am and report timing differences for your requirement set case in the gist. |
PS to get the gist to work, I had to add
216.x still has broad default ICs, which in my case captured a 3.7 interpreter, which is incompatible with the |
Ok, thanks. I'm going to cut out Pants in my testing, but the IC will still be relevant. |
Not that this helps in CI, but you should see a difference if your nuking command is So you get a sense of what's going on. |
I mean, stopping real fast here, just looking at the parallel downloads from the lock file:
Now, this is on my home network, but ~54s (1/2 the total time) just to download torch. As a sanity check I find:
So the contended (parallel) Pip download Pex does is slower than a straight curl of that one wheel, but not by much. Pex absolutely cannot speed that up in any way whatever besides caching, and this use case is ruling out a cache. With a cache (re-run the last command), and:
... more analysis to come re the remaining ~60s. |
We can, however, pre-seed the venv, so that at least the test timeout only applies to the pytest run, and not the venv seeding. This would solve the narrow problem the OP started with, and can be done entirely in Pants. |
Sure - yes. Save 0 time, but save a timeout. I'll continue to dig on the raw performance aspect on the Pex side to see what can be juiced. |
Fair enough, but we have a lot of code that depend on torch, and we only need to download torch once for our lockfile per CI run whereas multiple different pex files will be generated, and ?unzipped? as part of the their first pex execution. This can be limited with |
Ok, the lion's share (~90%), of the remaining time is creating the (compressed)
So the lion's share of the overall time is dominated by the long pole of the download of torch in a parallel download of all needed wheels and the zip compression of packed wheel zips. The latter is serial; so that could be parallelized - but stepping back and trying a native zip of just the torch installed wheel shows:
So there is not much to be gained there in this case. |
@JoostvDoorn I'll let you continue with @benjyw down the path of not speeding things up in reality, but avoiding the test timeout with accounting tricks. To meaningfully speed venv creation up for this example, you definitely need to allow for caching something. |
You can control the batch size, to trade off concurrency vs this overhead. |
This is where @jriddy's work on "provided dependencies" might be valuable. If your CI image could provide torch preinstalled in site-packages... |
I think that work brings us full circle: pex-tool/pex#2097 |
I think what this really all says is Pants probably needs to stop playing whac-a-mole and instead have someone fully embrace that they are a torch user and look at the shortcomings wholistically and actually solve them all. From lock issues to speed issues. |
Yeah I'm aware of that one. I want to get to it, just trying to find when I have the time
I'm not fully sure what you mean by this, but in my experience the overlap between the actual users of torch and people who have any interest in deterministic builds is pretty small. Often ML teams have a handful of enabler engineers with far less background in ML that are trying to build tooling and pipelines and generally hold the whole dev shop side of things together. So it will take that rare someone that is both a genuine user of Torch and someone who is willing to wade through the nitty-gritty of weird native code + python packaging to understand the problem holistically. I think "provided dependencies" + environments is probably a better way to handle this longer term, because torch and tensorflow aren't the only things like this. So giving some way to provide something resembling determinism while acquiescing to the reality of how some packages/ecosystems/organizations work seems to be a reasonable way out for these cases that we can never really solve perfectly through the pip ecosystem. |
I mean just that - Pants has historically been composed of committers that basically have no clue about the things they're integrating. For example, I learned Python packaging on the job over the course of ~10 years. I had no clue going in. We have had, FWICT, no Pants committer dig in and learn / live ML to produce a useful experience. I agree this is hard, I'm just pointing out its a huge problem and has been for a very long time. Pants has Pants experts and next to no Pants use case problem domain experts. Its very hard being a middleman tool but not having the middleman expertise but that is the position Pants is in. Much better would be if there were a way to get out of the middle somehow. Then the ecosystem expert could know little about Pants but leverage their ecosystem. Right now its a bit of the worst of both worlds / you have to know both worlds to do good work. |
Okay that makes sense. I actually have been there glue guy on an ML team
and I’m currently dealing with a RPM shop where we can’t use Pex
conventionally . So I think I do have a bit of domain expertise here. I
just struggle to carve out enough time to actually make these
contributions. 😅
…On Tue, Aug 29, 2023 at 20:36 John Sirois ***@***.***> wrote:
I'm not fully sure what you mean by this, but in my experience the overlap
between the actual users of torch and people who have any interest in
deterministic builds is pretty small. Often ML teams have a handful of
enabler engineers with far less background in ML that are trying to build
tooling and pipelines and generally hold the whole dev shop side of things
together. So it will take that rare someone that is both a genuine user of
Torch and someone who is willing to wade through the nitty-gritty of weird
native code + python packaging to understand the problem holistically.
I mean just that - Pants has historically been composed of committers that
basically have no clue about the things they're integrating. For example, I
learned Python packaging on the job over the course of ~10 years. We have
had, FWICT, no Pants committer dig in and learn / live ML to produce a
useful experience. I agree this is hard, I'm just pointing out its a huge
problem and has been for a very long time. Pants has Pants experts and next
to no Pants use case problem domain experts.
—
Reply to this email directly, view it on GitHub
<#19681 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAH5UH4PVDRK2RRJDX5B7TDXX2DI7ANCNFSM6AAAAAA4BN3WGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
100%. This has been one of the biggest challenges of expanding Pants into use-cases the current maintainers don't understand. To make a useful plugin for X you need to know X and Pants, and that intersection is small to empty for most X. |
Specifically for the initial problem of the test timeout being consumed by the venv creation overhead: @JoostvDoorn , do you want to take a crack at that? The idea is: right before running pytest we run a process in the same venv (so same params as the pytest process) that does enough work to create the venv, but doesn't actually run the test. Actually, now that I think of it, this is similar to #19430. Maybe we wait for that one, then unify into one "force venv to materialize" rule. |
Seems doable from my side. Will have a look at this sometime after this is merged. |
OK! So #19430 is now merged, and I think we want to generalize it to a "force a venv to exist" rule, that we don't cache. See the conversation in that PR for more context. |
All changes: - https://github.com/pantsbuild/pex/releases/tag/v2.1.153 - https://github.com/pantsbuild/pex/releases/tag/v2.1.154 - https://github.com/pantsbuild/pex/releases/tag/v2.1.155 Highlights: - `--no-pre-install-wheels` (and `--max-install-jobs`) that likely helps with: - #15062 - (the root cause of) #20227 - _maybe_ arguably #18293, #18965, #19681 - improved shebang selection, helping with #19514, but probably not the full solution (#19925) - performance improvements
https://github.com/pantsbuild/pex/releases/tag/v2.1.156 Continuing from #20347, this brings additional performance optimisations, particularly for large wheels like PyTorch, and so may help with #18293, #18965, #19681
Highlighted changelogs: * https://github.com/pex-tool/pex/releases/tag/v2.4.0 * https://github.com/pex-tool/pex/releases/tag/v2.4.1 * https://github.com/pex-tool/pex/releases/tag/v2.5.0 * https://github.com/pex-tool/pex/releases/tag/v2.6.0 * https://github.com/pex-tool/pex/releases/tag/v2.7.0 * https://github.com/pex-tool/pex/releases/tag/v2.8.0 * https://github.com/pex-tool/pex/releases/tag/v2.8.1 * https://github.com/pex-tool/pex/releases/tag/v2.9.0 * https://github.com/pex-tool/pex/releases/tag/v2.10.0 * https://github.com/pex-tool/pex/releases/tag/v2.10.1 * https://github.com/pex-tool/pex/releases/tag/v2.11.0 (!) ``` Lockfile diff: 3rdparty/python/user_reqs.lock [python-default] == Upgraded dependencies == certifi 2024.6.2 --> 2024.7.4 exceptiongroup 1.2.1 --> 1.2.2 pex 2.3.3 --> 2.11.0 pydantic 1.10.16 --> 1.10.17 ``` NOTE: This updates the scrupulously backwards compatible Pex, but does not use any new features yet. The minimum version is unchanged. Possibly relevant for: #15704 #21103 #20852 #15454 #21100 #11324 #19256 #19552 #19681
Describe the bug
When a
python_tests
has heavy dependencies (e.g. pytorch) a test could timeout on a "cold" execution of the test, as during first execution of the test some additional overhead actions are performed to create the venv. This can be significant, we observe a 'test time' >30s for a test that takes 3s with a warm cache.One solution would be to exclude this additional venv creation time for the test timeout, and only measure the actual execution time of the python process. This would make the timeout argument for
python_tests
more useful when large complex dependencies are used.Not sure if there is a performance component that can be solved, potentially that should then be an issue in the pex repository.
Pants version
2.16.0 and 2.17.0rc5
OS
MacOS + Linux
Additional info
See gist for example:
https://gist.github.com/JoostvDoorn/9c0f63ed5198544a36b477502eeac4fb
To test:
Second execution using
pants test :: --keep-sandboxes=on_failure --test-force
is significantly faster:The bug here is that the test execution timing should reflect the actual time it takes to execute the test, and not the creation of the virtual environment.
For the gist example the expectation is that would always work regardless of the state of the cache:
The text was updated successfully, but these errors were encountered: