-
-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
attempt to also run e2e tests that needs gpus #1070
Conversation
Hopefully this won't block CI pipeline too much |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like something is wrong with this command see the CI logs
|
@winglian I dont have access to the hosted runner, but I think that we need to set nvidia runtime as the default (just trying to interpret the error message). On your self-hosted runner, in {
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
} If this is not the case, make the change and then restart docker by running
|
639de26
to
3f8a9d1
Compare
ensure wandb is dissabled for docker pytests clear wandb env after testing clear wandb env after testing make sure to provide a default val for pop tryin skipping wandb validation tests explicitly disable wandb in the e2e tests explicitly report_to None to see if that fixes the docker e2e tests split gpu from non-gpu unit tests skip bf16 check in test for now build docker w/o cache since it uses branch name ref revert some changes now that caching is fixed skip bf16 check if on gpu w support
a8d3c31
to
bd57059
Compare
* attempt to also run e2e tests that needs gpus * fix stray quote * checkout specific github ref * dockerfile for tests with proper checkout ensure wandb is dissabled for docker pytests clear wandb env after testing clear wandb env after testing make sure to provide a default val for pop tryin skipping wandb validation tests explicitly disable wandb in the e2e tests explicitly report_to None to see if that fixes the docker e2e tests split gpu from non-gpu unit tests skip bf16 check in test for now build docker w/o cache since it uses branch name ref revert some changes now that caching is fixed skip bf16 check if on gpu w support * pytest skip for auto-gptq requirements * skip mamba tests for now, split multipack and non packed lora llama tests * split tests that use monkeypatches * fix relative import for prev commit * move other tests using monkeypatches to the correct run
No description provided.