Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Docker image and add CPU torch installation #5682

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ManfeiBai
Copy link
Collaborator

@ManfeiBai ManfeiBai commented Oct 6, 2023

add this command for libtpu.so file to fix INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory when using docker image

@@ -336,6 +340,7 @@ function main() {
build_and_install_torch
pushd xla
build_and_install_torch_xla
install_libtpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use this script anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script is used when build image if we build docker image using https://github.com/pytorch/xla/blob/master/docker/Dockerfile: https://github.com/pytorch/xla/blob/bf47009582d513d4068478f0e0e372657e1cabb8/docker/Dockerfile#L58C37-L58C58

add this func install_libtpu might help INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory, will see CI tests results

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but I thought re don't use that docker now, given our docker build is based on https://github.com/pytorch/xla/tree/master/infra . @will-cromar am I missing something

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this docker file is not used anymore. I don't think we even use it in the TPU CI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated to modify in https://github.com/pytorch/xla/tree/master/infra's Dockerfile

@ManfeiBai ManfeiBai requested a review from qihqi October 6, 2023 18:20
@ManfeiBai ManfeiBai changed the title Update build_torch_wheels.sh to install libtpu.so Update Dockerfile to install libtpu.so Oct 6, 2023
@will-cromar
Copy link
Collaborator

I'm not sure I understand the context.

Nightly wheels already have libtpu bundled, so they won't need this. Is this problem on the release images? This image tag will have libtpu already installed: us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_libtpu_3.10_tpuvm

@ManfeiBai
Copy link
Collaborator Author

I'm not sure I understand the context.

Nightly wheels already have libtpu bundled, so they won't need this. Is this problem on the release images? This image tag will have libtpu already installed: us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_libtpu_3.10_tpuvm

when i try to use docker image as a user from https://github.com/pytorch/xla#docker, i installed us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_tpuvm and there is no libtpu installed inside, so wanna add command to install libtpu

thanks for pointing this docker image us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_libtpu_3.10_tpuvm here, do we want to change suggested 2.1 docker image to us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_libtpu_3.10_tpuvm for TPUVM use case, from my side, user might would need to use libtpu if they use this docker image

@will-cromar
Copy link
Collaborator

Yeah, that's my bad. Thanks for catching it. Let's update the documented image to the r2.1.0_libtpu... one

@ManfeiBai
Copy link
Collaborator Author

Yeah, that's my bad. Thanks for catching it. Let's update the documented image to the r2.1.0_libtpu... one

Thanks, udpated

do we want to build a wheel for py38 with libtpu too? now the 2.1 wheel for py38 is https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl

@ManfeiBai ManfeiBai changed the title Update Dockerfile to install libtpu.so Update Docker image and add CPU torch installation Oct 6, 2023
@ManfeiBai ManfeiBai marked this pull request as ready for review October 9, 2023 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants