Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to get cpuinfo on aws lambda arm64 #143

Open
kartheekgottipati opened this issue Apr 11, 2023 · 8 comments
Open

failed to get cpuinfo on aws lambda arm64 #143

kartheekgottipati opened this issue Apr 11, 2023 · 8 comments
Labels

Comments

@kartheekgottipati
Copy link

kartheekgottipati commented Apr 11, 2023

AWS Lambda
Arm64
pytorch 2.0.0

when running pytorch on aws lambda with pytorch 2.0.0 on arm64 i am getting the following error

[WARNING] 2023-04-10T23:55:34.026Z RUNNING WITH 1 threads
Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'c10::Error'
what(): [enforce fail at ThreadPool.cc:44] cpuinfo_initialize(). cpuinfo initialization failed
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x50 (0xffff70e7ca90 in /var/task/torch/lib/libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, char const*, void const*) + 0x50 (0xffff70e7cc30 in /var/task/torch/lib/libc10.so)
frame #2: + 0x2c8cc78 (0xffff73b6ac78 in /var/task/torch/lib/libtorch_cpu.so)
frame #3: + 0x2c8fb64 (0xffff73b6db64 in /var/task/torch/lib/libtorch_cpu.so)
frame #4: at::set_num_threads(int) + 0x2c (0xffff71bc12bc in /var/task/torch/lib/libtorch_cpu.so)
frame #5: + 0x58d698 (0xffff7980f698 in /var/task/torch/lib/libtorch_python.so)

frame #63: __libc_start_main + 0xe8 (0xffff84323e18 in /lib/aarch64-linux-gnu/libc.so.6)
START RequestId: 6b21fcf4-19b2-45cc-83e4-74a2cefe6bad Version: $LATEST
RequestId: 6b21fcf4-19b2-45cc-83e4-74a2cefe6bad Error: Runtime exited with error: signal: aborted
Runtime.ExitError

both x86_64 and arm64 dont have access to the files on aws lambda but x86_64 is ignoring the issue and proceeding while using arm64 it failing with above error.

Any reason an error log is used for arm64 vs warning for the rest?

@subhankar-trisetra
Copy link

I'm having the same issue

@jc-hdez
Copy link

jc-hdez commented Oct 23, 2023

I am having the same issue, torch version 2.1.0

@thecasual
Copy link

any update?

@stephenswetonic
Copy link

I believe the issue is with onnxruntime itself and is still not resolved. I'm going to try x86 for now.

@malfet
Copy link
Contributor

malfet commented Apr 6, 2024

In some sense of the word it’s an expected behavior: lambda runtime doesn’t want to leak hardware details to hosted processes, so cpuinfo fails to initialize, but PyTorch crash should be fixed

@StanislavMakhrov
Copy link

Is any SLA for solving this bug? Issue was opened more than year ago.
@soumith, @apaszke, @suo, could you please help?

@pluiedev
Copy link

Also problematic in restricted build environments (like Nix) that don't expose /sys/devices/system/cpu/{possible,present} to prevent packages from relying on the specific hardware configuration of the build system.

@nywhere
Copy link

nywhere commented Sep 2, 2024

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants