Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Inference is exceptionally slow on the L20 GPU #10652

Open
1 task done
joey9503 opened this issue Nov 26, 2024 · 0 comments
Open
1 task done

[Bug]: Inference is exceptionally slow on the L20 GPU #10652

joey9503 opened this issue Nov 26, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@joey9503
Copy link

Your current environment

PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Alibaba Group Enterprise Linux Server 7.2 (Paladin) (x86_64)
GCC version: (GCC) 9.2.1 20200522 (Alibaba 9.2.1-3 2.17)
Clang version: Could not collect
CMake version: version 3.20.1
Libc version: glibc-2.30

Python version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.9.151-015.ali3000.alios7.x86_64-x86_64-with-glibc2.30
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: L20-2-PCIE-48GB-48GB-L-H-V
Nvidia driver version: 535.161.08
cuDNN version: Probably one of the following:
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.3
/...es (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

LD_LIBRARY_PATH=/home/t4/huse/images/tansformers-gpu.1d12f125-1d2d-464b-9ea7-2c616fd2e256/1/spark-3.2.0-dag-snapshotTest-20241030180001-SNAPSHOT-py39/python/lib/conda/lib/python3.9/site-packages/cv2/../../lib64:/home/t4/huse/images/tansformers-gpu.1d12f125-1d2d-464b-9ea7-2c616fd2e256/1/spark-3.2.0-dag-snapshotTest-20241030180001-SNAPSHOT-py39/python/lib/conda/lib/python3.9/site-packages/nvidia/nvjitlink/lib:/dev/shm/xpdk-lite:/dev/shm/xpdk-lite:/opt/conda/lib/python3.8/site-packages/aistudio_common/reader/libs/:/opt/taobao/java/jre/lib/amd64/server/:/usr/local/cuda/lib64:/usr/local/TensorRT-8.6.1/lib/:/usr/local/lib:/usr/local/lib64:/opt/ai-inference/
NVIDIA_VISIBLE_DEVICES=GPU-a18c8a32-ef6d-8d76-5a61-7a81da133490
NVIDIA_DRIVER_CAPABILITIES=utility,compute
CUDA_MPS_PIPE_DIRECTORY=/dev/shm/nvidia-mps
CUDA_MODULE_LOADING=LAZY

Model Input Dumps

No response

🐛 Describe the bug

Inference is exceptionally slow on the L20 GPU, speed 0.08tokens/s:
截屏2024-11-25 15 35 43
and graphics card usage is low:
截屏2024-11-25 14 24 49

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@joey9503 joey9503 added the bug Something isn't working label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant