[Bug]: Inference is exceptionally slow on the L20 GPU #10652

joey9503 · 2024-11-26T02:42:11Z

Your current environment

PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Alibaba Group Enterprise Linux Server 7.2 (Paladin) (x86_64)
GCC version: (GCC) 9.2.1 20200522 (Alibaba 9.2.1-3 2.17)
Clang version: Could not collect
CMake version: version 3.20.1
Libc version: glibc-2.30

Python version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.9.151-015.ali3000.alios7.x86_64-x86_64-with-glibc2.30
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: L20-2-PCIE-48GB-48GB-L-H-V
Nvidia driver version: 535.161.08
cuDNN version: Probably one of the following:
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.3
/...es (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

LD_LIBRARY_PATH=/home/t4/huse/images/tansformers-gpu.1d12f125-1d2d-464b-9ea7-2c616fd2e256/1/spark-3.2.0-dag-snapshotTest-20241030180001-SNAPSHOT-py39/python/lib/conda/lib/python3.9/site-packages/cv2/../../lib64:/home/t4/huse/images/tansformers-gpu.1d12f125-1d2d-464b-9ea7-2c616fd2e256/1/spark-3.2.0-dag-snapshotTest-20241030180001-SNAPSHOT-py39/python/lib/conda/lib/python3.9/site-packages/nvidia/nvjitlink/lib:/dev/shm/xpdk-lite:/dev/shm/xpdk-lite:/opt/conda/lib/python3.8/site-packages/aistudio_common/reader/libs/:/opt/taobao/java/jre/lib/amd64/server/:/usr/local/cuda/lib64:/usr/local/TensorRT-8.6.1/lib/:/usr/local/lib:/usr/local/lib64:/opt/ai-inference/
NVIDIA_VISIBLE_DEVICES=GPU-a18c8a32-ef6d-8d76-5a61-7a81da133490
NVIDIA_DRIVER_CAPABILITIES=utility,compute
CUDA_MPS_PIPE_DIRECTORY=/dev/shm/nvidia-mps
CUDA_MODULE_LOADING=LAZY

Model Input Dumps

No response

🐛 Describe the bug

Inference is exceptionally slow on the L20 GPU, speed 0.08tokens/s:

and graphics card usage is low:

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

joey9503 added the bug Something isn't working label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Inference is exceptionally slow on the L20 GPU #10652

[Bug]: Inference is exceptionally slow on the L20 GPU #10652

joey9503 commented Nov 26, 2024

[Bug]: Inference is exceptionally slow on the L20 GPU #10652

[Bug]: Inference is exceptionally slow on the L20 GPU #10652

Comments

joey9503 commented Nov 26, 2024

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...