You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Alibaba Group Enterprise Linux Server 7.2 (Paladin) (x86_64)
GCC version: (GCC) 9.2.1 20200522 (Alibaba 9.2.1-3 2.17)
Clang version: Could not collect
CMake version: version 3.20.1
Libc version: glibc-2.30
Python version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.9.151-015.ali3000.alios7.x86_64-x86_64-with-glibc2.30
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: L20-2-PCIE-48GB-48GB-L-H-V
Nvidia driver version: 535.161.08
cuDNN version: Probably one of the following:
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.3
/...es (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
Inference is exceptionally slow on the L20 GPU, speed 0.08tokens/s:
and graphics card usage is low:
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Alibaba Group Enterprise Linux Server 7.2 (Paladin) (x86_64)
GCC version: (GCC) 9.2.1 20200522 (Alibaba 9.2.1-3 2.17)
Clang version: Could not collect
CMake version: version 3.20.1
Libc version: glibc-2.30
Python version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-4.9.151-015.ali3000.alios7.x86_64-x86_64-with-glibc2.30
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: L20-2-PCIE-48GB-48GB-L-H-V
Nvidia driver version: 535.161.08
cuDNN version: Probably one of the following:
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.3
/usr/local/cuda/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.3
/...es (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
LD_LIBRARY_PATH=/home/t4/huse/images/tansformers-gpu.1d12f125-1d2d-464b-9ea7-2c616fd2e256/1/spark-3.2.0-dag-snapshotTest-20241030180001-SNAPSHOT-py39/python/lib/conda/lib/python3.9/site-packages/cv2/../../lib64:/home/t4/huse/images/tansformers-gpu.1d12f125-1d2d-464b-9ea7-2c616fd2e256/1/spark-3.2.0-dag-snapshotTest-20241030180001-SNAPSHOT-py39/python/lib/conda/lib/python3.9/site-packages/nvidia/nvjitlink/lib:/dev/shm/xpdk-lite:/dev/shm/xpdk-lite:/opt/conda/lib/python3.8/site-packages/aistudio_common/reader/libs/:/opt/taobao/java/jre/lib/amd64/server/:/usr/local/cuda/lib64:/usr/local/TensorRT-8.6.1/lib/:/usr/local/lib:/usr/local/lib64:/opt/ai-inference/
NVIDIA_VISIBLE_DEVICES=GPU-a18c8a32-ef6d-8d76-5a61-7a81da133490
NVIDIA_DRIVER_CAPABILITIES=utility,compute
CUDA_MPS_PIPE_DIRECTORY=/dev/shm/nvidia-mps
CUDA_MODULE_LOADING=LAZY
Model Input Dumps
No response
🐛 Describe the bug
Inference is exceptionally slow on the L20 GPU, speed 0.08tokens/s:
and graphics card usage is low:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: