Server stucks at model warming phase for codestral-22b on 4xH100 #2835

phymbert · 2024-12-13T08:33:22Z

Tgi version 3.0.1, official docker image: thanks for the amazing last releases 🤗

Within a kubernetes deployment with 256Gi mem request and shm volume.

Prefix caching and chunking enabled.

Works fine on 2xH100 but not on 4, i.e. CUDA_VISIBLE_DEVICES=0,1,2,3

Loading llama3.1-70b works fine on the same config with 4xH100.

Start TGI codestral-22b on 4 H100, it stucks at warming model phase.

Autoconfig and model warmed up for codestral22b on 4 H100 as it works on 2

KreshLaDoge mentioned this issue Dec 13, 2024

Model warmup fails after adding Triton indexing kernels #2838

Open

4 tasks

Provide feedback