You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
type: service
# The name is optional, if not specified, generated randomly
name: llama31
# Using a Docker image with a fix instead of the official one
# More details at https://github.com/huggingface/optimum-tpu/pull/87
image: dstackai/optimum-tpu:llama31
# Required environment variables
env:
- HF_TOKEN
- MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
- MAX_TOTAL_TOKENS=4096
- MAX_BATCH_PREFILL_TOKENS=4095
commands:
- text-generation-launcher --port 8000
# Expose the TGI port
port: 8000
model: meta-llama/Meta-Llama-3.1-8B-Instruct
resources:
# Required resources
gpu: v5litepod-4
Doesn't matter spot or on-demand
Actual behaviour
dstack apply -f tpu/tgi.dstack.yml
Project main
User admin
Configuration tpu/tgi.dstack.yml
Type service
Resources 2..xCPU, 8GB.., 1xv5litepod-4, 100GB.. (disk)
Max price -
Max duration -
Spot policy on-demand
Retry policy no
Creation policy reuse-or-create
Termination policy destroy-after-idle
Termination idle time 5m
# BACKEND REGION INSTANCE RESOURCES SPOT PRICE
1 gcp us-central1 v5litepod-4 1xv5litepod-4, 100.0GB (disk) no $4.8
2 gcp us-east5 v5litepod-4 1xv5litepod-4, 100.0GB (disk) no $4.8
3 gcp us-south1 v5litepod-4 1xv5litepod-4, 100.0GB (disk) no $4.8
...
Shown 3 of 8 offers, $6.24 max
Finished run llama31 already exists.
Override the run? [y/n]: y
llama31 provisioning completed (terminating)
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and
server logs for more details.
Expected behaviour
No response
dstack version
master
Server logs
[20:17:29] INFO dstack._internal.server.services.backends:404 Requesting instance offers from backends:
['runpod', 'cudo', 'lambda', 'gcp', 'aws', 'azure']
[20:17:55] INFO dstack._internal.server.background.tasks.process_submitted_jobs:255 job(7602a8)llama31-0-0: now
is provisioning a new instance
INFO dstack._internal.server.background.tasks.process_submitted_jobs:280 The job llama31-0-0 created
the new instance llama31-0
[20:17:57] INFO dstack._internal.server.background.tasks.process_runs:330 run(4b9cae)llama31: run status has
changed SUBMITTED -> PROVISIONING
[20:18:13] WARNING dstack._internal.server.background.tasks.process_instances:718 Error while waiting for instance
llama31-0 to become running: ProvisioningError('Failed to get instance IP address. Instance not found.')
[20:18:18] INFO dstack._internal.server.background.tasks.process_instances:783 Instance llama31-0 terminated
[20:18:21] INFO dstack._internal.server.background.tasks.process_runs:330 run(4b9cae)llama31: run status has
changed PROVISIONING -> TERMINATING
[20:18:27] INFO dstack._internal.server.services.jobs:268 job(7602a8)llama31-0-0: instance 'llama31-0' has been
released, new status is TERMINATED
INFO dstack._internal.server.services.jobs:283 job(7602a8)llama31-0-0: job status is FAILED, reason:
FAILED_TO_START_DUE_TO_NO_CAPACITY
[20:18:29] INFO dstack._internal.server.services.runs:952 run(4b9cae)llama31: run status has changed
TERMINATING -> FAILED, reason: JOB_FAILED
[20:18:40] INFO dstack._internal.server.background.tasks.process_fleets:72 Automatic cleanup of an empty fleet
llama31
INFO dstack._internal.server.background.tasks.process_fleets:78 Fleet llama31 deleted
### Additional information
_No response_
The text was updated successfully, but these errors were encountered:
Steps to reproduce
Doesn't matter spot or on-demand
Actual behaviour
Expected behaviour
No response
dstack version
master
Server logs
The text was updated successfully, but these errors were encountered: