You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm compiling a fine-tuned Llama 3.1 70B model with the below system info on an inf2.48xlarge machine. I'm using neuronX TGI 0.0.25 with AWS Sagemaker. I get the below error:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/compiled/e28e32d0143dad6277a9.neff'
FileNotFoundError: Could not find a matching NEFF foryour HLOin this directory. Ensure that the model you are trying to load is the same type and has the same parameters as the one you saved or call "save" on this model to reserialize it.
Platform:
- Platform: Linux-6.8.0-1015-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
Python packages:
- `optimum-neuron` version: 0.0.25
- `neuron-sdk` version: 2.20.0
- `optimum` version: 1.22.0
- `transformers` version: 4.43.2
- `huggingface_hub` version: 0.26.2
- `torch` version: 2.1.2+cu121
- `aws-neuronx-runtime-discovery` version: 2.9
- `libneuronxla` version: 2.0.4115.0
- `neuronx-cc` version: 2.15.128.0+56dc5a86
- `neuronx-distributed` version: 0.8.0
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.1.2.2.3.0
- `torch-xla` version: 2.1.4
- `transformers-neuronx` version: 0.12.313
Neuron Driver:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]
This is my compilation command:
optimum-cli export neuron -m orig-llama/ --batch_size 4 --task text-generation --sequence_length 4096 --num_cores 24 --auto_cast_type bf16 ./neuron-llama-throughput
Here is my TGI env:
sagemaker_model_env = {
"SM_MODEL_DIR" = "/opt/ml/model""HF_MODEL_ID" = "/opt/ml/model""HF_NUM_CORES" = "24""HF_BATCH_SIZE" = "4""HF_SEQUENCE_LENGTH" = "4096""HF_AUTO_CAST_TYPE" = "bf16""MAX_BATCH_SIZE" = "4""MAX_INPUT_TOKENS" = "3072""MAX_TOTAL_TOKENS" = "4096""MESSAGES_API_ENABLED" = "false""MAX_BATCH_PREFILL_TOKENS" = "16384""MAX_BATCH_TOTAL_TOKENS" = "20000""ROPE_SCALING" = "dynamic""ROPE_FACTOR" = "8.0"
}
System Info
Who can help?
@dacorvo @JingyaHuang
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Compile Llama 3.1 70B using the below system info on inf2.48xlarge and run on neuronX TGI version 0.0.25
Platform:
Python packages:
optimum-neuron
version: 0.0.25neuron-sdk
version: 2.20.0optimum
version: 1.22.0transformers
version: 4.43.2huggingface_hub
version: 0.26.2torch
version: 2.1.2+cu121aws-neuronx-runtime-discovery
version: 2.9libneuronxla
version: 2.0.4115.0neuronx-cc
version: 2.15.128.0+56dc5a86neuronx-distributed
version: 0.8.0neuronx-hwm
version: NAtorch-neuronx
version: 2.1.2.2.3.0torch-xla
version: 2.1.4transformers-neuronx
version: 0.12.313Neuron Driver:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]
Expected behavior
The neuronX TGI server to start
The text was updated successfully, but these errors were encountered: