Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not find a matching NEFF for your HLO in this directory #730

Open
4 tasks
SteliosGian opened this issue Oct 30, 2024 · 1 comment
Open
4 tasks

Could not find a matching NEFF for your HLO in this directory #730

SteliosGian opened this issue Oct 30, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@SteliosGian
Copy link

SteliosGian commented Oct 30, 2024

System Info

I'm compiling a fine-tuned Llama 3.1 70B model with the below system info on an inf2.48xlarge machine. I'm using neuronX TGI 0.0.25 with AWS Sagemaker. I get the below error:

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/compiled/e28e32d0143dad6277a9.neff'
FileNotFoundError: Could not find a matching NEFF for your HLO in this directory. Ensure that the model you are trying to load is the same type and has the same parameters as the one you saved or call "save" on this model to reserialize it.

Platform:

- Platform: Linux-6.8.0-1015-aws-x86_64-with-glibc2.35
- Python version: 3.10.12


Python packages:

- `optimum-neuron` version: 0.0.25
- `neuron-sdk` version: 2.20.0
- `optimum` version: 1.22.0
- `transformers` version: 4.43.2
- `huggingface_hub` version: 0.26.2
- `torch` version: 2.1.2+cu121
- `aws-neuronx-runtime-discovery` version: 2.9
- `libneuronxla` version: 2.0.4115.0
- `neuronx-cc` version: 2.15.128.0+56dc5a86
- `neuronx-distributed` version: 0.8.0
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.1.2.2.3.0
- `torch-xla` version: 2.1.4
- `transformers-neuronx` version: 0.12.313


Neuron Driver:


WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

This is my compilation command:

optimum-cli export neuron -m orig-llama/ --batch_size 4 --task text-generation --sequence_length 4096 --num_cores 24 --auto_cast_type bf16 ./neuron-llama-throughput

Here is my TGI env:

sagemaker_model_env = {
    "SM_MODEL_DIR" = "/opt/ml/model"
    "HF_MODEL_ID" = "/opt/ml/model"
    "HF_NUM_CORES" = "24"
    "HF_BATCH_SIZE" = "4"
    "HF_SEQUENCE_LENGTH" = "4096"
    "HF_AUTO_CAST_TYPE" = "bf16"
    "MAX_BATCH_SIZE" = "4"
    "MAX_INPUT_TOKENS" = "3072"
    "MAX_TOTAL_TOKENS" = "4096"
    "MESSAGES_API_ENABLED" = "false"
    "MAX_BATCH_PREFILL_TOKENS" = "16384"
    "MAX_BATCH_TOTAL_TOKENS" = "20000"
    "ROPE_SCALING" = "dynamic"
    "ROPE_FACTOR" = "8.0"
  }

Who can help?

@dacorvo @JingyaHuang

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Compile Llama 3.1 70B using the below system info on inf2.48xlarge and run on neuronX TGI version 0.0.25

Platform:

  • Platform: Linux-6.8.0-1015-aws-x86_64-with-glibc2.35
  • Python version: 3.10.12

Python packages:

  • optimum-neuron version: 0.0.25
  • neuron-sdk version: 2.20.0
  • optimum version: 1.22.0
  • transformers version: 4.43.2
  • huggingface_hub version: 0.26.2
  • torch version: 2.1.2+cu121
  • aws-neuronx-runtime-discovery version: 2.9
  • libneuronxla version: 2.0.4115.0
  • neuronx-cc version: 2.15.128.0+56dc5a86
  • neuronx-distributed version: 0.8.0
  • neuronx-hwm version: NA
  • torch-neuronx version: 2.1.2.2.3.0
  • torch-xla version: 2.1.4
  • transformers-neuronx version: 0.12.313

Neuron Driver:

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

Expected behavior

The neuronX TGI server to start

@SteliosGian SteliosGian added the bug Something isn't working label Oct 30, 2024
@jimburtoft
Copy link
Contributor

I want to rule out an SDK mismatch between the compilation environment and the hosting environment.

Are you deploying on SageMaker? What image are you using?

If you are not deploying on SageMaker, try compiling using the TGI image itself.

See the example here:
https://github.com/huggingface/optimum-neuron/tree/main/benchmark/text-generation-inference/performance#compiling-the-model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants