Skip to content

Commit

Permalink
[doc] LMI environment variable instruction (#1334)
Browse files Browse the repository at this point in the history
  • Loading branch information
sindhuvahinis authored Nov 21, 2023
1 parent ebe8821 commit 15694f9
Showing 1 changed file with 85 additions and 0 deletions.
85 changes: 85 additions & 0 deletions serving/docs/lmi/lmi_environment_variable_instruction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# LMI environment variable instruction

LMI allow customer to specify environment variable. For example, if I want to deploy a LLM model with LMI without creating any files. Here is some options you can do

## Standalone serving.properties to Environment variable

option.model_id in LMI can be a HuggingFace Model ID, or an S3 url point to an uncompressed folder

serving.properties

```
engine=MPI
option.model_id=tiiuae/falcon-40b
option.task=text-generation
option.entryPoint=djl_python.transformersneuronx
option.trust_remote_code=true
option.tensor_parallel_degree=4
option.max_rolling_batch_size=32
option.rolling_batch=lmi-dist
option.dtype=fp16
```

The above serving.properties can be translated into all environment variable settings

```
SERVING_LOAD_MODELS=test::MPI=/opt/ml/model
OPTION_MODEL_ID=tiiuae/falcon-40b
OPTION_TASK=text-generation
OPTION_ENTRYPOINT=djl_python.transformersneuronx
OPTION_TRUST_REMOTE_CODE=true
OPTION_TENSOR_PARALLEL_DEGREE=4
OPTION_MAX_ROLLING_BATCH_SIZE=32
OPTION_ROLLING_BATCH=lmi-dist
OPTION_DTYPE=FP16
```

engine translate from

```
engine=<engine name> -> SERVING_LOAD_MODELS=test::<engine name>=/opt/ml/model
```

All the rest properties are translate as

```
option.<properties> -> OPTION_<PROPERTIES>
```

If there are properties not starting with option, those are typically model server parameter, you can specify as following

```
batch_size=4
max_batch_delay=200
```

Those are translate into

```
SERVING_BATCH_SIZE=4
SERVING_MAX_BATCH_DELAY=200
```

## SageMaker trained model translation

Let’s assume we used SageMaker to train/fine-tuned a model and upload to S3 as a tar.gz file.
The file is located in:

```
s3://my-training-repo/my_fine_tuned_llama.tar.gz
```

Assume this file is a standard HuggingFace saved model. Here are something you can set to not alter the file to its original format

```
from sagemaker import Model
code_artifact = s3://my-training-repo/my_fine_tuned_llama.tar.gz
env = {"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
"OPTION_TASK": "text-generation",
"OPTION_TENSOR_PARALLEL_DEGREE": 4,
"OPTION_ROLLING_BATCH": "auto",
"OPTION_DTYPE": "FP16"}
model = Model(image_uri=image_uri, model_data=code_artifact, role=role)
```

In this case, we will build the serving.properties on the fly for you and no other coding required!

0 comments on commit 15694f9

Please sign in to comment.