-
Notifications
You must be signed in to change notification settings - Fork 68
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[doc] LMI environment variable instruction (#1334)
- Loading branch information
1 parent
ebe8821
commit 15694f9
Showing
1 changed file
with
85 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# LMI environment variable instruction | ||
|
||
LMI allow customer to specify environment variable. For example, if I want to deploy a LLM model with LMI without creating any files. Here is some options you can do | ||
|
||
## Standalone serving.properties to Environment variable | ||
|
||
option.model_id in LMI can be a HuggingFace Model ID, or an S3 url point to an uncompressed folder | ||
|
||
serving.properties | ||
|
||
``` | ||
engine=MPI | ||
option.model_id=tiiuae/falcon-40b | ||
option.task=text-generation | ||
option.entryPoint=djl_python.transformersneuronx | ||
option.trust_remote_code=true | ||
option.tensor_parallel_degree=4 | ||
option.max_rolling_batch_size=32 | ||
option.rolling_batch=lmi-dist | ||
option.dtype=fp16 | ||
``` | ||
|
||
The above serving.properties can be translated into all environment variable settings | ||
|
||
``` | ||
SERVING_LOAD_MODELS=test::MPI=/opt/ml/model | ||
OPTION_MODEL_ID=tiiuae/falcon-40b | ||
OPTION_TASK=text-generation | ||
OPTION_ENTRYPOINT=djl_python.transformersneuronx | ||
OPTION_TRUST_REMOTE_CODE=true | ||
OPTION_TENSOR_PARALLEL_DEGREE=4 | ||
OPTION_MAX_ROLLING_BATCH_SIZE=32 | ||
OPTION_ROLLING_BATCH=lmi-dist | ||
OPTION_DTYPE=FP16 | ||
``` | ||
|
||
engine translate from | ||
|
||
``` | ||
engine=<engine name> -> SERVING_LOAD_MODELS=test::<engine name>=/opt/ml/model | ||
``` | ||
|
||
All the rest properties are translate as | ||
|
||
``` | ||
option.<properties> -> OPTION_<PROPERTIES> | ||
``` | ||
|
||
If there are properties not starting with option, those are typically model server parameter, you can specify as following | ||
|
||
``` | ||
batch_size=4 | ||
max_batch_delay=200 | ||
``` | ||
|
||
Those are translate into | ||
|
||
``` | ||
SERVING_BATCH_SIZE=4 | ||
SERVING_MAX_BATCH_DELAY=200 | ||
``` | ||
|
||
## SageMaker trained model translation | ||
|
||
Let’s assume we used SageMaker to train/fine-tuned a model and upload to S3 as a tar.gz file. | ||
The file is located in: | ||
|
||
``` | ||
s3://my-training-repo/my_fine_tuned_llama.tar.gz | ||
``` | ||
|
||
Assume this file is a standard HuggingFace saved model. Here are something you can set to not alter the file to its original format | ||
|
||
``` | ||
from sagemaker import Model | ||
code_artifact = s3://my-training-repo/my_fine_tuned_llama.tar.gz | ||
env = {"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model", | ||
"OPTION_TASK": "text-generation", | ||
"OPTION_TENSOR_PARALLEL_DEGREE": 4, | ||
"OPTION_ROLLING_BATCH": "auto", | ||
"OPTION_DTYPE": "FP16"} | ||
model = Model(image_uri=image_uri, model_data=code_artifact, role=role) | ||
``` | ||
|
||
In this case, we will build the serving.properties on the fly for you and no other coding required! |