From 15694f9e2ed346724aa1e0ec6dc637f6cbdc8def Mon Sep 17 00:00:00 2001 From: Sindhu Somasundaram <56774226+sindhuvahinis@users.noreply.github.com> Date: Mon, 20 Nov 2023 18:08:32 -0800 Subject: [PATCH] [doc] LMI environment variable instruction (#1334) --- .../lmi_environment_variable_instruction.md | 85 +++++++++++++++++++ 1 file changed, 85 insertions(+) create mode 100644 serving/docs/lmi/lmi_environment_variable_instruction.md diff --git a/serving/docs/lmi/lmi_environment_variable_instruction.md b/serving/docs/lmi/lmi_environment_variable_instruction.md new file mode 100644 index 000000000..7013f9267 --- /dev/null +++ b/serving/docs/lmi/lmi_environment_variable_instruction.md @@ -0,0 +1,85 @@ +# LMI environment variable instruction + +LMI allow customer to specify environment variable. For example, if I want to deploy a LLM model with LMI without creating any files. Here is some options you can do + +## Standalone serving.properties to Environment variable + +option.model_id in LMI can be a HuggingFace Model ID, or an S3 url point to an uncompressed folder + +serving.properties + +``` +engine=MPI +option.model_id=tiiuae/falcon-40b +option.task=text-generation +option.entryPoint=djl_python.transformersneuronx +option.trust_remote_code=true +option.tensor_parallel_degree=4 +option.max_rolling_batch_size=32 +option.rolling_batch=lmi-dist +option.dtype=fp16 +``` + +The above serving.properties can be translated into all environment variable settings + +``` +SERVING_LOAD_MODELS=test::MPI=/opt/ml/model +OPTION_MODEL_ID=tiiuae/falcon-40b +OPTION_TASK=text-generation +OPTION_ENTRYPOINT=djl_python.transformersneuronx +OPTION_TRUST_REMOTE_CODE=true +OPTION_TENSOR_PARALLEL_DEGREE=4 +OPTION_MAX_ROLLING_BATCH_SIZE=32 +OPTION_ROLLING_BATCH=lmi-dist +OPTION_DTYPE=FP16 +``` + +engine translate from + +``` +engine= -> SERVING_LOAD_MODELS=test::=/opt/ml/model +``` + +All the rest properties are translate as + +``` +option. -> OPTION_ +``` + +If there are properties not starting with option, those are typically model server parameter, you can specify as following + +``` +batch_size=4 +max_batch_delay=200 +``` + +Those are translate into + +``` +SERVING_BATCH_SIZE=4 +SERVING_MAX_BATCH_DELAY=200 +``` + +## SageMaker trained model translation + +Let’s assume we used SageMaker to train/fine-tuned a model and upload to S3 as a tar.gz file. +The file is located in: + +``` +s3://my-training-repo/my_fine_tuned_llama.tar.gz +``` + +Assume this file is a standard HuggingFace saved model. Here are something you can set to not alter the file to its original format + +``` +from sagemaker import Model +code_artifact = s3://my-training-repo/my_fine_tuned_llama.tar.gz +env = {"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model", + "OPTION_TASK": "text-generation", + "OPTION_TENSOR_PARALLEL_DEGREE": 4, + "OPTION_ROLLING_BATCH": "auto", + "OPTION_DTYPE": "FP16"} +model = Model(image_uri=image_uri, model_data=code_artifact, role=role) +``` + +In this case, we will build the serving.properties on the fly for you and no other coding required!