From 15694f9e2ed346724aa1e0ec6dc637f6cbdc8def Mon Sep 17 00:00:00 2001
From: Sindhu Somasundaram <56774226+sindhuvahinis@users.noreply.github.com>
Date: Mon, 20 Nov 2023 18:08:32 -0800
Subject: [PATCH] [doc] LMI environment variable instruction (#1334)

---
 .../lmi_environment_variable_instruction.md   | 85 +++++++++++++++++++
 1 file changed, 85 insertions(+)
 create mode 100644 serving/docs/lmi/lmi_environment_variable_instruction.md
diff --git a/serving/docs/lmi/lmi_environment_variable_instruction.md b/serving/docs/lmi/lmi_environment_variable_instruction.md
new file mode 100644
index 000000000..7013f9267
--- /dev/null
+++ b/serving/docs/lmi/lmi_environment_variable_instruction.md
@@ -0,0 +1,85 @@
+#  LMI environment variable instruction
+
+LMI allow customer to specify environment variable. For example, if I want to deploy a LLM model with LMI without creating any files. Here is some options you can do
+
+## Standalone serving.properties to Environment variable
+
+option.model_id in LMI can be a HuggingFace Model ID, or an S3 url point to an uncompressed folder
+
+serving.properties
+
+```
+engine=MPI
+option.model_id=tiiuae/falcon-40b
+option.task=text-generation
+option.entryPoint=djl_python.transformersneuronx
+option.trust_remote_code=true
+option.tensor_parallel_degree=4
+option.max_rolling_batch_size=32
+option.rolling_batch=lmi-dist
+option.dtype=fp16
+```
+
+The above serving.properties can be translated into all environment variable settings
+
+```
+SERVING_LOAD_MODELS=test::MPI=/opt/ml/model
+OPTION_MODEL_ID=tiiuae/falcon-40b
+OPTION_TASK=text-generation
+OPTION_ENTRYPOINT=djl_python.transformersneuronx
+OPTION_TRUST_REMOTE_CODE=true
+OPTION_TENSOR_PARALLEL_DEGREE=4
+OPTION_MAX_ROLLING_BATCH_SIZE=32
+OPTION_ROLLING_BATCH=lmi-dist
+OPTION_DTYPE=FP16
+```
+
+engine translate from
+
+```
+engine=<engine name> -> SERVING_LOAD_MODELS=test::<engine name>=/opt/ml/model
+```
+
+All the rest properties are translate as
+
+```
+option.<properties> -> OPTION_<PROPERTIES>
+```
+
+If there are properties not starting with option, those are typically model server parameter, you can specify as following
+
+```
+batch_size=4
+max_batch_delay=200
+```
+
+Those are translate into
+
+```
+SERVING_BATCH_SIZE=4
+SERVING_MAX_BATCH_DELAY=200
+```
+
+## SageMaker trained model translation
+
+Let’s assume we used SageMaker to train/fine-tuned a model and upload to S3 as a tar.gz file.
+The file is located in:
+
+```
+s3://my-training-repo/my_fine_tuned_llama.tar.gz
+```
+
+Assume this file is a standard HuggingFace saved model. Here are something you can set to not alter the file to its original format
+
+```
+from sagemaker import Model
+code_artifact = s3://my-training-repo/my_fine_tuned_llama.tar.gz
+env = {"SERVING_LOAD_MODELS": "test::MPI=/opt/ml/model",
+       "OPTION_TASK": "text-generation",
+       "OPTION_TENSOR_PARALLEL_DEGREE": 4,
+       "OPTION_ROLLING_BATCH": "auto",
+       "OPTION_DTYPE": "FP16"}
+model = Model(image_uri=image_uri, model_data=code_artifact, role=role)
+```
+
+In this case, we will build the serving.properties on the fly for you and no other coding required!