diff --git a/serving/docs/lmi/configurations_large_model_inference_containers.md b/serving/docs/lmi/configurations_large_model_inference_containers.md index fa2255757..4f625b6dd 100644 --- a/serving/docs/lmi/configurations_large_model_inference_containers.md +++ b/serving/docs/lmi/configurations_large_model_inference_containers.md @@ -3,7 +3,7 @@ There are a number of shared configurations for python models running large language models. They are also available through the [Large Model Inference Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#large-model-inference-containers). -### Common ([doc](https://github.com/deepjavalibrary/djl-serving/blob/521e0edadec35b04ec9e1d51b9e406119efd0235/serving/docs/configurations_large_model_inference_containers.md#common-doc)) +### Common ([doc](https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-configuration.html)) | Item | Required | Description | Example value | |----------------------------------------------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------| diff --git a/serving/docs/lmi/tuning_guides/trtllm_tuning_guide.md b/serving/docs/lmi/tuning_guides/trtllm_tuning_guide.md new file mode 100644 index 000000000..c56749430 --- /dev/null +++ b/serving/docs/lmi/tuning_guides/trtllm_tuning_guide.md @@ -0,0 +1,3 @@ +# TensorRT LLM Tuning guide + +This doc recommends the configurations based on your model and instance type. \ No newline at end of file diff --git a/serving/docs/lmi/tutorials/trtllm_aot_tutorial.md b/serving/docs/lmi/tutorials/trtllm_aot_tutorial.md new file mode 100644 index 000000000..31533f14c --- /dev/null +++ b/serving/docs/lmi/tutorials/trtllm_aot_tutorial.md @@ -0,0 +1,3 @@ +# TensorRT LLM ahead of time compilation of models + +This doc helps you to convert your HuggingFace model to Tensorrt-LLM LMI model format to load and run inference with Tensorrt-LLM. \ No newline at end of file