From 9e00b4f4cfd04ff805299c70e66b90c142cae96d Mon Sep 17 00:00:00 2001
From: Julien Simon <julien@arcee.ai>
Date: Tue, 17 Dec 2024 08:55:24 +0000
Subject: [PATCH] Initial version

---
 .../sample-notebook-coder-on-sagemaker.ipynb  | 589 ++++++++++++++++++
 ...ample-notebook-virtuoso-on-sagemaker.ipynb | 570 +++++++++++++++++
 2 files changed, 1159 insertions(+)
 create mode 100644 model_package_notebooks/sample-notebook-coder-on-sagemaker.ipynb
 create mode 100644 model_package_notebooks/sample-notebook-virtuoso-on-sagemaker.ipynb

diff --git a/model_package_notebooks/sample-notebook-coder-on-sagemaker.ipynb b/model_package_notebooks/sample-notebook-coder-on-sagemaker.ipynb
new file mode 100644
index 0000000..6c0060a
--- /dev/null
+++ b/model_package_notebooks/sample-notebook-coder-on-sagemaker.ipynb
@@ -0,0 +1,589 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using Arcee.ai Coder Models on SageMaker through Model Packages\n",
+    "*The latest version of this notebook is available on [Github](https://github.com/arcee-ai/aws-samples/tree/main/model_package_notebooks).*\n",
+    "\n",
+    "This notebook shows you how to deploy the [Arcee.ai](https://www.arcee.ai) Coder models listed on [AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile?id=seller-r7b33ivdczgs6). You must have previously subscribed to the appropriate model to deploy it.\n",
+    "\n",
+    "The Coder models are purpose-built for developers, and excel in both benchmark performance and real-world applications. They bring you the same quality as much larger models in a more compact form ideal for organizations looking for both performance and cost efficiency.\n",
+    "\n",
+    "They are available in two sizes, both with a 32K token context size:\n",
+    "* **Coder Large** handles advanced programming and development tasks.\n",
+    "* **Coder Small** is a lightweight option for faster, simpler coding workflows and autocomplete tasks.\n",
+    "\n",
+    "Models are deployed to an Amazon SageMaker endpoint.  If you need general information on real-time inference with SageMaker, please refer to the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html).\n",
+    "\n",
+    "**If you already deployed the model package with CloudFormation, the AWS CLI or directly in the AWS console, there is no need to deploy it again with this notebook. For inference, please use the [sample-notebook-all-models-existing-sagemaker-endpoint.ipynb](sample-notebook-all-models-existing-sagemaker-endpoint.ipynb) notebook instead.**\n",
+    "\n",
+    "## Use cases\n",
+    "The Coder models are suitable for a wide range of code-related tasks, demonstrating particular strength in:\n",
+    "* **Writing and Generating Code**: Creating code snippets, scripts, and full programs in various programming languages (e.g., Python, Java, C++, JavaScript).\n",
+    "* **Debugging Code**: Identifying and fixing errors in existing code.\n",
+    "* **Code Optimization**: Improving the performance and efficiency of code.\n",
+    "* **Algorithm Design**: Developing algorithms to solve specific problems.\n",
+    "* **Learning Resources**: Recommending tutorials, documentation, and learning materials for different programming languages and frameworks.\n",
+    "\n",
+    "\n",
+    "They can be applied to various business tasks such as:\n",
+    "* **Software Development**: Writing and generating code for custom software solutions tailored to business needs.\n",
+    "* **API Integration**: Assisting with the integration of third-party APIs to enhance functionality and data flow.\n",
+    "* **Data Analysis**: Using code to perform data analysis and generate insights for business decision-making.\n",
+    "* **DevOps Implementation**: Helping with setting up CI/CD pipelines, containerization, and cloud deployments to streamline development processes.\n",
+    "* **Security Assurance**: Providing code reviews and security best practices to protect business applications from vulnerabilities.\n",
+    "\n",
+    "## Pre-requisites\n",
+    "1. This notebook works for models listed on AWS Marketplace. Please make sure you have previously subscribed to the appropriate model.\n",
+    "1. Ensure that IAM role attached to this notebook has the **AmazonSageMakerFullAccess** IAM policy.\n",
+    "\n",
+    "## Contents\n",
+    "1. [Select model package](#1.-Select-model-package)\n",
+    "\n",
+    "2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)\n",
+    "    1. [Define the endpoint configuration](#A.-Define-the-endpoint-configuration)\n",
+    "    2. [Create the endpoint](#B.-Create-the-endpoint)\n",
+    "    3. [Define a test payload](#C.-Define-a-test-payload)\n",
+    "    4. [Perform real-time inference](#D.-Perform-real-time-inference)\n",
+    "    5. [Visualize output](#E.-Visualize-output)\n",
+    "    6. [Perform streaming inference](#F.-Perform-streaming-inference)\n",
+    "\n",
+    "3. [Clean-up](#3.-Clean-up)\n",
+    "    1. [Delete the model](#A.-Delete-the-model)\n",
+    "    2. [Delete the endpoint](#B.-Delete-the-endpoint)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "pip install -q boto3 sagemaker"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datetime\n",
+    "import json\n",
+    "import pprint\n",
+    "\n",
+    "import boto3\n",
+    "import sagemaker\n",
+    "from IPython.display import Markdown, display\n",
+    "from sagemaker import ModelPackage, get_execution_role\n",
+    "from sagemaker_streaming import print_event_stream"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "role = get_execution_role()\n",
+    "sagemaker_session = sagemaker.Session()\n",
+    "runtime_sm_client = boto3.client(\"runtime.sagemaker\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Select the model package\n",
+    "\n",
+    "Virtuoso Small, Medium and Large are packaged separately. Please run one of the three cells below to select the size you'd like to deploy, and the instance type you'd like to deploy it on. \n",
+    "\n",
+    "By default, models are deployed on Amazon EC2 [g6e](https://aws.amazon.com/ec2/instance-types/g6e/) instances powered by NVIDIA L40S GPUs. You may use other instance types as long as they're supported by the model package: you will find the list on the AWS Marketplace model page."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell to deploy Coder Small\n",
+    "\n",
+    "model_name = \"coder-small\"\n",
+    "real_time_inference_instance_type = \"ml.g6e.12xlarge\"\n",
+    "\n",
+    "model_package_map = {\n",
+    "    \"ap-northeast-1\": \"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Tokyo\n",
+    "    \"ap-northeast-2\": \"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Seoul\n",
+    "    \"ap-south-1\": \"arn:aws:sagemaker:ap-south-1:077584701553:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Mumbai\n",
+    "    \"ap-southeast-1\": \"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Singapore\n",
+    "    \"ap-southeast-2\": \"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Sydney\n",
+    "    \"ca-central-1\": \"arn:aws:sagemaker:ca-central-1:470592106596:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Canada Central\n",
+    "    \"eu-central-1\": \"arn:aws:sagemaker:eu-central-1:446921602837:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Frankfurt\n",
+    "    \"eu-north-1\": \"arn:aws:sagemaker:eu-north-1:136758871317:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Stockholm\n",
+    "    \"eu-west-1\": \"arn:aws:sagemaker:eu-west-1:985815980388:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Ireland\n",
+    "    \"eu-west-2\": \"arn:aws:sagemaker:eu-west-2:856760150666:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # London\n",
+    "    \"eu-west-3\": \"arn:aws:sagemaker:eu-west-3:843114510376:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Paris\n",
+    "    \"sa-east-1\": \"arn:aws:sagemaker:sa-east-1:270155090741:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # São Paulo\n",
+    "    \"us-east-1\": \"arn:aws:sagemaker:us-east-1:865070037744:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # N. Virginia\n",
+    "    \"us-east-2\": \"arn:aws:sagemaker:us-east-2:057799348421:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Ohio\n",
+    "    \"us-west-1\": \"arn:aws:sagemaker:us-west-1:382657785993:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # N. California\n",
+    "    \"us-west-2\": \"arn:aws:sagemaker:us-west-2:594846645681:model-package/coder-small-vllm-marketplace-v-d0c67172309a363d990442eeb1299404\",  # Oregon\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell to deploy Coder Large\n",
+    "\n",
+    "model_name = \"coder-large\"\n",
+    "real_time_inference_instance_type = \"ml.g6e.48xlarge\"\n",
+    "\n",
+    "model_package_map = {\n",
+    "    \"ap-northeast-1\": \"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Tokyo\n",
+    "    \"ap-northeast-2\": \"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Seoul\n",
+    "    \"ap-south-1\": \"arn:aws:sagemaker:ap-south-1:077584701553:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Mumbai\n",
+    "    \"ap-southeast-1\": \"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Singapore\n",
+    "    \"ap-southeast-2\": \"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Sydney\n",
+    "    \"ca-central-1\": \"arn:aws:sagemaker:ca-central-1:470592106596:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Canada Central\n",
+    "    \"eu-central-1\": \"arn:aws:sagemaker:eu-central-1:446921602837:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Frankfurt\n",
+    "    \"eu-north-1\": \"arn:aws:sagemaker:eu-north-1:136758871317:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Stockholm\n",
+    "    \"eu-west-1\": \"arn:aws:sagemaker:eu-west-1:985815980388:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Ireland\n",
+    "    \"eu-west-2\": \"arn:aws:sagemaker:eu-west-2:856760150666:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # London\n",
+    "    \"eu-west-3\": \"arn:aws:sagemaker:eu-west-3:843114510376:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Paris\n",
+    "    \"sa-east-1\": \"arn:aws:sagemaker:sa-east-1:270155090741:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # São Paulo\n",
+    "    \"us-east-1\": \"arn:aws:sagemaker:us-east-1:865070037744:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # N. Virginia\n",
+    "    \"us-east-2\": \"arn:aws:sagemaker:us-east-2:057799348421:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Ohio\n",
+    "    \"us-west-1\": \"arn:aws:sagemaker:us-west-1:382657785993:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # N. California\n",
+    "    \"us-west-2\": \"arn:aws:sagemaker:us-west-2:594846645681:model-package/coder-large-vllm-marketplace-v-271e62e9152c3b48a99025600eb78bfb\",  # Oregon\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "region = boto3.Session().region_name\n",
+    "if region not in model_package_map.keys():\n",
+    "    raise \"UNSUPPORTED REGION\"\n",
+    "\n",
+    "model_package_arn = model_package_map[region]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Create an endpoint and perform real-time inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### A. Define the endpoint configuration"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Models have been pre-packaged and stored in AWS. No public download is taking place at deployment time.\n",
+    "\n",
+    "The SageMaker endpoint runs the AWS [Large Model Inference](https://docs.djl.ai/master/docs/serving/serving/docs/lmi/index.html) container (LMI), powered by the vLLM inference server. vLLM enables high-performance text generation for the most popular open-source language models. \n",
+    "\n",
+    "The [OpenAI Messages API](https://huggingface.co/docs/text-generation-inference/messages_api) is available in vLLM."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### B. Create the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployable model from the model package.\n",
+    "model = ModelPackage(\n",
+    "    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session\n",
+    ")\n",
+    "\n",
+    "# create a unique endpoint name\n",
+    "timestamp = \"{:%Y-%m-%d-%H-%M-%S}\".format(datetime.datetime.now())\n",
+    "endpoint_name = f\"{model_name}-{timestamp}\"\n",
+    "print(f\"Deploying endpoint {endpoint_name}\")\n",
+    "\n",
+    "# deploy the model\n",
+    "response = model.deploy(\n",
+    "    initial_instance_count=1,\n",
+    "    instance_type=real_time_inference_instance_type,\n",
+    "    endpoint_name=endpoint_name,\n",
+    "    model_data_download_timeout=900,\n",
+    "    container_startup_health_check_timeout=900,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once the endpoint is in service, you will be able to perform real-time inference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### C. Define a test payload"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a friendly and helpful AI coding assistant.\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"\"\"Explain the difference between logits distillation and hidden states distillation.\n",
+    "            In particular, explain how the loss functions differ. Show code snippets with Pytorch.\"\"\",\n",
+    "        },\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### D. Perform real-time inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = runtime_sm_client.invoke_endpoint(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    ContentType=\"application/json\",\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    ")\n",
+    "\n",
+    "output = json.loads(response[\"Body\"].read().decode(\"utf8\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### E. Visualize output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can print the raw JSON output in OpenAI format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pprint.pprint(output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also print the generated output with Markdown formatting."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(Markdown(output[\"choices\"][0][\"message\"][\"content\"]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### F. Perform streaming inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here are some more examples. Please feel free to tweak them and add your own!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "code = \"\"\"\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "\n",
+    "def process_data(file_path):\n",
+    "    # Load data from CSV\n",
+    "    data = pd.read_csv(file_path)\n",
+    "    # Filter data where 'age' is greater than 30\n",
+    "    filtered_data = data[data['age'] > 30]\n",
+    "    # Calculate the mean of the 'salary' column\n",
+    "    mean_salary = np.mean(filtered_data['salary'])\n",
+    "    # Calculate the standard deviation of the 'salary' column\n",
+    "    std_salary = np.std(filtered_data['salary'])\n",
+    "    # Create a summary dictionary\n",
+    "    summary = {\n",
+    "        'mean_salary': mean_salary,\n",
+    "        'std_salary': std_salary\n",
+    "    }\n",
+    "    return summary\n",
+    "\"\"\"\n",
+    "\n",
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a friendly and helpful AI coding assistant.\",\n",
+    "        },\n",
+    "        {\"role\": \"user\", \"content\": f\"Identify and fix issues in this code: {code}\"},\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "    \"stream\": True,\n",
+    "}\n",
+    "\n",
+    "response = runtime_sm_client.invoke_endpoint_with_response_stream(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    "    ContentType=\"application/json\",\n",
+    ")\n",
+    "\n",
+    "print_event_stream(response[\"Body\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "code = \"\"\"\n",
+    "import boto3\n",
+    "\n",
+    "my_bucket = \"image-bucket'\n",
+    "s3 = boto3.client('s3')\n",
+    "\n",
+    "# List all JPEG images larger than 8MB\n",
+    "\"\"\"\n",
+    "\n",
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a friendly and helpful AI coding assistant.\",\n",
+    "        },\n",
+    "        {\"role\": \"user\", \"content\": f\"Complete this code fragment: {code}\"},\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "    \"stream\": True,\n",
+    "}\n",
+    "\n",
+    "response = runtime_sm_client.invoke_endpoint_with_response_stream(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    "    ContentType=\"application/json\",\n",
+    ")\n",
+    "\n",
+    "print_event_stream(response[\"Body\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "code = \"\"\"\n",
+    "import com.amazonaws.auth.AWSStaticCredentialsProvider;\n",
+    "import com.amazonaws.auth.BasicAWSCredentials;\n",
+    "import com.amazonaws.services.s3.AmazonS3;\n",
+    "import com.amazonaws.services.s3.AmazonS3ClientBuilder;\n",
+    "import com.amazonaws.services.s3.model.ObjectListing;\n",
+    "import com.amazonaws.services.s3.model.S3ObjectSummary;\n",
+    "\n",
+    "public class S3ListObjects {\n",
+    "\n",
+    "    public static void main(String[] args) {\n",
+    "        // Hardcoded AWS credentials (bad practice)\n",
+    "        String accessKey = \"YOUR_ACCESS_KEY\";\n",
+    "        String secretKey = \"YOUR_SECRET_KEY\";\n",
+    "        String bucketName = \"image-bucket\";\n",
+    "\n",
+    "        // Create AWS credentials\n",
+    "        BasicAWSCredentials awsCreds = new BasicAWSCredentials(accessKey, secretKey);\n",
+    "\n",
+    "        // Create S3 client\n",
+    "        AmazonS3 s3Client = AmazonS3ClientBuilder.standard()\n",
+    "                .withRegion(\"us-west-2\")\n",
+    "                .withCredentials(new AWSStaticCredentialsProvider(awsCreds))\n",
+    "                .build();\n",
+    "\n",
+    "        // List objects in the bucket\n",
+    "        ObjectListing objectListing = s3Client.listObjects(bucketName);\n",
+    "\n",
+    "        // Iterate over the objects and print their keys\n",
+    "        for (S3ObjectSummary os : objectListing.getObjectSummaries()) {\n",
+    "            System.out.println(\" - \" + os.getKey() + \" (size = \" + os.getSize() + \")\");\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\"\"\"\n",
+    "\n",
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a friendly and helpful AI coding assistant.\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": f\"Fix all issues in the code. Write a short description for a pull request: {code}\",\n",
+    "        },\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "    \"stream\": True,\n",
+    "}\n",
+    "\n",
+    "response = runtime_sm_client.invoke_endpoint_with_response_stream(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    "    ContentType=\"application/json\",\n",
+    ")\n",
+    "\n",
+    "print_event_stream(response[\"Body\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Clean-up\n",
+    "\n",
+    "Please don't forget to run the cells below to delete all resources and avoid unecessary charges."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### A. Delete the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.sagemaker_session.delete_endpoint(endpoint_name)\n",
+    "model.sagemaker_session.delete_endpoint_config(endpoint_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### B. Delete the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.delete_model()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Thank you for trying Coder. We have only scratched the surface of what you can do with this model.\n",
+    "\n",
+    "We'd be happy to hear from you, learn more about your use case, and help you build your next AI-powered product or service. Please reach out to julien@arcee.ai."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "instance_type": "ml.t3.medium",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/model_package_notebooks/sample-notebook-virtuoso-on-sagemaker.ipynb b/model_package_notebooks/sample-notebook-virtuoso-on-sagemaker.ipynb
new file mode 100644
index 0000000..b67aa56
--- /dev/null
+++ b/model_package_notebooks/sample-notebook-virtuoso-on-sagemaker.ipynb
@@ -0,0 +1,570 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using Arcee.ai Virtuoso Models on SageMaker through Model Packages\n",
+    "*The latest version of this notebook is available on [Github](https://github.com/arcee-ai/aws-samples/tree/main/model_package_notebooks).*\n",
+    "\n",
+    "This notebook shows you how to deploy the [Arcee.ai](https://www.arcee.ai) Virtuoso models listed on [AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile?id=seller-r7b33ivdczgs6). You must have previously subscribed to the appropriate model to deploy it.\n",
+    "\n",
+    "The Virtuoso models are general-purpose, high-performance models that excel in both benchmark performance and real-world applications. They bring you the same quality as much larger models in a more compact form ideal for organizations looking for both performance and cost efficiency.\n",
+    "\n",
+    "They are available in three sizes, all with a 128K token context size:\n",
+    "* **Virtuoso Large:** Best-in-class frontier model.\n",
+    "* **Virtuoso Medium**: Mid-tier general-purpose performance at a lower cost. \n",
+    "* **Virtuoso Small**: Optimized for lightweight tasks and faster inference.\n",
+    "\n",
+    "Models are deployed to an Amazon SageMaker endpoint.  If you need general information on real-time inference with SageMaker, please refer to the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html).\n",
+    "\n",
+    "**If you already deployed the model package with CloudFormation, the AWS CLI or directly in the AWS console, there is no need to deploy it again with this notebook. For inference, please use the [sample-notebook-all-models-existing-sagemaker-endpoint.ipynb](sample-notebook-all-models-existing-sagemaker-endpoint.ipynb) notebook instead.**\n",
+    "\n",
+    "## Use cases\n",
+    "The Virtuoso models are suitable for a wide range of language tasks, demonstrating particular strength in:\n",
+    "* **Reasoning**: Solving complex problems and drawing logical conclusions.\n",
+    "* **Creative Writing**: Generating engaging and original content across various genres.\n",
+    "* **General Language Understanding**: Comprehending and generating human-like text in diverse contexts.\n",
+    "\n",
+    "They can be applied to various business tasks such as:\n",
+    "* **Customer Service**: Implement sophisticated chatbots and virtual assistants.\n",
+    "* **Content Creation**: Generate high-quality written content for marketing and documentation.\n",
+    "* **Data Analysis**: Enhance data interpretation and generate insightful reports.\n",
+    "* **Research and Development**: Assist in literature reviews and hypothesis generation.\n",
+    "* **Legal and Compliance**: Automate contract analysis and regulatory compliance checks.\n",
+    "* **Education and Training**: Create adaptive learning systems and intelligent tutoring programs.\n",
+    "\n",
+    "## Pre-requisites\n",
+    "1. This notebook works for models listed on AWS Marketplace. Please make sure you have previously subscribed to the appropriate model.\n",
+    "1. Ensure that IAM role attached to this notebook has the **AmazonSageMakerFullAccess** IAM policy.\n",
+    "\n",
+    "## Contents\n",
+    "1. [Select model package](#1.-Select-model-package)\n",
+    "\n",
+    "2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)\n",
+    "    1. [Define the endpoint configuration](#A.-Define-the-endpoint-configuration)\n",
+    "    2. [Create the endpoint](#B.-Create-the-endpoint)\n",
+    "    3. [Define a test payload](#C.-Define-a-test-payload)\n",
+    "    4. [Perform real-time inference](#D.-Perform-real-time-inference)\n",
+    "    5. [Visualize output](#E.-Visualize-output)\n",
+    "    6. [Perform streaming inference](#F.-Perform-streaming-inference)\n",
+    "\n",
+    "3. [Clean-up](#3.-Clean-up)\n",
+    "    1. [Delete the model](#A.-Delete-the-model)\n",
+    "    2. [Delete the endpoint](#B.-Delete-the-endpoint)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "pip install -q boto3 sagemaker"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import datetime\n",
+    "import json\n",
+    "import pprint\n",
+    "\n",
+    "import boto3\n",
+    "import sagemaker\n",
+    "from IPython.display import Markdown, display\n",
+    "from sagemaker import ModelPackage, get_execution_role\n",
+    "from sagemaker_streaming import print_event_stream"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "role = get_execution_role()\n",
+    "sagemaker_session = sagemaker.Session()\n",
+    "runtime_sm_client = boto3.client(\"runtime.sagemaker\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Select the model package\n",
+    "\n",
+    "Virtuoso Small, Medium and Large are packaged separately. Please run one of the three cells below to select the size you'd like to deploy, and the instance type you'd like to deploy it on. \n",
+    "\n",
+    "By default, models are deployed on Amazon EC2 [g6e](https://aws.amazon.com/ec2/instance-types/g6e/) instances powered by NVIDIA L40S GPUs. You may use other instance types as long as they're supported by the model package: you will find the list on the AWS Marketplace model page."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell to deploy Virtuoso Small\n",
+    "\n",
+    "model_name = \"virtuoso-small\"\n",
+    "real_time_inference_instance_type = \"ml.g6e.12xlarge\"\n",
+    "\n",
+    "model_package_map = {\n",
+    "    \"ap-northeast-1\": \"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Tokyo\n",
+    "    \"ap-northeast-2\": \"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Seoul\n",
+    "    \"ap-south-1\": \"arn:aws:sagemaker:ap-south-1:077584701553:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Mumbai\n",
+    "    \"ap-southeast-1\": \"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Singapore\n",
+    "    \"ap-southeast-2\": \"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Sydney\n",
+    "    \"ca-central-1\": \"arn:aws:sagemaker:ca-central-1:470592106596:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Canada Central\n",
+    "    \"eu-central-1\": \"arn:aws:sagemaker:eu-central-1:446921602837:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Frankfurt\n",
+    "    \"eu-north-1\": \"arn:aws:sagemaker:eu-north-1:136758871317:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Stockholm\n",
+    "    \"eu-west-1\": \"arn:aws:sagemaker:eu-west-1:985815980388:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Ireland\n",
+    "    \"eu-west-2\": \"arn:aws:sagemaker:eu-west-2:856760150666:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # London\n",
+    "    \"eu-west-3\": \"arn:aws:sagemaker:eu-west-3:843114510376:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Paris\n",
+    "    \"sa-east-1\": \"arn:aws:sagemaker:sa-east-1:270155090741:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # São Paulo\n",
+    "    \"us-east-1\": \"arn:aws:sagemaker:us-east-1:865070037744:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # N. Virginia\n",
+    "    \"us-east-2\": \"arn:aws:sagemaker:us-east-2:057799348421:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Ohio\n",
+    "    \"us-west-1\": \"arn:aws:sagemaker:us-west-1:382657785993:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # N. California\n",
+    "    \"us-west-2\": \"arn:aws:sagemaker:us-west-2:594846645681:model-package/virtuoso-small-vllm-marketplac-823845f606f738259097c384364ff2f0\",  # Oregon\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell to deploy Virtuoso Medium\n",
+    "\n",
+    "model_name = \"virtuoso-medium\"\n",
+    "real_time_inference_instance_type = \"ml.g6e.12xlarge\"\n",
+    "\n",
+    "model_package_map = {\n",
+    "    \"ap-northeast-1\": \"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Tokyo\n",
+    "    \"ap-northeast-2\": \"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Seoul\n",
+    "    \"ap-south-1\": \"arn:aws:sagemaker:ap-south-1:077584701553:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Mumbai\n",
+    "    \"ap-southeast-1\": \"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Singapore\n",
+    "    \"ap-southeast-2\": \"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Sydney\n",
+    "    \"ca-central-1\": \"arn:aws:sagemaker:ca-central-1:470592106596:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Canada Central\n",
+    "    \"eu-central-1\": \"arn:aws:sagemaker:eu-central-1:446921602837:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Frankfurt\n",
+    "    \"eu-north-1\": \"arn:aws:sagemaker:eu-north-1:136758871317:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Stockholm\n",
+    "    \"eu-west-1\": \"arn:aws:sagemaker:eu-west-1:985815980388:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Ireland\n",
+    "    \"eu-west-2\": \"arn:aws:sagemaker:eu-west-2:856760150666:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # London\n",
+    "    \"eu-west-3\": \"arn:aws:sagemaker:eu-west-3:843114510376:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Paris\n",
+    "    \"sa-east-1\": \"arn:aws:sagemaker:sa-east-1:270155090741:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # São Paulo\n",
+    "    \"us-east-1\": \"arn:aws:sagemaker:us-east-1:865070037744:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # N. Virginia\n",
+    "    \"us-east-2\": \"arn:aws:sagemaker:us-east-2:057799348421:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Ohio\n",
+    "    \"us-west-1\": \"arn:aws:sagemaker:us-west-1:382657785993:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # N. California\n",
+    "    \"us-west-2\": \"arn:aws:sagemaker:us-west-2:594846645681:model-package/virtuoso-medium-vllm-marketpla-c91c71ed9080314ab3a082d29c9bd3bb\",  # Oregon\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run this cell to deploy Virtuoso Large\n",
+    "\n",
+    "model_name = \"virtuoso-large\"\n",
+    "real_time_inference_instance_type = \"ml.g6e.48xlarge\"\n",
+    "\n",
+    "model_package_map = {\n",
+    "    \"ap-northeast-1\": \"arn:aws:sagemaker:ap-northeast-1:977537786026:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Tokyo\n",
+    "    \"ap-northeast-2\": \"arn:aws:sagemaker:ap-northeast-2:745090734665:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Seoul\n",
+    "    \"ap-south-1\": \"arn:aws:sagemaker:ap-south-1:077584701553:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Mumbai\n",
+    "    \"ap-southeast-1\": \"arn:aws:sagemaker:ap-southeast-1:192199979996:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Singapore\n",
+    "    \"ap-southeast-2\": \"arn:aws:sagemaker:ap-southeast-2:666831318237:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Sydney\n",
+    "    \"ca-central-1\": \"arn:aws:sagemaker:ca-central-1:470592106596:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Canada Central\n",
+    "    \"eu-central-1\": \"arn:aws:sagemaker:eu-central-1:446921602837:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Frankfurt\n",
+    "    \"eu-north-1\": \"arn:aws:sagemaker:eu-north-1:136758871317:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Stockholm\n",
+    "    \"eu-west-1\": \"arn:aws:sagemaker:eu-west-1:985815980388:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Ireland\n",
+    "    \"eu-west-2\": \"arn:aws:sagemaker:eu-west-2:856760150666:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # London\n",
+    "    \"eu-west-3\": \"arn:aws:sagemaker:eu-west-3:843114510376:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Paris\n",
+    "    \"sa-east-1\": \"arn:aws:sagemaker:sa-east-1:270155090741:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # São Paulo\n",
+    "    \"us-east-1\": \"arn:aws:sagemaker:us-east-1:865070037744:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # N. Virginia\n",
+    "    \"us-east-2\": \"arn:aws:sagemaker:us-east-2:057799348421:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Ohio\n",
+    "    \"us-west-1\": \"arn:aws:sagemaker:us-west-1:382657785993:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # N. California\n",
+    "    \"us-west-2\": \"arn:aws:sagemaker:us-west-2:594846645681:model-package/virtuoso-large-vllm-marketplac-4fbe605cb0dd3237aeef5fe94842be93\",  # Oregon\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "region = boto3.Session().region_name\n",
+    "if region not in model_package_map.keys():\n",
+    "    raise \"UNSUPPORTED REGION\"\n",
+    "\n",
+    "model_package_arn = model_package_map[region]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Create an endpoint and perform real-time inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### A. Define the endpoint configuration"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Models have been pre-packaged and stored in AWS. No public download is taking place at deployment time.\n",
+    "\n",
+    "The SageMaker endpoint runs the AWS [Large Model Inference](https://docs.djl.ai/master/docs/serving/serving/docs/lmi/index.html) container (LMI), powered by the vLLM inference server. vLLM enables high-performance text generation for the most popular open-source language models. \n",
+    "\n",
+    "The [OpenAI Messages API](https://huggingface.co/docs/text-generation-inference/messages_api) is available in vLLM."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### B. Create the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a deployable model from the model package.\n",
+    "model = ModelPackage(\n",
+    "    role=role, model_package_arn=model_package_arn, sagemaker_session=sagemaker_session\n",
+    ")\n",
+    "\n",
+    "# create a unique endpoint name\n",
+    "timestamp = \"{:%Y-%m-%d-%H-%M-%S}\".format(datetime.datetime.now())\n",
+    "endpoint_name = f\"{model_name}-{timestamp}\"\n",
+    "print(f\"Deploying endpoint {endpoint_name}\")\n",
+    "\n",
+    "# deploy the model\n",
+    "response = model.deploy(\n",
+    "    initial_instance_count=1,\n",
+    "    instance_type=real_time_inference_instance_type,\n",
+    "    endpoint_name=endpoint_name,\n",
+    "    model_data_download_timeout=900,\n",
+    "    container_startup_health_check_timeout=900,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once the endpoint is in service, you will be able to perform real-time inference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### C. Define a test payload"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\"role\": \"system\", \"content\": \"You are a friendly and helpful AI assistant.\"},\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Suggest 5 names for a new neighborhood pet food store. Names should be short, fun, easy to remember, and respectful of pets. \\\n",
+    "        Explain why customers would like them.\",\n",
+    "        },\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### D. Perform real-time inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = runtime_sm_client.invoke_endpoint(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    ContentType=\"application/json\",\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    ")\n",
+    "\n",
+    "output = json.loads(response[\"Body\"].read().decode(\"utf8\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### E. Visualize output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can print the raw JSON output in OpenAI format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pprint.pprint(output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also print the generated output with Markdown formatting."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(Markdown(output[\"choices\"][0][\"message\"][\"content\"]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### F. Perform streaming inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here are some more examples. Please feel free to tweak them and add your own!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = \"\"\"Please write a friendly marketing pitch for a new SaaS AI platform called Arcee Cloud.\n",
+    "We will send this pitch by email to business and technical decision-makers, so make it sound exciting yet professional.\n",
+    "The contact email is sales@arcee.ai.\n",
+    "Arcee Cloud makes it simple for enterprise users to tailor open-source small language models to their own domain knowledge,\n",
+    "in order to build high-quality, cost-effective and secure AI solutions.\"\"\"\n",
+    "\n",
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are a friendly and helpful AI marketing assistant.\",\n",
+    "        },\n",
+    "        {\"role\": \"user\", \"content\": prompt},\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "    \"stream\": True,\n",
+    "}\n",
+    "\n",
+    "response = runtime_sm_client.invoke_endpoint_with_response_stream(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    "    ContentType=\"application/json\",\n",
+    ")\n",
+    "\n",
+    "print_event_stream(response[\"Body\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"As a friendly financial assistant, answer the question in detail.\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Suggest a pre-earning options hedging strategy for a volative tech stock.\\\n",
+    "        Show me an example with a fictitious company.\",\n",
+    "        },\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "    \"stream\": True,\n",
+    "}\n",
+    "\n",
+    "response = runtime_sm_client.invoke_endpoint_with_response_stream(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    "    ContentType=\"application/json\",\n",
+    ")\n",
+    "\n",
+    "print_event_stream(response[\"Body\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_sample_input = {\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": \"You are Darlene, a friendly and helpful salesperson \\\n",
+    "        working at Crystal River Classic Bikes, a classic motorcycle dealership in central Florida.\",\n",
+    "        },\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": \"Using English, write a personalized customer email to get \\\n",
+    "        them to sign up for a test ride on the new 2025 motorcycles that are visible at the dealership. \\\n",
+    "        Tone should be warm and personal, make sure to weave in the customer information below. \\\n",
+    "        Wyatt, your chief mechanic and road captain, has just won the 2024 State Award for Best Mechanic. \\\n",
+    "        \\\n",
+    "        Customer information:\\\n",
+    "        - name: Julien \\\n",
+    "        - last visit: 6 months ago for bike service \\\n",
+    "        - Owns 2 bikes, a 2002 sporty bike and a 2007 cruiser \\\n",
+    "        - Wishes he had more time to ride!\",\n",
+    "        },\n",
+    "    ],\n",
+    "    \"max_tokens\": 1024,\n",
+    "    \"stream\": True,\n",
+    "}\n",
+    "\n",
+    "response = runtime_sm_client.invoke_endpoint_with_response_stream(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    Body=json.dumps(model_sample_input),\n",
+    "    ContentType=\"application/json\",\n",
+    ")\n",
+    "\n",
+    "print_event_stream(response[\"Body\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Clean-up\n",
+    "\n",
+    "Please don't forget to run the cells below to delete all resources and avoid unecessary charges."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### A. Delete the endpoint"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.sagemaker_session.delete_endpoint(endpoint_name)\n",
+    "model.sagemaker_session.delete_endpoint_config(endpoint_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### B. Delete the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.delete_model()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Thank you for trying Virtuoso. We have only scratched the surface of what you can do with this model.\n",
+    "\n",
+    "We'd be happy to hear from you, learn more about your use case, and help you build your next AI-powered product or service. Please reach out to julien@arcee.ai."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "instance_type": "ml.t3.medium",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}