diff --git a/ai-ml/trainium-inferentia/examples/gradio-ui/README-StableDiffusion.md b/ai-ml/trainium-inferentia/examples/gradio-ui/README-StableDiffusion.md deleted file mode 100644 index 1189a337a..000000000 --- a/ai-ml/trainium-inferentia/examples/gradio-ui/README-StableDiffusion.md +++ /dev/null @@ -1,54 +0,0 @@ -# Steps to Deploy Gradio on Your Mac - -## Pre-requisites -Deploy the `trainium-inferentia` blueprint using this [link](https://awslabs.github.io/data-on-eks/docs/blueprints/ai-ml/trainium) - -## Step 1: Execute Port Forward to the StableDiffusion Ray Service -First, execute a port forward to the StableDiffusion Ray Service using kubectl: - -```bash -kubectl -n stablediffusion port-forward svc/stablediffusion-service 8000:8000 -``` - -## Step 2: Deploy Gradio WebUI Locally - -### 2.1. Create a Virtual Environment -Create a virtual environment for the Gradio application: - -```bash -cd ai-ml/trainium-inferentia/examples/gradio-ui -python3 -m venv .venv -source .venv/bin/activate -``` -### 2.2. Install Gradio WebUI app - -Install all the Gradio WebUI app dependencies with pip - -```bash -pip install gradio requests -``` - -### 2.3. Invoke the WebUI -Run the Gradio WebUI using the following command: - -NOTE: `gradio-app-stablediffusion.py` refers to the port forward url. e.g., `service_name = "http://localhost:8000" ` - -```bash -python gradio-app-stablediffusion.py -``` - -You should see output similar to the following: -```text -Running on local URL: http://127.0.0.1:7860 - -To create a public link, set `share=True` in `launch()`. -``` - -### 2.4. Access the WebUI from Your Browser -Open your web browser and access the Gradio WebUI by navigating to the following URL: - -http://127.0.0.1:7860 - -![gradio-sd](gradio-app-stable-diffusion-xl.png) - -You should now be able to interact with the Gradio application from your local machine. diff --git a/ai-ml/trainium-inferentia/examples/gradio-ui/gradio-app-stable-diffusion-xl.png b/ai-ml/trainium-inferentia/examples/gradio-ui/gradio-app-stable-diffusion-xl.png deleted file mode 100644 index 9576241fe..000000000 Binary files a/ai-ml/trainium-inferentia/examples/gradio-ui/gradio-app-stable-diffusion-xl.png and /dev/null differ diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/Dockerfile b/ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/Dockerfile similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/Dockerfile rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/Dockerfile diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/README.md b/ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/README.md similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/README.md rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/README.md diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/ray-service-llama2.yaml b/ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/ray-service-llama2.yaml similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/ray-service-llama2.yaml rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/ray-service-llama2.yaml diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/ray_serve_llama2.py b/ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/ray_serve_llama2.py similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2/ray_serve_llama2.py rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2/ray_serve_llama2.py diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/Dockerfile b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/Dockerfile similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/Dockerfile rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/Dockerfile diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/README.md b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/README.md similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/README.md rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/README.md diff --git a/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/Dockerfile b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/Dockerfile new file mode 100644 index 000000000..126fd6bef --- /dev/null +++ b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/Dockerfile @@ -0,0 +1,13 @@ +# Use Python base image +FROM --platform=linux/amd64 python:3.9-slim + +# Set working directory in the container +WORKDIR /app + +# Copy the Python script into the container +COPY gradio-app-stablediffusion.py /app/gradio-app-stablediffusion.py + +RUN pip install --no-cache-dir gradio requests Pillow + +# Command to run the Python script +ENTRYPOINT ["python", "gradio-app-stablediffusion.py"] diff --git a/ai-ml/trainium-inferentia/examples/gradio-ui/gradio-app-stablediffusion.py b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/gradio-app-stablediffusion.py similarity index 72% rename from ai-ml/trainium-inferentia/examples/gradio-ui/gradio-app-stablediffusion.py rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/gradio-app-stablediffusion.py index 5fa377158..418b025f4 100644 --- a/ai-ml/trainium-inferentia/examples/gradio-ui/gradio-app-stablediffusion.py +++ b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/gradio-app-stablediffusion.py @@ -1,14 +1,12 @@ import gradio as gr import requests -import json +import os from PIL import Image from io import BytesIO # Constants for model endpoint and service name -model_endpoint = "/imagine" -# service_name = "http:///serve" -service_name = "http://localhost:8000" # Replace with your actual service name - +model_endpoint = os.environ.get("MODEL_ENDPOINT", "/imagine") +service_name = os.environ.get("SERVICE_NAME", "http://localhost:8000") # Function to generate image based on prompt def generate_image(prompt): @@ -25,9 +23,10 @@ def generate_image(prompt): except requests.exceptions.RequestException as e: # Handle any request exceptions (e.g., connection errors) - return f"AI: Error: {str(e)}" + # return f"AI: Error: {str(e)}" + return Image.new('RGB', (100, 100), color='red') # Define the Gradio PromptInterface demo = gr.Interface(fn=generate_image, inputs = [gr.Textbox(label="Enter the Prompt")], - outputs = gr.Image(type='pil')).launch(debug='True') + outputs = gr.Image(type='pil')).launch(server_name="0.0.0.0") diff --git a/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/gradio-deploy.yaml b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/gradio-deploy.yaml new file mode 100644 index 000000000..6edd03096 --- /dev/null +++ b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/gradio-ui/gradio-deploy.yaml @@ -0,0 +1,56 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: gradio +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: gradio-deployment + namespace: gradio + labels: + app: gradio +spec: + replicas: 1 + selector: + matchLabels: + app: gradio + template: + metadata: + labels: + app: gradio + spec: + containers: + - name: gradio + image: public.ecr.aws/data-on-eks/gradio-app:sd-v1.0 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 7860 + resources: + requests: + cpu: "512m" + memory: "2048Mi" + limits: + cpu: "1" + memory: "4096Mi" + env: + - name: MODEL_ENDPOINT + value: "/imagine" + #Please note that the service name is currently hardcoded to match the Stable Diffusion service for this blueprint. If there are any updates or changes to the actual RayServe deployment, you'll need to update the service name in this code accordingly. + - name: SERVICE_NAME + value: "http://stablediffusion-service.stablediffusion.svc.cluster.local:8000" +--- +apiVersion: v1 +kind: Service +metadata: + name: gradio-service + namespace: gradio +spec: + selector: + app: gradio + ports: + - name: http + protocol: TCP + port: 7860 + targetPort: 7860 + type: ClusterIP diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/ray-service-stablediffusion.yaml b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/ray-service-stablediffusion.yaml similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/ray-service-stablediffusion.yaml rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/ray-service-stablediffusion.yaml diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/ray_serve_stablediffusion.py b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/ray_serve_stablediffusion.py similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/ray_serve_stablediffusion.py rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/ray_serve_stablediffusion.py diff --git a/ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/stable-diffusion-xl-prompt_3.png b/ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/stable-diffusion-xl-prompt_3.png similarity index 100% rename from ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2/stable-diffusion-xl-prompt_3.png rename to ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2/stable-diffusion-xl-prompt_3.png diff --git a/analytics/terraform/datahub-on-eks/README.md b/analytics/terraform/datahub-on-eks/README.md index 84b4778b2..2b2ac6086 100644 --- a/analytics/terraform/datahub-on-eks/README.md +++ b/analytics/terraform/datahub-on-eks/README.md @@ -28,8 +28,7 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/ | [eks](#module\_eks) | terraform-aws-modules/eks/aws | ~> 19.15 | | [eks\_blueprints\_addons](#module\_eks\_blueprints\_addons) | aws-ia/eks-blueprints-addons/aws | ~> 1.2 | | [vpc](#module\_vpc) | terraform-aws-modules/vpc/aws | ~> 5.0 | -| [vpc\_endpoints](#module\_vpc\_endpoints) | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | ~> 5.0 | -| [vpc\_endpoints\_sg](#module\_vpc\_endpoints\_sg) | terraform-aws-modules/security-group/aws | ~> 5.0 | +| [vpc\_endpoints](#module\_vpc\_endpoints) | terraform-aws-modules/vpc/aws//modules/vpc-endpoints | ~> 5.1 | ## Resources @@ -43,15 +42,17 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/ | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| -| [create\_iam\_service\_linked\_role\_es](#input\_create\_iam\_service\_linked\_role\_es) | Whether to create `AWSServiceRoleForAmazonOpensearchService` service-linked role. Set it to `false` if the role already exists | `bool` | `true` | no | +| [create\_vpc](#input\_create\_vpc) | Create VPC | `bool` | `true` | no | | [eks\_cluster\_version](#input\_eks\_cluster\_version) | EKS Cluster version | `string` | `"1.26"` | no | | [enable\_vpc\_endpoints](#input\_enable\_vpc\_endpoints) | Enable VPC Endpoints | `bool` | `false` | no | | [name](#input\_name) | Name of the VPC and EKS Cluster | `string` | `"datahub-on-eks"` | no | +| [private\_subnet\_ids](#input\_private\_subnet\_ids) | Ids for existing private subnets - needed when create\_vpc set to false | `list(string)` | `[]` | no | | [private\_subnets](#input\_private\_subnets) | Private Subnets CIDRs. 32766 Subnet1 and 16382 Subnet2 IPs per Subnet | `list(string)` |
[
"10.1.0.0/17",
"10.1.128.0/18"
]
| no | | [public\_subnets](#input\_public\_subnets) | Public Subnets CIDRs. 62 IPs per Subnet | `list(string)` |
[
"10.1.255.128/26",
"10.1.255.192/26"
]
| no | | [region](#input\_region) | Region | `string` | `"us-west-2"` | no | | [tags](#input\_tags) | Default tags | `map(string)` | `{}` | no | -| [vpc\_cidr](#input\_vpc\_cidr) | VPC CIDR | `string` | `"10.1.0.0/16"` | no | +| [vpc\_cidr](#input\_vpc\_cidr) | VPC CIDR - must change to match the cidr of the existing VPC if create\_vpc set to false | `string` | `"10.1.0.0/16"` | no | +| [vpc\_id](#input\_vpc\_id) | VPC Id for the existing vpc - needed when create\_vpc set to false | `string` | `""` | no | ## Outputs diff --git a/analytics/terraform/datahub-on-eks/providers.tf b/analytics/terraform/datahub-on-eks/providers.tf index 73f40ecc1..2de30958c 100644 --- a/analytics/terraform/datahub-on-eks/providers.tf +++ b/analytics/terraform/datahub-on-eks/providers.tf @@ -15,4 +15,3 @@ provider "helm" { token = data.aws_eks_cluster_auth.this.token } } - diff --git a/website/docs/gen-ai/inference/Llama2.md b/website/docs/gen-ai/inference/Llama2.md index 195fb8c7f..3f30b5134 100644 --- a/website/docs/gen-ai/inference/Llama2.md +++ b/website/docs/gen-ai/inference/Llama2.md @@ -154,7 +154,7 @@ aws eks --region us-west-2 update-kubeconfig --name trainium-inferentia **Deploy RayServe Cluster** ```bash -cd ai-ml/trainium-inferentia/examples/ray-serve/llama2-inf2 +cd ai-ml/trainium-inferentia/examples/inference/ray-serve/llama2-inf2 kubectl apply -f ray-service-llama2.yaml ``` diff --git a/website/docs/gen-ai/inference/StableDiffusion.md b/website/docs/gen-ai/inference/StableDiffusion.md index c48e08bc7..2980ccf39 100644 --- a/website/docs/gen-ai/inference/StableDiffusion.md +++ b/website/docs/gen-ai/inference/StableDiffusion.md @@ -131,7 +131,7 @@ aws eks --region us-west-2 update-kubeconfig --name trainium-inferentia **Deploy RayServe Cluster** ```bash -cd ai-ml/trainium-inferentia/examples/ray-serve/stable-diffusion-inf2 +cd ai-ml/trainium-inferentia/examples/inference/ray-serve/stable-diffusion-inf2 kubectl apply -f ray-service-stablediffusion.yaml ``` @@ -192,11 +192,12 @@ From this webpage, you will be able to monitor the progress of Model deployment, ![Ray Dashboard](img/ray-dashboard-sdxl.png) ### To Test the Stable Diffusion XL Model -Once you see the status of the model deployment is in `running` state then you can start using Llama-2-chat. + +Once you've verified that the Stable Diffusion model deployment status has switched to a `running` state in Ray Dashboard , you're all set to start leveraging the model. This change in status signifies that the Stable Diffusion model is now fully functional and prepared to handle your image generation requests based on textual descriptions." You can use the following URL with a query added at the end of the URL. - http://\/serve/serve/imagine?prompt=an astronaut is dancing on green grass, sunlit + http://\/serve/imagine?prompt=an astronaut is dancing on green grass, sunlit You will see an output like this in your browser: @@ -205,66 +206,52 @@ You will see an output like this in your browser: ## Deploying the Gradio WebUI App Discover how to create a user-friendly chat interface using [Gradio](https://www.gradio.app/) that integrates seamlessly with deployed models. -Let's deploy Gradio app locally on your machine to interact with the Stable Diffusion XL model deployed using RayServe. +Let's move forward with setting up the Gradio app as a Kubernetes deployment, utilizing a Docker container. This setup will enable interaction with the Stable Diffusion XL model, which is deployed using RayServe. :::info -The Gradio app interacts with the locally exposed service created solely for the demonstration. Alternatively, you can deploy the Gradio app on EKS as a Pod with Ingress and Load Balancer for wider accessibility. +The Gradio UI application is containerized and the container image is stored in [data-on-eks](https://gallery.ecr.aws/data-on-eks/gradio-app) public repository. The Gradio app container internally points to the `stablediffusion-service` that's running on port 8000. ::: -### Execute Port Forward to the stablediffusion Ray Service -First, execute a port forward to the stablediffusion Ray Service using kubectl: +### Deploy the Gradio Pod as Deployment -```bash -kubectl port-forward svc/stablediffusion-service 8000:8000 -n stablediffusion -``` - -### Deploy Gradio WebUI Locally - -#### Create a Virtual Environment -Create a Python virtual environment in your machine for the Gradio application: +First, deploy the Gradio app as a Deployment on EKS using kubectl: ```bash -cd ai-ml/trainium-inferentia/examples/gradio-ui -python3 -m venv .venv -source .venv/bin/activate +cd gradio-ui +kubectl apply -f gradio-deploy.yaml ``` -#### Install Gradio Image Generator app -Install all the Gradio WebUI app dependencies with pip +This should create a Deployment and a Service in namespace `gradio`. Check the status of the resources. ```bash -pip install gradio requests +kubectl -n gradio get all +NAME READY STATUS RESTARTS AGE +pod/gradio-deployment-59cfbffdf5-q745z 1/1 Running 0 143m + +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +service/gradio-service ClusterIP 172.20.245.153 7860/TCP 3d12h ``` #### Invoke the WebUI -Run the Gradio WebUI using the following command: -NOTE: `gradio-app-stablediffusion.py` refers to the port forward url. e.g., `service_name = "http://localhost:8000" ` +Execute a port forward to the `gradio-service` Service using kubectl: ```bash -python gradio-app-stablediffusion.py +kubectl -n gradio port-forward service/gradio-service 8080:7860 ``` -You should see output similar to the following: - -```text -Running on local URL: http://127.0.0.1:7860 - -To create a public link, set `share=True` in `launch()`. -``` - -#### 2.4. Access the WebUI from Your Browser Open your web browser and access the Gradio WebUI by navigating to the following URL: -http://127.0.0.1:7860 +Running on local URL: http://localhost:8080 You should now be able to interact with the Gradio application from your local machine. ![Gradio Output](img/stable-diffusion-xl-gradio.png) ## Conclusion + In conclusion, you will have successfully deployed the **Stable-diffusion-xl-base** model on EKS with Ray Serve and created a prompt based web UI using Gradio. This opens up exciting possibilities for natural language processing and prompt based image generator and image predictor development.