add k8s docs for getting started, K8s Manifest and Helm #179

devpramod · 2024-09-25T16:00:48Z

This PR contains the following docs:

Getting Started for k8s - Installation, basic introduction to k8s and has a section for helm and k8s manifest. As more k8s deployment modes are added, corresponding sections will be created in this doc
Deploy using helm charts, a doc that follows the xeon.md template as much as possible to deploy ChatQnA on k8s using Helm
Deploy using K8s Manifest, a doc that follows the xeon.md template as much as possible to deploy ChatQnA on k8s using a K8s manifest yaml

Signed-off-by: devpramod <[email protected]>

dbkinder

some suggested edits

Also, when you add new documents, they need to be linked into the table of contents structure. There's an index.rst file in this folder you can edit to add these two documents.

I'd suggest you add an edit to the index.rst doc in this deploy folder, and replace the existing Kubernetes section with this:

Kubernetes
**********

.. toctree::
   :maxdepth: 1

   k8s_getting_started
   TGI on Xeon with Helm Charts <k8s_helm>

* Xeon & Gaudi with GMC
* Xeon & Gaudi without GMC

examples/ChatQnA/deploy/k8s_getting_started.md

examples/ChatQnA/deploy/k8s_helm.md

examples/ChatQnA/deploy/k8s_getting_started.md

examples/ChatQnA/deploy/k8s_helm.md

Signed-off-by: devpramod <[email protected]> Signed-off-by: devpramod <[email protected]>

examples/ChatQnA/deploy/index.rst

examples/ChatQnA/deploy/k8s_helm.md

Signed-off-by: devpramod <[email protected]>

dbkinder

LGTM, thanks!

tylertitsworth

Some of the stuff I see in the docs, is just a tutorial on things that already have docs. Like TGI/TEI, Helm, and Kubernetes. It feels a lot like we're overexplaining a concept that can be answered by a link to the source docs of another tool and a command for how it's relevant to use with ChatQnA.

For reference, this is the most handholding I would do in the case of deploying TGI:

Configure Model Server

Before we deploy a model, we need to configure the model server with information like, what model to use and how many max tokens to use. We will be using the tgi-on-intel helm chart. This chart uses XPU to the serve model normally, but we are going to configure it to use gaudi2 instead.

First, look at the configuration files in the tgi directory and add/remove any configuration options relevant to your workflow:

cd tgi
# Create a new configmap for your model server to use
kubectl apply -f cm.yaml

Tip

Here is the reference to the Huggingface Launcher Environment Variables and the TGI-Gaudi Environment Variables.

Deploy Model Server

Now that we have configured the model server, we can deploy it to Kubernetes. Using the provided config.yaml file in the tgi directory, we can deploy the model server.

Modify any values like resources or replicas in the config.yaml file to suit your needs. Then, deploy the model server:

# Encode HF Token for secret.encodedToken
echo -n '<token>' | base64
# Install Chart
git clone https://github.com/intel/ai-containers
helm install model-server -f config.yaml ai-containers/workflows/charts/tgi
# Check the pod status
kubectl get pod
kubectl logs -f <pod-name>

Please use a tool like markdownlint to ensure consistent styling.

examples/ChatQnA/deploy/k8s_helm.md

examples/ChatQnA/deploy/k8s_getting_started.md

examples/ChatQnA/deploy/k8s_helm.md

dbkinder · 2024-09-30T16:15:30Z

I've got a script in docs/scripts/checkmd.sh that uses pymarkdown (lint) to scan markdown files, with a bunch of checks disabled. Alas, if I wasn't retiring today, including a markdown linter was on my list to add to the CI checks. :)

Signed-off-by: devpramod <[email protected]>

arun-gupta

The initial setup is confusing as the docs point to set up k8s in multiple ways where as the tested configuration is minikube. We need to clearly document that. Other than that, there are some more clarifications required.

arun-gupta · 2024-11-27T20:35:33Z

examples/ChatQnA/deploy/k8s_getting_started.md

+## Introduction
+Kubernetes is an orchestration platform for managing containerized applications, ideal for deploying microservices based architectures like ChatQnA. It offers robust mechanisms for automating deployment, scaling, and operations of application containers across clusters of hosts. Kubernetes supports different deployment modes for ChatQnA, which cater to various operational preferences:
+
+- **Using GMC (GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management.


Lets remove this one, it is no longer maintained.

arun-gupta · 2024-11-27T20:36:01Z

examples/ChatQnA/deploy/k8s_getting_started.md

+Kubernetes is an orchestration platform for managing containerized applications, ideal for deploying microservices based architectures like ChatQnA. It offers robust mechanisms for automating deployment, scaling, and operations of application containers across clusters of hosts. Kubernetes supports different deployment modes for ChatQnA, which cater to various operational preferences:
+
+- **Using GMC (GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management.
+- **Using Manifests**: This involves deploying directly using Kubernetes manifest files without the GenAI Microservices Connector (GMC).


Keep it simple, just publish only Helm charts.

arun-gupta · 2024-11-27T20:47:53Z

examples/ChatQnA/deploy/k8s_getting_started.md

+
+**Update Dependencies:**
+
+- A script called **./update_dependency.sh** is provided which is used to update chart dependencies, ensuring all nested charts are at their latest versions.


Its not clear whether this script is standard for all Helm charts or unique to OPEA?

Unique to OPEA. Normally one just calls Helm dep command directly.

Then we should use the standard tools instead of creating our own mechanisms

I'm rather new to Helm, so all of this is very much AFAIK:

Script is there because OPEA has many of the dependencies locally instead of in a Helm repo

Normally users would update the repo first, and that would basically do what the script is needed for (refresh the deps to latest, not just minimum version)

OPEA Helm charts are moving to repo, so I think in future that script is needed only in case when one wants to use local checkout

@poussa ?

arun-gupta · 2024-11-27T20:51:58Z

examples/ChatQnA/deploy/k8s_getting_started.md

+
+### Kubernetes Cluster and Development Environment
+
+**Setting Up the Kubernetes Cluster:** Before beginning deployment for the ChatQnA application, ensure that a Kubernetes cluster is ready. For guidance on setting up your Kubernetes cluster, please refer to the comprehensive setup instructions available on the [Opea Project deployment guide](https://opea-project.github.io/latest/deploy/index.html).


We need to provide guidance on the memory/disk requirements for k8s cluster, otherwise ChatQnA may not run.

arun-gupta · 2024-11-27T21:16:20Z

examples/ChatQnA/deploy/k8s_helm.md

@@ -0,0 +1,497 @@
+# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm
+
+This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md).


It was tested on a single node k8s cluster so the "multi-node" part is not accurate

arun-gupta · 2024-11-27T21:43:27Z

examples/ChatQnA/deploy/k8s_helm.md

+
+Set a new [namespace](#create-and-set-namespace) and switch to it if needed
+
+To enable UI, uncomment the lines `56-62` in `GenAIInfra/helm-charts/chatqna/values.yaml`:


this should be lines: 58-62

56 # If you would like to switch to traditional UI image 57 # Uncomment the following lines 58 # chatqna-ui: 59 # image: 60 # repository: "opea/chatqna-ui" 61 # tag: "1.1" 62 # containerPort: "5173"

Also, just uncommenting may not be sufficient as that messes up with formatting. There is an additional space that needs to be deleted too.

arun-gupta · 2024-11-27T22:51:42Z

examples/ChatQnA/deploy/k8s_helm.md

+chatqna-retriever-usvc-6695979d67-z5jgx    1/1     Running   0          5m7s
+chatqna-tei-769dc796c-gh5vx                1/1     Running   0          5m7s
+chatqna-teirerank-54f58c596c-76xqz         1/1     Running   0          5m7s
+chatqna-tgi-7b5556d46d-pnzph               1/1     Running   0          5m7s


Could not get all pods to run, filed opea-project/GenAIExamples#1202

arun-gupta · 2024-11-27T22:53:07Z

examples/ChatQnA/deploy/k8s_helm.md

+
+Set the necessary environment variables to setup the use case
+```bash
+export MODELDIR="/mnt/opea-models"  #export MODELDIR="null" if you don't want to cache the model.  


This caused an error:

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 10m default-scheduler Successfully assigned default/chatqna-tei-645548d7f-7crjj to minikube Warning FailedMount 17s (x13 over 10m) kubelet MountVolume.SetUp failed for volume "model-volume" : hostPath type check failed: /mnt/opea-models is not a directory

Using this command export MODELDIR="null" solved the issue.

I would not recommend to set global.modelUseHostPath=${MODELDIR} in helm install in multi-node environment. This options is meant to share the pre downloaded model data files in a single node environment for a quick setup to test. To use this in a multiple nodes environment , you need to go to every K8S worker node to make sure there is a ${MODELDIR} directory exists and writable. Just don't set global.modelUseHostPath and leave it as the default value(empty string) which will download the models without sharing.

Another option to share the model files is to set global.modelUsePVC to use K8S persistent volume to share the model data files. In order to use that, users need to do some preparation work out-of-band:

Setup the K8S storage class, persistent volumes.

Create a K8S persistent volume claim which can be shared by multi pods.

pass the created k8s persistent volume claim to global.modelUsePVC

I would not recommend setting it empty either, because not sharing the data will much more easily cause node to run out disk space, meaning that k8s evicts pods from it, and things in general break.

eero-t · 2024-11-28T11:33:48Z

examples/ChatQnA/deploy/k8s_getting_started.md

+|Command                          |Function                     |
+|-------------------------------  |-----------------------------|
+|`kubectl describe pod <pod-name>` | Provides detailed information about a specific pod, including its            current state, recent events, and configuration details.                            |
+|`kubectl delete deployments --all` |             Deletes all deployments in the current namespace, which effectively removes all the managed pods and associated resources.                |


Not all associated resources, only replicaset and pods. E.g. associated Service, ServiceMonitor, configMap, access control etc resources still remain.

Things should be deleted using same manifest or Helm chart that was used to create them!

eero-t · 2024-11-28T11:37:08Z

examples/ChatQnA/deploy/k8s_getting_started.md

+|-------------------------------  |-----------------------------|
+|`kubectl describe pod <pod-name>` | Provides detailed information about a specific pod, including its            current state, recent events, and configuration details.                            |
+|`kubectl delete deployments --all` |             Deletes all deployments in the current namespace, which effectively removes all the managed pods and associated resources.                |
+|`kubectl get pods -o wide`         |              Retrieves a detailed list of all pods in the current namespace, including additional information like IP addresses and the nodes they are running on.               |


Use of "current" in all of these is wrong.

When namespace is not specified, removal is done from "default" namespace, not "current" one, whatever that is.

=> I think all examples should include namespace option,.

eero-t · 2024-11-28T11:38:06Z

examples/ChatQnA/deploy/k8s_getting_started.md

+|`kubectl logs <pod-name>`         |         Fetches the logs generated by a container in a specific pod, useful for debugging and monitoring application behavior.                    |
+|`kubectl get svc`                  |            Lists all services in the current namespace, providing a quick overview of the network services and their status.
+
+#### Create and Set Namespace


IMHO would make sense to have this section before above one.

eero-t · 2024-11-28T11:41:56Z

examples/ChatQnA/deploy/k8s_getting_started.md

+| Component         |Description                                                                                                                                             |
+| ---               | ---                                                                                                                                                    |
+| `Chart.yaml`      | This file contains metadata about the chart such as name, version, and description.                                                                    |
+| `values.yaml`     | Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. |


Somewhat misleading. Maybe?

Suggested change

| `values.yaml` | Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. |

| `values.yaml` | Overridable configuration values for the Helm chart deployment, used in the chart k8s object templates. |

eero-t · 2024-11-28T11:45:26Z

examples/ChatQnA/deploy/k8s_getting_started.md

+| ---               | ---                                                                                                                                                    |
+| `Chart.yaml`      | This file contains metadata about the chart such as name, version, and description.                                                                    |
+| `values.yaml`     | Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. |
+| `deployment.yaml` | Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services.                       |


Incorrect. Typically template directory has one file per k8s object, with object type being used as the file name.

Better to just mention template dir, and maybe have pointer to Helm docs: https://helm.sh/docs/chart_best_practices/templates/

eero-t · 2024-11-28T11:48:48Z

examples/ChatQnA/deploy/k8s_getting_started.md

+For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/).
+
+### Using Kubernetes Manifest to Deploy
+Manifest files in YAML format define the Kubernetes resources you want to manage. The main components in a manifest file include:


IMHO this section should be be removed. Manifests will be removed because they're generated from Helm charts, they don't catch all the configurations possible with Helm charts, and are continuously out of sync.

eero-t · 2024-11-28T11:54:42Z

examples/ChatQnA/deploy/k8s_helm.md

+```
+## Use Case Setup
+
+The `GenAIInfra` repository utilizes a structured Helm chart approach, comprising a primary `Charts.yaml` and individual sub-charts for components like the LLM Service, Embedding Service, and Reranking Service. Each sub-chart includes its own `values.yaml` file, enabling specific configurations such as Docker image sources and deployment parameters. This modular design facilitates flexible, scalable deployment and easy management of the GenAI application suite within Kubernetes environments. For detailed configurations and common components, visit the [GenAIInfra common components directory](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common).


"Docker image sources" => "container image name/version"?

(k8s supports also OCI standard images in addition to legacy Docker format.)

eero-t · 2024-11-28T13:44:52Z

examples/ChatQnA/deploy/k8s_helm.md

+curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{
+     "model": "Intel/neural-chat-7b-v3-3",
+     "messages": "What is the revenue of Nike in 2023?"
+     }'


Query specified here seems outdated based on answer, and data-prep being given OPEA material, and re-run of query specified below?

Getting the following error when running this query:

ubuntu@ip-172-31-7-63:~/.minikube$ curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "model": "Intel/neural-chat-7b-v3-3", "messages": "What is the revenue of Nike in 2023?" }' curl: (7) Failed to connect to localhost port 8888 after 0 ms: Couldn't connect to server

eero-t · 2024-11-28T14:03:32Z

examples/ChatQnA/deploy/k8s_helm.md

+Here is the output for your reference:
+```bash
+data: b' O', data: b'PE', data: b'A', data: b' stands', data: b' for', data: b' Organization', data: b' of', data: b' Public', data: b' Em', data: b'ploy', data: b'ees', data: b' of', data: b' Alabama', data: b'.', data: b' It', data: b' is', data: b' a', data: b' labor', data: b' union', data: b' representing', data: b' public', data: b' employees', data: b' in', data: b' the', data: b' state', data: b' of', data: b' Alabama', data: b',', data: b' working', data: b' to', data: b' protect', data: b' their', data: b' rights', data: b' and', data: b' interests', data: b'.', data: b'', data: b'', data: [DONE]


It does not look like this. Each data: output is on its own line, with extra empty lines between.

(Same issue in output below.)

eero-t · 2024-11-28T14:06:25Z

examples/ChatQnA/deploy/k8s_helm.md

+```
+{"status":200,"message":"Data preparation succeeded"}
+```
+For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-%28advanced%29)


Link is wrong (can be tested with "View file" functionality):

Suggested change

For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-%28advanced%29)

For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-advanced).

eero-t · 2024-11-28T14:07:45Z

examples/ChatQnA/deploy/k8s_helm.md

+     "model": "Intel/neural-chat-7b-v3-3",
+     "messages": "What is OPEA?"
+     }'
+


Please remove these redundant empty lines from all the command examples. They do not look good.

eero-t · 2024-11-28T14:09:01Z

examples/ChatQnA/deploy/k8s_helm.md

+    -H 'Content-Type: application/json'
+```
+
+In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of


Please use correct formatting for commands and command lines:

Suggested change

In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of

In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the `curl` command is a embedded vector of

eero-t · 2024-11-28T14:14:14Z

examples/ChatQnA/deploy/k8s_helm.md

+`your_embedding` dimension equal to it.
+
+```
+export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")


Both export and embedding var are redundant:

Suggested change

export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")

your_embedding=$(python3 -c "import random; print([random.uniform(-1, 1) for _ in range(768)])")

eero-t · 2024-11-28T14:15:47Z

examples/ChatQnA/deploy/k8s_helm.md

+
+The output is retrieved text that relevant to the input data:
+```
+{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... }


With the uploaded document being about OPEA, output being about Nike fiscals seems unlikely.

eero-t · 2024-11-28T14:16:45Z

examples/ChatQnA/deploy/k8s_helm.md

+
+```
+
+TGI service generate text for the input prompt. Here is the expected result from TGI:


Suggested change

TGI service generate text for the input prompt. Here is the expected result from TGI:

TGI service generates text for the input prompt. Here is the expected result from TGI:

eero-t · 2024-11-28T14:34:59Z

examples/ChatQnA/deploy/k8s_manifest.md

+4. Reranking
+5. LLM with TGI
+
+> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node.


And it has resources (memory) for running all of them...

eero-t · 2024-11-28T14:37:01Z

examples/ChatQnA/deploy/k8s_manifest.md

+git checkout tags/v1.1
+```
+### Bfloat16 Inference Optimization
+We recommend using newer CPUs, such as 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids) and later, that support the bfloat16 data type. If your hardware includes such CPUs and your model is compatible with bfloat16, adding the `--dtype bfloat16` argument to the HuggingFace `text-generation-inference` server can significantly reduce memory usage by half and provide a moderate speed boost. This change has already been configured in the `chatqna_bf16.yaml` file. To use it, follow these steps:


Suggested change

We recommend using newer CPUs, such as 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids) and later, that support the bfloat16 data type. If your hardware includes such CPUs and your model is compatible with bfloat16, adding the `--dtype bfloat16` argument to the HuggingFace `text-generation-inference` server can significantly reduce memory usage by half and provide a moderate speed boost. This change has already been configured in the `chatqna_bf16.yaml` file. To use it, follow these steps:

We recommend using newer CPUs, such as 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids) and later, that support the bfloat16 data type. If your hardware includes such CPUs and your model is compatible with bfloat16, adding the `--dtype bfloat16` argument to the HuggingFace `text-generation-inference` server halves its memory usage and provides a moderate speed boost. This change has already been configured in the `chatqna_bf16.yaml` file. To use it, follow these steps:

eero-t · 2024-11-28T14:39:39Z

examples/ChatQnA/deploy/k8s_manifest.md

+kubectl label node <node-name> node-type=node-bfloat16
+```
+
+>**Note:**  The manifest folder has several configuration pipelines that can be deployed for ChatQnA. In this example, we'll use the `chatqna_bf16.yaml` configuration. You can use `chatqna.yaml` instead if you don't have BFloat16 support in your nodes.


With Helm charts, one can e.g. enable the relevant BF16 option in the values file: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/tgi/values.yaml#L23

eero-t · 2024-11-28T14:41:40Z

examples/ChatQnA/deploy/k8s_manifest.md

+>**Note:**  The manifest folder has several configuration pipelines that can be deployed for ChatQnA. In this example, we'll use the `chatqna_bf16.yaml` configuration. You can use `chatqna.yaml` instead if you don't have BFloat16 support in your nodes.
+
+### HF Token
+The example can utilize model weights from HuggingFace and langchain.


Does langchain really provide weights too, and not just manipulate them?

eero-t · 2024-11-28T14:47:19Z

examples/ChatQnA/deploy/k8s_manifest.md

+> [!NOTE] 
+> Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on
+
+The ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'. 


Suggested change

The ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'.

The ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods have `Running` status and have initialized themselves successfully, i.e. show `1/1` as their `Ready` state.

add k8s docs for getting started and helm

44d9c28

Signed-off-by: devpramod <[email protected]>

devpramod requested review from dbkinder, chensuyue, ftian1, mkbhanda, preethivenkatesh, chickenrae and tomlenth as code owners September 25, 2024 16:00

dbkinder suggested changes Sep 26, 2024

View reviewed changes

fix formatting issues

8e7f138

Signed-off-by: devpramod <[email protected]> Signed-off-by: devpramod <[email protected]>

devpramod force-pushed the main branch from 6399e6d to 8e7f138 Compare September 27, 2024 15:42

dbkinder suggested changes Sep 27, 2024

View reviewed changes

examples/ChatQnA/deploy/index.rst Outdated Show resolved Hide resolved

examples/ChatQnA/deploy/k8s_helm.md Show resolved Hide resolved

update toctree

fcf8851

Signed-off-by: devpramod <[email protected]>

dbkinder approved these changes Sep 27, 2024

View reviewed changes

tylertitsworth suggested changes Sep 30, 2024

View reviewed changes

upddate both docs

12e2e4f

Signed-off-by: devpramod <[email protected]>

ftian1 approved these changes Oct 16, 2024

View reviewed changes

devpramod added 2 commits November 19, 2024 16:16

add k8s mainfest and add to getting started

de87a0a

Signed-off-by: devpramod <[email protected]>

update helm

ccf1240

Signed-off-by: devpramod <[email protected]>

devpramod changed the title ~~add k8s docs for getting started and helm~~ add k8s docs for getting started, K8s Manifest and Helm Nov 19, 2024

arun-gupta mentioned this pull request Nov 27, 2024

[Bug] CrashLoopBackOff starting ChatQnA on a single-node k8s cluster opea-project/GenAIExamples#1202

Open

6 tasks

arun-gupta suggested changes Nov 27, 2024

View reviewed changes

eero-t reviewed Nov 28, 2024

View reviewed changes

eero-t suggested changes Nov 28, 2024

View reviewed changes


		Update Dependencies:

		- A script called ./update_dependency.sh is provided which is used to update chart dependencies, ensuring all nested charts are at their latest versions.


		### Kubernetes Cluster and Development Environment

		Setting Up the Kubernetes Cluster: Before beginning deployment for the ChatQnA application, ensure that a Kubernetes cluster is ready. For guidance on setting up your Kubernetes cluster, please refer to the comprehensive setup instructions available on the [Opea Project deployment guide](https://opea-project.github.io/latest/deploy/index.html).

		@@ -0,0 +1,497 @@
		# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm

		This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md).


		Set a new [namespace](#create-and-set-namespace) and switch to it if needed

		To enable UI, uncomment the lines `56-62` in `GenAIInfra/helm-charts/chatqna/values.yaml`:

	\| `values.yaml` \| Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. \|
	\| `values.yaml` \| Overridable configuration values for the Helm chart deployment, used in the chart k8s object templates. \|

	For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-%28advanced%29)
	For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-advanced).

	In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of
	In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the `curl` command is a embedded vector of

	export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
	your_embedding=$(python3 -c "import random; print([random.uniform(-1, 1) for _ in range(768)])")


		```

		TGI service generate text for the input prompt. Here is the expected result from TGI:

	TGI service generate text for the input prompt. Here is the expected result from TGI:
	TGI service generates text for the input prompt. Here is the expected result from TGI:

	The ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'.
	The ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods have `Running` status and have initialized themselves successfully, i.e. show `1/1` as their `Ready` state.

add k8s docs for getting started, K8s Manifest and Helm #179

Are you sure you want to change the base?

add k8s docs for getting started, K8s Manifest and Helm #179

Conversation

devpramod commented Sep 25, 2024 • edited Loading

dbkinder left a comment • edited Loading

Choose a reason for hiding this comment

dbkinder left a comment

Choose a reason for hiding this comment

tylertitsworth left a comment • edited Loading

Choose a reason for hiding this comment

Configure Model Server

Deploy Model Server

dbkinder commented Sep 30, 2024

arun-gupta left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eero-t Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eero-t Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

eero-t Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

eero-t Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eero-t Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devpramod commented Sep 25, 2024 •

edited

Loading

dbkinder left a comment •

edited

Loading

tylertitsworth left a comment •

edited

Loading

eero-t Nov 28, 2024 •

edited

Loading

eero-t Nov 28, 2024 •

edited

Loading

eero-t Nov 28, 2024 •

edited

Loading

eero-t Nov 28, 2024 •

edited

Loading

eero-t Nov 28, 2024 •

edited

Loading