Skip to content

Commit

Permalink
add kubernetes deployment info
Browse files Browse the repository at this point in the history
Signed-off-by: irisdingbj <[email protected]>
  • Loading branch information
irisdingbj authored and ftian1 committed May 30, 2024
1 parent a0a6e03 commit 028989a
Show file tree
Hide file tree
Showing 3 changed files with 126 additions and 37 deletions.
163 changes: 126 additions & 37 deletions community/rfcs/24-05-17-001-OPEA-Deployment-Design.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
**Author**

[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), **Edit Here to add your id**
[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), [irisdingbj](https://github.com/irisdingbj), [KfreeZ](https://github.com/kfreez), [zhlsunshine](https://github.com/zhlsunshine) **Edit Here to add your id**

**Status**

Under Review

**Objective**

Have a clear and good design for users to deploy their own GenAI applications on-premis or cloud environment.
Have a clear and good design for users to deploy their own GenAI applications on docker or Kubernetes environment.


**Motivation**
Expand All @@ -27,67 +27,155 @@ The proposed OPEA deployment workflow is

For GenAI applications, we provides two interfaces for deployment

1. on-premis deployment by python
1. Docker deployment by python

For example, constructing RAG (Retrieval-Augmented Generation) application with python code is something like:

```python
from comps import MicroService, ServiceOrchestrator
class ChatQnAService:
    def __init__(self, port=8080):
        self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna")
    def add_remote_service(self):
        embedding = MicroService(
            name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True
        )
        retriever = MicroService(
            name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True
        )
        rerank = MicroService(
            name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True
        )
        llm = MicroService(
            name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True
        )
        self.service_builder.add(embedding).add(retriever).add(rerank).add(llm)
        self.service_builder.flow_to(embedding, retriever)
        self.service_builder.flow_to(retriever, rerank)
        self.service_builder.flow_to(rerank, llm)
def __init__(self, port=8080):
self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna")
def add_remote_service(self):
embedding = MicroService(
name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True
)
retriever = MicroService(
name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True
)
rerank = MicroService(
name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True
)
llm = MicroService(
name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True
)
self.service_builder.add(embedding).add(retriever).add(rerank).add(llm)
self.service_builder.flow_to(embedding, retriever)
self.service_builder.flow_to(retriever, rerank)
self.service_builder.flow_to(rerank, llm)

```

2. cloud deployment by yaml
2. Kubernetes deployment by yaml

For example, constructing RAG (Retrieval-Augmented Generation) application with yaml is something like:

```yaml
opea_micro_services:
  embedding:
    endpoint: /v1/embeddings
embedding:
endpoint: /v1/embeddings
port: 6000
  retrieval:
    endpoint: /v1/retrieval
retrieval:
endpoint: /v1/retrieval
port: 7000
  reranking:
    endpoint: /v1/reranking
reranking:
endpoint: /v1/reranking
port: 8000
  llm:
    endpoint: /v1/chat/completions
llm:
endpoint: /v1/chat/completions
port: 9000
 
opea_mega_service:
  port: 8080
  mega_flow:
    - embedding >> retrieval >> reranking >> llm
port: 8080
mega_flow:
- embedding >> retrieval >> reranking >> llm

```

When user wants to deploy the GenAI application to clould environment, such yaml configuration file should be defined and coverted to `docker composer` or `kubernetes manifest` or `kubernetes helm chart` files.
When user wants to deploy the GenAI application to Kubernetes environment, such yaml configuration file should be defined and coverted to `docker composer`or [GenAI Microservice Connecto -(GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector) Custom Resource files.
A sample GMC Custom Resource is like below:
```yaml
apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
name: chatqna
namespace: gmcsample
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Embedding
internalService:
serviceName: embedding-service
config:
endpoint: /v1/embeddings
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/tei
modelId: BAAI/bge-base-en-v1.5
endpoint: /embed
isDownstreamService: true
- name: Retriever
data: $response
internalService:
serviceName: retriever-redis-server
config:
RedisUrl: redis-vector-db
IndexName: rag-redis
tei_endpoint: tei-embedding-service
endpoint: /v1/retrieval
- name: VectorDB
internalService:
serviceName: redis-vector-db
isDownstreamService: true
- name: Reranking
data: $response
internalService:
serviceName: reranking-service
config:
tei_reranking_endpoint: tei-reranking-service
gmcTokenSecret: gmc-tokens
endpoint: /v1/reranking
- name: TeiReranking
internalService:
serviceName: tei-reranking-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/rerank
modelId: BAAI/bge-reranker-large
endpoint: /rerank
isDownstreamService: true
- name: Llm
data: $response
internalService:
serviceName: llm-service
config:
tgi_endpoint: tgi-service
gmcTokenSecret: gmc-tokens
endpoint: /v1/chat/completions
- name: Tgi
internalService:
serviceName: tgi-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/tgi
modelId: Intel/neural-chat-7b-v3-3
endpoint: /generate
isDownstreamService: true
```
There should be an available `gmconnectors.gmc.opea.io` CR named `chatqna` under the namespace `gmcsample`, showing below:

```bash
$kubectl get gmconnectors.gmc.opea.io -n gmcsample
NAME URL READY AGE
chatqa http://router-service.gmcsample.svc.cluster.local:8080 Success 3m
```

And the user can access the application pipeline via the value of `URL` field in above.

The whole deployment process illustrated by the diagram below.

<a target="_blank" href="opea_deploy_process.png">
<img src="opea_deploy_process.png" alt="Deployment Process" width=480 height=310>
<img src="opea_deploy_process_v1.png" alt="Deployment Process" width=480 height=310>
</a>


Expand All @@ -108,3 +196,4 @@ n/a
- [ ] k8s GMC with istio



File renamed without changes
Binary file added community/rfcs/opea_deploy_process_v1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 028989a

Please sign in to comment.