Consider adding support for routing to LLMServerPool as a valid backendRef #4423

arkodg · 2024-10-10T23:54:37Z

Description:

The kubernetes-sigs/llm-instance-gateway project has introduced a new backendRef called LLMServerPool, representing a collection of model servers inside Kubernetes, that can be routed to, from an HTTPRoute, and is looking for envoy proxy based implementations to support routing to this backendRef natively. More in kubernetes-sigs/llm-instance-gateway#19

Creating this issue, to decide on whether Envoy Gateway should add support for this

arkodg · 2024-10-11T00:11:15Z

Hoping end users as well as vendors using Envoy Gateway today can chime in and share whether they are interested in using this feature if it did exist natively.

Please also leave a comment, If you're not yet an Envoy Gateway user, but would adopt it, if this feature was added 😄

Current workaround

Create an EnvoyExtensionPolicy to configure the ext proc service
Edit xDS to add the using EnvoyPatchPolicy or Extension Server to add the original_destination_cluster xDS Cluster config

guydc · 2024-10-11T11:26:11Z

So far, we refrained from supporting specific backends (e.g. S3, EC2, ... ). This API is not yet a widely adopted resource like Service, ServiceImport.

Create an EnvoyExtensionPolicyto configure the ext proc service

The alternative, as I understand it, is to have a backend resource define portions of the downstream filter chain. In general (not for the LLM use case), that can create some issues around unexpected side effects and conflicts from different backends. Maybe this can be mitigated by scoping the filters to specific routes or even using upstream filters and by detecting/resolving conflicts in IR translation.

Would this significantly complicate existing translation in EG?
Are there other examples in EG/GW-API space for backends having this "implict" impact on downstream traffic processing?

Edit xDS to add the using EnvoyPatchPolicy or Extension Server to add the original_destination_cluster xDS Cluster config

This can be improved (somewhat) by supporting backend reference extensibility, as proposed here: #4373 (comment).

Users may still reference the LLMServerPool in their HTTPRoutes, but EG is not responsible for the translation.
The extension server required for LLM resource translation may be delivered as part of an extended EG "contrib" chart, to simplify LCM.

zhaohuabing · 2024-10-15T01:59:47Z

EG can't directly support LLMServerPool as a Backend type because it lacks the logic to handle LLM-specific configuraitons, such as how to set up the filter chain and routes properly. This responsibility falls to a standalone component, the "LLM Gateway controller".

The current workaround, using a dummy backend approach, is a bit of a hack. It results in an HTTPRoute that can be confusing to anyone inspecting it, as the destination cluster is just a placeholder. This can be improved by adding support to custom Backend types, as @guydc suggested.

EG will need to invoke an "LLM Gateway extension" to translate the llm-backend to a original_destination_cluster. This extension will also insert an ExtProc filter to the HTTP filter chain to retrieve the IP of the LLM pod, this can be added via an EnvoyExtensionPolicy or through a xDS mutation extension point like the Extension Server.

EG delegates the translation of llm-gateway.k8s.io\LLMServerPool Backend type to a third-party extension.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: llm-route
spec:
  parentRefs:
    - name: inference-gateway
      sectionName: llm-gw
  rules:
  - backendRefs:
      - group: llm-gateway.k8s.io
        kind: LLMServerPool
        name: llm-backend

This Backend resource is only used by the LLM Gateway controller, EG doesn't care about it.

apiVersion: llm-gateway.k8s.io
kind: LLMServerPool
metadata:
  name: llm-backend
spec:
  .... omitted, EG doesn't care

This mechanism can also be used to support other vendor-specific or private Backend types as out-of-tree extensions, such as AWS S3, EC2, Lambda, etc.

robscott · 2024-10-15T18:15:30Z

To clarify, we're checking with Envoy-based Gateway API implementations to understand which ones would be open to adding native support for the new LLMServerPool API that wg-serving is working on.

This API is not yet a widely adopted resource like Service, ServiceImport.

Completely agree. This is a bit of a chicken and egg problem though. We want to see Gateway API implementations support this new k8s API as a backend, but that requires one implementation to be first. Ideally that's an OSS implementation that can then be used as a reference implementation for how this integration can work.

EG delegates the translation of llm-gateway.k8s.io\LLMServerPool Backend type to a third-party extension.

The point here is that this is a new Kubernetes API, not a third-party extension. Deciding on whether or not to support this should be more related to whether or not this project should support TLSRoute or ServiceImport - OSS Kubernetes APIs that are still only in alpha.

I've suggested that instead of continuing to work on the rather fragile workaround in #4423 (comment), it would be better for the WG to work to support this resource natively in an OSS + CNCF Gateway API implementation. Envoy Gateway seems like a great option for this, but we'll also be open to any other projects that are interested.

zhaohuabing · 2024-10-16T01:16:58Z

@robscott Thanks for the clarification! I initially thought this was being proposed as an EG-specific API. If it's going to be a Kubernetes API like TCPRoute, then EG would be happy to support it. EG has already supported all the experimental Gateway APIs, so supporting this API would be in line with that.

github-actions · 2024-11-15T04:02:41Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

arkodg added the kind/decision A record of a decision made by the community. label Oct 10, 2024

arkodg mentioned this issue Oct 11, 2024

Gateways supporting LLMServerPool as a an HTTPRoute BackendRef kubernetes-sigs/llm-instance-gateway#19

Open

github-actions bot added the stale label Nov 15, 2024

muwaqar mentioned this issue Nov 22, 2024

Support custom backendRefs via extensions #4762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding support for routing to LLMServerPool as a valid backendRef #4423

Consider adding support for routing to LLMServerPool as a valid backendRef #4423

arkodg commented Oct 10, 2024 •

edited

Loading

arkodg commented Oct 11, 2024 •

edited

Loading

guydc commented Oct 11, 2024 •

edited

Loading

zhaohuabing commented Oct 15, 2024 •

edited

Loading

robscott commented Oct 15, 2024

zhaohuabing commented Oct 16, 2024 •

edited

Loading

github-actions bot commented Nov 15, 2024

Consider adding support for routing to LLMServerPool as a valid backendRef #4423

Consider adding support for routing to LLMServerPool as a valid backendRef #4423

Comments

arkodg commented Oct 10, 2024 • edited Loading

arkodg commented Oct 11, 2024 • edited Loading

guydc commented Oct 11, 2024 • edited Loading

zhaohuabing commented Oct 15, 2024 • edited Loading

robscott commented Oct 15, 2024

zhaohuabing commented Oct 16, 2024 • edited Loading

github-actions bot commented Nov 15, 2024

arkodg commented Oct 10, 2024 •

edited

Loading

arkodg commented Oct 11, 2024 •

edited

Loading

guydc commented Oct 11, 2024 •

edited

Loading

zhaohuabing commented Oct 15, 2024 •

edited

Loading

zhaohuabing commented Oct 16, 2024 •

edited

Loading