🗺️ Vision / multi-modal #495

mikeldking · 2024-05-23T08:23:18Z

GPT 4o introduces a new message type that contains images and coded as either URL or base64 encoded.

example:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

https://platform.openai.com/docs/guides/vision

Milestone 1

Vision support in instrumentations python for llama-index, openai, gemini, and langchain
Eliminate performance degradations from base64 encoded payloads by allowing users to opt out
Preliminary set of config flags to mask input output that could be sensitive info
Create examples

Milestone N

image synthesis apis such as DALL-E

Tracing

Instrumenation

Testing

[instrumentation-python] crewai testing #872

Image tracing

Context Attributes

Config

Suppress Tracing

UI / Javascript

[ui][vision] display images in the phoenix UI #568
[vision] instrumentation for openai-node SDK #704
[instrumentation-js] TraceConfig for openinference-core #821
[vision][ui] better display for images (max size, expand, grid view) #956
[vision] instrumentation for langchain-js

Testing

[vision][testing] benchmark different image sizes, load #558

Documentation

Evals

[vision][evals] llm_classify with image chat completions #574

mikeldking · 2024-05-23T15:37:28Z

Example vLLM client that should also support vision

class VLMClient:
    def __init__(self, vlm_model: str = VLM_MODEL, vllm_url: str = VLLM_URL):
        self._vlm_model = vlm_model
        self._vllm_client = httpx.AsyncClient(base_url=vllm_url)

        if VLLM_HEALTHCHECK:
            wait_for_ready(
                server_url=vllm_url,
                wait_seconds=VLLM_READY_TIMEOUT,
                health_endpoint="health",
            )

    @property
    def vlm_model(self) -> str:
        return self._vlm_model

    async def __call__(
        self,
        prompt: str,
        image_bytes: bytes | None = None,
        image_filetype: filetype.Type | None = None,
        max_tokens: int = 10,
    ) -> str:
        # Assemble the message content
        message_content: list[dict[str, str | dict]] = [
            {
                "type": "text",
                "text": prompt,
            }
        ]

        if image_bytes is not None:
            if image_filetype is None:
                image_filetype = filetype.guess(image_bytes)

            if image_filetype is None:
                raise ValueError("Could not determine image filetype")

            if image_filetype not in ALLOWED_IMAGE_TYPES:
                raise ValueError(
                    f"Image type {image_filetype} is not supported. Allowed types: {ALLOWED_IMAGE_TYPES}"
                )

            image_b64 = base64.b64encode(image_bytes).decode("utf-8")
            message_content.append(
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:{image_filetype.mime};base64,{image_b64}",
                    },
                }
            )

        # Put together the request payload
        payload = {
            "model": self.vlm_model,
            "messages": [{"role": "user", "content": message_content}],
            "max_tokens": max_tokens,
            # "logprobs": True,
            # "top_logprobs": 1,
        }

        response = await self._vllm_client.post("/v1/chat/completions", json=payload)
        response = response.json()
        response_text: str = (
            response.get("choices")[0].get("message", {}).get("content", "").strip()
        )

        return response_text

mikeldking · 2024-12-06T00:44:37Z

Closing as completed as images is complete. Audio will come as part of openAI realtime instrumentation

mikeldking added enhancement New feature or request triage Issues that require triage labels May 23, 2024

github-project-automation bot added this to phoenix May 23, 2024

github-project-automation bot moved this to 📘 Todo in phoenix May 23, 2024

mikeldking self-assigned this May 23, 2024

mikeldking added language: js Related to JavaScript or Typescript integration language: python Related to Python integration labels May 23, 2024

mikeldking changed the title ~~[feature request]. Capture OpenAI gpt 4o image messages~~ 🗺️ Vision / multi-modal May 23, 2024

mikeldking added the roadmap label May 23, 2024

mikeldking added this to phoenix roadmap May 23, 2024

mikeldking removed the triage Issues that require triage label May 31, 2024

mikeldking removed their assignment May 31, 2024

mikeldking assigned fjcasti1 Jun 10, 2024

mikeldking assigned RogerHYang Jul 3, 2024

mikeldking added this to the Vision M0 milestone Aug 23, 2024

mikeldking closed this as completed Dec 6, 2024

github-project-automation bot moved this to Done in phoenix roadmap Dec 6, 2024

github-project-automation bot moved this from 📘 Todo to ✅ Done in phoenix Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗺️ Vision / multi-modal #495

🗺️ Vision / multi-modal #495

mikeldking commented May 23, 2024 •

edited by axiomofjoy

Loading

mikeldking commented May 23, 2024

mikeldking commented Dec 6, 2024

🗺️ Vision / multi-modal #495

🗺️ Vision / multi-modal #495

Comments

mikeldking commented May 23, 2024 • edited by axiomofjoy Loading

Milestone 1

Milestone N

Tracing

Instrumenation

Testing

Image tracing

Context Attributes

Config

Suppress Tracing

UI / Javascript

Testing

Documentation

Evals

mikeldking commented May 23, 2024

mikeldking commented Dec 6, 2024

mikeldking commented May 23, 2024 •

edited by axiomofjoy

Loading