Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🗺️ Vision / multi-modal #495

Closed
53 of 62 tasks
mikeldking opened this issue May 23, 2024 · 2 comments
Closed
53 of 62 tasks

🗺️ Vision / multi-modal #495

mikeldking opened this issue May 23, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request language: js Related to JavaScript or Typescript integration language: python Related to Python integration roadmap
Milestone

Comments

@mikeldking
Copy link
Contributor

mikeldking commented May 23, 2024

GPT 4o introduces a new message type that contains images and coded as either URL or base64 encoded.

example:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0])

https://platform.openai.com/docs/guides/vision

Milestone 1

  • Vision support in instrumentations python for llama-index, openai, gemini, and langchain
  • Eliminate performance degradations from base64 encoded payloads by allowing users to opt out
  • Preliminary set of config flags to mask input output that could be sensitive info
  • Create examples

Milestone N

  • image synthesis apis such as DALL-E

Tracing

Instrumenation

Testing

Image tracing

Context Attributes

Config

Suppress Tracing

UI / Javascript

Testing

Documentation

Evals

@mikeldking mikeldking added enhancement New feature or request triage Issues that require triage labels May 23, 2024
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix May 23, 2024
@mikeldking mikeldking self-assigned this May 23, 2024
@mikeldking mikeldking added language: js Related to JavaScript or Typescript integration language: python Related to Python integration labels May 23, 2024
@mikeldking
Copy link
Contributor Author

Example vLLM client that should also support vision

class VLMClient:
    def __init__(self, vlm_model: str = VLM_MODEL, vllm_url: str = VLLM_URL):
        self._vlm_model = vlm_model
        self._vllm_client = httpx.AsyncClient(base_url=vllm_url)

        if VLLM_HEALTHCHECK:
            wait_for_ready(
                server_url=vllm_url,
                wait_seconds=VLLM_READY_TIMEOUT,
                health_endpoint="health",
            )

    @property
    def vlm_model(self) -> str:
        return self._vlm_model

    async def __call__(
        self,
        prompt: str,
        image_bytes: bytes | None = None,
        image_filetype: filetype.Type | None = None,
        max_tokens: int = 10,
    ) -> str:
        # Assemble the message content
        message_content: list[dict[str, str | dict]] = [
            {
                "type": "text",
                "text": prompt,
            }
        ]

        if image_bytes is not None:
            if image_filetype is None:
                image_filetype = filetype.guess(image_bytes)

            if image_filetype is None:
                raise ValueError("Could not determine image filetype")

            if image_filetype not in ALLOWED_IMAGE_TYPES:
                raise ValueError(
                    f"Image type {image_filetype} is not supported. Allowed types: {ALLOWED_IMAGE_TYPES}"
                )

            image_b64 = base64.b64encode(image_bytes).decode("utf-8")
            message_content.append(
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:{image_filetype.mime};base64,{image_b64}",
                    },
                }
            )

        # Put together the request payload
        payload = {
            "model": self.vlm_model,
            "messages": [{"role": "user", "content": message_content}],
            "max_tokens": max_tokens,
            # "logprobs": True,
            # "top_logprobs": 1,
        }

        response = await self._vllm_client.post("/v1/chat/completions", json=payload)
        response = response.json()
        response_text: str = (
            response.get("choices")[0].get("message", {}).get("content", "").strip()
        )

        return response_text

@mikeldking mikeldking changed the title [feature request]. Capture OpenAI gpt 4o image messages 🗺️ Vision / multi-modal May 23, 2024
@mikeldking mikeldking removed the triage Issues that require triage label May 31, 2024
@mikeldking mikeldking removed their assignment May 31, 2024
@mikeldking mikeldking added this to the Vision M0 milestone Aug 23, 2024
@mikeldking
Copy link
Contributor Author

Closing as completed as images is complete. Audio will come as part of openAI realtime instrumentation

@github-project-automation github-project-automation bot moved this from 📘 Todo to ✅ Done in phoenix Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request language: js Related to JavaScript or Typescript integration language: python Related to Python integration roadmap
Projects
Archived in project
Status: Done
Development

No branches or pull requests

3 participants