Multimodal support for Gemini #80

samrat · 2024-11-12T15:10:49Z

Currently, it's only possible to send text messages using the Gemini adapter:

instructor_ex/lib/instructor/adapters/gemini.ex

Line 61 in 1abd847

{system_instructions, [%{role: "user", parts: [%{text: content}]} | history]}

The Gemini API supports image, video and audio inputs(unlike the OpenAI API where you send the file contents base64-encoded, you need to upload the file separately)

Would you be open to a PR that adds support for uploading files, or would you say that is out of scope of this project?

If it's out of scope, I can create a smaller PR that allows media URLs(with the upload happening outside the library):

Instructor.chat_completion(
  mode: :json_schema,
  model: "gemini-1.5-flash",
  response_model: VideoDesc,
  messages: [
    %{
      role: "user", 
      content: [
        %{
          type: "video_url",
          video_url: %{
            url: "https://generativelanguage.googleapis.com/v1beta/files/..."
          }
        },
        %{
          type: "text",
          text: " what's going on in this video?"
        }
      ]
    }
  ]
)

samrat linked a pull request Nov 12, 2024 that will close this issue

[Gemini adapter] Support message content with image and video URLs #81

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal support for Gemini #80

Multimodal support for Gemini #80

samrat commented Nov 12, 2024

Multimodal support for Gemini #80

Multimodal support for Gemini #80

Comments

samrat commented Nov 12, 2024