Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal support for Gemini #80

Open
samrat opened this issue Nov 12, 2024 · 0 comments · May be fixed by #81
Open

Multimodal support for Gemini #80

samrat opened this issue Nov 12, 2024 · 0 comments · May be fixed by #81

Comments

@samrat
Copy link

samrat commented Nov 12, 2024

Currently, it's only possible to send text messages using the Gemini adapter:

{system_instructions, [%{role: "user", parts: [%{text: content}]} | history]}

The Gemini API supports image, video and audio inputs(unlike the OpenAI API where you send the file contents base64-encoded, you need to upload the file separately)

Would you be open to a PR that adds support for uploading files, or would you say that is out of scope of this project?

If it's out of scope, I can create a smaller PR that allows media URLs(with the upload happening outside the library):

Instructor.chat_completion(
  mode: :json_schema,
  model: "gemini-1.5-flash",
  response_model: VideoDesc,
  messages: [
    %{
      role: "user", 
      content: [
        %{
          type: "video_url",
          video_url: %{
            url: "https://generativelanguage.googleapis.com/v1beta/files/..."
          }
        },
        %{
          type: "text",
          text: " what's going on in this video?"
        }
      ]
    }
  ]
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant