Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support image input in the chat completion request #55

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Youho99
Copy link

@Youho99 Youho99 commented Jul 10, 2024

Tested with a single image

This pull request responds to issue #54

It allows you to take into account the architecture of the OpenAI API request with an image

Example on the OpenAI documentation:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4-turbo",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

The code has not been prettyfied, so we need to review that

@lhenault
Copy link
Owner

Thanks for your work, will happily review this once you think it's ready (and passing the pre-commit check). If you have a working example for VLM / image processing to share, that would be a nice addition to the existing ones.

@Youho99 Youho99 marked this pull request as ready for review July 15, 2024 14:35
@Youho99
Copy link
Author

Youho99 commented Jul 15, 2024

Don't use grpcio and grpcio-tools 1.65.0 version (remised version)

I don't know how to modify it in the poetry requirements

@Youho99
Copy link
Author

Youho99 commented Jul 16, 2024

I just modified the rules regarding the versions of grpcio and grpcio-tools in the toml, and I regenerated the poetry.lock

Since this is my first time doing this, I would like to request special attention on this.

@Youho99
Copy link
Author

Youho99 commented Jul 16, 2024

I will provide an example of using my feature in a second step (in another PR I think)

@Youho99
Copy link
Author

Youho99 commented Jul 16, 2024

@lhenault I think you can review this PR (and change the version accordingly) :)

@lhenault
Copy link
Owner

Hey @Youho99 !

I tried your changes the other day and encountered a few issues, but probably because of me. Thanks again for your PR and sorry for the delay, it's very much appreciated. 😌

Let me have another look soon (and if you have a working example for image inputs that might speed up things).

@Youho99
Copy link
Author

Youho99 commented Aug 28, 2024

@lhenault

In the next few days I'll get back to it, and provide an example.

Let me know if you have any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants