Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: few shots should support image input #155

Open
kdziedzic68 opened this issue Oct 28, 2024 · 3 comments
Open

feat: few shots should support image input #155

kdziedzic68 opened this issue Oct 28, 2024 · 3 comments
Labels
feature New feature or request

Comments

@kdziedzic68
Copy link
Collaborator

Feature description

Action items:

  • FewShotExample type should be extended with optional field representing list of input images
  • method list_few_shots of class Prompt should be able to recognize whether the given few shot entry contains images
  • usage of list_few_shots should be moved to LLM method: _format_chat_for_llm because of deciding whether the given model supports vision

Motivation

Users would need to create few shot learning systems for image processing - eg. classification etc.

Additional context

No response

@kdziedzic68 kdziedzic68 added the feature New feature or request label Oct 28, 2024
@ludwiktrammer
Copy link
Collaborator

ludwiktrammer commented Oct 28, 2024

I believe "FewShotExample type should be extended with optional field representing list of input images" is not necessary. FewShotExample object already contains the input model object, which already contains images (for prompts with images).

So we only need to make sure that the existing API for providing few show examples works as expected:

prompt.add_few_shot(SongData(name="Alice", age=30, theme="pop", cover_image=image_data), "It's a really catchy tune.")

Currently cover_image will be ignored. It should be used and added to the conversation as an image (provided that is present in image_input_fields)

@ludwiktrammer
Copy link
Collaborator

usage of list_few_shots should be moved to LLM method: _format_chat_for_llm because of deciding whether the given model supports vision

This is complicated. I think it would be good to discuss during grooming what kind of data should Prompt's chat() method return:

  1. The full conversation in the OpenAI format, including images and other non-standard elements
  2. A list of messages (object of specific data classes) that is independent of the OpenAI format and specifies different elements of the message separately. It would be LLMs role to change this to a format needed by the LLM model and to decide which elements to use.
  3. Only the textual part of the conversation. It would be LLMs role to obtain other elements (like images) by calling prompt's methods separately and try to integrate them with the textual conversation.

At the beginning of the project we discussed between 1 and 2 and decided to go with 1. Adding images showed some disadvantages of 1 (prompt alone cannot know what the particular LLM model can handle).

Currently (with the latest PR adding images to prompt and with how this ticket is written) we seem to be going the route of 3. I'm not convinced it's the best route - it seems quite wobbly (for example: knowing which image to add to which element of the conversation). I'm not convinced it's the best route - I think it would be worth revisiting our previous discussion as the team.

@akonarski-ds akonarski-ds moved this to Backlog in ragbits Oct 30, 2024
@ludwiktrammer
Copy link
Collaborator

@mhordynski I believe you wanted to read through the comments here and decide on one of the options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

2 participants