feat: few shots should support image input #155

kdziedzic68 · 2024-10-28T13:58:08Z

Feature description

Action items:

FewShotExample type should be extended with optional field representing list of input images
method list_few_shots of class Prompt should be able to recognize whether the given few shot entry contains images
usage of list_few_shots should be moved to LLM method: _format_chat_for_llm because of deciding whether the given model supports vision

Motivation

Users would need to create few shot learning systems for image processing - eg. classification etc.

Additional context

No response

The text was updated successfully, but these errors were encountered:

ludwiktrammer · 2024-10-28T14:22:57Z

I believe "FewShotExample type should be extended with optional field representing list of input images" is not necessary. FewShotExample object already contains the input model object, which already contains images (for prompts with images).

So we only need to make sure that the existing API for providing few show examples works as expected:

prompt.add_few_shot(SongData(name="Alice", age=30, theme="pop", cover_image=image_data), "It's a really catchy tune.")

Currently cover_image will be ignored. It should be used and added to the conversation as an image (provided that is present in image_input_fields)

ludwiktrammer · 2024-10-28T14:36:29Z

usage of list_few_shots should be moved to LLM method: _format_chat_for_llm because of deciding whether the given model supports vision

This is complicated. I think it would be good to discuss during grooming what kind of data should Prompt's chat() method return:

The full conversation in the OpenAI format, including images and other non-standard elements
A list of messages (object of specific data classes) that is independent of the OpenAI format and specifies different elements of the message separately. It would be LLMs role to change this to a format needed by the LLM model and to decide which elements to use.
Only the textual part of the conversation. It would be LLMs role to obtain other elements (like images) by calling prompt's methods separately and try to integrate them with the textual conversation.

At the beginning of the project we discussed between 1 and 2 and decided to go with 1. Adding images showed some disadvantages of 1 (prompt alone cannot know what the particular LLM model can handle).

Currently (with the latest PR adding images to prompt and with how this ticket is written) we seem to be going the route of 3. I'm not convinced it's the best route - it seems quite wobbly (for example: knowing which image to add to which element of the conversation). I'm not convinced it's the best route - I think it would be worth revisiting our previous discussion as the team.

ludwiktrammer · 2024-11-18T14:00:27Z

@mhordynski I believe you wanted to read through the comments here and decide on one of the options

kdziedzic68 added the feature New feature or request label Oct 28, 2024

kdziedzic68 added this to ragbits Oct 28, 2024

akonarski-ds moved this to Backlog in ragbits Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: few shots should support image input #155

feat: few shots should support image input #155

kdziedzic68 commented Oct 28, 2024

ludwiktrammer commented Oct 28, 2024 •

edited

Loading

ludwiktrammer commented Oct 28, 2024

ludwiktrammer commented Nov 18, 2024

feat: few shots should support image input #155

feat: few shots should support image input #155

Comments

kdziedzic68 commented Oct 28, 2024

Feature description

Motivation

Additional context

ludwiktrammer commented Oct 28, 2024 • edited Loading

ludwiktrammer commented Oct 28, 2024

ludwiktrammer commented Nov 18, 2024

ludwiktrammer commented Oct 28, 2024 •

edited

Loading