Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce outlines.models.transformers_multimodal #33

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lapp0
Copy link
Owner

@lapp0 lapp0 commented Jun 19, 2024

Docs: https://github.com/lapp0/outlines/blob/multimodal-models/docs/reference/models/multimodal.md

Done:

  • core implementation and all components necessary for structured generation with image and video input

Todo:

  • more unit tests for MultiModalSequenceGeneratorAdapter
  • Find a tiny vision model for test_generate.py, current model is too expensive to be part of test suite.
  • Test models and architectures other than llava-hf/llava-v1.6-mistral-7b-hf
  • fix batch request handling, you can have multiple images per prompt

Improve docs:

  • reference transformers.md for capabilities
  • show direct image loading, and local file image loading
  • more detailed introduction section
  • how to add multiple images
  • caveat on including <image> token in prompt

def __call__( # type: ignore
self,
prompts: Union[str, List[str]],
media: Union[str, Any],
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change Any to PIL.Image

@lapp0 lapp0 force-pushed the multimodal-models branch 2 times, most recently from 11c0db1 to a6c229e Compare June 19, 2024 20:02
@rlouf
Copy link

rlouf commented Jun 20, 2024

Let’s call it transformers_vision which I think is a more specific name.

@lapp0 lapp0 force-pushed the fix-mamba-integration branch from f17913b to 48b6f8f Compare July 15, 2024 09:05
@rlouf rlouf force-pushed the fix-mamba-integration branch from 48b6f8f to bf3694c Compare July 15, 2024 13:53
@lapp0 lapp0 force-pushed the fix-mamba-integration branch 8 times, most recently from 75dc370 to acb0759 Compare July 16, 2024 00:08
@lapp0 lapp0 force-pushed the multimodal-models branch from a6c229e to a40ec2f Compare July 19, 2024 10:11
@lapp0 lapp0 changed the base branch from fix-mamba-integration to main July 19, 2024 10:13
@lapp0 lapp0 force-pushed the multimodal-models branch 4 times, most recently from bdf4097 to 653fe26 Compare July 19, 2024 11:56
return prompts, media

@classmethod
def _load_media(cls, media):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be part of the library?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, it's a convenience, but unnecessary. Removing.

from outlines.processors import OutlinesLogitsProcessor


class TransformersMultiModal(Transformers):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TransformersMultiModal(Transformers):
class TransformersVision(Transformers):

yield self._decode_generation(output_group_ids)


def transformers_multimodal(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def transformers_multimodal(
def transformers_vision(

@lapp0 lapp0 force-pushed the multimodal-models branch 9 times, most recently from 9cc0775 to 56b5918 Compare July 19, 2024 14:22
@lapp0 lapp0 force-pushed the multimodal-models branch 17 times, most recently from 6adb73b to 9ae6e70 Compare July 19, 2024 15:50
@lapp0 lapp0 force-pushed the multimodal-models branch from 9ae6e70 to 43424c6 Compare July 19, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants