forked from dottxt-ai/outlines
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introduce outlines.models.transformers_multimodal
- Loading branch information
Showing
10 changed files
with
605 additions
and
79 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Transformers MultiModal | ||
|
||
Outlines allows seamless use of [multimodal models](https://huggingface.co/learn/computer-vision-course/en/unit4/multimodal-models/tasks-models-part1). | ||
|
||
Tasks supported include | ||
- image + text | ||
- video + text -> text | ||
- TODO: look into other models which can be used with no code changes | ||
|
||
|
||
## Example: Using [Llava-Next](https://huggingface.co/docs/transformers/en/model_doc/llava_next) Vision Models | ||
|
||
Install dependencies | ||
`pip install torchvision pillow flash-attn` | ||
|
||
Create the model | ||
```python | ||
import outlines | ||
|
||
model = outlines.models.transformers_multimodal( | ||
"llava-hf/llava-v1.6-mistral-7b-hf", | ||
device="cuda", | ||
) | ||
``` | ||
|
||
Create convenience function to load a `PIL.Image` from URL | ||
``` | ||
from PIL import Image | ||
from io import BytesIO | ||
from urllib.request import urlopen | ||
def img_from_url(url): | ||
img_byte_stream = BytesIO(urlopen(url).read()) | ||
return Image.open(img_byte_stream).convert("RGB") | ||
``` | ||
|
||
### Describing an image | ||
|
||
```python | ||
description_generator = outlines.generate.text(model) | ||
description_generator( | ||
"<image> detailed description:", | ||
[img_from_url("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")] | ||
) | ||
``` | ||
|
||
> This is a color photograph featuring a Siamese cat with striking blue eyes. The cat has a creamy coat and a light eye color, which is typical for the Siamese breed. Its features include elongated ears, a long, thin tail, and a striking coat pattern. The cat is sitting in an indoor setting, possibly on a cat tower or a similar raised platform, which is covered with a beige fabric, providing a comfortable and soft surface for the cat to rest or perch. The surface of the wall behind the cat appears to be a light-colored stucco or plaster. | ||
### Classifying an Image | ||
|
||
```python | ||
pattern = "Mercury|Venus|Earth|Mars|Saturn|Jupiter|Neptune|Uranus|Pluto" | ||
planet_generator = outlines.generate.regex(model, pattern) | ||
|
||
planet_generator( | ||
"What planet is this: <image>", | ||
[img_from_url("https://upload.wikimedia.org/wikipedia/commons/e/e3/Saturn_from_Cassini_Orbiter_%282004-10-06%29.jpg")] | ||
) | ||
``` | ||
|
||
> Saturn | ||
|
||
### Extracting Structured Image data | ||
|
||
```python | ||
from pydantic import BaseModel | ||
from typing import List, Optional | ||
|
||
def img_from_url(url) | ||
|
||
class ImageData(BaseModel): | ||
caption: str | ||
tags_list: List[str] | ||
object_list: List[str] | ||
is_photo: bool | ||
|
||
image_data_generator = outlines.generate.json(model, ImageData) | ||
|
||
image_data_generator( | ||
"<image> detailed JSON metadata:", | ||
[img_from_url("https://upload.wikimedia.org/wikipedia/commons/9/98/Aldrin_Apollo_11_original.jpg")] | ||
) | ||
``` | ||
|
||
> `ImageData(caption='An astronaut on the moon', tags_list=['moon', 'space', 'nasa', 'americanflag'], object_list=['moon', 'moon_surface', 'space_suit', 'americanflag'], is_photo=True)` | ||
|
||
## Resources | ||
|
||
### Chosing a model | ||
- https://mmbench.opencompass.org.cn/leaderboard | ||
- https://huggingface.co/spaces/WildVision/vision-arena |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.