Multimodal Models

LLaVA and BakLLaVA are multimodal models available through Ollama. Select a multimodal model from the Lumos Options page and prompt away!

Note: Some webpages contain many images. It may be preferable to open individual images in a separate tab to reduce the amount of images bound to the model. In the future, optimizations may be made to improve the user experience. At the moment, only 10 images are bound to the model for processing at a time.

Prompting Tip

Prefix the prompt with the text "Based on the image". This prefix will override Lumos's internal prompt classification mechanism.

Examples

"Based on the image, describe the background"
"Based on the image, count how many dogs are in the photo"

Examples

What's for dinner?

More food...

Cool picture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal.md

multimodal.md

Multimodal Models

Prompting Tip

Examples

Files

multimodal.md

Latest commit

History

multimodal.md

File metadata and controls

Multimodal Models

Prompting Tip

Examples