LLaVA and BakLLaVA are multimodal models available through Ollama. Select a multimodal model from the Lumos Options page and prompt away!
Note: Some webpages contain many images. It may be preferable to open individual images in a separate tab to reduce the amount of images bound to the model. In the future, optimizations may be made to improve the user experience. At the moment, only 10 images are bound to the model for processing at a time.
Prefix the prompt with the text "Based on the image". This prefix will override Lumos's internal prompt classification mechanism.
Examples
- "Based on the image, describe the background"
- "Based on the image, count how many dogs are in the photo"