Skip to content

Commit

Permalink
Refresh Multi Modal Doc to include more docs (run-llama#9160)
Browse files Browse the repository at this point in the history
* Refresh Multi Modal Doc to include more docs

* cr

---------

Co-authored-by: haotian zhang <[email protected]>
  • Loading branch information
hatianzhang and haotian zhang authored Nov 27, 2023
1 parent a87b63f commit acd3441
Showing 1 changed file with 55 additions and 0 deletions.
55 changes: 55 additions & 0 deletions docs/use_cases/multimodal.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,61 @@ maxdepth: 1
/examples/multi_modal/llava_multi_modal_tesla_10q.ipynb
```

### LLaVa-13, Fuyu-8B and MiniGPT-4 Multi-Modal LLM Models Comparison for Image Reasoning

These notebooks show how to use different Multi-Modal LLM models for image understanding/reasoning. The various model inferences are supported by Replicate or OpenAI GPT4-V API. We compared several popular Multi-Modal LLMs:

- GPT4-V (OpenAI API)
- LLava-13B (Replicate)
- Fuyu-8B (Replicate)
- MiniGPT-4 (Replicate)

Check out our guides below:

```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/replicate_multi_modal.ipynb
GPT4-V: </examples/multi_modal/openai_multi_modal.ipynb>
```

### Pydantic Program for Generating Structured Output for Multi-Modal LLMs

You can generate `structured` output with new OpenAI GPT4V via LlamaIndex. The user just needs to specify a Pydantic object to define the structure of output.

Check out the guide below:

```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/multi_modal_pydantic.ipynb
```

### Chain of Thought (COT) Prompting for GPT4-V

GPT4-V has amazed us with its ability to analyze images and even generate website code from visuals.
This tutorial investigates GPT4-V's proficiency in interpreting bar charts, scatter plots, and tables. We aim to assess whether specific questioning and chain of thought prompting can yield better responses compared to broader inquiries. Our demonstration seeks to determine if GPT-4V can exceed these known limitations with precise questioning and systematic reasoning techniques.

```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/gpt4v_experiments_cot.ipynb
```

### Simple Evaluation of Multi-Modal RAG

In this notebook guide, we'll demonstrate how to evaluate a Multi-Modal RAG system. As in the text-only case, we will consider the evaluation of Retrievers and Generators separately. As we alluded in our blog on the topic of Evaluating Multi-Modal RAGs, our approach here involves the application of adapted versions of the usual techniques for evaluating both Retriever and Generator (used for the text-only case). These adapted versions are part of the llama-index library (i.e., evaluation module), and this notebook will walk you through how you can apply them to your evaluation use-cases.

```{toctree}
---
maxdepth: 1
---
/examples/evaluation/multi_modal/multi_modal_rag_evaluation.ipynb
```

### Using Chroma for Multi-Modal retrieval with single vector store

Chroma vector DB supports single vector store for indexing both images and texts.
Expand Down

0 comments on commit acd3441

Please sign in to comment.