Skip to content

Commit

Permalink
docs: update lntegration docs & fixed links (lancedb#1423)
Browse files Browse the repository at this point in the history
1. Updated langchain docs. 
2. Minor update to llamaindex doc.
3. Added notebook examples and linked them correctly
  • Loading branch information
raghavdixit99 authored Jul 3, 2024
1 parent b8ccea9 commit a5ff623
Show file tree
Hide file tree
Showing 5 changed files with 1,225 additions and 8 deletions.
7 changes: 4 additions & 3 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,11 @@ nav:
- DuckDB: python/duckdb.md
- LangChain:
- LangChain 🔗: integrations/langchain.md
- LangChain demo: notebooks/langchain_demo.ipynb
- LangChain JS/TS 🔗: https://js.langchain.com/docs/integrations/vectorstores/lancedb
- LlamaIndex 🦙:
- LlamaIndex docs: integrations/llamaIndex.md
- LlamaIndex demo: https://docs.llamaindex.ai/en/stable/examples/vector_stores/LanceDBIndexDemo/
- LlamaIndex demo: notebooks/llamaIndex_demo.ipynb
- Pydantic: python/pydantic.md
- Voxel51: integrations/voxel51.md
- PromptTools: integrations/prompttools.md
Expand Down Expand Up @@ -204,9 +205,9 @@ nav:
- Pandas and PyArrow: python/pandas_and_pyarrow.md
- Polars: python/polars_arrow.md
- DuckDB: python/duckdb.md
- LangChain 🦜️🔗↗: https://python.langchain.com/docs/integrations/vectorstores/lancedb
- LangChain 🦜️🔗↗: integrations/langchain.md
- LangChain.js 🦜️🔗↗: https://js.langchain.com/docs/integrations/vectorstores/lancedb
- LlamaIndex 🦙↗: https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
- LlamaIndex 🦙↗: integrations/llamaIndex.md
- Pydantic: python/pydantic.md
- Voxel51: integrations/voxel51.md
- PromptTools: integrations/prompttools.md
Expand Down
117 changes: 113 additions & 4 deletions docs/src/integrations/langchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
![Illustration](../assets/langchain.png)

## Quick Start
You can load your document data using langchain's loaders, for this example we are using `TextLoader` and `OpenAIEmbeddings` as the embedding model.
You can load your document data using langchain's loaders, for this example we are using `TextLoader` and `OpenAIEmbeddings` as the embedding model. Checkout Complete example here - [LangChain demo](../notebooks/langchain_example.ipynb)
```python
import os
from langchain.document_loaders import TextLoader
Expand Down Expand Up @@ -38,6 +38,8 @@ The exhaustive list of parameters for `LanceDB` vector store are :
- `api_key`: (Optional) API key to use for LanceDB cloud database. Defaults to `None`.
- `region`: (Optional) Region to use for LanceDB cloud database. Only for LanceDB Cloud, defaults to `None`.
- `mode`: (Optional) Mode to use for adding data to the table. Defaults to `'overwrite'`.
- `reranker`: (Optional) The reranker to use for LanceDB.
- `relevance_score_fn`: (Optional[Callable[[float], float]]) Langchain relevance score function to be used. Defaults to `None`.

```python
db_url = "db://lang_test" # url of db you created
Expand All @@ -54,12 +56,14 @@ vector_store = LanceDB(
```

### Methods
To add texts and store respective embeddings automatically:

##### add_texts()
- `texts`: `Iterable` of strings to add to the vectorstore.
- `metadatas`: Optional `list[dict()]` of metadatas associated with the texts.
- `ids`: Optional `list` of ids to associate with the texts.
- `kwargs`: `Any`

This method adds texts and stores respective embeddings automatically.

```python
vector_store.add_texts(texts = ['test_123'], metadatas =[{'source' :'wiki'}])
Expand All @@ -74,19 +78,124 @@ pd_df.to_csv("docsearch.csv", index=False)
# you can also create a new vector store object using an older connection object:
vector_store = LanceDB(connection=tbl, embedding=embeddings)
```
For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
##### create_index()
- `col_name`: `Optional[str] = None`
- `vector_col`: `Optional[str] = None`
- `num_partitions`: `Optional[int] = 256`
- `num_sub_vectors`: `Optional[int] = 96`
- `index_cache_size`: `Optional[int] = None`

This method creates an index for the vector store. For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.

```python
# for creating vector index
vector_store.create_index(vector_col='vector', metric = 'cosine')

# for creating scalar index(for non-vector columns)
vector_store.create_index(col_name='text')

```
```

##### similarity_search()
- `query`: `str`
- `k`: `Optional[int] = None`
- `filter`: `Optional[Dict[str, str]] = None`
- `fts`: `Optional[bool] = False`
- `name`: `Optional[str] = None`
- `kwargs`: `Any`

Return documents most similar to the query without relevance scores

```python
docs = docsearch.similarity_search(query)
print(docs[0].page_content)
```

##### similarity_search_by_vector()
- `embedding`: `List[float]`
- `k`: `Optional[int] = None`
- `filter`: `Optional[Dict[str, str]] = None`
- `name`: `Optional[str] = None`
- `kwargs`: `Any`

Returns documents most similar to the query vector.

```python
docs = docsearch.similarity_search_by_vector(query)
print(docs[0].page_content)
```

##### similarity_search_with_score()
- `query`: `str`
- `k`: `Optional[int] = None`
- `filter`: `Optional[Dict[str, str]] = None`
- `kwargs`: `Any`

Returns documents most similar to the query string with relevance scores, gets called by base class's `similarity_search_with_relevance_scores` which selects relevance score based on our `_select_relevance_score_fn`.

```python
docs = docsearch.similarity_search_with_relevance_scores(query)
print("relevance score - ", docs[0][1])
print("text- ", docs[0][0].page_content[:1000])
```

##### similarity_search_by_vector_with_relevance_scores()
- `embedding`: `List[float]`
- `k`: `Optional[int] = None`
- `filter`: `Optional[Dict[str, str]] = None`
- `name`: `Optional[str] = None`
- `kwargs`: `Any`

Return documents most similar to the query vector with relevance scores.
Relevance score

```python
docs = docsearch.similarity_search_by_vector_with_relevance_scores(query_embedding)
print("relevance score - ", docs[0][1])
print("text- ", docs[0][0].page_content[:1000])
```

##### max_marginal_relevance_search()
- `query`: `str`
- `k`: `Optional[int] = None`
- `fetch_k` : Number of Documents to fetch to pass to MMR algorithm, `Optional[int] = None`
- `lambda_mult`: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding
to maximum diversity and 1 to minimum diversity.
Defaults to 0.5. `float = 0.5`
- `filter`: `Optional[Dict[str, str]] = None`
- `kwargs`: `Any`

Returns docs selected using the maximal marginal relevance(MMR).
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Similarly, `max_marginal_relevance_search_by_vector()` function returns docs most similar to the embedding passed to the function using MMR. instead of a string query you need to pass the embedding to be searched for.

```python
result = docsearch.max_marginal_relevance_search(
query="text"
)
result_texts = [doc.page_content for doc in result]
print(result_texts)

## search by vector :
result = docsearch.max_marginal_relevance_search_by_vector(
embeddings.embed_query("text")
)
result_texts = [doc.page_content for doc in result]
print(result_texts)
```

##### add_images()
- `uris` : File path to the image. `List[str]`.
- `metadatas` : Optional list of metadatas. `(Optional[List[dict]], optional)`
- `ids` : Optional list of IDs. `(Optional[List[str]], optional)`

Adds images by automatically creating their embeddings and adds them to the vectorstore.

```python
vec_store.add_images(uris=image_uris)
# here image_uris are local fs paths to the images.
```


5 changes: 4 additions & 1 deletion docs/src/integrations/llamaIndex.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
![Illustration](../assets/llama-index.jpg)

## Quick start
You would need to install the integration via `pip install llama-index-vector-stores-lancedb` in order to use it. You can run the below script to try it out :
You would need to install the integration via `pip install llama-index-vector-stores-lancedb` in order to use it.
You can run the below script to try it out :
```python
import logging
import sys
Expand Down Expand Up @@ -43,6 +44,8 @@ retriever = index.as_retriever(vector_store_kwargs={"where": lance_filter})
response = retriever.retrieve("What did the author do growing up?")
```

Checkout Complete example here - [LlamaIndex demo](../notebooks/LlamaIndex_example.ipynb)

### Filtering
For metadata filtering, you can use a Lance SQL-like string filter as demonstrated in the example above. Additionally, you can also filter using the `MetadataFilters` class from LlamaIndex:
```python
Expand Down
Loading

0 comments on commit a5ff623

Please sign in to comment.