docs: update lntegration docs & fixed links (lancedb#1423)

1. Updated langchain docs. 2. Minor update to llamaindex doc. 3. Added notebook examples and linked them correctly
Epicism · Jul 3, 2024 · a5ff623 · a5ff623
1 parent b8ccea9
commit a5ff623
Show file tree

Hide file tree

Showing 5 changed files with 1,225 additions and 8 deletions.
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -125,10 +125,11 @@ nav:
           - DuckDB: python/duckdb.md
           - LangChain:
             - LangChain 🔗: integrations/langchain.md
+            - LangChain demo: notebooks/langchain_demo.ipynb
             - LangChain JS/TS 🔗: https://js.langchain.com/docs/integrations/vectorstores/lancedb
           - LlamaIndex 🦙:
             - LlamaIndex docs: integrations/llamaIndex.md
-            - LlamaIndex demo:  https://docs.llamaindex.ai/en/stable/examples/vector_stores/LanceDBIndexDemo/
+            - LlamaIndex demo: notebooks/llamaIndex_demo.ipynb
           - Pydantic: python/pydantic.md
           - Voxel51: integrations/voxel51.md
           - PromptTools: integrations/prompttools.md
@@ -204,9 +205,9 @@ nav:
       - Pandas and PyArrow: python/pandas_and_pyarrow.md
       - Polars: python/polars_arrow.md
       - DuckDB: python/duckdb.md
-      - LangChain 🦜️🔗↗: https://python.langchain.com/docs/integrations/vectorstores/lancedb
+      - LangChain 🦜️🔗↗: integrations/langchain.md
       - LangChain.js 🦜️🔗↗: https://js.langchain.com/docs/integrations/vectorstores/lancedb
-      - LlamaIndex 🦙↗: https://gpt-index.readthedocs.io/en/latest/examples/vector_stores/LanceDBIndexDemo.html
+      - LlamaIndex 🦙↗: integrations/llamaIndex.md
       - Pydantic: python/pydantic.md
       - Voxel51: integrations/voxel51.md
       - PromptTools: integrations/prompttools.md

diff --git a/docs/src/integrations/langchain.md b/docs/src/integrations/langchain.md
@@ -2,7 +2,7 @@
 ![Illustration](../assets/langchain.png)
 
 ## Quick Start
-You can load your document data using langchain's loaders, for this example we are using `TextLoader` and `OpenAIEmbeddings` as the embedding model.
+You can load your document data using langchain's loaders, for this example we are using `TextLoader` and `OpenAIEmbeddings` as the embedding model. Checkout Complete example here - [LangChain demo](../notebooks/langchain_example.ipynb)
 ```python
 import os
 from langchain.document_loaders import TextLoader
@@ -38,6 +38,8 @@ The exhaustive list of parameters for `LanceDB` vector store are :
 - `api_key`: (Optional) API key to use for LanceDB cloud database. Defaults to `None`.  
 - `region`: (Optional) Region to use for LanceDB cloud database. Only for LanceDB Cloud, defaults to `None`.  
 - `mode`: (Optional) Mode to use for adding data to the table. Defaults to `'overwrite'`.  
+- `reranker`: (Optional) The reranker to use for LanceDB.
+- `relevance_score_fn`: (Optional[Callable[[float], float]]) Langchain relevance score function to be used. Defaults to `None`. 
 
 ```python
 db_url = "db://lang_test" # url of db you created
@@ -54,12 +56,14 @@ vector_store = LanceDB(
 ```
 
 ### Methods 
-To add texts and store respective embeddings automatically:   
+
 ##### add_texts()
 - `texts`: `Iterable` of strings to add to the vectorstore.
 - `metadatas`: Optional `list[dict()]` of metadatas associated with the texts.
 - `ids`: Optional `list` of ids to associate with the texts. 
+- `kwargs`: `Any`
 
+This method adds texts and stores respective embeddings automatically.
 
 ```python
 vector_store.add_texts(texts = ['test_123'], metadatas =[{'source' :'wiki'}]) 
@@ -74,19 +78,124 @@ pd_df.to_csv("docsearch.csv", index=False)
 # you can also create a new vector store object using an older connection object:
 vector_store = LanceDB(connection=tbl, embedding=embeddings)
 ```
-For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
 ##### create_index() 
 - `col_name`: `Optional[str] = None`
 - `vector_col`: `Optional[str] = None`
 - `num_partitions`: `Optional[int] = 256`
 - `num_sub_vectors`: `Optional[int] = 96`
 - `index_cache_size`: `Optional[int] = None`
 
+This method creates an index for the vector store. For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.
+
 ```python
 # for creating vector index
 vector_store.create_index(vector_col='vector', metric = 'cosine')
 
 # for creating scalar index(for non-vector columns)
 vector_store.create_index(col_name='text')
 
-```
+```
+
+##### similarity_search()
+- `query`: `str`
+- `k`: `Optional[int] = None`
+- `filter`: `Optional[Dict[str, str]] = None`
+- `fts`: `Optional[bool] = False`
+- `name`: `Optional[str] = None`
+- `kwargs`: `Any`
+
+Return documents most similar to the query without relevance scores
+
+```python
+docs = docsearch.similarity_search(query)
+print(docs[0].page_content)
+```
+
+##### similarity_search_by_vector()
+- `embedding`: `List[float]`
+- `k`: `Optional[int] = None`
+- `filter`: `Optional[Dict[str, str]] = None`
+- `name`: `Optional[str] = None`
+- `kwargs`: `Any`
+
+Returns documents most similar to the query vector.
+
+```python
+docs = docsearch.similarity_search_by_vector(query)
+print(docs[0].page_content)
+```
+
+##### similarity_search_with_score()
+- `query`: `str`
+- `k`: `Optional[int] = None`
+- `filter`: `Optional[Dict[str, str]] = None`
+- `kwargs`: `Any`
+
+Returns documents most similar to the query string with relevance scores, gets called by base class's `similarity_search_with_relevance_scores` which selects relevance score based on our `_select_relevance_score_fn`.
+
+```python
+docs = docsearch.similarity_search_with_relevance_scores(query)
+print("relevance score - ", docs[0][1])
+print("text- ", docs[0][0].page_content[:1000])
+```
+
+##### similarity_search_by_vector_with_relevance_scores()
+- `embedding`: `List[float]`
+- `k`: `Optional[int] = None`
+- `filter`: `Optional[Dict[str, str]] = None`
+- `name`: `Optional[str] = None`
+- `kwargs`: `Any`
+
+Return documents most similar to the query vector with relevance scores.
+Relevance score 
+
+```python
+docs = docsearch.similarity_search_by_vector_with_relevance_scores(query_embedding)
+print("relevance score - ", docs[0][1])
+print("text- ", docs[0][0].page_content[:1000])
+```
+
+##### max_marginal_relevance_search()
+- `query`: `str`
+- `k`: `Optional[int] = None`
+- `fetch_k` : Number of Documents to fetch to pass to MMR algorithm, `Optional[int] = None`
+- `lambda_mult`: Number between 0 and 1 that determines the degree
+                        of diversity among the results with 0 corresponding
+                        to maximum diversity and 1 to minimum diversity.
+                        Defaults to 0.5. `float = 0.5`
+- `filter`: `Optional[Dict[str, str]] = None`
+- `kwargs`: `Any`
+
+Returns docs selected using the maximal marginal relevance(MMR).
+Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
+
+Similarly, `max_marginal_relevance_search_by_vector()` function returns docs most similar to the embedding passed to the function using MMR. instead of a string query you need to pass the embedding to be searched for. 
+
+```python
+result = docsearch.max_marginal_relevance_search(
+        query="text"
+    )
+result_texts = [doc.page_content for doc in result]
+print(result_texts)
+
+## search by vector :
+result = docsearch.max_marginal_relevance_search_by_vector(
+        embeddings.embed_query("text")
+    )
+result_texts = [doc.page_content for doc in result]
+print(result_texts)
+```
+
+##### add_images()
+- `uris` : File path to the image. `List[str]`.
+- `metadatas` : Optional list of metadatas. `(Optional[List[dict]], optional)`
+- `ids` : Optional list of IDs. `(Optional[List[str]], optional)`
+
+Adds images by automatically creating their embeddings and adds them to the vectorstore.
+
+```python
+vec_store.add_images(uris=image_uris) 
+# here image_uris are local fs paths to the images.
+```
+
+
diff --git a/docs/src/integrations/llamaIndex.md b/docs/src/integrations/llamaIndex.md
@@ -2,7 +2,8 @@
 ![Illustration](../assets/llama-index.jpg)
 
 ## Quick start
-You would need to install the integration via `pip install llama-index-vector-stores-lancedb` in order to use it. You can run the below script to try it out :
+You would need to install the integration via `pip install llama-index-vector-stores-lancedb` in order to use it. 
+You can run the below script to try it out :
 ```python
 import logging
 import sys
@@ -43,6 +44,8 @@ retriever = index.as_retriever(vector_store_kwargs={"where": lance_filter})
 response = retriever.retrieve("What did the author do growing up?")
 ```
 
+Checkout Complete example here - [LlamaIndex demo](../notebooks/LlamaIndex_example.ipynb)
+
 ### Filtering
 For metadata filtering, you can use a Lance SQL-like string filter as demonstrated in the example above. Additionally, you can also filter using the `MetadataFilters` class from LlamaIndex:
 ```python