[MLC-14] server: step toward Retrieval Augmented Generation (RAG) w/ … #8

stockeh · 2024-02-28T15:02:59Z

…indexing (load, split, store) retriever.

Overview of RAG: https://python.langchain.com/docs/use_cases/question_answering/

Implementation Details

Extends langchain with mlx adaptions: RecursiveCharacterTextSplitter, OpenAIEmbeddings, Chroma
Custom Embeddings from Gemma model, averaged vector chunk tokens to have unit length (ideal for cosine similarity)

Risks

Similarity score: empirical favoring toward max_marginal_relevance_search over similarity_search
Gemma embedding layer isn't optimized for text retrieval. Should we use an alternative text embedding model for this?

…indexing (load, split, store) retriever

stockeh · 2024-02-28T15:04:28Z

Example usage of retriever:

from server.utils import load

from server.retriever.loader import directory_loader
from server.retriever.splitter import RecursiveCharacterTextSplitter
from server.retriever.vectorstore import Chroma, Embeddings


def main():
    model, tokenizer = load('mlx-community/quantized-gemma-7b-it')
    raw_docs = directory_loader(
        '/Users/stock/Library/Mobile Documents/iCloud~md~obsidian/Documents/main')
    print(len(raw_docs), len(raw_docs[0].page_content))
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1024, chunk_overlap=32, add_start_index=True
    )
    splits = text_splitter.split_documents(raw_docs)
    print(len(splits), len(splits[0].page_content), splits[0].metadata)
    db = Chroma.from_documents(
        documents=splits, embedding=Embeddings(model.model, tokenizer))
    print('-------------------')
    query = "What is a cascade neural network?"
    # docs = db.similarity_search(query)
    docs = db.max_marginal_relevance_search(query)
    print('>', query)
    for doc in docs:
        print(doc.page_content, doc.metadata, sep='\n')
        print('-------------------')


if __name__ == '__main__':
    main()

.vscode/settings.json

[MLC-14] server: step toward Retrieval Augmented Generation (RAG) w/ …

b076038

…indexing (load, split, store) retriever

stockeh requested a review from ParkerSm1th February 28, 2024 15:02

stockeh self-assigned this Feb 28, 2024

ParkerSm1th reviewed Feb 28, 2024

View reviewed changes

.vscode/settings.json Outdated Show resolved Hide resolved

[MLC-14] repo: Add custom python setting.json for tabSize & formatter

75a7014

ParkerSm1th merged commit 47a0c2d into main Feb 28, 2024
1 check passed

stockeh deleted the MLC-14 branch March 5, 2024 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLC-14] server: step toward Retrieval Augmented Generation (RAG) w/ … #8

[MLC-14] server: step toward Retrieval Augmented Generation (RAG) w/ … #8

stockeh commented Feb 28, 2024

stockeh commented Feb 28, 2024

[MLC-14] server: step toward Retrieval Augmented Generation (RAG) w/ … #8

[MLC-14] server: step toward Retrieval Augmented Generation (RAG) w/ … #8

Conversation

stockeh commented Feb 28, 2024

Implementation Details

Risks

stockeh commented Feb 28, 2024