diff --git a/docs/core_docs/docs/integrations/retrievers/azion-edgesql.ipynb b/docs/core_docs/docs/integrations/retrievers/azion-edgesql.ipynb new file mode 100644 index 000000000000..43a50b87b450 --- /dev/null +++ b/docs/core_docs/docs/integrations/retrievers/azion-edgesql.ipynb @@ -0,0 +1,280 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "afaf8039", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: __sidebar_label__\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "e49f1e0d", + "metadata": {}, + "source": [ + "# AzionRetriever\n", + "\n", + "## Overview\n", + "\n", + "This will help you getting started with the [AzionRetriever](/docs/concepts/#retrievers). For detailed documentation of all AzionRetriever features and configurations head to the [API reference](https://v03.api.js.langchain.com/classes/_langchain_community.retrievers_azion_edgesql.AzionRetriever.html).\n", + "\n", + "### Integration details\n", + "\n", + "\n", + "| Retriever | Self-host | Cloud offering | Package | [Py support](__python_doc_url__) |\n", + "| :--- | :--- | :---: | :---: | :---: |\n", + "[AzionRetriever](https://v03.api.js.langchain.com/classes/_langchain_community.retrievers_azion_edgesql.AzionRetriever.html) | ☒ | ☒ | @langchain/community | ☒ |\n", + "\n", + "\n", + "## Setup\n", + "\n", + "To use the AzionRetriever, you need to set the AZION_TOKEN environment variable.\n", + "\n", + "```typescript\n", + "process.env.AZION_TOKEN = \"your-api-key\"\n", + "```\n", + "\n", + "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n", + "\n", + "```typescript\n", + "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n", + "```\n", + "If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```typescript\n", + "// process.env.LANGSMITH_API_KEY = \"\";\n", + "// process.env.LANGSMITH_TRACING = \"true\";\n", + "```\n", + "\n", + "### Installation\n", + "\n", + "This retriever lives in the `@langchain/community/retrievers/azion_edgesql` package:\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " __package_name__ @langchain/core\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "a38cde65-254d-4219-a441-068766c0d4b5", + "metadata": {}, + "source": [ + "## Instantiation\n", + "\n", + "Now we can instantiate our retriever:" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "70cc8e65-2a02-408a-bbc6-8ef649057d82", + "metadata": {}, + "outputs": [], + "source": [ + "// import { AzionRetriever } from \"@langchain/community/retrievers/azion_edgesql\";\n", + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "import { ChatOpenAI } from \"@langchain/openai\";\n", + "import { AzionRetriever } from \"./src/function/AzionRetriever\";\n", + "\n", + "const embeddingModel = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-small\"\n", + "})\n", + "\n", + "const chatModel = new ChatOpenAI({\n", + " model: \"gpt-4o-mini\",\n", + " apiKey: process.env.OPENAI_API_KEY\n", + "})\n", + "\n", + "const retriever = new AzionRetriever(embeddingModel, chatModel, \n", + " {dbName:\"langchain\",\n", + " vectorTable:\"documents\", // table where the vector embeddings are stored\n", + " ftsTable:\"documents_fts\", // table where the fts index is stored\n", + " searchType:\"hybrid\", // search type to use for the retriever\n", + " ftsK:2, // number of results to return from the fts index\n", + " similarityK:2, // number of results to return from the vector index\n", + " metadataItems:[\"language\",\"topic\"],\n", + " filters: [{ operator: \"=\", column: \"language\", value: \"en\" }]\n", + "\n", + "}) // number of results to return from the vector index" + ] + }, + { + "cell_type": "markdown", + "id": "5c5f2839-4020-424e-9fc9-07777eede442", + "metadata": {}, + "source": [ + "## Usage" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "51a60dbe-9f2e-4e04-bb62-23968f17164a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " Document {\n", + " pageContent: 'Australia s indigenous people have inhabited the continent for over 65,000 years',\n", + " metadata: { language: 'en', topic: 'history', searchtype: 'similarity' },\n", + " id: '3'\n", + " },\n", + " Document {\n", + " pageContent: 'Australia is a leader in solar energy adoption and renewable technology',\n", + " metadata: { language: 'en', topic: 'technology', searchtype: 'similarity' },\n", + " id: '5'\n", + " },\n", + " Document {\n", + " pageContent: 'Australia s tech sector is rapidly growing with innovation hubs in major cities',\n", + " metadata: { language: 'en', topic: 'technology', searchtype: 'fts' },\n", + " id: '7'\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "const query = \"Australia\"\n", + "\n", + "await retriever.invoke(query);" + ] + }, + { + "cell_type": "markdown", + "id": "dfe8aad4-8626-4330-98a9-7ea1ca5d2e0e", + "metadata": {}, + "source": [ + "## Use within a chain\n", + "\n", + "Like other retrievers, AzionRetriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n", + "\n", + "We will need a LLM or chat model:\n", + "\n", + "```{=mdx}\n", + "import ChatModelTabs from \"@theme/ChatModelTabs\";\n", + "\n", + "\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "25b647a3-f8f2-4541-a289-7a241e43f9df", + "metadata": {}, + "outputs": [], + "source": [ + "// @lc-docs-hide-cell\n", + "\n", + "import { ChatOpenAI } from \"@langchain/openai\";\n", + "\n", + "const llm = new ChatOpenAI({\n", + " model: \"gpt-4o-mini\",\n", + " temperature: 0,\n", + "});" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "23e11cc9-abd6-4855-a7eb-799f45ca01ae", + "metadata": {}, + "outputs": [], + "source": [ + "import { ChatPromptTemplate } from \"@langchain/core/prompts\";\n", + "import { RunnablePassthrough, RunnableSequence } from \"@langchain/core/runnables\";\n", + "import { StringOutputParser } from \"@langchain/core/output_parsers\";\n", + "\n", + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const prompt = ChatPromptTemplate.fromTemplate(`\n", + "Answer the question based only on the context provided.\n", + "\n", + "Context: {context}\n", + "\n", + "Question: {question}`);\n", + "\n", + "const formatDocs = (docs: Document[]) => {\n", + " return docs.map((doc) => doc.pageContent).join(\"\\n\\n\");\n", + "}\n", + "\n", + "// See https://js.langchain.com/docs/tutorials/rag\n", + "const ragChain = RunnableSequence.from([\n", + " {\n", + " context: retriever.pipe(formatDocs),\n", + " question: new RunnablePassthrough(),\n", + " },\n", + " prompt,\n", + " llm,\n", + " new StringOutputParser(),\n", + "]);" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "d47c37dd-5c11-416c-a3b6-bec413cd70e8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The context mentions that the 2024 Olympics are in Paris.\n" + ] + } + ], + "source": [ + "await ragChain.invoke(\"Paris\")" + ] + }, + { + "cell_type": "markdown", + "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all AzionRetriever features and configurations head to the [API reference](https://v03.api.js.langchain.com/classes/_langchain_community.retrievers_azion_edgesql.AzionRetriever.html).\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/core_docs/docs/integrations/vectorstores/azion-edgesql.ipynb b/docs/core_docs/docs/integrations/vectorstores/azion-edgesql.ipynb new file mode 100644 index 000000000000..f20231db7014 --- /dev/null +++ b/docs/core_docs/docs/integrations/vectorstores/azion-edgesql.ipynb @@ -0,0 +1,357 @@ +{ + "cells": [ + { + "cell_type": "raw", + "id": "1957f5cb", + "metadata": { + "vscode": { + "languageId": "raw" + } + }, + "source": [ + "---\n", + "sidebar_label: __sidebar_label__\n", + "---" + ] + }, + { + "cell_type": "markdown", + "id": "ef1f0986", + "metadata": {}, + "source": [ + "# AzionVectorStore\n", + "\n", + "The `AzionVectorStore` is used to manage and search through a collection of documents using vector embeddings, directly on Azion's Edge Plataform using Edge SQL. \n", + "\n", + "This guide provides a quick overview for getting started with __sidebar_label__ [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `AzionVectorStore` features and configurations head to the [API reference](https://v03.api.js.langchain.com/classes/_langchain_community.vectorstores_azion_edgesql.AzionVectorStore.html)." + ] + }, + { + "cell_type": "markdown", + "id": "c824838d", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "### Integration details\n", + "\n", + "| Class | Package | [PY support](__python_doc_url__) | Package latest |\n", + "| :--- | :--- | :---: | :---: |\n", + "| [`AzionVectorStore`](__api_ref_module__) | [`@langchain/community`](https://npmjs.com/__package_name__) | ☒| ![NPM - Version](https://img.shields.io/npm/v/__package_name__?style=flat-square&label=%20&) |" + ] + }, + { + "cell_type": "markdown", + "id": "36fdc060", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "To use the `AzionVectorStore` vector store, you will need to install the `@langchain/community` package. Besides that, you will need an [Azion account](https://www.azion.com/en/documentation/products/accounts/creating-account/) and a [Token](https://www.azion.com/en/documentation/products/guides/personal-tokens/) to use the Azion API, configuring it as environment variable `AZION_TOKEN`. Further information about this can be found in the [Documentation](https://www.azion.com/en/documentation/).\n", + "\n", + "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n", + "\n", + "```{=mdx}\n", + "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n", + "import Npm2Yarn from \"@theme/Npm2Yarn\";\n", + "\n", + "\n", + "\n", + "\n", + " __package_name__ @langchain/openai @langchain/core\n", + "\n", + "```\n", + "\n", + "### Credentials\n", + "\n", + "Once you've done this set the AZION_TOKEN environment variable:\n", + "\n", + "```typescript\n", + "process.env.AZION_TOKEN = \"your-api-key\"\n", + "```\n", + "\n", + "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n", + "\n", + "```typescript\n", + "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n", + "```\n", + "\n", + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n", + "\n", + "```typescript\n", + "// process.env.LANGCHAIN_TRACING_V2=\"true\"\n", + "// process.env.LANGCHAIN_API_KEY=\"your-api-key\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "93df377e", + "metadata": {}, + "source": [ + "## Instantiation" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "// import { AzionVectorStore } from \"@langchain/community/vectorstores/azion_edgesql\";\n", + "import { OpenAIEmbeddings } from \"@langchain/openai\";\n", + "import {AzionVectorStore} from \"./src/function/AzionVectorStore\"\n", + "const embeddings = new OpenAIEmbeddings({\n", + " model: \"text-embedding-3-small\",\n", + "});\n", + "\n", + "// Instantiate with the constructor if the database and table have already been created\n", + "const vectorStore = new AzionVectorStore(embeddings, { dbName: \"langchain\", tableName: \"documents\" });\n", + "\n", + "// If you have not created the database and table yet, you can do so with the setupDatabase method\n", + "// await vectorStore.setupDatabase({ columns:[\"topic\",\"language\"], mode: \"hybrid\" })\n", + "\n", + "// OR instantiate with the static method if the database and table have not been created yet\n", + "// const vectorStore = await AzionVectorStore.initialize(embeddingModel, { dbName: \"langchain\", tableName: \"documents\" }, { columns:[], mode: \"hybrid\" })" + ] + }, + { + "cell_type": "markdown", + "id": "ac6071d4", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "### Add items to vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "17f5efc0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Inserting chunks\n", + "Inserting chunk 0\n", + "Chunks inserted!\n" + ] + } + ], + "source": [ + "import type { Document } from \"@langchain/core/documents\";\n", + "\n", + "const document1: Document = {\n", + " pageContent: \"The powerhouse of the cell is the mitochondria\",\n", + " metadata: { language: \"en\", topic: \"biology\" }\n", + "};\n", + "\n", + "const document2: Document = {\n", + " pageContent: \"Buildings are made out of brick\",\n", + " metadata: { language: \"en\", topic: \"history\" }\n", + "};\n", + "\n", + "const document3: Document = {\n", + " pageContent: \"Mitochondria are made out of lipids\",\n", + " metadata: { language: \"en\", topic: \"biology\" }\n", + "};\n", + "\n", + "const document4: Document = {\n", + " pageContent: \"The 2024 Olympics are in Paris\",\n", + " metadata: { language: \"en\", topic: \"history\" }\n", + "}\n", + "\n", + "const documents = [document1, document2, document3, document4];\n", + "\n", + "await vectorStore.addDocuments(documents);" + ] + }, + { + "cell_type": "markdown", + "id": "dcf1b905", + "metadata": {}, + "source": [ + "### Delete items from vector store" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "ef61e188", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Deleted 1 items from documents\n" + ] + } + ], + "source": [ + "await vectorStore.delete([\"4\"]);" + ] + }, + { + "cell_type": "markdown", + "id": "c3620501", + "metadata": {}, + "source": [ + "## Query vector store\n", + "\n", + "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent. \n", + "\n", + "### Query directly\n", + "\n", + "Performing a simple similarity search can be done as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "id": "aa0a16fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Hybrid Search Results\n", + "[{\"pageContent\":\"The Australian dingo is a unique species that plays a key role in the ecosystem\",\"metadata\":{\"searchtype\":\"fulltextsearch\"},\"id\":\"6\"},-0.25748711028997995]\n", + "[{\"pageContent\":\"The powerhouse of the cell is the mitochondria\",\"metadata\":{\"searchtype\":\"fulltextsearch\"},\"id\":\"16\"},-0.31697985337654005]\n", + "[{\"pageContent\":\"Australia s indigenous people have inhabited the continent for over 65,000 years\",\"metadata\":{\"searchtype\":\"similarity\"},\"id\":\"3\"},0.14822345972061157]\n" + ] + } + ], + "source": [ + "const filter = [{ operator: \"=\", column: \"language\", value: \"en\" }]\n", + "\n", + "const hybridSearchResults = await vectorStore.azionHybridSearch(\"biology\", {kfts:2, kvector:1, \n", + " filter:[{ operator: \"=\", column: \"language\", value: \"en\" }]});\n", + "\n", + "console.log(\"Hybrid Search Results\")\n", + "for (const doc of hybridSearchResults) {\n", + " console.log(`${JSON.stringify(doc)}`);\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "5efd2eaa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Similarity Search Results\n", + "[{\"pageContent\":\"Australia s indigenous people have inhabited the continent for over 65,000 years\",\"metadata\":{\"searchtype\":\"similarity\"},\"id\":\"3\"},0.4486490488052368]\n" + ] + } + ], + "source": [ + "const similaritySearchResults = await vectorStore.azionSimilaritySearch(\"australia\", {kvector:3, filter:[{ operator: \"=\", column: \"topic\", value: \"history\" }]});\n", + "\n", + "console.log(\"Similarity Search Results\")\n", + "for (const doc of similaritySearchResults) {\n", + " console.log(`${JSON.stringify(doc)}`);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "0c235cdc", + "metadata": {}, + "source": [ + "### Query by turning into retriever\n", + "\n", + "You can also transform the vector store into a [retriever](/docs/concepts/#retrievers) for easier usage in your chains. " + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "f3460093", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[\n", + " Document {\n", + " pageContent: 'Australia s indigenous people have inhabited the continent for over 65,000 years',\n", + " metadata: { searchtype: 'similarity' },\n", + " id: '3'\n", + " },\n", + " Document {\n", + " pageContent: 'Mitochondria are made out of lipids',\n", + " metadata: { searchtype: 'similarity' },\n", + " id: '18'\n", + " }\n", + "]\n" + ] + } + ], + "source": [ + "const retriever = vectorStore.asRetriever({\n", + " // Optional filter\n", + " filter: filter,\n", + " k: 2,\n", + "});\n", + "await retriever.invoke(\"biology\");" + ] + }, + { + "cell_type": "markdown", + "id": "e2e0a211", + "metadata": {}, + "source": [ + "### Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n", + "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](/docs/concepts/retrieval)" + ] + }, + { + "cell_type": "markdown", + "id": "8a27244f", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all AzionVectorStore features and configurations head to the [API reference](https://v03.api.js.langchain.com/classes/_langchain_community.vectorstores_azion_edgesql.AzionVectorStore.html)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "TypeScript", + "language": "typescript", + "name": "tslab" + }, + "language_info": { + "codemirror_mode": { + "mode": "typescript", + "name": "javascript", + "typescript": true + }, + "file_extension": ".ts", + "mimetype": "text/typescript", + "name": "typescript", + "version": "3.7.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/langchain-community/langchain.config.js b/libs/langchain-community/langchain.config.js index b0207b8612ab..378d15962a90 100644 --- a/libs/langchain-community/langchain.config.js +++ b/libs/langchain-community/langchain.config.js @@ -117,6 +117,7 @@ export const config = { // vectorstores "vectorstores/analyticdb": "vectorstores/analyticdb", "vectorstores/astradb": "vectorstores/astradb", + "vectorstores/azion_edgesql": "vectorstores/azion_edgesql", "vectorstores/azure_aisearch": "vectorstores/azure_aisearch", "vectorstores/azure_cosmosdb": "vectorstores/azure_cosmosdb", "vectorstores/cassandra": "vectorstores/cassandra", @@ -196,6 +197,7 @@ export const config = { // retrievers "retrievers/amazon_kendra": "retrievers/amazon_kendra", "retrievers/amazon_knowledge_base": "retrievers/amazon_knowledge_base", + "retrievers/azion_edgesql": "retrievers/azion_edgesql", "retrievers/bm25": "retrievers/bm25", "retrievers/chaindesk": "retrievers/chaindesk", "retrievers/databerry": "retrievers/databerry", @@ -377,6 +379,7 @@ export const config = { "llms/layerup_security", "vectorstores/analyticdb", "vectorstores/astradb", + "vectorstores/azion_edgesql", "vectorstores/azure_aisearch", "vectorstores/azure_cosmosdb", "vectorstores/cassandra", @@ -434,6 +437,7 @@ export const config = { "chat_models/zhipuai", "retrievers/amazon_kendra", "retrievers/amazon_knowledge_base", + "retrievers/azion_edgesql", "retrievers/dria", "retrievers/metal", "retrievers/supabase", diff --git a/libs/langchain-community/package.json b/libs/langchain-community/package.json index a6870ab61aab..ca0e55934101 100644 --- a/libs/langchain-community/package.json +++ b/libs/langchain-community/package.json @@ -138,6 +138,7 @@ "@zilliz/milvus2-sdk-node": ">=2.3.5", "apify-client": "^2.7.1", "assemblyai": "^4.6.0", + "azion": "^1.11.1", "better-sqlite3": "9.5.0", "cassandra-driver": "^4.7.2", "cborg": "^4.1.1", @@ -285,6 +286,7 @@ "@zilliz/milvus2-sdk-node": ">=2.3.5", "apify-client": "^2.7.1", "assemblyai": "^4.6.0", + "azion": "^1.11.1", "better-sqlite3": ">=9.4.0 <12.0.0", "cassandra-driver": "^4.7.2", "cborg": "^4.1.1", @@ -533,6 +535,9 @@ "assemblyai": { "optional": true }, + "azion": { + "optional": true + }, "better-sqlite3": { "optional": true }, diff --git a/libs/langchain-community/src/retrievers/azion_edgesql.ts b/libs/langchain-community/src/retrievers/azion_edgesql.ts new file mode 100644 index 000000000000..de0211214a8d --- /dev/null +++ b/libs/langchain-community/src/retrievers/azion_edgesql.ts @@ -0,0 +1,510 @@ +import { QueryResult, useQuery } from 'azion/sql'; +import type { EmbeddingsInterface } from '@langchain/core/embeddings'; +import { Document } from '@langchain/core/documents'; +import { BaseRetriever, BaseRetrieverInput } from '@langchain/core/retrievers'; +import { ChatOpenAI } from '@langchain/openai'; +import { SystemMessage, HumanMessage } from '@langchain/core/messages'; + +export type AzionMetadata = Record; + +/** + * Represents a filter condition for querying the Azion database + * @property operator - The comparison operator to use (e.g. =, !=, >, <, etc) + * @property column - The database column to filter on + * @property value - The value to compare against + */ +export type AzionFilter = {operator: Operator, column: Column, value: string}; + +/** + * Represents a database column name + */ +export type Column = string; + +/** + * Valid SQL operators that can be used in filter conditions + */ +export type Operator = + | '=' | '!=' | '>' | '<>' | '<' // Basic comparison operators + | '>=' | '<=' // Range operators + | 'LIKE' | 'NOT LIKE' // Pattern matching + | 'IN' | 'NOT IN' // Set membership + | 'IS NULL' | 'IS NOT NULL'; // NULL checks + +/** + * Interface for the response returned when searching embeddings. + */ +interface SearchEmbeddingsResponse { + id: number; + content: string; + metadata: { + searchtype: string; + [key: string]: any; + }; +} + +/** + * Interface for the arguments required to initialize an Azion library. + */ +export interface AzionRetrieverArgs extends BaseRetrieverInput { + /** + * Search type to perform. Cosine similarity and hybrid (vector + FTS) are currently supported. + */ + searchType?: 'hybrid' | 'similarity'; + + /** + * The number of documents retrieved with cosine similarity (vector) search. Minimum is 1. + */ + similarityK?: number; + + /** + * The number of documents retrieved with full text search. Minimum is 1. + */ + ftsK?: number; + + /** + * The name of the database to search for documents. + */ + dbName?: string; + + /** + * The prompt to the chatmodel to extract entities to perform Full text search on the database + */ + promptEntityExtractor?: string; + + /** + * Max items to maintain per searchtype. Default is 3. + */ + maxItemsSearch?: number; + + /** + * The columns from the tables that metadata must contain + */ + metadataItems?: string[]; + + /** + * Name of the table to perform vector similarity seach. Default is 'documents' + */ + vectorTable?: string + + /** + * Name of the table to perform full text search. Default is 'document_fts' + */ + ftsTable?: string + + /** + * Filters to apply to the search. Default is an empty array. + */ + filters?: AzionFilter[]; + + /** Whether the metadata is contained in a single column or multiple columns */ + expandedMetadata?: boolean; +} + +/** + * class for performing hybrid search operations on Azion's Edge SQL database. + * It extends the 'BaseRetriever' class and implements methods for + * similarity search and full-text search (FTS). + */ +/** + * Example usage: + * ```ts + * // Initialize embeddings and chat model + * const embeddings = new OpenAIEmbeddings(); + * const chatModel = new ChatOpenAI(); + * + * // Create retriever with hybrid search + * const retriever = new AzionRetriever(embeddings, chatModel, { + * searchType: 'hybrid', + * similarityK: 3, + * ftsK: 2, + * dbName: 'my_docs', + * metadataItems: ['category', 'author'], + * vectorTable: 'documents', + * ftsTable: 'documents_fts', + * filters: [ + * { operator: '=', column: 'status', value: 'published' } + * ] + * }); + * + * // Retrieve relevant documents + * const docs = await retriever._getRelevantDocuments( + * "What are coral reefs in Australia?" + * ); + * + * // Create retriever with similarity search only + * const simRetriever = new AzionRetriever(embeddings, chatModel, { + * searchType: 'similarity', + * similarityK: 5, + * dbName: 'my_docs', + * vectorTable: 'documents' + * }); + * + * // Customize entity extraction prompt + * const customRetriever = new AzionRetriever(embeddings, chatModel, { + * searchType: 'hybrid', + * similarityK: 3, + * ftsK: 2, + * dbName: 'my_docs', + * promptEntityExtractor: "Extract key entities from: {{query}}" + * }); + * ``` + */ + +export class AzionRetriever extends BaseRetriever { + static lc_name() { + return 'azionRetriever'; + } + + /** Namespace for the retriever in LangChain */ + lc_namespace = ['langchain', 'retrievers', 'azion']; + + /** Type of search to perform - either hybrid (combining vector + FTS) or similarity only */ + searchType?: 'hybrid' | 'similarity'; + + /** Number of results to return from similarity search. Minimum is 1. */ + similarityK: number; + + /** Number of results to return from full text search. Minimum is 1. */ + ftsK: number; + + /** Interface for generating embeddings from text */ + embeddings: EmbeddingsInterface; + + /** Name of the database to search */ + dbName: string; + + /** ChatOpenAI model used to extract entities from queries */ + entityExtractor: ChatOpenAI; + + /** Prompt template for entity extraction */ + promptEntityExtractor: string; + + /** Optional metadata columns to include in results */ + metadataItems?: string[]; + + /** Name of table containing vector embeddings for similarity search */ + vectorTable: string; + + /** Name of table containing documents for full text search */ + ftsTable: string; + + /** Array of filters to apply to search results */ + filters: AzionFilter[]; + + /** Whether the metadata is contained in a single column or multiple columns */ + expandedMetadata: boolean + + constructor( + embeddings: EmbeddingsInterface, + entityExtractor: ChatOpenAI, + args: AzionRetrieverArgs + ) { + super(args) + + this.ftsTable = this.sanitizeItem(args.ftsTable) || "document_fts" + this.vectorTable = this.sanitizeItem(args.vectorTable) || "documents" + this.similarityK = Math.max(1, args.similarityK || 1); + this.ftsK = Math.max(1, args.ftsK || 1); + this.dbName = args.dbName || "azioncopilotprod" + + this.embeddings = embeddings; + this.searchType = args.searchType || "similarity" + + this.entityExtractor = entityExtractor + this.metadataItems = args.metadataItems || undefined + this.promptEntityExtractor = args.promptEntityExtractor || "Provide them as a space-separated string in lowercase, translated to English." + this.filters = args.filters || [] + this.expandedMetadata = args.expandedMetadata || false + } + + /** + * Generates a string of filters for the SQL query. + * @param {AzionFilter[]} filters - The filters to apply to the search. + * @returns {string} A string of filters for the SQL query. + */ + protected generateFilters( + filters: AzionFilter[] + ): string { + if (!filters || filters?.length === 0) { + return ''; + } + + return filters.map(({operator, column, value}) => { + const columnRef = this.expandedMetadata ? this.sanitizeItem(column) : `metadata->>'$.${this.sanitizeItem(column)}'`; + if (['IN', 'NOT IN'].includes(operator.toUpperCase())) { + return `${columnRef} ${operator} (${this.sanitizeItem(value)})`; + } + return `${columnRef} ${operator} '${this.sanitizeItem(value)}'`; + }).join(' AND ') + ' AND '; + } + + /** + * Generates SQL queries for full-text search and similarity search. + * @param {number[]} embeddedQuery - The embedded query vector. + * @param {string} queryEntities - The entities extracted from the query for full-text search. + * @param {string} metadata - Additional metadata columns to be included in the results. + * @returns An object containing the FTS query and similarity query strings. + */ + protected generateSqlQueries( + embeddedQuery: number[], + queryEntities: string, + metadata: string + ): { ftsQuery: string, similarityQuery: string } { + const filters = this.generateFilters(this.filters) + + let rowsNumber = this.similarityK + if (this.searchType === "hybrid") { + rowsNumber+=this.ftsK + } + + const ftsQuery = ` + SELECT id, content, ${metadata.replace('hybrid', 'fts')} + FROM ${this.ftsTable} + WHERE ${filters} ${this.ftsTable} MATCH '${queryEntities}' + ORDER BY rank + LIMIT ${rowsNumber} + `; + + const similarityQuery = ` + SELECT id, content, ${metadata.replace('hybrid', 'similarity')} + FROM ${this.vectorTable} + WHERE ${filters} rowid IN vector_top_k('${this.vectorTable}_idx', vector('[${embeddedQuery}]'), ${rowsNumber}) + `; + + return { ftsQuery, similarityQuery }; + } + + /** + * Generates the SQL statements for the similarity search and full-text search. + * @param query The user query. + * @returns An array of SQL statements. + */ + protected async generateStatements( + query: string + ): Promise { + const embeddedQuery = await this.embeddings.embedQuery(query) + + const metadata = this.generateMetadata() + + let queryEntities = '' + if (this.searchType === 'hybrid') { + queryEntities = await this.extractEntities(query) + } + + const { ftsQuery, similarityQuery } = this.generateSqlQueries(embeddedQuery, queryEntities, metadata); + + if (this.searchType === "similarity") { + return [similarityQuery] + } + + return [similarityQuery, ftsQuery] + } + + /** + * Generates the metadata string for the SQL query. + * @returns {string} The metadata string. + */ + protected generateMetadata(): string { + if (!this.metadataItems) { + return `json_object('searchtype', '${this.searchType}') as metadata` + } + + if (this.expandedMetadata) { + return `json_object('searchtype','${this.searchType}',${this.metadataItems.map(item => `'${this.sanitizeItem(item)}', ${this.sanitizeItem(item)}`).join(', ')}) as metadata` + } + + return `json_patch(json_object(${this.metadataItems?.map(item => `'${this.sanitizeItem(item)}', metadata->>'$.${this.sanitizeItem(item)}'`).join(', ')}), '{"searchtype":"${this.searchType}"}') as metadata` + } + + /** + * Performs a similarity search on the vector store and returns the top 'similarityK' similar documents. + * @param query The query string. + * @returns A promise that resolves with the similarity search results when the search is complete. + */ + protected async similaritySearchWithScore( + query: string + ): Promise<[Document][]> { + + const statements = await this.generateStatements(query) + + const { data: response, error: errorQuery } = await useQuery(this.dbName,statements); + + if (!response) { + console.error('RESPONSE ERROR: ', errorQuery); + throw this.searchError(errorQuery) + } + const searches = this.mapRows(response.results) + const result = this.mapSearches(searches) + return result + } + + /** + * Extracts entities from a user query using the entityExtractor model. + * @param query The user query + * @returns A promise that resolves with the extracted entities when the extraction is complete. + */ + protected async extractEntities(query: string): Promise { + const entityExtractionPrompt = new SystemMessage( + this.promptEntityExtractor + ); + const entityQuery = await this.entityExtractor.invoke([ + entityExtractionPrompt, + new HumanMessage(query), + ]); + return entityQuery.content.toString().replace(/[^a-zA-Z0-9\s]/g, ' ').split(' ').join(' OR ') + } + + /** + * Performs a hybrid search on the vector store, using cosine similarity and FTS search, and + * returns the top 'similarityK' + 'ftsK' similar documents. + * @param query The user query + * @returns A promise that resolves with the hybrid search results when the search is complete. + */ + protected async hybridSearchAzion( + query: string + ): Promise<[Document][]> { + + const statements = await this.generateStatements(query) + + const { data: response, error: errorQuery } = await useQuery(this.dbName,statements) + + if (!response) { + console.error('RESPONSE ERROR: ', errorQuery); + throw this.searchError(errorQuery) + } + + const results = this.mapRows(response.results) + + const finalResults = this.removeDuplicates(results) + + return this.mapSearches(finalResults) + } + + /** + * Generates an error document based on the provided error information + * @param error The error object containing details about the issue + * @returns A promise that resolves to an array containing a single Document representing the error + */ + protected searchError( + error: { + message: string; + operation: string;} | undefined + ): Error { + throw new Error(error?.message, { cause: error?.operation }) + } + + /** + * Performs the selected search and returns the documents retrieved. + * @param query The user query + * @returns A promise that resolves with the completion of the search results. + */ + async _getRelevantDocuments( + query: string + ): Promise { + let result: [Document][]; + + if (this.searchType === 'similarity') { + result = await this.similaritySearchWithScore(query); + } else { + result = await this.hybridSearchAzion(query); + } + + return result.map(([doc]) => doc); + + } + + /** + * Removes duplicate results from the search results, prioritizing a mix of similarity and FTS results. + * @param {SearchEmbeddingsResponse[]} results - The array of search results to process. + * @returns {SearchEmbeddingsResponse[]} An array of unique search results, with a maximum of 3 similarity and 3 FTS results. + */ + private removeDuplicates( + results: SearchEmbeddingsResponse[] + ): SearchEmbeddingsResponse[] { + const uniqueResults: SearchEmbeddingsResponse[] = []; + const seenIds = new Set(); + + let similarityCount = 0 + let ftsCount = 0 + const maxItems = this.ftsK + this.similarityK + + for (const result of results) { + if (!seenIds.has(result.id)) { + if (result.metadata.searchtype === 'similarity' && similarityCount < this.similarityK) { + seenIds.add(result.id) + uniqueResults.push(result) + similarityCount++ + } else if (result.metadata.searchtype === 'fts' && ftsCount < this.ftsK) { + seenIds.add(result.id) + uniqueResults.push(result) + ftsCount++ + } + } + if (similarityCount + ftsCount === maxItems) break + } + return uniqueResults; + } + +/** + * Converts query results to SearchEmbeddingsResponse objects. + * @param {QueryResult[]} results - The raw query results from the database. + * @returns {SearchEmbeddingsResponse[]} An array of SearchEmbeddingsResponse objects. + */ +private mapRows( + results: QueryResult[] | undefined +): SearchEmbeddingsResponse[] { + + if (!results) { + return [] + } + + return results.flatMap(( + queryResult: QueryResult + ): SearchEmbeddingsResponse[] => { + + if (!queryResult.rows || !queryResult.columns) { + return [] + } + + return queryResult.rows.map( + (row): SearchEmbeddingsResponse => ({ + id: Number(row[0]), + content: String(row[1]), + metadata: JSON.parse(String(row[2])) + }) + ); + } + ); +} + + /** + * Maps search results to Document objects. + * @param {SearchEmbeddingsResponse[]} searches An array of SearchEmbeddingsResponse objects. + * @returns An array of tuples, each containing a single Document object. + */ + protected mapSearches( + searches: SearchEmbeddingsResponse[] + ): [Document][] { + return searches.map((resp: SearchEmbeddingsResponse) => [ + new Document({ + metadata: resp.metadata, + pageContent: resp.content, + id: resp.id.toString(), + }) + ]); + } + + /** + * Sanitizes an item by removing non-alphanumeric characters. + * @param {string} item The item to sanitize. + * @returns {string} The sanitized item. + */ + private sanitizeItem( + item: string | undefined + ): string { + if (item){ + return item.replace(/[^a-zA-Z0-9\s]/g, '') + } + return '' + } +} \ No newline at end of file diff --git a/libs/langchain-community/src/retrievers/tests/azion_edgesql.int.test.ts b/libs/langchain-community/src/retrievers/tests/azion_edgesql.int.test.ts new file mode 100644 index 000000000000..9ad5ab6f63a0 --- /dev/null +++ b/libs/langchain-community/src/retrievers/tests/azion_edgesql.int.test.ts @@ -0,0 +1,44 @@ +/* eslint-disable no-process-env */ +/* eslint-disable @typescript-eslint/no-non-null-assertion */ +import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai"; +import { AzionRetriever } from "@langchain/community/retrievers/azion_edgesql"; +import { jest, test, expect } from "@jest/globals"; + +// Increase timeout to 30 seconds +jest.setTimeout(30000); + +test("Azion search", async () => { + + const embeddings = new OpenAIEmbeddings(); + const entityExtractor = new ChatOpenAI({ + modelName: "gpt-4o-mini", + temperature: 0, + }); + const retrieverHybrid = new AzionRetriever(embeddings, entityExtractor, { + searchType: "hybrid", + similarityK: 2, + ftsK: 2, + dbName: 'langchain', + vectorTable:'documents', + ftsTable:'documents_fts' + }); + + expect(retrieverHybrid).toBeDefined(); + + const results1 = await retrieverHybrid._getRelevantDocuments("hello"); + + expect(results1.length).toBeGreaterThan(0); + + const retrieverSimilarity = new AzionRetriever(embeddings, entityExtractor, { + searchType: "similarity", + similarityK: 2, + ftsK: 2, + dbName: 'langchain', + vectorTable:'documents', + ftsTable:'documents_fts' + }); + + const results2 = await retrieverSimilarity._getRelevantDocuments("hello"); + + expect(results2.length).toBeGreaterThan(0); +}); \ No newline at end of file diff --git a/libs/langchain-community/src/vectorstores/azion_edgesql.ts b/libs/langchain-community/src/vectorstores/azion_edgesql.ts new file mode 100644 index 000000000000..785504cd9896 --- /dev/null +++ b/libs/langchain-community/src/vectorstores/azion_edgesql.ts @@ -0,0 +1,924 @@ +import { VectorStore } from '@langchain/core/vectorstores'; +import { useQuery, useExecute, getDatabases, createDatabase, getTables, type AzionDatabaseResponse, QueryResult, AzionDatabaseQueryResponse } from 'azion/sql'; +import type { EmbeddingsInterface } from '@langchain/core/embeddings'; +import { Document } from '@langchain/core/documents'; + +/** + * Represents a filter condition for querying the Azion database + * @property operator - The comparison operator to use (e.g. =, !=, >, <, etc) + * @property column - The database column to filter on + * @property value - The value to compare against + */ +export type AzionFilter = {operator: Operator, column: Column, value: string}; + +/** + * Represents a database column name + */ +export type Column = string; + +/** + * Valid SQL operators that can be used in filter conditions + */ +export type Operator = + | '=' | '!=' | '>' | '<>' | '<' // Basic comparison operators + | '>=' | '<=' // Range operators + | 'LIKE' | 'NOT LIKE' // Pattern matching + | 'IN' | 'NOT IN' // Set membership + | 'IS NULL' | 'IS NOT NULL'; // NULL checks + + +/** + * Interface for configuring the Azion vector store setup + * @property {string[]} columns - Additional columns to create in the database table beyond the required ones + * @property {"vector" | "hybrid"} mode - The search mode to enable: + * "vector" - Only vector similarity search + * "hybrid" - Both vector and full-text search capabilities + */ +interface AzionSetupOptions { + columns: string[], + mode: "vector" | "hybrid" +} + +/** + * Interface representing the structure of a row in the vector store + * @property content - The text content of the document + * @property embedding - The vector embedding of the content as an array of numbers + * @property metadata - Additional metadata associated with the document as key-value pairs + */ +interface rowsInterface { + content: string; + embedding: number[]; + metadata: Record; +} + +export type AzionMetadata = Record; + +/** + * Interface for the response returned when searching embeddings. + */ +interface SearchEmbeddingsResponse { + id: number; + content: string; + similarity: number; + metadata: { + searchtype: string; + [key: string]: any; + }; +} + +/** + * Interface for configuring hybrid search options that combines vector and full-text search + * @property {number} kfts - Number of results to return from full-text search + * @property {number} kvector - Number of results to return from vector similarity search + * @property {AzionFilter[]} [filter] - Optional array of filters to apply to search results + * @property {string[]} [metadataItems] - Optional array of metadata fields to include in results + */ +interface hybridSearchOptions { + kfts: number, + kvector: number, + filter?: AzionFilter[], + metadataItems?: string[] +} + +/** + * Interface for configuring full-text search options + * @property {number} kfts - Number of results to return from full-text search + * @property {AzionFilter[]} [filter] - Optional array of filters to apply to search results + * @property {string[]} [metadataItems] - Optional array of metadata fields to include in results + */ +interface fullTextSearchOptions { + kfts: number, + filter?: AzionFilter[], + metadataItems?: string[] +} + +/** + * Interface for configuring vector similarity search options + * @property {number} kvector - Number of results to return from vector similarity search + * @property {AzionFilter[]} [filter] - Optional array of filters to apply to search results + * @property {string[]} [metadataItems] - Optional array of metadata fields to include in results + */ +interface similaritySearchOptions { + kvector: number, + filter?: AzionFilter[], + metadataItems?: string[] +} + +/** + * Interface for the arguments required to initialize an Azion library. + */ +export interface AzionVectorStoreArgs { + tableName: string; + filter?: AzionMetadata; + dbName: string; + expandedMetadata?: boolean; +} + +/** + * Example usage: + * ```ts + * // Initialize the vector store + * const vectorStore = new AzionVectorStore(embeddings, { + * dbName: "mydb", + * tableName: "documents" + * }); + * + * // Setup database with hybrid search and metadata columns + * await vectorStore.setupDatabase({ + * columns: ["topic", "language"], + * mode: "hybrid" + * }); + * + * // OR: Initialize using the static create method + * const vectorStore = await AzionVectorStore.initialize(embeddings, { + * dbName: "mydb", + * tableName: "documents" + * }, { + * columns: ["topic", "language"], + * mode: "hybrid" + * }); + * + * // Add documents to the vector store + * await vectorStore.addDocuments([ + * new Document({ + * pageContent: "Australia is known for its unique wildlife", + * metadata: { topic: "nature", language: "en" } + * }) + * ]); + * + * // Perform similarity search + * const results = await vectorStore.similaritySearch( + * "coral reefs in Australia", + * 2, // Return top 2 results + * { filter: [{ operator: "=", column: "topic", string: "biology" }] } // Optional AzionFilter + * ); + * + * // Perform full text search + * const ftResults = await vectorStore.fullTextSearch( + * "Sydney Opera House", + * 1, // Return top result + * { filter: [{ operator: "=", column: "language", string: "en" }] } // Optional AzionFilter + * ); + * ``` + */ + +export class AzionVectorStore extends VectorStore { + /** Type declaration for filter type */ + declare FilterType: AzionMetadata + + /** Name of the main table to store vectors and documents */ + tableName: string + + /** Name of the database to use */ + dbName: string + + /** Whether the metadata is contained in a single column or multiple columns */ + expandedMetadata: boolean + + _vectorstoreType(): string { + return 'azionEdgeSQL' + } + + constructor( + embeddings: EmbeddingsInterface, + args: AzionVectorStoreArgs + ) { + super(embeddings, args) + this.tableName = args.tableName + this.dbName = args.dbName + this.expandedMetadata = args.expandedMetadata ?? false + } + + /** + * Creates a new vector store instance and sets up the database. + * @param {EmbeddingsInterface} embeddings - The embeddings interface to use for vectorizing documents + * @param {AzionVectorStoreArgs} args - Configuration options: + * @param {string} args.dbName - Name of the database to create/use + * @param {string} args.tableName - Name of the table to create/use + * @param {AzionSetupOptions} setupOptions - Database setup options: + * @param {string[]} setupOptions.columns - Additional columns to create in the table beyond the required ones + * @param {"vector"|"hybrid"} setupOptions.mode - The search mode to enable: + * - "vector": Only vector similarity search capabilities + * - "hybrid": Both vector and full-text search capabilities + * @returns {Promise} A promise that resolves with the configured vector store instance + */ + static async initialize( + embeddings: EmbeddingsInterface, + args: AzionVectorStoreArgs, + setupOptions: AzionSetupOptions + ): Promise { + const instance = new AzionVectorStore(embeddings, args) + await instance.setupDatabase(setupOptions) + return instance + } + + /** + * Adds documents to the vector store. + * @param {Document[]} documents The documents to add. + * @param {Object} options Optional parameters for adding the documents. + * @returns A promise that resolves when the documents have been added. + */ + async addDocuments( + documents: Document[], + options?: { ids?: string[] | number[] } + ) { + const texts = documents.map((doc) => doc.pageContent) + const embeddings = await this.embeddings.embedDocuments(texts) + return this.addVectors(embeddings, documents, options) + } + + /** + * Adds vectors to the vector store. + * @param {number[][]} vectors The vectors to add. + * @param {Document[]} documents The documents associated with the vectors. + * @param {Object} options Optional parameters for adding the vectors. + * @returns A promise that resolves with the IDs of the added vectors when the vectors have been added. + */ + async addVectors( + vectors: number[][], + documents: Document[], + options?: { ids?: string[] | number[] } + ) { + + const rows = await this.mapRowsFromDocuments(vectors, documents) + const insertStatements = this.createStatements(rows) + const chunks = this.createInsertChunks(insertStatements) + + await this.insertChunks(chunks) + } + + /** + * Gets the dimensions of the embeddings. + * @returns {Promise} The dimensions of the embeddings. + */ + private async getEmbeddingsDimensions( + ): Promise { + return (await this.embeddings.embedQuery("test")).length + } + + /** + * Maps the rows and metadata to the correct format. + * @param vectors The vectors to map. + * @param {Document[]} documents The documents to map. + * @returns {Promise} The mapped rows and metadata. + */ + private async mapRowsFromDocuments( + vectors: number[][], + documents: Document[] + ): Promise< rowsInterface[] > { + + return vectors.map((embedding, idx) => ({ + content: documents[idx].pageContent, + embedding, + metadata: documents[idx].metadata, + })) + } + + /** + * Sets up the database and tables. + * @param {AzionSetupOptions} setupOptions The setup options: + * - columns: string[] - The metadata columns to add to the table + * - mode: "vector" | "hybrid" - The mode to use for the table. "vector" for vector search only, "hybrid" for vector and full-text search + * @returns {Promise} A promise that resolves when the database and tables have been set up. + */ + async setupDatabase( + setupOptions:AzionSetupOptions + ): Promise{ + const {columns, mode} = setupOptions + + await this.handleDatabase() + await new Promise(resolve => setTimeout(resolve, 15000)) + console.log("Database created") + await this.handleTables(mode, columns) + } + + /** + * Handles the table creation and setup. + * @param {string} mode The mode. + * @param {string[]} columns The columns to setup. + * @returns {Promise} A promise that resolves when the table has been created and setup. + */ + private async handleTables( + mode: "vector" | "hybrid", + columns: string[] + ): Promise{ + + const {data : dataTables, error : errorTables} = await getTables(this.dbName) + + this.errorHandler(errorTables, "Error getting tables") + + const tables = dataTables?.results?.[0]?.rows?.map(row => row[1]) + + if (!this.areTablesSetup(tables, mode)){ + const { error : errorSetupDb} = await this.setupTables(mode, columns) + this.errorHandler(errorSetupDb, "Error setting up tables") + } + } + + /** + * Handles the error. + * @param {Object} error The error object. + * @param {string} message The message to display. + * @returns {void} A void value. + */ + private errorHandler( + error:{ + message: string + operation: string} | undefined, + message: string + ): void { + if (error){ + console.log(message, error) + throw new Error(error?.message ?? message) + } + } + + /** + * Checks if the tables are setup. + * @param {string | number | string[] | number[]} tables The tables. + * @param {string} mode The mode. + * @returns {boolean} Whether the tables are setup. + */ + private areTablesSetup( + tables: (string | number)[] | undefined, + mode: "vector" | "hybrid" + ): boolean { + + if (!tables){ + return false + } + + if (mode === "hybrid"){ + return tables?.includes(this.tableName) && tables?.includes(this.tableName + "_fts") + } + + return tables?.includes(this.tableName) + } + + /** + * Handles the database creation and setup. + * @returns {Promise} A promise that resolves when the database has been created and setup. + */ + private async handleDatabase( + ): Promise{ + const {data : dataGet, error : errorGet} = await getDatabases() + + this.errorHandler(errorGet, "Error getting databases") + + if (!dataGet?.databases?.find((db) => db.name === this.dbName)){ + console.log("Creating database: ",this.dbName) + const {error : errorCreate} = await createDatabase(this.dbName, {debug:true}) + + this.errorHandler(errorCreate, "Error creating database") + } + } + + /** + * Sets up the tables based on the specified mode and columns. + * @param {string} mode The mode to use - either "vector" for vector search only or "hybrid" for vector + full text search + * @param {string[]} columns Additional metadata columns to add to the tables + * @returns {Promise>} A promise that resolves when the tables have been created and setup + */ + private async setupTables( + mode: "vector" | "hybrid", + columns: string[] + ): Promise> { + + const createTableStatement = ` + CREATE TABLE ${this.tableName} ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + content TEXT NOT NULL, + embedding F32_BLOB(${await this.getEmbeddingsDimensions()}) + ${this.expandedMetadata ? + (columns.length > 0 ? ',' + columns.map(key => `${key} TEXT`).join(',') : '') : + ',metadata JSON' + } + );` + + const createIndexStatement = ` + CREATE INDEX ${this.tableName}_idx ON ${this.tableName} ( + libsql_vector_idx(embedding, 'metric=cosine', 'compress_neighbors=float8', 'max_neighbors=20') + )` + + const createFtsStatement = ` + CREATE VIRTUAL TABLE IF NOT EXISTS ${this.tableName}_fts USING fts5( + content, + id UNINDEXED + ${this.expandedMetadata ? + (columns.length > 0 ? ',' + columns.map(key => `${key}`).join(',') : '') : + ',metadata' + }, + tokenize = 'porter' + )` + + const createTriggersStatements = [ + `CREATE TRIGGER IF NOT EXISTS insert_into_${this.tableName}_fts + AFTER INSERT ON ${this.tableName} + BEGIN + INSERT INTO ${this.tableName}_fts(id, content ${this.expandedMetadata ? (columns.length > 0 ? ',' + columns.join(',') : '') : ',metadata'}) + VALUES(new.id, new.content ${this.expandedMetadata ? (columns.length > 0 ? ',' + columns.map(key => `new.${key}`).join(',') : '') : ',new.metadata'}); + END`, + + `CREATE TRIGGER IF NOT EXISTS update_${this.tableName}_fts + AFTER UPDATE ON ${this.tableName} + BEGIN + UPDATE ${this.tableName}_fts + SET content = new.content + ${this.expandedMetadata ? (columns.length > 0 ? ',' + columns.map(key => `${key} = new.${key}`).join(',') : '') : ',metadata = new.metadata'} + WHERE id = old.id; + END`, + + `CREATE TRIGGER IF NOT EXISTS delete_${this.tableName}_fts + AFTER DELETE ON ${this.tableName} + BEGIN + DELETE FROM ${this.tableName}_fts WHERE id = old.id; + END` + ] + + let allStatements = [ + createTableStatement, + createIndexStatement, + createFtsStatement, + ...createTriggersStatements + ] + + if (mode === "vector"){ + allStatements = allStatements.slice(0,2) + } + + const { error } = await useExecute(this.dbName, allStatements) + this.errorHandler(error, "Error setting up tables") + return {data: "Database setup successfully", error: undefined} + } + + /** + * Inserts the chunks into the database. + * @param {string[][]} chunks The chunks to insert. + * @returns {Promise} A promise that resolves when the chunks have been inserted. + */ + private async insertChunks( + chunks: string[][] + ): Promise { + console.log("Inserting chunks") + for (const chunk of chunks){ + console.log("Inserting chunk", chunks.indexOf(chunk)) + const { error } = await useExecute(this.dbName,chunk) + this.errorHandler(error, "Error inserting chunk") + } + console.log("Chunks inserted!") + } + + /** + * Extracts the metadata columns from the rows. + * @param {rowsInterface[]} rows The rows to extract the metadata columns from. + * @returns {string[]} The metadata columns. + */ + private extractMetadataColumns( + rows: rowsInterface[] + ): string[] { + const metadataColumns: string[] = [] + + for (const row of Object.values(rows)) { + if (row.metadata) { + Object.keys(row.metadata).forEach(key => { + if (!metadataColumns.includes(key)) { + metadataColumns.push(key) + } + }) + } + } + return metadataColumns + } + + /** + * Creates the insert statement for a row. + * @param {rowsInterface} row The row to create the insert statement for. + * @param {string[]} metadataColumns The metadata columns. + * @returns {string} The insert statement. + */ + private createInsertStatement( + row: rowsInterface, + metadataColumns: string[] + ): string { + + if (this.expandedMetadata) { + const columnNames = ['content', 'embedding', ...metadataColumns] + const values = [ + row.content, + row.embedding, + ...metadataColumns.map(col => row.metadata?.[col] ?? null) + ] + return this.createInsertString(columnNames, values) + } + + const columnNames = ['content', 'embedding', 'metadata'] + const values = [ + row.content, + row.embedding, + JSON.stringify(row.metadata) + ]; + + return this.createInsertString(columnNames, values) + } + + /** + * Creates the insert statements for the rows. + * @param {rowsInterface[]} rows The rows to create the insert statements for. + * @returns {string[]} The insert statements. + */ + private createStatements( + rows: rowsInterface[] + ): string[] { + const insertStatements = [] + const metadataColumns = this.extractMetadataColumns(rows) + + for (const row of rows) { + const statement = this.createInsertStatement(row, metadataColumns) + insertStatements.push(statement) + } + + return insertStatements + } + + /** + * Creates the insert chunks for the statements. + * @param {string[]} statements The statements to create the insert chunks for. + * @returns {string[][]} The insert chunks. + */ + private createInsertChunks( + statements: string[] + ): string[][] { + const maxChunkLength = 1000 + const maxMbSize = 0.8 * 1024 * 1024 + let insertChunk = [] + let originalStatements = statements + const totalSize = this.getStringBytes(originalStatements.join(' ')) + + if (totalSize < maxMbSize && originalStatements.length < maxChunkLength) { + return [originalStatements] + } + + console.log("Total size exceeded max size. Initiating chunking...") + let array: string[] = [] + while (originalStatements.length > 0){ + for (const statement of originalStatements){ + const totalStringBytes = this.getStringBytes(statement) + this.getStringBytes(array.join(' ')) + if (totalStringBytes > maxMbSize || (array.length+1 > maxChunkLength)){ + insertChunk.push(array) + array = [statement] + originalStatements = originalStatements.slice(1) + } else { + array.push(statement) + if (originalStatements.length == 1){ + insertChunk.push(array) + } + originalStatements = originalStatements.slice(1) + } + } + } + + return insertChunk + } + + /** + * Gets the number of bytes in a string. + * @param {string} str The string to get the number of bytes for. + * @returns {number} The number of bytes in the string. + */ + private getStringBytes( + str: string + ): number { + return new TextEncoder().encode(str).length; + } + +/** + * Performs a similarity search on the vector store and returns the top 'similarityK' similar documents. + * @param {number[]} vector The vector to search for. + * @param {number} k The number of documents to return. + * @param {AzionFilter[]} filter Optional filters to apply to the search. + * @param {string[]} metadataItems Optional metadata items to include in the search. + * @returns {Promise<[Document, number][]>} A promise that resolves with the similarity search results when the search is complete. + */ + async similaritySearchVectorWithScore( + vector: number[], + k: number, + filter?: AzionFilter[], + metadataItems?: string[] + ): Promise<[Document, number][]> { + + const metadata = this.generateMetadata(metadataItems, 'similarity') + + const filters = this.generateFilters(filter) + + const similarityQuery = ` + SELECT + id, content, ${metadata}, 1 - vector_distance_cos(embedding, vector('[${vector}]')) as similarity + FROM ${this.tableName} + WHERE ${filters} rowid IN vector_top_k('${this.tableName}_idx', vector('[${vector}]'), ${k})` + + const { data, error } = await useQuery(this.dbName, [similarityQuery]) + + if (!data) { + this.errorHandler(error, "Error performing similarity search") + throw this.searchError(error) + } + + const searches = this.mapRows(data.results) + const results = this.mapSearches(searches) + return results + } + + /** + * Performs a full-text search on the vector store and returns the top 'k' similar documents. + * @param query The query string to search for + * @param options The options for the full-text search, including: + * - kfts: The number of full-text search results to return + * - filter: Optional filters to apply to narrow down the search results + * - metadataItems: Optional metadata fields to include in the results + * @returns A promise that resolves with the full-text search results when the search is complete. + */ + async azionFullTextSearch( + query: string, + options: fullTextSearchOptions + ){ + const {kfts, filter, metadataItems} = options + const metadata = this.generateMetadata(metadataItems, 'fulltextsearch') + + const filters = this.generateFilters(filter) + + const fullTextQuery = ` + SELECT id, content, ${metadata}, rank as bm25_similarity + FROM ${this.tableName}_fts + WHERE ${filters} ${this.tableName}_fts MATCH '${query.toString().replace(/[^a-zA-Z0-9\s]/g, '').split(' ').join(' OR ')}' + LIMIT ${kfts}` + + const { data, error } = await useQuery(this.dbName, [fullTextQuery]) + + if (!data) { + this.errorHandler(error, "Error performing full-text search") + throw this.searchError(error) + } + + const searches = this.mapRows(data?.results) + const results = this.mapSearches(searches) + return results + } + + /** + * Performs a hybrid search on the vector store and returns the top 'k' similar documents. + * @param query The query string to search for + * @param options The options for the hybrid search, including: + * - kfts: The number of full-text search results to return + * - kvector: The number of vector search results to return + * - filter: Optional filters to apply to narrow down the search results + * - metadataItems: Optional metadata fields to include in the results + * @returns A promise that resolves with the hybrid search results when the search is complete. + */ + async azionHybridSearch( + query: string, + hybridSearchOptions: hybridSearchOptions + ): Promise<[Document, number][]> { + const {kfts, kvector, filter, metadataItems} = hybridSearchOptions + + const vector = await this.embeddings.embedQuery(query) + const ftsResults = await this.azionFullTextSearch(query, {kfts, filter, metadataItems}) + + const vectorResults = await this.similaritySearchVectorWithScore(vector, kvector, filter, metadataItems) + + return this.removeDuplicates([...ftsResults, ...vectorResults], kfts, kvector) + } + + /** + * Performs a similarity search on the vector store and returns the top 'k' similar documents. + * @param query The query string. + * @param options The options for the similarity search, including: + * - kvector: The number of vector search results to return + * - filter: Optional filters to apply to the search + * - metadataItems: Optional metadata fields to include in results + * @returns A promise that resolves with the similarity search results when the search is complete. + */ + async azionSimilaritySearch( + query: string, + options: similaritySearchOptions + ): Promise<[Document, number][]>{ + const {kvector, filter, metadataItems} = options + const vector = await this.embeddings.embedQuery(query) + return this.similaritySearchVectorWithScore(vector, kvector, filter, metadataItems) + } + +/** + * Generates an error document based on the provided error information + * @param {Object} error The error object containing details about the issue + * @returns {Promise<[Document, number][]>} A promise that resolves to an array containing a single Document representing the error + */ + private searchError( + error: { + message: string; + operation: string;} | undefined + ): Error { + throw new Error(error?.message, { cause: error?.operation }) + } + + /** + * Deletes documents from the vector store. + * @param {string[]} ids The IDs of the documents to delete. + * @returns {Promise} A promise that resolves when the documents have been deleted. + */ + async delete( + ids: string[] + ): Promise { + const deleteStatement = `DELETE FROM ${this.tableName} WHERE id IN (${ids.join(',')})` + const { error } = await useExecute(this.dbName, [deleteStatement]) + if (error) { + this.errorHandler(error, `Error deleting document from ${this.tableName}`) + } else { + console.log(`Deleted ${ids.length} items from ${this.tableName}`) + } + } + + /** + * Removes duplicate results from the search results, prioritizing a mix of similarity and FTS results. + * @param {[Document, number][]} results - The array of search results to process, containing document and score pairs + * @param {number} kfts - Maximum number of full-text search results to include + * @param {number} kvector - Maximum number of vector similarity search results to include + * @returns {[Document, number][]} An array of unique search results, limited by kfts and kvector parameters + */ + private removeDuplicates( + results: [Document, number][], + kfts: number, + kvector: number + ): [Document, number][] { + const uniqueResults: [Document, number][] = []; + const seenIds = new Set(); + + let similarityCount = 0 + let ftsCount = 0 + const maxItems = kfts + kvector + + for (const result of results) { + if (!seenIds.has(result[0].id)) { + if (result[0].metadata?.searchtype === 'similarity' && similarityCount < kvector) { + seenIds.add(result[0].id) + uniqueResults.push(result) + similarityCount++ + } else if (result[0].metadata.searchtype === 'fulltextsearch' && ftsCount < kfts) { + seenIds.add(result[0].id) + uniqueResults.push(result) + ftsCount++ + } + } + if (similarityCount + ftsCount === maxItems) break + } + return uniqueResults; + } + +/** + * Converts query results to SearchEmbeddingsResponse objects. + * @param {QueryResult[]} results - The raw query results from the database. + * @returns {SearchEmbeddingsResponse[]} An array of SearchEmbeddingsResponse objects. + */ + private mapRows( + results: QueryResult[] | undefined + ): SearchEmbeddingsResponse[] { + + if (!results) { + return [] + } + + return results.flatMap(( + queryResult: QueryResult + ): SearchEmbeddingsResponse[] => { + + if (!queryResult.rows || !queryResult.columns) { + return [] + } + + return queryResult.rows.map( + (row): SearchEmbeddingsResponse => ({ + id: Number(row[0]), + content: String(row[1]), + metadata: JSON.parse(String(row[2])), + similarity: Number(row[3]) + }) + ); + } + ); + } + + /** + * Maps search results to Document objects. + * @param {SearchEmbeddingsResponse[]} searches An array of SearchEmbeddingsResponse objects. + * @returns An array of tuples, each containing a single Document object. + */ + private mapSearches( + searches: SearchEmbeddingsResponse[] + ): [Document, number][] { + return searches.map((resp: SearchEmbeddingsResponse) => [ + new Document({ + metadata: resp.metadata, + pageContent: resp.content, + id: resp.id.toString(), + }), + resp.similarity + ]); + } + + /** + * Generates the metadata string for the SQL query. + * @param {string[]} metadataItems - The metadata items to include in the query. + * @param {string} searchType - The type of search. + * @returns {string} The metadata string. + */ + private generateMetadata( + metadataItems: string[] | undefined, + searchType: string + ): string { + if (!metadataItems) { + return `json_object('searchtype', '${searchType}') as metadata` + } + + if (this.expandedMetadata) { + return `json_object('searchtype','${searchType}',${metadataItems.map(item => `'${this.sanitizeItem(item)}', ${this.sanitizeItem(item)}`).join(', ')}) as metadata` + } + + return `json_patch(json_object(${metadataItems?.map(item => `'${this.sanitizeItem(item)}', metadata->>'$.${this.sanitizeItem(item)}'`).join(', ')}), '{"searchtype":"${searchType}"}') as metadata` + } + + /** + * Generates the filters string for the SQL query. + * @param {AzionFilter[]} filters The filters to apply to the query. + * @returns {string} The filters string. + */ + private generateFilters( + filters: AzionFilter[] | undefined + ): string { + if (!filters || filters?.length === 0) { + return ''; + } + + return filters.map(({operator, column, value}) => { + const columnRef = this.expandedMetadata ? this.sanitizeItem(column) : `metadata->>'$.${this.sanitizeItem(column)}'`; + if (['IN', 'NOT IN'].includes(operator.toUpperCase())) { + return `${columnRef} ${operator} (${this.sanitizeItem(value)})`; + } + return `${columnRef} ${operator} '${this.sanitizeItem(value)}'`; + }).join(' AND ') + ' AND '; + } + + /** + * Creates the insert sql query for a row. + * @param {string[]} columnNames The column names. + * @param {string[]} values The values. + * @returns {string} The insert sql query. + */ + private createInsertString( + columnNames: string[], + values: any[] + ): string { + + if (this.expandedMetadata) { + const string = `INSERT INTO ${this.tableName} (${columnNames.join(', ')}) + VALUES (${values.map((value, index) => columnNames[index] === 'embedding' ? + `vector('[${value}]')` : `'${this.escapeQuotes(value)}'`).join(', ')})` + + return string + } + + const string = `INSERT INTO ${this.tableName} (${columnNames.join(', ')}) + VALUES (${values.map((value, index) => { + if (columnNames[index] === 'embedding') { + return `vector('[${value}]')` + } else if (columnNames[index] === 'metadata') { + return `'${value}'` + } else { + return `'${this.escapeQuotes(value)}'` + } + }).join(', ')})` + return string + } + + /** + * Escapes the quotes in the value. + * @param {string} value The value to escape the quotes in. + * @returns {string} The value with the quotes escaped. + */ + private escapeQuotes( + value: string + ): string { + return value.replace(/'/g, " ").replace(/"/g, ' ') + } + + /** + * Sanitizes an item by removing non-alphanumeric characters. + * @param {string} item The item to sanitize. + * @returns {string} The sanitized item. + */ + private sanitizeItem( + item: string | undefined + ): string { + if (item){ + return item.replace(/[^a-zA-Z0-9\s]/g, '') + } + return '' + } +} \ No newline at end of file diff --git a/libs/langchain-community/src/vectorstores/tests/azion_edgesql.int.test.ts b/libs/langchain-community/src/vectorstores/tests/azion_edgesql.int.test.ts new file mode 100644 index 000000000000..f25a56557fea --- /dev/null +++ b/libs/langchain-community/src/vectorstores/tests/azion_edgesql.int.test.ts @@ -0,0 +1,156 @@ +/* eslint-disable no-process-env */ +/* eslint-disable @typescript-eslint/no-non-null-assertion */ +import { OpenAIEmbeddings } from "@langchain/openai"; +import { AzionVectorStore } from "@langchain/community/vectorstores/azion_edgesql"; +import { Document } from "@langchain/core/documents"; +import { jest, test, expect, describe, beforeAll } from "@jest/globals"; + +// Increase timeout for database operations +jest.setTimeout(60000); + +describe("AzionVectorStore", () => { + let vectorStore: AzionVectorStore; + const dbName = "langchain"; + const tableName = "documents"; + + const testDocs = [ + new Document({ + pageContent: "Aspirin is good for headaches", + metadata: { category: "medicine", type: "pain relief" } + }), + new Document({ + pageContent: "Ibuprofen reduces inflammation and pain", + metadata: { category: "medicine", type: "pain relief" } + }), + new Document({ + pageContent: "Regular exercise helps prevent headaches", + metadata: { category: "lifestyle", type: "prevention" } + }) + ]; + + beforeAll(async () => { + const embeddings = new OpenAIEmbeddings(); + + // Test static factory method + vectorStore = await AzionVectorStore.initialize( + embeddings, + { + dbName, + tableName + }, + { + columns: ["category", "type"], + mode: "hybrid" + } + ); + + // Add test documents + await vectorStore.addDocuments(testDocs); + }); + + test("should create vector store instance", () => { + expect(vectorStore).toBeDefined(); + expect(vectorStore._vectorstoreType()).toBe("azionEdgeSQL"); + }); + + test("should perform similarity search", async () => { + const results = await vectorStore.azionSimilaritySearch( + "what helps with headaches?", + { + kvector: 2, + filter: [{ operator: "=", column: "category", value: "medicine" }], + metadataItems: ["category", "type"] + } + ); + + expect(results).toBeDefined(); + expect(results.length).toBeLessThanOrEqual(2); + expect(results[0][0].metadata.category).toBe("medicine"); + }); + + test("should perform full text search", async () => { + const results = await vectorStore.azionFullTextSearch( + "exercise headaches", + { + kfts: 1, + filter: [{ operator: "=", column: "category", value: "lifestyle" }], + metadataItems: ["category", "type"] + } + ); + + expect(results).toBeDefined(); + expect(results.length).toBeLessThanOrEqual(1); + expect(results[0][0].metadata.category).toBe("lifestyle"); + }); + + test("should perform hybrid search", async () => { + const results = await vectorStore.azionHybridSearch( + "pain relief medicine", + { + kfts: 2, + kvector: 2, + filter: [{ operator: "=", column: "type", value: "pain relief" }], + metadataItems: ["category", "type"] + } + ); + + expect(results).toBeDefined(); + expect(results.length).toBeLessThanOrEqual(4); + expect(results[0][0].metadata.type).toBe("pain relief"); + }); + + test("should handle filters correctly", async () => { + const results = await vectorStore.azionSimilaritySearch( + "medicine", + { + kvector: 2, + filter: [ + { operator: "=", column: "category", value: "medicine" }, + { operator: "=", column: "type", value: "pain relief" } + ], + metadataItems: ["category", "type"] + } + ); + + expect(results).toBeDefined(); + expect(results.length).toBeGreaterThan(0); + results.forEach(([doc]) => { + expect(doc.metadata.category).toBe("medicine"); + expect(doc.metadata.type).toBe("pain relief"); + }); + }); + + test("should handle empty search results", async () => { + const results = await vectorStore.azionSimilaritySearch( + "nonexistent content", + { + kvector: 2, + filter: [{ operator: "=", column: "category", value: "nonexistent" }] + } + ); + + expect(results).toBeDefined(); + expect(results.length).toBe(0); + }); + + test("should add new documents", async () => { + const newDoc = new Document({ + pageContent: "Meditation can help with stress headaches", + metadata: { category: "lifestyle", type: "stress relief" } + }); + + await vectorStore.addDocuments([newDoc]); + + const results = await vectorStore.azionFullTextSearch( + "meditation stress", + { + kfts: 1, + filter: [{ operator: "=", column: "type", value: "stress relief" }] + } + ); + + expect(results).toBeDefined(); + expect(results.length).toBe(1); + expect(results[0][0].pageContent).toContain("Meditation"); + }); +}); \ No newline at end of file