From 8d9b76885d1371906f6b82a54caad3fb3a7fd2da Mon Sep 17 00:00:00 2001 From: InAnYan Date: Wed, 14 Aug 2024 14:52:37 +0300 Subject: [PATCH 1/4] Add explanation of embeddings --- en/ai/README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/en/ai/README.md b/en/ai/README.md index 5e7f572e7..9336b51d8 100644 --- a/en/ai/README.md +++ b/en/ai/README.md @@ -27,7 +27,16 @@ In this window you can see the following elements: ## How does the AI functionality work? -In the background, JabRef analyses the linked PDF files of library entries. The information used after the indexing is then supplied to the AI, which, to be precise, in our case is a Large Language Model (LLM). The LLM is currently not stored on your computer. Instead, we have many integrations with AI providers (OpenAI, Mistral AI, Hugging Face), so you can choose the one you like the most. These AI providers are available only remotely via the internet. In short: we send chunks of text to AI service and then receive processed responses. In order to use it you need to configure JabRef to use your API key. +In the background, JabRef analyzes the linked PDF files of library entries. The information used after the indexing is then supplied to the AI, +which, to be precise, in our case is a Large Language Model (LLM). The LLM is currently not stored on your computer. Instead, we have many +integrations with AI providers (OpenAI, Mistral AI, Hugging Face), so you can choose the one you like the most. These AI providers are available +only remotely via the internet. In short: we send chunks of text to AI service and then receive processed responses. In order to use it you need +to configure JabRef to use your API key. + +JabRef processes linked files this way: the file is split into parts of fixed-length (also called *chunks*), and then an *embedding* is generated. Embedding is a +representation of a part of text. It's a vector that represents the meaning of the text. This vector has a crucial property: texts with +similar meaning have vectors that are close to (this is called *vector similarity*)! So, whenever you ask AI a question, JabRef tries to find relevant pieces +of text from the indexed files using vector similarity. ## More information From f0ce409698f2130ded246413068176a6932d3faf Mon Sep 17 00:00:00 2001 From: InAnYan Date: Wed, 14 Aug 2024 20:22:07 +0300 Subject: [PATCH 2/4] Fix from code review --- en/ai/README.md | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/en/ai/README.md b/en/ai/README.md index 9336b51d8..238043a99 100644 --- a/en/ai/README.md +++ b/en/ai/README.md @@ -27,16 +27,17 @@ In this window you can see the following elements: ## How does the AI functionality work? -In the background, JabRef analyzes the linked PDF files of library entries. The information used after the indexing is then supplied to the AI, -which, to be precise, in our case is a Large Language Model (LLM). The LLM is currently not stored on your computer. Instead, we have many -integrations with AI providers (OpenAI, Mistral AI, Hugging Face), so you can choose the one you like the most. These AI providers are available -only remotely via the internet. In short: we send chunks of text to AI service and then receive processed responses. In order to use it you need -to configure JabRef to use your API key. - -JabRef processes linked files this way: the file is split into parts of fixed-length (also called *chunks*), and then an *embedding* is generated. Embedding is a -representation of a part of text. It's a vector that represents the meaning of the text. This vector has a crucial property: texts with -similar meaning have vectors that are close to (this is called *vector similarity*)! So, whenever you ask AI a question, JabRef tries to find relevant pieces -of text from the indexed files using vector similarity. +In the background, JabRef analyzes the linked PDF files of library entries. The information used after the indexing is then supplied to the AI, which, to be precise, in our case is a Large Language Model (LLM). The LLM is currently not stored on your computer. +Instead, we have many integrations with AI providers (OpenAI, Mistral AI, Hugging Face), so you can choose the one you like the most. +These AI providers are availableonly remotely via the internet. +In short: we send chunks of text to AI service and then receive processed responses. +In order to use it you need to configure JabRef to use your API key. + +JabRef processes linked files this way: the file is split into parts of fixed-length (also called *chunks*), and then an *embedding* is generated. +Embedding is a representation of a part of text. +It is a vector that represents the meaning of the text. +This vector has a crucial property: texts with similar meaning have vectors that are close to (this is called *vector similarity*). +So, whenever you ask AI a question, JabRef tries to find relevant pieces of text from the indexed files using vector similarity. ## More information From e8b0c71ba6d90d048f445e8d0575cd6ebdad95de Mon Sep 17 00:00:00 2001 From: Oliver Kopp Date: Fri, 16 Aug 2024 00:35:31 +0200 Subject: [PATCH 3/4] Rework text --- en/ai/README.md | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/en/ai/README.md b/en/ai/README.md index 238043a99..4a26b7c76 100644 --- a/en/ai/README.md +++ b/en/ai/README.md @@ -27,17 +27,15 @@ In this window you can see the following elements: ## How does the AI functionality work? -In the background, JabRef analyzes the linked PDF files of library entries. The information used after the indexing is then supplied to the AI, which, to be precise, in our case is a Large Language Model (LLM). The LLM is currently not stored on your computer. -Instead, we have many integrations with AI providers (OpenAI, Mistral AI, Hugging Face), so you can choose the one you like the most. -These AI providers are availableonly remotely via the internet. -In short: we send chunks of text to AI service and then receive processed responses. -In order to use it you need to configure JabRef to use your API key. - -JabRef processes linked files this way: the file is split into parts of fixed-length (also called *chunks*), and then an *embedding* is generated. -Embedding is a representation of a part of text. -It is a vector that represents the meaning of the text. -This vector has a crucial property: texts with similar meaning have vectors that are close to (this is called *vector similarity*). -So, whenever you ask AI a question, JabRef tries to find relevant pieces of text from the indexed files using vector similarity. +JabRef uses external AI providers to do the actual work. +You can choose between OpenAI, Mistral AI, and Hugging Face. +They all run "Large Language Models" (LLMs) to process the quests. +The AI providers need chunks of text to work. +For this, JabRef parses and indexes linked PDF files of entries: +The file is split into parts of fixed-length (so-called *chunks*) and for each of them, an *embedding* is generated. +An embedding itself is a representation of a part of text and in turn a vector that represents the meaning of the text. +Each vector has a crucial property: texts with similar meaning have vectors that are close to (so-called *vector similarity*). +As a result, whenever you ask AI a question, JabRef tries to find relevant pieces of text from the indexed files using vector similarity. ## More information From 3052a5796225c42429313307aea4c5a4b9131057 Mon Sep 17 00:00:00 2001 From: InAnYan Date: Fri, 16 Aug 2024 17:52:24 +0300 Subject: [PATCH 4/4] Fix typo --- en/ai/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/en/ai/README.md b/en/ai/README.md index 4a26b7c76..c984828f0 100644 --- a/en/ai/README.md +++ b/en/ai/README.md @@ -29,7 +29,7 @@ In this window you can see the following elements: JabRef uses external AI providers to do the actual work. You can choose between OpenAI, Mistral AI, and Hugging Face. -They all run "Large Language Models" (LLMs) to process the quests. +They all run "Large Language Models" (LLMs) to process the requests. The AI providers need chunks of text to work. For this, JabRef parses and indexes linked PDF files of entries: The file is split into parts of fixed-length (so-called *chunks*) and for each of them, an *embedding* is generated.