Skip to content

Releases: llm-tools/embedJs

v0.1.4

09 Oct 17:30
dea7e1a
Compare
Choose a tag to compare

0.1.4 (2024-10-09)

🚀 Features

🩹 Fixes

  • fixed debug message string fro createLoaderFromMimeType (abf5901)

  • remove changelog generation from github release (87abd2b)

  • remove changelog generation from github release (4aa3f18)

  • capitalization on contributing.md (0381453)

  • Adhityan K V

v0.1.3

06 Oct 10:16
5d3b526
Compare
Choose a tag to compare

0.1.3 (2024-10-06)

🚀 Features

  • readded local-path and url loaders (303133c)

🩹 Fixes

  • commit pinecone example package lock file (3d1051e)
  • exclude examples from release process (1382185)
  • downgrade esbuild version to match nx requirements (183308f)

❤️ Thank You

v0.1.2

06 Oct 00:30
c4d4781
Compare
Choose a tag to compare

0.1.2 (2024-10-06)

🚀 Features

  • readded local-path and url loaders (303133c)

🩹 Fixes

  • exclude examples from release process (1382185)

  • downgrade esbuild version to match nx requirements (183308f)

❤️ Thank You

0.1.1 (2024-10-04)

04 Oct 08:45
afb0d7f
Compare
Choose a tag to compare

Temporarily disabled dynamic, url and local path loaders as they required install of all modules from the monorepo. Also temporarily removed access to Simple_Models enum. They will be reenabled soon.

v0.1.0

04 Oct 07:45
ddcea24
Compare
Choose a tag to compare

0.1.0 (2024-10-03)

This component has been extracted and is now published as part of a workspace monorepo managed by NX. There are many reasons that prompted this move, but the most critical issue was to decouple the need to install all dependencies for a single usecase. While we add (and continue to add) more and more loaders, databases, caches and models - the number of shared dependencies grew a lot. Most projects will not use all these combinations and it made no sense to have them all installed for everyone. Further, issues with dependent packages raised vulnerabilities that affected all projects - clearly something we did not intend.

Now what? Starting with version 0.1.0, We have switched to a monorepo based approach. All packages will have the same version number but changelogs and dependencies will be independent. You only need to install the relevant addons (loaders, models, databases, etc) specific to your usecase. Given the shortage of maintainers, we will not be able to support the non-monorepo version of the library beyond critical bugfixes for the next three months, post which the older version will not receive any security fixes. We strongly recommend upgrading to the newer version as soon as you can.

  • Adhityan K V

Version 0.0.82

01 Jun 19:08
5dbde53
Compare
Choose a tag to compare

A number of important features and bug fixes make it into this release. Here's a rundown of the top new features -

Loader inference

The library can now infer the type of the loader automatically. You can pass a string and it will use the MimeType (detected using magic numbers) and the file extension if available to decide what is the correct loader to invoke. For example -

.addLoader('https://tesla-info.com/sitemap.xml') // will use sitemap loader
.addLoader('https://en.wikipedia.org/wiki/Tesla,_Inc.') // will use the web loader
.addLoader('s4pVFLUlx8g') // will detect this is a youtube video id and use the video loader
.addLoader('https://lamport.azurewebsites.net/pubs/paxos-simple.pdf') // will use the pdf loader

.addLoader('local/paxos-simple.pdf') // will also use the pdf loader
.addLoader('local/data.csv') // will also use the CSV loader

You can also pass it a local directory name and it will recursively load all files within it using the most appropirate loader. Note: It will skip files it does not have a loader for.

Alternatively, you can now add loaders by passing in an object with the correct parameters without invoking the loader constructor directly. That is -

//Before
.addLoader(new WebLoader({ urlOrContent: 'https://www.biography.com/business-leaders/steve-jobs' }))

//Now
.addLoader({ type: 'Web', urlOrContent: 'https://www.biography.com/business-leaders/steve-jobs' })

This makes for simpler reading and is very consistent across all loaders.

List of added loaders

The library now maintains the past list of loaders which were added in its cache. So, you can now get the list of all loaded content even between restarts. This is useful if you want to internalize the state of the RAG application within the library itself.

You can get the list of loaders by calling -

await ragApplication.getLoaders()

The list of added loaders will include all loaders, even those that were implicitly invoked by another loader. To understand this better, let's look at theLocalPathLoader. This loader uses the file system API to scan files and directories. Once it infers the file type, it internally calls other loaders to add and process PPT, CSV, HTML, etc. files. When this happens, the getLoaders() method will give you the list of all loaders including LocalPathLoader, CsvLoader, WebLoader, etc with metadata around what each loader worked on.

Note: All the data around this is recorded in the cache attached. Therefore this functionality only works when you have a cache set.

CSV Loader

Now you can add CSV files from both local and web URLs using the CSV loader. To add a Csv file (or URL) to your embeddings, use CsvLoader. The library will parse the Csv and add each row to its vector database.

.addLoader(new CsvLoader({ filePathOrUrl: '...' }))

Note: You can control how the CsvLoader parses the file in great detail by passing in the optional csvParseOptions constructor parameter.

Github workflow

The library now uses Github actions to verify if the PR compiles and builds in Node versions 18, 20 and 22. This will be automatically run on every PR.

Version 0.0.77 (Stable)

18 May 20:34
Compare
Choose a tag to compare

The library has now reached stable. It now supports several LLMs (including HuggingFace, Ollama, OpenAI's GPT 4o and Anthropic among others), many Embedding Models and even more Vector Databases (most popular databases are now supported).

Several critical features were added in this time -

  1. It has the ability to detect already having processed a type of document and avoid duplication in the vector database.
  2. You can automatically filter the embeddings based on a relevance cut-off. This is possible even with databases that do not return a simialrity score.
  3. It has extensive debug logging including the ablity to view raw LLM response
  4. It returns sources used in a LLM query
  5. You can use a variety of caches including Redis

First alpha release

01 Jul 20:22
Compare
Choose a tag to compare
First alpha release Pre-release
Pre-release

The package is open sourced under Apache v2 license as of this version.