Skip to content

Repository to store and share vector stores / embedding for GPT models

License

Notifications You must be signed in to change notification settings

TON-Metaspace/DocsHUB

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocsHUB

Repository to store and share vector stores / embedding for LLM models

DocsHUB is a open-source solution for storing vectors for LLM models. Like a package manager but for vector stores.

Say goodbye to time-consuming scraping and embedding, and let DocsHUB speed up your project by providing you with latest embeddings

Todo list

  • Github workflows to prepare json document with indexes ✅
  • Website with search and index
  • Plugin with DocsGPT
  • Plugins for other projects
  • API for search

Project structure

vectors - where all vecor stores are

ingestors - scripts to prepare and ingest data into vector stores

How to use it:

Just navigate to a folder you need vectors/<language>/<library_name>/<version>/ And download: docs.index, faiss_store.pkl

You can also use this index to find items you need in here (updated on every push)

https://d3dg1063dc54p9.cloudfront.net/combined.json

How to contribute:

Anyone can create a pull request. It should contain 3 files

  1. docs.index
  2. faiss_store.pkl
  3. metadata.json

Ensure the path is correct

  vectors/<language>/<library_name>/<version>/

if its actual python (language itself) for example use

vectors/python/.project/version/

And in a corresponding path

Metadata is a json document with this fields:

  • name
  • language
  • version
  • description (one or two sectences)
  • fullName (Full project name not a slug name)
  • date (to know when it was last updated)
  • docLink (link to the documentation that was used for it)

Example of metadata.json

{
  "name": "pandas",
  "language": "python",
  "version": "1.5.3",
  "description": "Pandas is alibrary providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.",
  "fullName": "Pandas",
  "date": "07/02/2023",
  "docLink": "https://pandas.pydata.org/docs/"
}

Built with 🦜️🔗 LangChain

About

Repository to store and share vector stores / embedding for GPT models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%