Welcome to the RAG Framework, a cutting-edge solution designed for scalable document ingestion and retrieval. Leveraging distributed computing with Ray, this framework empowers users to seamlessly process vast amounts of documents in parallel across multiple CPU and GPU nodes. The inclusion of Qdrant disk-based indexing ensures support for the scale of billions of vectors, making it a robust choice for large-scale applications.
product.demo.mp4
The RAG Framework employs Ray for distributed computing, enabling parallel document ingestion across multiple CPU and GPU nodes. This ensures optimal utilization of resources for efficient and scalable processing.
To support the scale of billions of vectors, the framework integrates Qdrant disk-based indexing. This technology provides high-performance indexing capabilities, facilitating rapid and precise retrieval of relevant information.
RAG Framework offers REST APIs for convenient asset ingestion from popular sources such as S3 and GitHub. The APIs are also designed for efficient retrieval, ensuring a smooth and seamless integration into your existing workflows.
REST APIs are served using Ray Serve, allowing for easy scalability across multiple GPU and CPU nodes. This ensures that the framework adapts to the demands of your application, providing consistent performance even in dynamic environments.
The RAG Framework is highly configurable, allowing users to tailor the system to their specific needs. Key configuration options include the number of CPUs/GPUs to use, the choice of embedding model, chunk size, reranker model, and more.
Follow these steps to get started with the RAG Framework:
- Clone the repository
- Configure your settings: Edit the configuration file (.env) to customize the framework based on your requirements. The sample .env is given in .env.example
- Run using docker:
docker compose up