Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's work on a RAG module! #85

Open
fdaudens opened this issue Dec 11, 2024 · 7 comments
Open

Let's work on a RAG module! #85

fdaudens opened this issue Dec 11, 2024 · 7 comments

Comments

@fdaudens
Copy link

It could be really cool to work on a module focused on RAG tools. I'm especially thinking about use-cases in journalism, but I'm sure there could be plenty of different pathways.

It could be smth like this:

  1. What is RAG (types of docs, etc.) + top use cases in media
  2. HF assistants for a single doc or web domain
  3. Simple RAG pipeline with varied docs
  4. Agentic RAG for example with this cookbook
  5. Multimodal RAG [for example with this cookbook]](https://huggingface.co/learn/cookbook/en/multimodal_rag_using_document_retrieval_and_reranker_and_vlms)
  6. Video RAG
  7. Evaluating RAG (with one of the Argilla cookbooks)
@burtenshaw
Copy link
Collaborator

Nice idea. I would suggest adapting to this structure:

  • 8_rag
    • README: What is RAG?
    • page: HF assistants for a single doc or web domain
    • page: Simple RAG pipeline with varied docs
    • page: Multimodal RAG + Video Rag
    • page: Evaluating RAG
    • notebooks/
      • notebooks on RAG
        -app/
      • example of a gradio RAG app

I would then add a 'pathway' to the main readme, which combined this with other modules for journalists.

I'll share this issue in the community channels and see if anyone else is interested.

@duydl
Copy link
Collaborator

duydl commented Dec 11, 2024

Great idea! I think this could work very well as a wrap-up or capstone project since it ties together many elements from earlier modules. To keep things consistent, maybe we could structure the tasks in previous modules to naturally build up toward a RAG pipeline, like:

  • Fine-tune modules: Give example of dataset and interaction for agentic task, like running code, performing calculations, or creating interactive workflows. This ensures we have SMOL models support some RAG concepts.

  • Evaluation module: Introduce methods that naturally lead into RAG evaluation, such as retrieval-based metrics or embedding-based similarity measures. Could also prevent the RAG module from becoming overcrowded.

  • Multimodal and vision modules:

    • Show more on how to integrate visual media into chat usage or combine them with purely text-based models.
    • Shift the focus toward diverse use cases of multimodal models over fine-tuning, as I think the fine-tuning principles are similar to what’s covered in earlier modules.

Additionally, perhaps having some examples of RAG usage like providing small datasets with clear target tasks and queries and exercises where students need to set up a RAG pipeline and set up a model to achieve specific outcomes.

@burtenshaw
Copy link
Collaborator

Thanks for the great input @duydl! Let's work on this.

I think this could work very well as a wrap-up or capstone project since it ties together many elements from earlier modules.

I agree that RAG would make a great capstone project. However, I'm reluctant to force students in any one direction. So I think that the capstone project should be open.

Also, we have another discussion going on here about a capstone using a student leaderboard. Also, the evaluation module already has a project in it, which could be converted to a capstone. I'll create another issue about a capstone, and we can discuss all these topics there.

Evaluation module: Introduce methods that naturally lead into RAG evaluation, such as retrieval-based metrics or embedding-based similarity measures.

It makes sense to add these elements to module_4.

Could also prevent the RAG module from becoming overcrowded.

There's basically a spectrum between overcrowded and stand-alone. IMO, students should be able to follow any 1 module and take something away. So whoever implements this can make the best call on their content.

Show more on how to integrate visual media into chat usage or combine them with purely text-based models.

This is a great suggestion which we could take up here.

@burtenshaw
Copy link
Collaborator

@duydl I created an issue on the capstone here.

On the RAG module, I think we should do it in two steps: 1. implement a minimal RAG module, 2. update the other modules to be more compatible with RAG.

Let me know if you're up for collaborating on the capstone or RAG module. 😄

@duydl
Copy link
Collaborator

duydl commented Dec 15, 2024

@burtenshaw Should we agree on some dependencies (or none, preferring from scratch with only HuggingFace projects) for this module? There are a few equivalent options, so should we just default to the most "popular" like langchain, and ragas for evaluation.

@burtenshaw
Copy link
Collaborator

@burtenshaw Should we agree on some dependencies (or none, preferring from scratch with only HuggingFace projects) for this module? There are a few equivalent options, so should we just default to the most "popular" like langchain, and ragas for evaluation.

Yeah. This sounds sensible. I would say start with as few as possible, but use obvious popular frameworks like langchain where appropriate.

There are a number of examples in the cookbook that we can start from.

@duydl
Copy link
Collaborator

duydl commented Jan 2, 2025

There is a great new library to include in this course and particularly the rag module
https://github.com/huggingface/smolagents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants