Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development Roadmap (2025 S1) #264

Open
3 of 17 tasks
huangyz0918 opened this issue Nov 12, 2024 · 3 comments
Open
3 of 17 tasks

Development Roadmap (2025 S1) #264

huangyz0918 opened this issue Nov 12, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@huangyz0918
Copy link
Member

huangyz0918 commented Nov 12, 2024

For anyone who wants to contribute (add features, report bugs, or just simply discuss and learn), join our Discord πŸ‘‹
Or you can just comment here for open discussions! πŸ‘¨β€πŸ’»

RAG Module for Code Indexing

Research Topic

FYI @HuaizhengZhang

  • PoC: How to sync up the knowledge updating (e.g., the code will update frequently)
  • PoC: How to efficiently scan the file as memory? (the embedding costs time, for a large amount of codebase/files)
  • PoC: How to do the chunking for code different types of textual (image/audio, other modality) data
  • PoC: Using graphRAG for overall (code) information summarization

Enhance mle chat

Prompting

Function Calls

Documentation

@huangyz0918 huangyz0918 pinned this issue Nov 12, 2024
@dosubot dosubot bot added the enhancement New feature or request label Nov 12, 2024
@TimeLordRaps
Copy link

What are your thoughts on the library nano-graphrag?

@huangyz0918
Copy link
Member Author

@TimeLordRaps Do you mean this https://github.com/gusye1234/nano-graphrag?

I think it is an elegant, small, and clean implementation of GraphRAG. However, to implement GraphRAG, a graph-based data store and (usually) a KV store must be introduced, which brings problems in 1) extract storage/dependency and 2) compatibility with other stores. Moreover, I am not sure how such graph-based indexing performs on code generation tasks.

But I think it is worth trying in the chat mode for our project -- since the user may ask very high-level, or summarized questions based on the large code base. For the advisor it can also help, but the very first problem is how we handle the function call with the graph search (maybe use graph query for project summarization as the pre-retrieval before calling function like web search).

@TimeLordRaps
Copy link

TimeLordRaps commented Nov 12, 2024

https://github.com/CEDARScript/cedarscript-grammar This should help. A friend of mines project I'm helping work forward into an eventual shared resource of personified coding graphs. Ie think aiders --edit-format, which is where its currently being applied, that said, we were just discussing branching out to other coding frameworks. Combining cedarscript with a graph database has power that hasn't been tested yet. That said cedarscript improved gemini-1.5-flash refactored benchmark with these highlights:

48% of tests (43 total) showed improvements
103% increase in Pass 1 success rate (75 tests)
Test duration reduced by 93% (from 5:17:26 to 0:25:17)
Token efficiency greatly improved:
Sent tokens: -37% (7.59M)
Received tokens: -96% (180K)
Error reduction:
Error outputs: -94% (35 total)
Malformed outputs: -94% (6 cases)
Syntax errors: -85% (3 cases)
Indent errors eliminated (100% reduction)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants