-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agglomerative summarizer #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great effort in implementing the agglomerative summarizer features and integrating them within the existing codebase. However, there are a few areas where improvements could be made to ensure robustness and readability. Particularly, handling default settings and dependencies within class constructors could be ironed out to avoid pitfalls during runtime. Also, error handling during the Daemon
class initialization could be softened to use default configurations or log warnings rather than halting with an error.
Butler is in closed beta. Reply with feedback or to ask Butler to review other parts of the PR. Please give feedback with emoji reacts.
model: Optional[TextModel | str] = DEFAULT_COMPLETION_MODEL, | ||
**kwargs, | ||
): | ||
super().__init__(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider providing default values for chunk_field_id
and summary_field_id
in the SummarizerAgglomerative
class constructor to ensure that they are not inadvertently set to None which could raise an error during runtime if not correctly handled elsewhere in the code.
ragdaemon/daemon.py
Outdated
@@ -68,19 +68,29 @@ def __init__( | |||
if self.verbose: | |||
print("Initialized empty graph.") | |||
|
|||
# Link annotators together as required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there might be excess complexity in how the graph path is handled here. If the graph_path
is always derived by appending to cwd
and the operation does not depend on external input or need to change during runtime, consider simplifying this to reduce potential errors or misunderstandings.
ragdaemon/daemon.py
Outdated
if "summarizer_agglomerative" in annotators: | ||
if chunker_type is None or summarizer_type is None: | ||
raise ValueError( | ||
"Summarizer annotator requires a chunker and summarizer to be specified." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising an error during the initialization of Daemon
might not be the best approach. Consider using a fallback default or configuration verification to prevent initialization failures that could interrupt the service or require debugging during deployment.
Implement hierarchical summaries using agglomerative clustering.
Step 1: (Done)
Step 2: (In progress)
Navigate from the root down in order to match a query with the relevant file(s). At each junction, send the query + both summaries to an LLM and ask which path to follow. I call it 'Semantic Merkle Tree'.