Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agglomerative summarizer #38

Merged
merged 12 commits into from
May 8, 2024
Merged

Agglomerative summarizer #38

merged 12 commits into from
May 8, 2024

Conversation

granawkins
Copy link
Member

Implement hierarchical summaries using agglomerative clustering.
image

Step 1: (Done)

  1. Match all files/chunks into most similar pairs
  2. Generate a merged summary of each pair
  3. Repeat recursively up to the root

Step 2: (In progress)
Navigate from the root down in order to match a query with the relevant file(s). At each junction, send the query + both summaries to an LLM and ask which path to follow. I call it 'Semantic Merkle Tree'.

Copy link

@mentatai mentatai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great effort in implementing the agglomerative summarizer features and integrating them within the existing codebase. However, there are a few areas where improvements could be made to ensure robustness and readability. Particularly, handling default settings and dependencies within class constructors could be ironed out to avoid pitfalls during runtime. Also, error handling during the Daemon class initialization could be softened to use default configurations or log warnings rather than halting with an error.

Butler is in closed beta. Reply with feedback or to ask Butler to review other parts of the PR. Please give feedback with emoji reacts.

model: Optional[TextModel | str] = DEFAULT_COMPLETION_MODEL,
**kwargs,
):
super().__init__(*args, **kwargs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider providing default values for chunk_field_id and summary_field_id in the SummarizerAgglomerative class constructor to ensure that they are not inadvertently set to None which could raise an error during runtime if not correctly handled elsewhere in the code.

@@ -68,19 +68,29 @@ def __init__(
if self.verbose:
print("Initialized empty graph.")

# Link annotators together as required
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there might be excess complexity in how the graph path is handled here. If the graph_path is always derived by appending to cwd and the operation does not depend on external input or need to change during runtime, consider simplifying this to reduce potential errors or misunderstandings.

if "summarizer_agglomerative" in annotators:
if chunker_type is None or summarizer_type is None:
raise ValueError(
"Summarizer annotator requires a chunker and summarizer to be specified."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an error during the initialization of Daemon might not be the best approach. Consider using a fallback default or configuration verification to prevent initialization failures that could interrupt the service or require debugging during deployment.

@granawkins granawkins merged commit 969cca7 into main May 8, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant