Agglomerative summarizer #38

granawkins · 2024-05-03T01:32:48Z

Implement hierarchical summaries using agglomerative clustering.

Step 1: (Done)

Match all files/chunks into most similar pairs
Generate a merged summary of each pair
Repeat recursively up to the root

Step 2: (In progress)
Navigate from the root down in order to match a query with the relevant file(s). At each junction, send the query + both summaries to an LLM and ask which path to follow. I call it 'Semantic Merkle Tree'.

mentatai

Great effort in implementing the agglomerative summarizer features and integrating them within the existing codebase. However, there are a few areas where improvements could be made to ensure robustness and readability. Particularly, handling default settings and dependencies within class constructors could be ironed out to avoid pitfalls during runtime. Also, error handling during the Daemon class initialization could be softened to use default configurations or log warnings rather than halting with an error.

Butler is in closed beta. Reply with feedback or to ask Butler to review other parts of the PR. Please give feedback with emoji reacts.

mentatai · 2024-05-03T13:33:21Z

ragdaemon/annotators/summarizer_agglomerative.py

+        model: Optional[TextModel | str] = DEFAULT_COMPLETION_MODEL,
+        **kwargs,
+    ):
+        super().__init__(*args, **kwargs)


Consider providing default values for chunk_field_id and summary_field_id in the SummarizerAgglomerative class constructor to ensure that they are not inadvertently set to None which could raise an error during runtime if not correctly handled elsewhere in the code.

mentatai · 2024-05-03T13:33:21Z

ragdaemon/daemon.py

@@ -68,19 +68,29 @@ def __init__(
        if self.verbose:
            print("Initialized empty graph.")

+        # Link annotators together as required


It seems there might be excess complexity in how the graph path is handled here. If the graph_path is always derived by appending to cwd and the operation does not depend on external input or need to change during runtime, consider simplifying this to reduce potential errors or misunderstandings.

mentatai · 2024-05-03T13:33:21Z

ragdaemon/daemon.py

+        if "summarizer_agglomerative" in annotators:
+            if chunker_type is None or summarizer_type is None:
+                raise ValueError(
+                    "Summarizer annotator requires a chunker and summarizer to be specified."


Raising an error during the initialization of Daemon might not be the best approach. Consider using a fallback default or configuration verification to prevent initialization failures that could interrupt the service or require debugging during deployment.

granawkins added 3 commits May 2, 2024 19:56

implement SummarizerAgglomerative annotator

e5a2549

integrate it into the program

9bccf0c

format typing and tests

fc4b01b

mentatai bot reviewed May 3, 2024

View reviewed changes

granawkins added 9 commits May 3, 2024 10:12

efficiency improvements

17236e8

add scipy dependency

45de25b

Merge branch 'main' into agglomerative-summarizer

aecc7d5

rename agglomerative_summarizer to binary_clusterer

bfbd40b

move annotator initialization checks to __init__

ed2efeb

remove scipy from requirements and lazy-load in clusterer_binary

a9bf59b

clean up commit

a33f216

version bump

53e1cb4

ignore scipy import typecheck

2fca705

granawkins merged commit 969cca7 into main May 8, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agglomerative summarizer #38

Agglomerative summarizer #38

granawkins commented May 3, 2024

mentatai bot left a comment

mentatai bot May 3, 2024

mentatai bot May 3, 2024

mentatai bot May 3, 2024

Agglomerative summarizer #38

Agglomerative summarizer #38

Conversation

granawkins commented May 3, 2024

mentatai bot left a comment

Choose a reason for hiding this comment

mentatai bot May 3, 2024

Choose a reason for hiding this comment

mentatai bot May 3, 2024

Choose a reason for hiding this comment

mentatai bot May 3, 2024

Choose a reason for hiding this comment