Context supports comments #2

jakethekoenig · 2024-04-11T18:52:44Z

Context keeps track of and renders comments. Let me know what you think of the approach. Some questions:

How should we render comments? I use 80*"-" because it's 1 token. It may be better to construct something using ┌ and rotations and |, -, to make box shapes. It uses more characters but it would certainly help the human read it. And I think very plausibly it may help the LLM.
What should ragdaemon provide for people adding comments? This PR provides a basic interface for adding comments but I think it may make sense to make a few more specific utilities for adding comments from common formats. For instance it'd be nice to be able to turn the output of pyright --outputjson into inline comments. It'd also be nice to be able to take a list of github comments and render them. There's some creative freedom there though around where/if you put the author's name, how/where you render the timestamp etc.
Currently my comment object accepts an endline but doesn't do anything with it. I'm not sure what the best way to handle it is. We could prefix all the commented lines with something like "|" but then it gets difficult if they overlap. We could also just say something in the comment text "This comment applies to lines 4-11".

Context keeps track of and renders comments

mentatbot · 2024-04-11T18:53:24Z

ragdaemon/annotators/comment.py

+        end_delimiter: Optional[str] = None,
+    ):
+        assert (
+            line or positioning == CommentPosition.File


Consider validating the line parameter more robustly. Currently, the assertion only checks if line is set when positioning is not CommentPosition.File. However, it might be beneficial to also ensure that when line is provided, it is a positive integer.

mentatbot · 2024-04-11T18:53:26Z

ragdaemon/annotators/comment.py

+        assert (
+            line or positioning == CommentPosition.File
+        ), f"Comment positioned via {positioning} must set line number"
+        assert line or not end_line, "Comments with end_line must set line"


The assertion message "Comments with end_line must set line" could be clearer. Consider rephrasing to "A start line must be provided if an end line is specified."

mentatbot · 2024-04-11T18:53:27Z

ragdaemon/annotators/comment.py

+        else:
+            self.end_delimiter = end_delimiter
+
+    def render(self, line_content: Optional[str] = None) -> str:


Suggested change

def render(self, line_content: Optional[str] = None) -> str:

assert (

not line_content or self.positioning not in [CommentPosition.File, CommentPosition.Start, CommentPosition.End],

"Line content should not be provided for file, start, or end positioning comments."

)

This change ensures that the assertion message is more descriptive and covers the case where line_content is provided for start or end positioning comments, which should not happen.

mentatbot · 2024-04-11T18:53:29Z

ragdaemon/context.py

+        if not self.context.get(path_str):
+            self._add_path(path_str)
+        if not line:
+            line = 0


Setting line = 0 as a default for comments without a specified line might be misleading, especially since line numbers typically start from 1 in most contexts. Consider using None as a default value to indicate that no specific line is targeted, and handle this case accordingly in the rendering logic.

mentatbot · 2024-04-11T18:53:31Z

ragdaemon/context.py

@@ -98,13 +132,20 @@ def render(self) -> str:
                output += "\n"
            tags = "" if not data["tags"] else f" ({', '.join(sorted(data['tags']))})"
            output += f"{path_str}{tags}\n"
+            if 0 in data["comments"]:


It's great that comments are now integrated into the context rendering. However, consider adding a brief comment explaining the logic behind checking if 0 is in data["comments"]. This could help clarify the intention to future maintainers, especially regarding how file-level comments are handled.

mentatbot · 2024-04-11T18:53:32Z

tests/annotators/test_comments.py

+
+    context = ContextBuilder(daemon.graph, daemon.db)
+    context.add_ref("src/operations.py")
+    context.add_comment("src/operations.py", "What is this file for?", positioning=CommentPosition.File)


This test case is comprehensive and covers multiple positioning scenarios for comments. To further improve test coverage, consider adding cases where comments overlap in their positioning (e.g., multiple comments above a line) and edge cases like comments at the start or end of a file.

mentatbot · 2024-04-11T18:53:34Z

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

The pull request introduces a solid foundation for handling comments within the context, enhancing the tool's ability to annotate code effectively. The implementation of comment positioning and rendering is well-thought-out. However, there are opportunities to refine the validation of input parameters and improve clarity in certain areas. Additionally, expanding test coverage to include edge cases and overlapping comments could further solidify the feature's robustness. Overall, this is a positive addition to the project, and with a few adjustments, it will be an excellent enhancement.

jakethekoenig · 2024-04-11T19:04:39Z

I think the windows test failing is not because of this PR. It's from a test I didn't write for functionality I don't think I should have changed.

biobootloader · 2024-04-11T19:45:34Z

How should we render comments? I use 80*"-" because it's 1 token. It may be better to construct something using ┌ and rotations and |, -, to make box shapes. It uses more characters but it would certainly help the human read it. And I think very plausibly it may help the LLM.

My guess would be that the box is just more confusing for the LLM. Maybe the best for an LLM would be to use xml tags like <github-comment> ... </github-comment>? Anthropic models are explicitly trained to understand them in prompts: https://docs.anthropic.com/claude/docs/use-xml-tags

We could also just say something in the comment text "This comment applies to lines 4-11"

I think this is the best approach, as long as line numbers are being displayed

jakethekoenig · 2024-04-11T21:57:27Z

How should we render comments? I use 80*"-" because it's 1 token. It may be better to construct something using ┌ and rotations and |, -, to make box shapes. It uses more characters but it would certainly help the human read it. And I think very plausibly it may help the LLM.

My guess would be that the box is just more confusing for the LLM. Maybe the best for an LLM would be to use xml tags like <github-comment> ... </github-comment>? Anthropic models are explicitly trained to understand them in prompts: https://docs.anthropic.com/claude/docs/use-xml-tags

We could also just say something in the comment text "This comment applies to lines 4-11"

I think this is the best approach, as long as line numbers are being displayed

Rewrote it to xml and it's definitely the right approach:

Makes the code much simpler.
Now comments with extra structure such as replies can encode it as subtrees themselves.
Similarly the comment doesn't need to be aware of end line number. The caller can put it in the dictionary if they want.

jakethekoenig · 2024-04-11T22:16:41Z

Rewrote it to xml and it's definitely the right approach:

Makes the code much simpler.

Now comments with extra structure such as replies can encode it as subtrees themselves.

Similarly the comment doesn't need to be aware of end line number. The caller can put it in the dictionary if they want.

Thinking about it some more it's possible the ragdaemon side should be even simpler and only accept/store strings. And then it can be up to the callers to format their comments appropriately. I could see either way. If we really think xml is the right approach maybe it makes sense to be prescriptive about it and simplify our caller's job very marginally.

Edit: Okay I think I figured out the best approach: Don't have comment tags by default just print the string. But accept stringly dictionaries and render them as xml.

granawkins · 2024-04-12T04:08:45Z

I like being able to insert arbitrary text like this, and the xml thing is great.

Comment isn't really an annotator in the same sense as the others, so I think it should just be in ContextBuilder.

(Separately) I think there should be a PullRequestAnnotator, which loads the comments/replies/diffs/linked into the knowledge graph. Then you'd be able to add them with something like daemon.include(<pr_id>:<message_id>), they'd all be searchable/crawlable.

jakethekoenig · 2024-04-12T11:47:44Z

(Separately) I think there should be a PullRequestAnnotator, which loads the comments/replies/diffs/linked into the knowledge graph. Then you'd be able to add them with something like daemon.include(<pr_id>:<message_id>), they'd all be searchable/crawlable.

That sounds like a good idea to me. Do you imagine ragdaeomon anything/scraping GitHub itself?

jakethekoenig · 2024-04-12T12:02:17Z

Comment isn't really an annotator in the same sense as the others, so I think it should just be in ContextBuilder.

Moved it. I also added tags. I figured it might be important for users to be able to add pyright annotations and github annotations and then later just remove one of them.

granawkins · 2024-04-12T12:24:54Z

(Separately) I think there should be a PullRequestAnnotator, which loads the comments/replies/diffs/linked into the knowledge graph. Then you'd be able to add them with something like daemon.include(<pr_id>:<message_id>), they'd all be searchable/crawlable.

That sounds like a good idea to me. Do you imagine ragdaeomon anything/scraping GitHub itself?

Im not sure, hopefully we can use the authenticated setup from Butler somehow.

jakethekoenig · 2024-04-12T13:56:31Z

Im not sure, hopefully we can use the authenticated setup from Butler somehow.

Oh yeah we could probably have it accept a repo, or pull object from the pygithub library and scrape/add everything from it. Then it's super simple for callers in the case of public repos and not too complicated for private ones either. Would make it easy for butler to use.

NOTE: pyright here is v1.1.339 (not most recent) because networkx typing is a circus

Context supports comments

a406cdf

Context keeps track of and renders comments

mentatbot bot reviewed Apr 11, 2024

View reviewed changes

jakethekoenig added 2 commits April 11, 2024 11:58

windows fix

566e0ea

Review fixes

20f863d

Test remove comments

7dad931

jakethekoenig added 3 commits April 11, 2024 14:48

Switch to xml

fb3a8b5

Update requirements

6b2ec77

Make wrap customizeable

c999fb5

Remove wrap

8ce2cd7

jakethekoenig added 2 commits April 12, 2024 04:56

Move comment to context

2d7d063

Add tags

3e0af6b

granawkins added 2 commits April 14, 2024 11:37

Merge branch 'main' into add-comments

6cdd1d5

small typing fix

ed5c91a

NOTE: pyright here is v1.1.339 (not most recent) because networkx typing is a circus

granawkins merged commit 5dc4155 into main Apr 14, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context supports comments #2

Context supports comments #2

jakethekoenig commented Apr 11, 2024

mentatbot bot Apr 11, 2024

mentatbot bot Apr 11, 2024

mentatbot bot Apr 11, 2024

mentatbot bot Apr 11, 2024

mentatbot bot Apr 11, 2024

mentatbot bot Apr 11, 2024

mentatbot bot commented Apr 11, 2024

jakethekoenig commented Apr 11, 2024

biobootloader commented Apr 11, 2024 •

edited

Loading

jakethekoenig commented Apr 11, 2024

jakethekoenig commented Apr 11, 2024 •

edited

Loading

granawkins commented Apr 12, 2024

jakethekoenig commented Apr 12, 2024

jakethekoenig commented Apr 12, 2024

granawkins commented Apr 12, 2024

jakethekoenig commented Apr 12, 2024

Context supports comments #2

Context supports comments #2

Conversation

jakethekoenig commented Apr 11, 2024

mentatbot bot Apr 11, 2024

Choose a reason for hiding this comment

mentatbot bot Apr 11, 2024

Choose a reason for hiding this comment

mentatbot bot Apr 11, 2024

Choose a reason for hiding this comment

mentatbot bot Apr 11, 2024

Choose a reason for hiding this comment

mentatbot bot Apr 11, 2024

Choose a reason for hiding this comment

mentatbot bot Apr 11, 2024

Choose a reason for hiding this comment

mentatbot bot commented Apr 11, 2024

jakethekoenig commented Apr 11, 2024

biobootloader commented Apr 11, 2024 • edited Loading

jakethekoenig commented Apr 11, 2024

jakethekoenig commented Apr 11, 2024 • edited Loading

granawkins commented Apr 12, 2024

jakethekoenig commented Apr 12, 2024

jakethekoenig commented Apr 12, 2024

granawkins commented Apr 12, 2024

jakethekoenig commented Apr 12, 2024

biobootloader commented Apr 11, 2024 •

edited

Loading

jakethekoenig commented Apr 11, 2024 •

edited

Loading