Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context supports comments #2

Merged
merged 12 commits into from
Apr 14, 2024
Merged

Context supports comments #2

merged 12 commits into from
Apr 14, 2024

Conversation

jakethekoenig
Copy link
Member

Context keeps track of and renders comments. Let me know what you think of the approach. Some questions:

  • How should we render comments? I use 80*"-" because it's 1 token. It may be better to construct something using ┌ and rotations and |, -, to make box shapes. It uses more characters but it would certainly help the human read it. And I think very plausibly it may help the LLM.
  • What should ragdaemon provide for people adding comments? This PR provides a basic interface for adding comments but I think it may make sense to make a few more specific utilities for adding comments from common formats. For instance it'd be nice to be able to turn the output of pyright --outputjson into inline comments. It'd also be nice to be able to take a list of github comments and render them. There's some creative freedom there though around where/if you put the author's name, how/where you render the timestamp etc.
  • Currently my comment object accepts an endline but doesn't do anything with it. I'm not sure what the best way to handle it is. We could prefix all the commented lines with something like "|" but then it gets difficult if they overlap. We could also just say something in the comment text "This comment applies to lines 4-11".

Context keeps track of and renders comments
end_delimiter: Optional[str] = None,
):
assert (
line or positioning == CommentPosition.File
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating the line parameter more robustly. Currently, the assertion only checks if line is set when positioning is not CommentPosition.File. However, it might be beneficial to also ensure that when line is provided, it is a positive integer.

assert (
line or positioning == CommentPosition.File
), f"Comment positioned via {positioning} must set line number"
assert line or not end_line, "Comments with end_line must set line"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion message "Comments with end_line must set line" could be clearer. Consider rephrasing to "A start line must be provided if an end line is specified."

else:
self.end_delimiter = end_delimiter

def render(self, line_content: Optional[str] = None) -> str:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def render(self, line_content: Optional[str] = None) -> str:
assert (
not line_content or self.positioning not in [CommentPosition.File, CommentPosition.Start, CommentPosition.End],
"Line content should not be provided for file, start, or end positioning comments."
)

This change ensures that the assertion message is more descriptive and covers the case where line_content is provided for start or end positioning comments, which should not happen.

if not self.context.get(path_str):
self._add_path(path_str)
if not line:
line = 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting line = 0 as a default for comments without a specified line might be misleading, especially since line numbers typically start from 1 in most contexts. Consider using None as a default value to indicate that no specific line is targeted, and handle this case accordingly in the rendering logic.

@@ -98,13 +132,20 @@ def render(self) -> str:
output += "\n"
tags = "" if not data["tags"] else f" ({', '.join(sorted(data['tags']))})"
output += f"{path_str}{tags}\n"
if 0 in data["comments"]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that comments are now integrated into the context rendering. However, consider adding a brief comment explaining the logic behind checking if 0 is in data["comments"]. This could help clarify the intention to future maintainers, especially regarding how file-level comments are handled.


context = ContextBuilder(daemon.graph, daemon.db)
context.add_ref("src/operations.py")
context.add_comment("src/operations.py", "What is this file for?", positioning=CommentPosition.File)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is comprehensive and covers multiple positioning scenarios for comments. To further improve test coverage, consider adding cases where comments overlap in their positioning (e.g., multiple comments above a line) and edge cases like comments at the start or end of a file.

Copy link

mentatbot bot commented Apr 11, 2024

MENTAT CODE REVIEW IN ACTIVE DEVELOPMENT. Only in use on mentat and internal repos.
Please Reply with feedback.

The pull request introduces a solid foundation for handling comments within the context, enhancing the tool's ability to annotate code effectively. The implementation of comment positioning and rendering is well-thought-out. However, there are opportunities to refine the validation of input parameters and improve clarity in certain areas. Additionally, expanding test coverage to include edge cases and overlapping comments could further solidify the feature's robustness. Overall, this is a positive addition to the project, and with a few adjustments, it will be an excellent enhancement.

@jakethekoenig
Copy link
Member Author

I think the windows test failing is not because of this PR. It's from a test I didn't write for functionality I don't think I should have changed.

@biobootloader
Copy link
Member

biobootloader commented Apr 11, 2024

How should we render comments? I use 80*"-" because it's 1 token. It may be better to construct something using ┌ and rotations and |, -, to make box shapes. It uses more characters but it would certainly help the human read it. And I think very plausibly it may help the LLM.

My guess would be that the box is just more confusing for the LLM. Maybe the best for an LLM would be to use xml tags like <github-comment> ... </github-comment>? Anthropic models are explicitly trained to understand them in prompts: https://docs.anthropic.com/claude/docs/use-xml-tags

We could also just say something in the comment text "This comment applies to lines 4-11"

I think this is the best approach, as long as line numbers are being displayed

@jakethekoenig
Copy link
Member Author

How should we render comments? I use 80*"-" because it's 1 token. It may be better to construct something using ┌ and rotations and |, -, to make box shapes. It uses more characters but it would certainly help the human read it. And I think very plausibly it may help the LLM.

My guess would be that the box is just more confusing for the LLM. Maybe the best for an LLM would be to use xml tags like <github-comment> ... </github-comment>? Anthropic models are explicitly trained to understand them in prompts: https://docs.anthropic.com/claude/docs/use-xml-tags

We could also just say something in the comment text "This comment applies to lines 4-11"

I think this is the best approach, as long as line numbers are being displayed

Rewrote it to xml and it's definitely the right approach:

  • Makes the code much simpler.
  • Now comments with extra structure such as replies can encode it as subtrees themselves.
  • Similarly the comment doesn't need to be aware of end line number. The caller can put it in the dictionary if they want.

@jakethekoenig
Copy link
Member Author

jakethekoenig commented Apr 11, 2024

Rewrote it to xml and it's definitely the right approach:

  • Makes the code much simpler.
  • Now comments with extra structure such as replies can encode it as subtrees themselves.
  • Similarly the comment doesn't need to be aware of end line number. The caller can put it in the dictionary if they want.

Thinking about it some more it's possible the ragdaemon side should be even simpler and only accept/store strings. And then it can be up to the callers to format their comments appropriately. I could see either way. If we really think xml is the right approach maybe it makes sense to be prescriptive about it and simplify our caller's job very marginally.

Edit: Okay I think I figured out the best approach: Don't have comment tags by default just print the string. But accept stringly dictionaries and render them as xml.

@granawkins
Copy link
Member

I like being able to insert arbitrary text like this, and the xml thing is great.

Comment isn't really an annotator in the same sense as the others, so I think it should just be in ContextBuilder.

(Separately) I think there should be a PullRequestAnnotator, which loads the comments/replies/diffs/linked into the knowledge graph. Then you'd be able to add them with something like daemon.include(<pr_id>:<message_id>), they'd all be searchable/crawlable.

@jakethekoenig
Copy link
Member Author

(Separately) I think there should be a PullRequestAnnotator, which loads the comments/replies/diffs/linked into the knowledge graph. Then you'd be able to add them with something like daemon.include(<pr_id>:<message_id>), they'd all be searchable/crawlable.

That sounds like a good idea to me. Do you imagine ragdaeomon anything/scraping GitHub itself?

@jakethekoenig
Copy link
Member Author

Comment isn't really an annotator in the same sense as the others, so I think it should just be in ContextBuilder.

Moved it. I also added tags. I figured it might be important for users to be able to add pyright annotations and github annotations and then later just remove one of them.

@granawkins
Copy link
Member

(Separately) I think there should be a PullRequestAnnotator, which loads the comments/replies/diffs/linked into the knowledge graph. Then you'd be able to add them with something like daemon.include(<pr_id>:<message_id>), they'd all be searchable/crawlable.

That sounds like a good idea to me. Do you imagine ragdaeomon anything/scraping GitHub itself?

Im not sure, hopefully we can use the authenticated setup from Butler somehow.

@jakethekoenig
Copy link
Member Author

Im not sure, hopefully we can use the authenticated setup from Butler somehow.

Oh yeah we could probably have it accept a repo, or pull object from the pygithub library and scrape/add everything from it. Then it's super simple for callers in the case of public repos and not too complicated for private ones either. Would make it easy for butler to use.

NOTE: pyright here is v1.1.339 (not most recent) because networkx typing is a circus
@granawkins granawkins merged commit 5dc4155 into main Apr 14, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants