Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-modal context management within gptel. #459

Open
metachip opened this issue Nov 4, 2024 · 4 comments
Open

Multi-modal context management within gptel. #459

metachip opened this issue Nov 4, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@metachip
Copy link

metachip commented Nov 4, 2024

First up I want to say:

GPTel is fantastic - it accelerates my Emacs workflow no end.

I want to thank you for creating this tool, in the way you have; lightweight and seamless, across the panopole of what Emacs offers.

My feedback here is in the context of "context":

  1. As we have discussed, the system message management could be easier.

    gptel--system-message is not robustly initialised. #416 (comment)

  2. Context management is great, but the way one manages it is a little difficult, for me at least.

    I am interested in your advice, but the means by which one adds and removes context feels a little rough. Adding context is simple enough; just call gptel-add with a region selected or just a buffer or file selected. Of course, the options via the transient menu are easy enough as well, although in this case, when one seeks to add several pieces of context, jumping around the buffers can be a challenge.

    The difficulty lies in managing context once it has been created.

    What seems to be missing is an easy list of the regions, buffers and files that have been added.

    The C command, via the transient menu, opens a list of the context content, rather than clear references to where that content came from. What I think I would find more convenient is the ability to pop open a consult style list with completions to enable me to select the context I want to remove or edit. The functionality of the existing C command is great when one wants to edit the content, however, given that the content comes from other buffers, or regions within buffers, providing means by which a list of "context references" would be presented that would enable me to jump directly to the origin of that context, to edit in-place there, would be a lower friction workflow.

    There may well be a way of doing this which I've not yet discovered, so please let me know if I've missed something!

  3. Multi-modal context support would be fantastic. For example, integrating advancements like this.

    https://docs.anthropic.com/en/docs/build-with-claude/pdf-support

    whereby one could put PDFs, images and other multi-modal files in context, with gptel-add. This would feed into what I've described at suggestion 2, where instead of viewing the context content directly, one instead has references to the source of that content, especially given that in light of this suggestion 3, that content may not be text and therefore not directly editable with Emacs.

Thanks again for creating such a useful tool.

[2024-11-05 Tue 10:49]

@metachip metachip added the enhancement New feature or request label Nov 4, 2024
karthink added a commit that referenced this issue Nov 5, 2024
* gptel-anthropic.el (gptel-make-anthropic,
gptel--anthropic-parse-multipart, gptel--anthropic-models): Add
support for sending PDFs to the model
`claude-3-5-sonnet-20241022'.  This is the only model that
supports reading PDFs as of now.  Cache sent PDFs so follow up the
input cost of reading the PDF in follow up messages is 90%
cheaper.
@karthink
Copy link
Owner

karthink commented Nov 5, 2024

  1. As we have discussed, the system message management could be easier.

    gptel--system-message is not robustly initialised. #416 (comment)

Will address this when I work on the system messages next, for which, as I mentioned in #416, a fair bit of work is planned.

  1. Context management is great, but the way one manages it is a little
    difficult, for me at least.

    I am interested in your advice, but the means by which one adds and removes
    context feels a little rough. Adding context is simple enough; just call
    gptel-add with a region selected or just a buffer or file selected. Of
    course, the options via the transient menu are easy enough as well, although
    in this case, when one seeks to add several pieces of context, jumping around
    the buffers can be a challenge.

    The difficulty lies in managing context once it has been created.

    What seems to be missing is an easy list of the regions, buffers and files
    that have been added.

The context inspection buffer was supposed to fulfill this purpose. (More explanation below)

The C command, via the transient menu, opens a list of the context content,
rather than clear references to where that content came from. What I think I
would find more convenient is the ability to pop open a consult style list
with completions to enable me to select the context I want to remove or
edit. The functionality of the existing C command is great when one wants to
edit the content,

This is not the case, the context chunks as displayed in the context inspection buffer are read-only. By edit I guess you meant "remove the context chunk from gptel"?

however, given that the content comes from other buffers,
or regions within buffers, providing means by which a list of "context
references" would be presented that would enable me to jump directly to the
origin of that context, to edit in-place there, would be a lower friction
workflow.

There may well be a way of doing this which I've not yet discovered, so
please let me know if I've missed something!

In the context inspection buffer, pressing RET on a context chunk pops up a window with the relevant buffer. You can use this to jump there.

The idea was that the context inspection buffer can fulfill both roles: it provides a listing of added context chunks (like ibuffer, buffer-menu or bookmark-bmenu-list do), but also provides a preview of the context chunks, like Consult commands do. You also have the same keybindings as dired or buffer-menu to move between chunks, delete entries or visit them in their original buffers.

So could you explain what exactly is missing? (If there are minor ergonomic deficiencies with the context buffer, we can address them.)

  1. Multi-modal context support would be fantastic. For example, integrating
    advancements like this.

    https://docs.anthropic.com/en/docs/build-with-claude/pdf-support

    whereby one could put PDFs, images and other multi-modal files in context,
    with gptel-add. This would feed into what I've described at suggestion 2,
    where instead of viewing the context content directly, one instead has
    references to the source of that content, especially given that in light of
    this suggestion 3, that content may not be text and therefore not directly
    editable with Emacs.

Image and media support was added over a month ago, before the 0.9.6 release. You can indeed add supported document types using gptel-add, or as links in Org/Markdown chat buffers. Or did I misunderstand what you're talking about here?

PDF support for Claude 3.5 Sonnet (and only this model) is brand new. Anyway, I added it just now.

@metachip
Copy link
Author

So could you explain what exactly is missing? (If there are minor ergonomic deficiencies with the context buffer, we can address them.)

  1. Collapsible Previews

    I find the context buffer hard to navigate because the preview chunks can be huge. My preference would be for previews to be hidden or not included at all, and the context buffer simply show the list of references. If one wants to see what the content is, one simply hits RET on the reference, as you advised. Another (preferrable) fix for this might be hide the context previews, for each reference, and reveal them with TAB, in much the same way Magit does for its various "chunked views".

    In other respects, the context buffer works well.

  2. Context Management

    There is only one context that can be maintained at a time globally, regardless of how many concurrent chats one might be having in different windows. It would be better if context could be managed and maintained per chat buffer. I sometimes have several conversations with different models on difference subjects concurrently. If those conversations require context, it becomes problematic.

    There is a partial solution to this. If the context is image or PDF media, then standalone links in the conversation thread solves the problem in most cases.

    However, if the context comes from other buffers, regions or local files, this workaround is not possible.

    One can of course simply copy of the content into the chat buffer, but then is changes are made the source, they are not reflected back to the model like they are when included by way of context.

    I have made a suggestion how to improve this here:

    Media (and other) links. #481

    In a nutshell, allow content to be included in a chat by reference, via links in the conversation, to "anything that makes sense, so long as it's text", and not just media mime types that the model accepts as is the case now.

@karthink
Copy link
Owner

karthink commented Nov 24, 2024

I'll reply in detail when I can, but regarding point 2 I wanted to link to #475.

@metachip
Copy link
Author

Thanks I've fleshed out my thinking here: #475 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants