Multi-modal context management within gptel. #459

metachip · 2024-11-04T23:50:26Z

First up I want to say:

GPTel is fantastic - it accelerates my Emacs workflow no end.

I want to thank you for creating this tool, in the way you have; lightweight and seamless, across the panopole of what Emacs offers.

My feedback here is in the context of "context":

As we have discussed, the system message management could be easier.

gptel--system-message is not robustly initialised. #416 (comment)
Context management is great, but the way one manages it is a little difficult, for me at least.

I am interested in your advice, but the means by which one adds and removes context feels a little rough. Adding context is simple enough; just call gptel-add with a region selected or just a buffer or file selected. Of course, the options via the transient menu are easy enough as well, although in this case, when one seeks to add several pieces of context, jumping around the buffers can be a challenge.

The difficulty lies in managing context once it has been created.

What seems to be missing is an easy list of the regions, buffers and files that have been added.

The C command, via the transient menu, opens a list of the context content, rather than clear references to where that content came from. What I think I would find more convenient is the ability to pop open a consult style list with completions to enable me to select the context I want to remove or edit. The functionality of the existing C command is great when one wants to edit the content, however, given that the content comes from other buffers, or regions within buffers, providing means by which a list of "context references" would be presented that would enable me to jump directly to the origin of that context, to edit in-place there, would be a lower friction workflow.

There may well be a way of doing this which I've not yet discovered, so please let me know if I've missed something!
Multi-modal context support would be fantastic. For example, integrating advancements like this.

https://docs.anthropic.com/en/docs/build-with-claude/pdf-support

whereby one could put PDFs, images and other multi-modal files in context, with gptel-add. This would feed into what I've described at suggestion 2, where instead of viewing the context content directly, one instead has references to the source of that content, especially given that in light of this suggestion 3, that content may not be text and therefore not directly editable with Emacs.

Thanks again for creating such a useful tool.

[2024-11-05 Tue 10:49]

* gptel-anthropic.el (gptel-make-anthropic, gptel--anthropic-parse-multipart, gptel--anthropic-models): Add support for sending PDFs to the model `claude-3-5-sonnet-20241022'. This is the only model that supports reading PDFs as of now. Cache sent PDFs so follow up the input cost of reading the PDF in follow up messages is 90% cheaper.

karthink · 2024-11-05T03:28:50Z

As we have discussed, the system message management could be easier.

gptel--system-message is not robustly initialised. #416 (comment)

Will address this when I work on the system messages next, for which, as I mentioned in #416, a fair bit of work is planned.

Context management is great, but the way one manages it is a little
difficult, for me at least.

I am interested in your advice, but the means by which one adds and removes
context feels a little rough. Adding context is simple enough; just call
gptel-add with a region selected or just a buffer or file selected. Of
course, the options via the transient menu are easy enough as well, although
in this case, when one seeks to add several pieces of context, jumping around
the buffers can be a challenge.

The difficulty lies in managing context once it has been created.

What seems to be missing is an easy list of the regions, buffers and files
that have been added.

The context inspection buffer was supposed to fulfill this purpose. (More explanation below)

The C command, via the transient menu, opens a list of the context content,
rather than clear references to where that content came from. What I think I
would find more convenient is the ability to pop open a consult style list
with completions to enable me to select the context I want to remove or
edit. The functionality of the existing C command is great when one wants to
edit the content,

This is not the case, the context chunks as displayed in the context inspection buffer are read-only. By edit I guess you meant "remove the context chunk from gptel"?

however, given that the content comes from other buffers,
or regions within buffers, providing means by which a list of "context
references" would be presented that would enable me to jump directly to the
origin of that context, to edit in-place there, would be a lower friction
workflow.

There may well be a way of doing this which I've not yet discovered, so
please let me know if I've missed something!

In the context inspection buffer, pressing RET on a context chunk pops up a window with the relevant buffer. You can use this to jump there.

The idea was that the context inspection buffer can fulfill both roles: it provides a listing of added context chunks (like ibuffer, buffer-menu or bookmark-bmenu-list do), but also provides a preview of the context chunks, like Consult commands do. You also have the same keybindings as dired or buffer-menu to move between chunks, delete entries or visit them in their original buffers.

So could you explain what exactly is missing? (If there are minor ergonomic deficiencies with the context buffer, we can address them.)

Multi-modal context support would be fantastic. For example, integrating
advancements like this.

https://docs.anthropic.com/en/docs/build-with-claude/pdf-support

whereby one could put PDFs, images and other multi-modal files in context,
with gptel-add. This would feed into what I've described at suggestion 2,
where instead of viewing the context content directly, one instead has
references to the source of that content, especially given that in light of
this suggestion 3, that content may not be text and therefore not directly
editable with Emacs.

Image and media support was added over a month ago, before the 0.9.6 release. You can indeed add supported document types using gptel-add, or as links in Org/Markdown chat buffers. Or did I misunderstand what you're talking about here?

PDF support for Claude 3.5 Sonnet (and only this model) is brand new. Anyway, I added it just now.

metachip · 2024-11-24T06:56:15Z

So could you explain what exactly is missing? (If there are minor ergonomic deficiencies with the context buffer, we can address them.)

Collapsible Previews

I find the context buffer hard to navigate because the preview chunks can be huge. My preference would be for previews to be hidden or not included at all, and the context buffer simply show the list of references. If one wants to see what the content is, one simply hits RET on the reference, as you advised. Another (preferrable) fix for this might be hide the context previews, for each reference, and reveal them with TAB, in much the same way Magit does for its various "chunked views".

In other respects, the context buffer works well.
Context Management

There is only one context that can be maintained at a time globally, regardless of how many concurrent chats one might be having in different windows. It would be better if context could be managed and maintained per chat buffer. I sometimes have several conversations with different models on difference subjects concurrently. If those conversations require context, it becomes problematic.

There is a partial solution to this. If the context is image or PDF media, then standalone links in the conversation thread solves the problem in most cases.

However, if the context comes from other buffers, regions or local files, this workaround is not possible.

One can of course simply copy of the content into the chat buffer, but then is changes are made the source, they are not reflected back to the model like they are when included by way of context.

I have made a suggestion how to improve this here:

Media (and other) links. #481

In a nutshell, allow content to be included in a chat by reference, via links in the conversation, to "anything that makes sense, so long as it's text", and not just media mime types that the model accepts as is the case now.

karthink · 2024-11-24T07:04:50Z

I'll reply in detail when I can, but regarding point 2 I wanted to link to #475.

metachip · 2024-11-24T07:41:36Z

Thanks I've fleshed out my thinking here: #475 (comment)

metachip added the enhancement New feature or request label Nov 4, 2024

metachip mentioned this issue Nov 27, 2024

Quick Context Suspension and Deletion #486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-modal context management within gptel. #459

Multi-modal context management within gptel. #459

metachip commented Nov 4, 2024

karthink commented Nov 5, 2024 •

edited

Loading

metachip commented Nov 24, 2024

karthink commented Nov 24, 2024 •

edited

Loading

metachip commented Nov 24, 2024

Multi-modal context management within gptel. #459

Multi-modal context management within gptel. #459

Comments

metachip commented Nov 4, 2024

karthink commented Nov 5, 2024 • edited Loading

metachip commented Nov 24, 2024

karthink commented Nov 24, 2024 • edited Loading

metachip commented Nov 24, 2024

karthink commented Nov 5, 2024 •

edited

Loading

karthink commented Nov 24, 2024 •

edited

Loading