Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forcing response_format to json #983

Open
andreadimaio opened this issue Oct 14, 2024 · 27 comments
Open

Forcing response_format to json #983

andreadimaio opened this issue Oct 14, 2024 · 27 comments

Comments

@andreadimaio
Copy link
Collaborator

The ChatLanguageModel interface provides a new method that can be implemented to force the use of response_format to json when an AiService method returns a pojo. This is something that can be done automatically by quarkus.

@Experimental
default ChatResponse chat(ChatRequest request) {
	throw new UnsupportedOperationException();
}

This should be a simple change to the AiServiceMethodImplementationSupport class, but all current providers will need to be updated to manage this new method.

Does this make sense?

@geoand
Copy link
Collaborator

geoand commented Oct 14, 2024

I think it does

@geoand
Copy link
Collaborator

geoand commented Oct 17, 2024

cc @langchain4j

@geoand
Copy link
Collaborator

geoand commented Oct 17, 2024

@andreadimaio do you want to work on this?

@andreadimaio
Copy link
Collaborator Author

@andreadimaio do you want to work on this?

Yes, I'll open a new PR

@geoand
Copy link
Collaborator

geoand commented Oct 17, 2024

🙏🏽

@andreadimaio
Copy link
Collaborator Author

andreadimaio commented Oct 17, 2024

The implementation is a little more complex than what I have in mind, for the simple reason that OpenAi also has another response_format option called json_schema (watsonx.ai only supports json_object).

If the json_schema option is enabled, the API will also take the schema of the object as input. In this case it is useful to create this schema at build time.

@andreadimaio
Copy link
Collaborator Author

andreadimaio commented Oct 17, 2024

I'm not an expert on the OpenAi APIs, but I think that if response_format is equal to json_schema, it makes no sense to inject the message You must answer strictly in the following JSON format: ... into the prompt, because OpenAi will do something to make sure this happens. This message can be injected for all other types TEXT and JSON_OBJECT, but this is a detail that for now can be overlooked.

@langchain4j
Copy link

@andreadimaio it works like this in vanilla LC4j. If schema can be passed, we do not append extra instructions

@langchain4j
Copy link

But schema is now supported only by OpenAI and Gemini

@andreadimaio
Copy link
Collaborator Author

andreadimaio commented Oct 17, 2024

There's something that's not clear to me. Looking at class DefaultAiServices.java there are these lines:

Response < AiMessage > response;
if (supportsJsonSchema && jsonSchema.isPresent()) {
    ChatRequest chatRequest = ChatRequest.builder()
        .messages(messages)
        .toolSpecifications(toolSpecifications)
        .responseFormat(ResponseFormat.builder()
            .type(JSON)
            .jsonSchema(jsonSchema.get())
            .build())
        .build();

    ChatResponse chatResponse = context.chatModel.chat(chatRequest);

    response = new Response < > (
        chatResponse.aiMessage(),
        chatResponse.tokenUsage(),
        chatResponse.finishReason()
    );
} else {
    // TODO migrate to new API
    response = toolSpecifications == null ?
        context.chatModel.generate(messages) :
        context.chatModel.generate(messages, toolSpecifications);
}

The chat method is invoked only when the provider supports the json_schema but what about the json_object? I would like to force the use of the chat method even in this case. Maybe the class Capability should contain also this type.

Another note is about the default implementation of the chat method. It has all the parameters to call the generate method if the provider doesn't support the response_format. Isn't it better to have this kind of default implementation instead of throwing an exception? In this case it should be easier to handle the chat method for all model providers (maybe I'm missing something?!).

@langchain4j

@andreadimaio
Copy link
Collaborator Author

andreadimaio commented Oct 17, 2024

Or your idea is to use RESPONSE_FORMAT_JSON_SCHEMA for both values (json_object, json_schema)? Maybe yes, because in the end the logic inside the chat method can handle the variable passed to make the correct call to the endpoint.

@langchain4j
Copy link

I was planning to add another Capability for json_object. It should be easy as we know which providers support Json mode.

Regarding the default implementation of the chat method, you're right, it should call the generate methods. I actually implemented it this way initially, but then rolled back because I had some doubts about it. This is work in progress, I plan to get back to this new API soon. Eventually generate methods will be deprecated and providers will need to implement only one method: chat.

@langchain4j
Copy link

Chat method is used only when Json capability is present because I had to rush this new chat API in order to enable structured outputs. Otherwise there was no way to pass the schema. WIP...

@andreadimaio
Copy link
Collaborator Author

andreadimaio commented Oct 17, 2024

I was planning to add another Capability for json_object. It should be easy as we know which providers support Json mode.

Regarding the default implementation of the chat method, you're right, it should call the generate methods. I actually implemented it this way initially, but then rolled back because I had some doubts about it. This is work in progress, I plan to get back to this new API soon. Eventually generate methods will be deprecated and providers will need to implement only one method: chat.

Chat method is used only when Json capability is present because I had to rush this new chat API in order to enable structured outputs. Otherwise there was no way to pass the schema. WIP...

Thank you!

@geoand what do you suggest to do regarding the implementation of this functionality in quarkus-langchain4j? I could go ahead and implement what is there today, or wait for a new release.

@geoand
Copy link
Collaborator

geoand commented Oct 18, 2024

I could go ahead and implement what is there today

You can go ahead and do that here and when it feature langs in LangChain4j we can utilize it

@langchain4j
Copy link

I've been thinking about it and I am considering using tools (function calling) instead of JSON mode when return type is POJO and Structured Outputs feature is not supported (e.g. when LLM provider is not OpenAI or Gemini):

  • Tools are supported by 14 LLM providers, JSON mode only by 7: https://docs.langchain4j.dev/integrations/language-models/
  • Tools allow setting JSON schema in a standartized way (although no guarantee of 100% schema adherance)
  • Many models are fine-tuned to use produce valid JSON for tools
  • No need to modify original user prompt (to append "You must answer strictly in the following format...")

This is how it can work:

if (isStructuredOutputType(methodReturnType)) { // e.g. POJO, enum, List<T>/Set<T>, etc.
    if (chatModel.supportedCapabilities().contains(RESPONSE_FORMAT_JSON_SCHEMA)) {
        // Proceed with generating JSON schema and passing it to the model using structured outputs feature.
        // This will work for OpenAI and Gemini.
    } else if (chatModel.supportedCapabilities().contains(TOOLS)) {
        // Create synthetic tool "answer" and generate JSON schema for it
        if (configuredTools.isEmpty()) {
            // The "answer" is the only tool, so we will *force* the model to call this tool using tool_mode LLM parameter (will be available in the new ChatModel API)
        } else {
            // There are other tools that user has configured. It means that LLM could/should use one or multiple of them before providing the final answer.
            // I am not sure yet what is the best solution in this case. For example, we could add "final_answer" to the list of tools and hope that LLM will use it to provide the answer.
            // We could also append a hint to the prompt (e.g. "Use final_answer tool to provide a final answer").
            // Or we could call the LLM in the loop (if LLM decides to call tools) untill it returns a final answer in plain text, and then call it again only with "answer" tool available and force it to call it with tool_mode parameter.
            // There can be multiple strategies and we could make this configurable for the user.
            // Please note that this is probably pretty rate use case (when user needs both structured outputs and tools).
        }
    } else {
        // Fallback to appending "You must answer strictly in the following format..." to the prompt.
    }
}

WDYT?

@langchain4j
Copy link

We can also make "what to use to get structured output from LLM" as a configurable strategy that user can specify explicitly (e.g. USE_STRUCTURED_OUTPUTS, USE_TOOLS, USE_JSON_MODE, USE_PROMPTING, etc.)

@andreadimaio
Copy link
Collaborator Author

I've been thinking about it and I am considering using tools (function calling) instead of JSON mode when return type is POJO and Structured Outputs feature is not supported

I am not entirely sure how the tools would enforce valid JSON generation from the LLM in this context. Is the primary role of the tools to generate the schema, or is it used to handle response formatting after the model has generated the output?

I think we need to be cautious about tools' functionality. Some model providers, like Ollama, support tools, but not for all the hosted models. In these cases, the chatModel.supportedCapabilities().contains(TOOLS) could introduce issues when certain models do not fully support tools.

I've been thinking about it and I am considering using tools (function calling) instead of JSON mode

Regarding the JSON mode, I think that combined with the "You must answer strictly in the following format..." can give good results even for "small" models. Today, models tend to return the JSON structure given in the prompt, but this does not mean that the desired structure is 100% present.

We can also make "what to use to get structured output from LLM" as a configurable strategy that user can specify explicitly (e.g. USE_STRUCTURED_OUTPUTS, USE_TOOLS, USE_JSON_MODE, USE_PROMPTING, etc.)

I agree. Having this as a configurable option is ideal from my perspective. However, we should be careful with USE_TOOLS.

@andreadimaio
Copy link
Collaborator Author

andreadimaio commented Oct 22, 2024

If the provider returns an error when trying to use a model that does not support tools, the consideration I made about using USE_TOOLS can be ignored.

@langchain4j
Copy link

I am not entirely sure how the tools would enforce valid JSON generation from the LLM in this context. Is the primary role of the tools to generate the schema, or is it used to handle response formatting after the model has generated the output?

When LLM support tools, you can provide a JSON schema and LLM will generate a valid JSON that follows the schema (in like 95% of cases ,depending on the complexity of schema). LC4j generates JSON schema from a @Tool-annotated method parameters automatically. Most modern LLMs are explicitly trained for "tool calling" use case to produce a valid JSON that follows the provided schema. Does this answer your Q?

I think we need to be cautious about tools' functionality. Some model providers, like Ollama, support tools, but not for all the hosted models. In these cases, the chatModel.supportedCapabilities().contains(TOOLS) could introduce issues when certain models do not fully support tools.

Good point, this is why we should make this behavior configurable.

Regarding the JSON mode, I think that combined with the "You must answer strictly in the following format..." can give good results even for "small" models. Today, models tend to return the JSON structure given in the prompt, but this does not mean that the desired structure is 100% present.

I agree that JSON mode works pretty good, but tools are more reliable than JSON mode by design. JSON mode feature just "guarantees" (in 95% of the times) that the returned text is a valid JSON. One can provide JSON schema in the free-form in the prompt, but there is no guarantee that LLM will follow it. Tools, on the other hand, "guarantee" (again, 95%) that returned text is not only a valid JSON, but also follows the specified schema. And in this case schema is specified in a standartized way (as a separate LLM request parameter) and not appended as free-form text to the user message. Since tool-calling LLMs are tuned to follow the schema, and there is only a single way to specify the schema, it is mroe reliable than appending schema as a free-form text to the user prompt.

I think we need to be cautious about tools' functionality. Some model providers, like Ollama, support tools, but not for all the hosted models. In these cases, the chatModel.supportedCapabilities().contains(TOOLS) could introduce issues when certain models do not fully support tools.

If the provider returns an error when trying to use a model that does not support tools, the consideration I made about using USE_TOOLS can be ignored.

Good point! I guess this concern is applicable mostly for Ollama, as all other LLM providers that support tools, usually support them for all their models (at least I see this trend lately). Ollama throws an error in case tools are not supported by specific model: {"error":"tinydolphin does not support tools"}, so we should be safe.

@andreadimaio
Copy link
Collaborator Author

When LLM support tools, you can provide a JSON schema and LLM will generate a valid JSON that follows the schema (in like 95% of cases ,depending on the complexity of schema). LC4j generates JSON schema from a @Tool-annotated method parameters automatically. Most modern LLMs are explicitly trained for "tool calling" use case to produce a valid JSON that follows the provided schema. Does this answer your Q?

In part, I want to understand the actual use of tools to solve this problem. I have something in mind, but I don't know if we're on the same page. Suppose I have an LLM that needs to extract some user info, and this is the output pojo:

record User(String firstName, String lastName) {}

Your idea is to have a tool method like this to generate the correct JSON?

@Tool("Generates a response in the required JSON format.")
public User answer(String firstName, String lastName) {
    return new User(firstName, lastName);
}

Good point! I guess this concern is applicable mostly for Ollama, as all other LLM providers that support tools, usually support them for all their models (at least I see this trend lately). Ollama throws an error in case tools are not supported by specific model: {"error":"tinydolphin does not support tools"}, so we should be safe.

👍

@langchain4j
Copy link

langchain4j commented Oct 22, 2024

@andreadimaio no, the idea is to automatically create

ToolSpecification.builder()
    .name("answer")
    .addParameter("firstName")
    .addParameter("lastName")
   .build()

under the hood of AI Service and inject it in the request to the LLM and force it to use this tool.

In this case user does not have to do anything, LLM will be forced to reply by calling answer tool and provide a valid JSON which we will deserialize into User object and return to the user.

@langchain4j
Copy link

langchain4j commented Oct 22, 2024

Python version of LC is actually using tool calling as a primary way for structured outputs: https://python.langchain.com/docs/how_to/structured_output/#the-with_structured_output-method

@langchain4j
Copy link

In this case tools are kind of "misused" for returning structured output

@andreadimaio
Copy link
Collaborator Author

@andreadimaio no, the idea is to automatically create

ToolSpecification.builder()
    .name("answer")
    .addParameter("firstName")
    .addParameter("lastName")
   .build()

under the hood of AI Service and inject it in the request to the LLM and force it to use this tool.

In this case user does not have to do anything, LLM will be forced to reply by calling answer tool and provide a valid JSON which we will deserialize into User object and return to the user.

Yes, it was just an example, of course everything will be automatic. So we are on the same page :)
Now it is clear to me, thanks!

@langchain4j
Copy link

Just to document this explicitly, here is the order (from best to worst) of strategies to get structured outputs is AI services (this is not implemented yet, jsut a plan):

  1. Use Structured Outputs feature (currently supported only by OpenAI and Gemini, Azure OpenAI is in progress, I also foresee other providers implementing this feature in the future) and provide JSON schema in structured way.
  2. Use Tools feature (currently supported by most LLM providers and is considered a standart feature) and provide JSON schema in a structured way.
  3. Automatically enable JSON mode (if supported) and append JSON schema in a free-form text to the user message.
  4. Append JSON schema in a free-form text to the user message.

User should be able to override this logic and explicitly specify which strategy to use.

@langchain4j
Copy link

cc @glaforge @jdubois @agoncal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants