ChatOllama in "astream_events" is switching from "on_chat_model_stream" to "on_llm_stream" #29383

weissenbacherpwc · 2025-01-23T15:49:52Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

This is my astream_events code:

# The Async Chain loaded with a LLM from the GROQ API                 
async def langgraph_agentic_pipeline(question: str, collection_name: str, streamlit_username: str,chat_with_doc_use_case: bool = False):
    """
    This function represents the Langgraph RAG pipeline with Chat Model, in this case from the Groq-API.
    First the retriever is initialized.
    Afterwards the response is streamed async, based on the question.
    With Chat Models, events are called: "on_chat_model_end" or "on_chat_model_stream".
    
    params:
    question (str): The question of a user
    collection_name (str): The collection name of the vectordatabase of the confluence export
    chat_with_doc_use_case (bool): If set to True, the retriever will be loaded for the Chat-with-your-document Use Case with lower thresholds than defined in the config.
    """    
    if chat_with_doc_use_case == False:
        retriever = load_retriever(
            collection_name=collection_name,
            embeddings=embeddings,
            reranking_model=reranking_model,
            CONNECTION_STRING=CONNECTION_STRING
        )   
        chain = create_workflow_app(retriever=retriever, model=llm)
    else:        
        print("Loading Retriever for 'Chat-with-your-document' use case'")
        retriever = load_retriever(
            collection_name=collection_name,
            embeddings=embeddings,
            reranking_model=reranking_model,
            reranking_top_k=10,
            threshold_after_colbert_reranking=8,
            threshold_after_cross_encoder_reranking=0.3,
            CONNECTION_STRING=CONNECTION_STRING
        )   
        chain = create_workflow_app(retriever=retriever, model=llm, langgraph_mode="basic_agentic_workflow")
    thread_id = streamlit_username # E.g. if thread_id changes from id_1 to id_2, the message history will be empty
    input_message = HumanMessage(content=question)
    config = {
            "configurable": {"thread_id": thread_id}, #for every user, a different thread_id should be selected
            'callbacks': [langfuse_handler] #if you are not using Langfuse, use 'ConsoleCallbackHandler' for 'callbacks' 
        }
    if chain_states.get(thread_id) and bool(chain_states[thread_id]):
        chain.update_state(
            config,
            chain_states[thread_id]
        )
    try:
        async for event in chain.astream_events(
            {"messages":  [input_message]},
            version="v1",
            config=config
        ):
            print(event)
            if event["event"] == f"on_{model_type_for_astream_event}_start" and event.get("metadata", {}).get("langgraph_node") == "generate":
                print("Stream started...")
                if model_type_for_astream_event == "llm":
                    prompt_length = len(event["data"]["input"]["prompts"][0])
                else:
                    prompt_length= len(event["data"]["input"]["messages"][0][0].content)
                yield f'data: {json.dumps({"type": "prompt_length_characters", "content": prompt_length})}\n\n'
                yield f'data: {json.dumps({"type": "prompt_length_tokens", "content": prompt_length / 4})}\n\n'
                
            if event["event"] == f"on_{model_type_for_astream_event}_stream" and event.get("metadata", {}).get("langgraph_node") == "generate":
                if model_type_for_astream_event == "llm":
                    chunks = event["data"]['chunk']
                else:
                    chunks = event["data"]['chunk'].content
                        # Serialize the chunk to JSON-safe format
                yield f'data: {json.dumps({"type": "chunk", "content": chunks})}\n\n'
                await asyncio.sleep(0.1) 
                
            elif event["event"] == "on_chain_end" and event.get("metadata", {}).get("langgraph_node") == "format_docs" and event["name"] == "format_docs":
                retrieved_docs = event["data"]["input"]["raw_docs"]
                serialized_docs = serialize_documents(retrieved_docs)
                yield f'data: {{"type": "docs", "content": {serialized_docs}}}\n\n'
    finally:
        states = chain.get_state(config).values
        #print(states)
        chain_states[thread_id] = states

Error Message and Stack Trace (if applicable)

No response

Description

Hi,

I am using ChatOllama and astream_events to stream my responses. I see a weird behaviour when using the astream_events function. When I call my LLM for the first time in a session, streaming works as expected and the event type is "'event': 'on_chat_model_stream'" as expected. However when I run the same question again the event type switches to "on_llm_stream". With this bug my code is not fetching the new event and the streaming does not work.

I see there is a field "Is_model_type: chat" which might be another way to debug this, however I wanted to report this weird behaviour.

First question:
'data': {'chunk': AIMessageChunk(content='-Le', id='run-80026f5b-cc9b-443a-a9f2-3f14188d25a6')}, 'parent_ids': []} ungs{'event': 'on_chat_model_stream', 'name': 'ChatOllama', 'run_id': '80026f5b-cc9b-443a-a9f2-3f14188d25a6', 'tags': ['seq:step:1'], 'metadata': {'thread_id': 'maxiw', 'langgraph_step': 3, 'langgraph_node': 'generate', 'langgraph_triggers': ['format_docs'], 'langgraph_task_idx': 0, 'thread_ts': '1efd9a0b-1fca-6892-8002-4d1eb12a4ad1', 'ls_provider': 'ollama', 'ls_model_name': 'maxiweissenbacher/sauerkrautlm-nistral-nemo-12b-instruct:fp16', 'ls_model_type': 'chat', 'ls_temperature': 0.1}, 'data':

Second question:
'data': {'chunk': AIMessageChunk(content=' Antwort', id='run-f16ce854-ba43-4225-8418-fbe5c1a3a6da')}, 'parent_ids': []} lie{'event': 'on_llm_stream', 'name': 'ChatOllama', 'run_id': 'f16ce854-ba43-4225-8418-fbe5c1a3a6da', 'tags': ['seq:step:1'], 'metadata': {'thread_id': 'maxiw', 'langgraph_step': 5, 'langgraph_node': 'generate', 'langgraph_triggers': ['no_docs_found_question'], 'langgraph_task_idx': 0, 'thread_ts': '1efd9a10-83d9-69dc-8004-84ec81d877c6', 'ls_provider': 'ollama', 'ls_model_name': 'maxiweissenbacher/sauerkrautlm-nistral-nemo-12b-instruct:fp16', 'ls_model_type': 'chat', 'ls_temperature': 0.1}, 'data'

System Info

ollama 0.4.7
langchain 0.2.7
langchain-community 0.2.7
langchain-core 0.2.43
langchain-experimental 0.0.63
langchain-groq 0.1.5
langchain-huggingface 0.0.3
langchain-ollama 0.1.1
langchain-openai 0.1.20
langchain-postgres 0.0.12

The text was updated successfully, but these errors were encountered:

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatOllama in "astream_events" is switching from "on_chat_model_stream" to "on_llm_stream" #29383

ChatOllama in "astream_events" is switching from "on_chat_model_stream" to "on_llm_stream" #29383

weissenbacherpwc commented Jan 23, 2025

ChatOllama in "astream_events" is switching from "on_chat_model_stream" to "on_llm_stream" #29383

ChatOllama in "astream_events" is switching from "on_chat_model_stream" to "on_llm_stream" #29383

Comments

weissenbacherpwc commented Jan 23, 2025

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info