-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Introduced a new condition to yield StreamingChatMessageContent
directly when usage data is available.
#9753
base: main
Are you sure you want to change the base?
Conversation
Hi @ymuichiro, thank you for your contribution! If you read the comments made to this particular Did you observe a different behavior with Azure OpenAI? |
@TaoChenOSU Am I misunderstanding something? It does indeed feel odd that it works even with older API versions. ↓ success versions and sample code
The following was created directly in the shell to prevent any misunderstandings due to other causes, but even when using Semantic Kernel, the same error could not be reproduced. payload="{\
\"messages\": [\
{\
\"role\": \"user\",\
\"content\": [\
{\
\"type\": \"text\",\
\"text\": \"hi.\"\
}\
]\
}\
],\
\"temperature\": 0.7,\
\"top_p\": 0.95,\
\"stream\": true,\
\"stream_options\": { \"include_usage\": true },\
\"max_tokens\": 10\
}"
curl "https://${***********************}.openai.azure.com/openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-10-21" \
-H "Content-Type: application/json" \
-H "api-key: **************************" \
-d "$payload" async def stream_sample() -> None:
kernel = sk.Kernel()
service_id: str = "dummy"
kernel.add_service(
AzureChatCompletion(
service_id=service_id,
deployment_name=AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME,
endpoint=AZURE_OPENAI_COMPLETION_ENDPOINT,
api_key=AZURE_OPENAI_COMPLETION_API_KEY,
api_version="2024-06-01",
)
)
service = kernel.get_service(service_id=service_id)
settings = service.get_prompt_execution_settings_class()(service_id=service_id)
if isinstance(settings, AzureChatPromptExecutionSettings):
settings.extra_body = {
"stream_options": {
"include_usage": True,
}
}
history = ChatHistory()
history.add_user_message("hello")
async for chunk in service.get_streaming_chat_message_contents(
chat_history=history,
settings=settings,
kernel=kernel,
arguments=KernelArguments(settings=settings),
):
print(chunk) |
Sorry, I used the wrong account but it's the same person. |
Hi @ymuichiro, I just verified. Seems like they have resolved the issue. Could you remove the override of |
…de of _inner_get_streaming_chat_message_contents has been removed.
hi @TaoChenOSU sure, is this ok? |
python/semantic_kernel/connectors/ai/open_ai/services/azure_chat_completion.py
Outdated
Show resolved
Hide resolved
Yes, this is right. Just one minor comment and we are good! |
We have a unit test failure because of the change. Could you fix that too? |
…atCompletionBase.
Sorry. I missed it. I found out that stream_options is forced, so I added the argument to the test code. I verified that the test runs successfully locally. |
Motivation and Context
issue: #9751
This pull request addresses a bug where setting
stream_options.include_usage
toTrue
does not return token usage, resulting inNone
for theusage
field.The issue occurs when using Azure OpenAI's GPT-4o and GPT-4omini models. In particular, if the last chunk of the response has an empty
choices
list, the chunk is skipped entirely, and the token usage is not processed correctly.In the Azure OpenAI implementation, if
usage
information is included, the chunk should be processed appropriately. However, the current code skips processing whenchoices
is empty. This pull request fixes this behavior so that the chunk is processed whenusage
is present, even ifchoices
is empty.Description
This fix includes the following changes:
azure_chat_completion.py
to ensure that chunks with emptychoices
are not skipped ifusage
information is present.if len(chunk.choices) == 0:
was updated to allow chunks withusage
data to be processed correctly.With these changes, setting
stream_options.include_usage
toTrue
will correctly return token usage data, even for chunks where thechoices
list is empty.Contribution Checklist