-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
awscurl: Missing token metrics when -t option specified #2340
Comments
You can use "-j" parameter to define jsonquery for tritonserver output, see this test code: https://github.com/deepjavalibrary/djl-serving/blob/master/awscurl/src/test/java/ai/djl/awscurl/AwsCurlTest.java#L455-L456 |
Hello Frank, Thank you for the valuable suggestion. {'name': 'generated_text', Should a JSON query like "$.outputs[?(@.name=='generated_text')].data[0]": "$.response" work? Notice that data is a JSON string and I was wondering if the execution code can process it accordingly by extracting the data contained in the response field (so the tokens associated with the key are not counted) All the best |
I noticed that tokenThroughput = (totalTokens * 1000000000d / totalTime * clients). Could you please tell me the meaning behind the 1000000000d constant and what does tokenThroughput end up measuring? |
The totalTime is in nano seconds, need to convert to seconds. |
I don't think we can parse the nested json string. |
Thanks for the reply. I also opened another awscurl bug ticket related to the tokenizer behavior. I would be very grateful if you could have a look. All the best! |
Hello Frank, Token metrics are no longer computed when specifying a json query. |
Description
When requesting token metrics from an endpoint running a LMI container using a vLLM engine, non-zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (as expected)
When requesting token metrics from an endpoint running a Triton Inference Server, zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (unexpected). The Triton endpoint was tested successfully to verify that it responds to individual as well as concurrent requests (produces the expected output given the input requests).
One difference between the two setups consists in the schema of the input requests and of the output response. Specifically, the Triton endpoint operates with a different input schema and produces an output structured differently from the LMI endpoint. Do you suspect this might be the reason of the token metrics not being computed?
Expected Behavior
Return token metrics when -t option is specified
Error Message
Zero valued token metrics
How to Reproduce?
TOKENIZER=<path_to_tokenizer> ./awscurl -c 1 -N 10 -X POST -n sagemaker <triton_endpoint> --dataset <path_to_dataset> -H 'Content-Type: application/json' -P --connect-timeout 60
Triton input schema:
{"inputs": [
{"name": str,
"shape": [int],
"datatype": str,
"data: [str]}
]
}
Triton output schema:
{
"model_name": str,
"model_version: str,
"outputs":[
{ "name": str,
"shape": [int],
"datatype": str,
"data: [str]
}
]
}
The text was updated successfully, but these errors were encountered: