-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high latency #2721
Comments
Could you share Kubernetes deployment configuration? I can see you are running with DEBUG log level - can you share logs from the server in your deployment with a few predictions logged? |
logs :
|
Okay, from the logs it looks like prediction takes ~1.4 s. Are you sure there are no resource constrains for your deployment? |
I have been using this OpenVINO server for the last two years without any issues. However, we started encountering this problem in the most recent deployments, even though we are using the same resources as in previous deployments. this is the resource column for the deployment :
|
Your standalone server does not have CPU limitations, but your Kubernetes deployment does. Please try removing CPU from limits and requests in resources field and check if performance gets closer to standalone instance. |
Sorry for the delayed response. We have tried it, but it didn't work. Now, this issue is occurring in every deployment. We are running inference containers outside the cluster. Could you help us understand what might be causing this? It seems like there could be issues in other parts of the platform as well, right? However, from the logs, it's clear that the server itself is taking more than 2 seconds for each inference |
So you observe the same issue outside Kubernetes?
From the logs I would say that only a fraction of CPU is used, like there's some limitation set. I don't know the details of your platform, but you can try to run it in isolation and send request manually to make sure it behaves the same. |
Outside of Kubernetes, it is working fine. The previous deployments done on Kubernetes are facing the issue. I will try removing all limitations inside the cluster. |
Latency in OpenVINO Model Server Inside Kubernetes Cluster
To Reproduce
Steps to reproduce the behavior:
Environment
I am using OpenVINO server for model deployment on edge devices with an Intel i7-13th Gen processor. My PyTorch model is trained with an image size of 640.
Expected Behavior
I expected the inference time for the OpenVINO Model Server inside the Kubernetes cluster to be comparable to that observed outside the cluster (i.e., <0.01 seconds).
model directory format
The text was updated successfully, but these errors were encountered: