Poor Inference Parallelism From Different JVMs On Same Host #2132
Replies: 2 comments 2 replies
-
Thanks for your reporting. Could you share any code snippet for your setup? And share any stack or flamechart of your JVM process. Please also feel free to email [email protected] to schedule any meeting with us to help you unblock. |
Beta Was this translation helpful? Give feedback.
-
@malcolm-mccarthy Secondly, if you run inference parallel, you need to tune your pytorch thread pool to avoid thread contention (even in multi-process case), see: https://docs.djl.ai/master/docs/development/inference_performance_optimization.html#thread-configuration
|
Beta Was this translation helpful? Give feedback.
-
Hello, we are running multiple copies of a Java application on the same host that use DJL/PyTorch/TorchScript to generate inferences. We had expected to see good parallel behaviour for the inferences as the JVMs are independent of one another. However, the odd thing is that we are seeing behaviour that is consistent with there being some sort of blocking activity within DJL/PyTorch/TorchScript that causes the inference calls from different JVMs to stack up behind each other.
We are running on Red Hat 8 with the following grade
implementation "ai.djl.pytorch::pytorch-engine:0.16.0"
implementation "ai.djl.pytorch::pytorch-jni:1.10.0-0.16.0"
implementation "ai.djl.pytorch::pytorch-native-cpu-precxx11:1.10.0:linux-x86_64"
Beta Was this translation helpful? Give feedback.
All reactions