You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using a g4dn.12xlarge AWS machine with four T4 GPUs.
The pod hangs when executing this line until I manually terminate it.
I suspected this change might have been the culprit so I ran the same code with v1.2.4 of parallelformers. This time, the pod quits during execution of the same line without outputting any errors which is odd.
Notably, if I run the same command without --use-pf it runs fine.
I saw you've reported some problems using docker. However, memory should not be an issue here since I'm using Helsinki-NLP/opus-mt-en-zh model which is relatively small.
I was wondering if parallelformers code has ever been tested on Kubernetes?
Also would appreciate it if you could look into this issue. Thanks!
Environment
OS : Linux
Python version : 3.8.3
Transformers version : 4.17.0
Whether to use Docker: Yes
Misc.:
branch: main
The text was updated successfully, but these errors were encountered:
How to reproduce
First of all, thanks for this great project!
I'm facing an issue running the test code provided here on Kubernetes.
This is what I'm running inside a Kubeflow pod:
I'm using a g4dn.12xlarge AWS machine with four T4 GPUs.
The pod hangs when executing this line until I manually terminate it.
I suspected this change might have been the culprit so I ran the same code with v1.2.4 of parallelformers. This time, the pod quits during execution of the same line without outputting any errors which is odd.
Notably, if I run the same command without
--use-pf
it runs fine.I saw you've reported some problems using docker. However, memory should not be an issue here since I'm using
Helsinki-NLP/opus-mt-en-zh
model which is relatively small.I was wondering if parallelformers code has ever been tested on Kubernetes?
Also would appreciate it if you could look into this issue. Thanks!
Environment
The text was updated successfully, but these errors were encountered: