You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using the de-en and de-cs model on the same dataset (a few hundred thousand texts), and noticed that the English model needs a lot more memory than the Czech one. I'm running on an A100 GPU (40 GB memory).
In practice, I ended up with a batch size for English smaller than half of the Czech batch, even though the model config says they are roughly the same size - the only difference being that actually the de-cs vocabulary is slightly larger.
On top of that, the English model gets the repeating nonsense subsequence issue a lot more often. I approximated that by a type to token ratio below 0.15, which gives 20 texts to Czech and around 70k in English. I don't see how this might relate to memory consumption but maybe there's something.
The text was updated successfully, but these errors were encountered:
I have been using the
de-en
andde-cs
model on the same dataset (a few hundred thousand texts), and noticed that the English model needs a lot more memory than the Czech one. I'm running on an A100 GPU (40 GB memory).In practice, I ended up with a batch size for English smaller than half of the Czech batch, even though the model config says they are roughly the same size - the only difference being that actually the
de-cs
vocabulary is slightly larger.On top of that, the English model gets the repeating nonsense subsequence issue a lot more often. I approximated that by a type to token ratio below 0.15, which gives 20 texts to Czech and around 70k in English. I don't see how this might relate to memory consumption but maybe there's something.
The text was updated successfully, but these errors were encountered: