You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have access to a GPU server that can handle larger batch-sizes, say around 128 (or more). I believe this would reduce the training time ~4x, What would you recommend would be a good learning rate on higher batch-sizes? In your experience, is there a good heuristic that you follow in training GANs when it comes to adjusting batch-size and learning-rate ?
On a slightly unrelated note -- have you tried using Distributed Data Parallel to speed up training? We've been trying to use it, but are encountering weird errors, maybe you have some insights? If we are able to figure it out, I'd love to share the code with you and contribute it here.
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, I have no direct experience with scaling of learning rates for larger batch sizes in GANs, but related work seems to suggest scaling the learning rate with the batch size might be a good start (see e.g. Section 4.7 in https://www.jmlr.org/papers/volume20/18-789/18-789.pdf). Popular approaches for that are to scale the learning rate linearly or by square root with the batch size. So I guess if you increase the batch size you should also increase the learning rate, but by how much exactly is somewhat difficult to predict.
Regarding distributed data parallel training I have not experimented with that but I'd be happy about your contributions and experiences.
Hello authors,
I have access to a GPU server that can handle larger batch-sizes, say around 128 (or more). I believe this would reduce the training time ~4x, What would you recommend would be a good learning rate on higher batch-sizes? In your experience, is there a good heuristic that you follow in training GANs when it comes to adjusting batch-size and learning-rate ?
On a slightly unrelated note -- have you tried using Distributed Data Parallel to speed up training? We've been trying to use it, but are encountering weird errors, maybe you have some insights? If we are able to figure it out, I'd love to share the code with you and contribute it here.
Thanks!
The text was updated successfully, but these errors were encountered: