Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch size issue in a multi-GPU environment #7

Open
edisonguo opened this issue Jun 2, 2023 · 0 comments
Open

batch size issue in a multi-GPU environment #7

edisonguo opened this issue Jun 2, 2023 · 0 comments

Comments

@edisonguo
Copy link

Training with horovod in a multi-GPU environment will invoke setup_datagen on each replica. But the batch_size of setup_datagen is set to self.global_batch_size, which will result in an effective batch size of self.global_batch_size * num_replicas. In both single and multi-GPUs, setup_datagen's batch_size should set to self.batch_size because the "global batch size" will be implicitly handled by the distributed data parallelism of horovod.

Please also see a related discussion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant