batch size issue in a multi-GPU environment #7

edisonguo · 2023-06-02T02:30:38Z

Training with horovod in a multi-GPU environment will invoke setup_datagen on each replica. But the batch_size of setup_datagen is set to self.global_batch_size, which will result in an effective batch size of self.global_batch_size * num_replicas. In both single and multi-GPUs, setup_datagen's batch_size should set to self.batch_size because the "global batch size" will be implicitly handled by the distributed data parallelism of horovod.

Please also see a related discussion here.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch size issue in a multi-GPU environment #7

batch size issue in a multi-GPU environment #7

edisonguo commented Jun 2, 2023

batch size issue in a multi-GPU environment #7

batch size issue in a multi-GPU environment #7

Comments

edisonguo commented Jun 2, 2023