how to calcuate the training throughput #444

bigtree2020 · 2024-09-12T06:09:53Z

From the line https://github.com/microsoft/Megatron-DeepSpeed/blob/main/megatron/training.py#L1119C9-L1119C24 it uses tokens_per_sec to measure the training throughput. it's being calculated with samples_per_sec * seq_len. what is the seq_len here?

For example, I have batch size 4 used for the training, which means each time I will give 4 samples to the trainer. Inside the batch, each sample has a different length. In this case, how do we calculate the tokens_per_sec?
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to calcuate the training throughput #444

how to calcuate the training throughput #444

bigtree2020 commented Sep 12, 2024

how to calcuate the training throughput #444

how to calcuate the training throughput #444

Comments

bigtree2020 commented Sep 12, 2024