You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My dataset's size is 162410. Now I have batch_size = per_device_train_batch_size * devices = 4*8 = 32. One iteration over dataset is 162410/32=5075.3125 steps.
So I set the max_steps as 50753 to make 10 epochs.
But I found that although my progress is nearly finished as followed.(50711/50753)
But the epoch shows still 1.26.
Thanks for your work.
I wanna ask a question why epoch in log is different from progress.
I have used the command to run the lora tuning with 8 gpus.
My dataset's size is 162410. Now I have batch_size = per_device_train_batch_size * devices = 4*8 = 32. One iteration over dataset is 162410/32=5075.3125 steps.
So I set the max_steps as 50753 to make 10 epochs.
But I found that although my progress is nearly finished as followed.(50711/50753)
But the epoch shows still 1.26.
So I wanna ask was it normal?
Does it mean that epoch only show one card info or there was something wrong?
Thanks for your response.
The text was updated successfully, but these errors were encountered: