Incorrect saving of checkpoints on train_xl.py #165

brycegoh · 2024-11-21T17:00:19Z

Hi, I recently tested out the training script with --checkpointing_epoch=1 and gradient_accumulation_steps=1. I have a total dataset size of 10k with batch size of 5. This means that the script should be saving a checkpoint every 2000 steps. However, the checkpoint folder created is checkpoint-4000 and is being saved when the progress bar shows 4000 steps.

The text was updated successfully, but these errors were encountered:

rphly · 2024-11-21T17:00:57Z

fixed #164

brycegoh · 2024-11-21T17:01:13Z

Thanks bro

rphly · 2024-11-21T17:01:30Z

no problem @yisol @Sang-kyung @subin-kim-cv pls see

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect saving of checkpoints on train_xl.py #165

Incorrect saving of checkpoints on train_xl.py #165

brycegoh commented Nov 21, 2024

rphly commented Nov 21, 2024

brycegoh commented Nov 21, 2024

rphly commented Nov 21, 2024

Incorrect saving of checkpoints on train_xl.py #165

Incorrect saving of checkpoints on train_xl.py #165

Comments

brycegoh commented Nov 21, 2024

rphly commented Nov 21, 2024

brycegoh commented Nov 21, 2024

rphly commented Nov 21, 2024