resuming checkpoint without lr schedule or optimizer state #253

eliebak · 2024-11-28T08:47:21Z

quick pr to add the possibility to resume training without the optimizer state or lr schedule for continual learning

NouamaneTazi

Nice! left a qst

NouamaneTazi · 2024-11-28T14:55:29Z

src/nanotron/trainer.py

@@ -215,7 +215,7 @@ def __init__(
            )

        # Define iteration start state
-        if self.init_checkpoint_path is not None:
+        if self.init_checkpoint_path is not None and self.config.checkpoints.load_lr_scheduler:


is this only lr related?

I think when you want to do a new lr schedule, you also want to do a different data stages + reset the "last_train_step" argument

NouamaneTazi

LGTM!

style fix

3034bd2

NouamaneTazi reviewed Nov 28, 2024

View reviewed changes

TJ-Solergibert mentioned this pull request Nov 28, 2024

[NEW] Llama3.2 weight converters 🦙 #255

Open

6 tasks

NouamaneTazi approved these changes Dec 3, 2024

View reviewed changes

NouamaneTazi merged commit fdd5151 into huggingface:main Dec 3, 2024
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resuming checkpoint without lr schedule or optimizer state #253

resuming checkpoint without lr schedule or optimizer state #253

eliebak commented Nov 28, 2024 •

edited

Loading

NouamaneTazi left a comment

NouamaneTazi Nov 28, 2024

eliebak Nov 29, 2024

NouamaneTazi left a comment

resuming checkpoint without lr schedule or optimizer state #253

resuming checkpoint without lr schedule or optimizer state #253

Conversation

eliebak commented Nov 28, 2024 • edited Loading

NouamaneTazi left a comment

Choose a reason for hiding this comment

NouamaneTazi Nov 28, 2024

Choose a reason for hiding this comment

eliebak Nov 29, 2024

Choose a reason for hiding this comment

NouamaneTazi left a comment

Choose a reason for hiding this comment

eliebak commented Nov 28, 2024 •

edited

Loading