Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update doc for metric_for_best_model when save_strategy="best". #35389

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

seanswyi
Copy link
Contributor

What does this PR do?

Updates the docstring for TrainingArguments.metric_for_best_model, Trainer._determine_best_metric, and adds a new test.

Specifically, when save_strategy="best" we need to specify a value for metric_for_best_model. This clashes with the previous logic that metric_for_best_model would default to loss.

Brought up in this comment: #31817 (comment)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@muellerzr @SunMarc (cc. @shcheklein - Author of comment)

@seanswyi seanswyi changed the title Fix/update metric for best model default Update doc for metric_for_best_model when save_strategy="best". Dec 22, 2024
@seanswyi seanswyi closed this Dec 22, 2024
@seanswyi seanswyi reopened this Dec 22, 2024
self.assertIn("`args.metric_for_best_model` must be provided", str(context.exception))

# Case 4: Metric name not provided and save_best_strategy is "steps" (i.e., not "best").
with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not critical / minor: tbh, it seems a bit out of place for the test_save_best_checkpoint (as well as the previous case). I would probably move it into a separate test. Or should it otherwise call at least train and test actual checkpoint saved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I agree that it logically does seem a bit out of place. I think cases 3 and 4 could be grouped into their own methods since the point isn't so much to test the save_strategy = "best" itself but more to test the behavior related to metric_for_best_model.

I'm not sure if actually running training would be necessary, though. Case 3 is simply to check whether a ValueError is being properly thrown at Trainer initialization time, and case 4 is also simply to check whether the __post_init__ method of TrainingArguments properly initializes metric_for_best_model to "loss" when save_strategy != "best" and load_best_model_at_end = True. To me, neither of these seem to require training/evaluation and Trainer instantiation seems sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I would also split it into a separate test (or two test). And, yes, we are testing the init here, that's why it was looking out of place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinion. We can split it into a separate test for case 3 and 4.

@@ -477,7 +477,7 @@ class TrainingArguments:
metric_for_best_model (`str`, *optional*):
Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different
models. Must be the name of a metric returned by the evaluation with or without the prefix `"eval_"`. Will
default to `"loss"` if unspecified and `load_best_model_at_end=True` (to use the evaluation loss).
default to `"loss"` if unspecified, `load_best_model_at_end=True`, and `save_strategy!="best"`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my 2cs (Disclaimer! I'm not very familiar with the whole scope of the initial change, or reason behind it!): it's a bit hard to read and understand what is going on here and why. E.g. why can't it default to loss when save_strategy == best? What is the major difference with the load_best_model_at_end (and save_strategy!="best")?

Again, apologies if I'm missing some obvious context here. Please feel free to disregard my comment / question then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the place where we set metric_for_best_model = "loss" when save_strategy!=best. Can you explain a bit why you changed the description ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein That was a design decision made here (#31817 (comment)). It was deemed easier to debug if we don't add a hard-coded value and rather raise an error.


@SunMarc Hmm I'm starting to think that maybe the problem is that we're not able to set load_best_model_at_end = True when save_strategy = "best" since load_best_model_at_end requires eval_strategy == save_strategy but eval_strategy doesn't have a "best" option.

This means that if we want to use save_strategy = "best" then we have to have load_best_model_at_end = False, which in turn means that when save_strategy != "best" and load_best_model_at_end = True then the __post_init__ method of TrainingArguments is setting metric_for_best_model to "loss". https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L1676:L1679

The modified docstring aims to add a bit more detail as to when the metric_for_best_model is set to a default of "loss".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add best for eval_strategy then ?

Copy link
Contributor Author

@seanswyi seanswyi Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that might sound a bit awkward since it means we'd "perform evaluation at every best checkpoint."

Maybe we could check if save_strategy == "best" and then bypass the eval_strategy == save_strategy condition? That would mean that here we would change the code to:

if self.load_best_model_at_end and self.save_strategy != "best":
    if self.eval_strategy != self.save_strategy:
        raise ValueError(
            "--load_best_model_at_end requires the save and eval strategy to match, but found\n- Evaluation "
            f"strategy: {self.eval_strategy}\n- Save strategy: {self.save_strategy}"
        )

I'm not 100% sure about the history of why eval_strategy == save_strategy but I'm assuming that it's a safe guard to prevent situations where we want to load the best model at the end of training but we never saved it because the two didn't match. If save_strategy == "best" I don't think we'd have that problem since saving is guaranteed to following evaluation.

I think this also means that we may have to either remove the default loss error we're raising or change it to a warning (#31817 (comment)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me for the new condition, just add a comment to explain why we don't need to perform the check when self.save_strategy == "best". Also, which default loss error you are talking about ? I'm not sure why we need to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to this block of code:

class Trainer:
    def __init__(...):
        ...

        if args.save_strategy == SaveStrategy.BEST or args.load_best_model_at_end:
            if args.metric_for_best_model is None:
                raise ValueError(
                    "`args.metric_for_best_model` must be provided when using 'best' save_strategy or if `args.load_best_model_at_end` is set to `True`."
                )

I believe that this error was added because rather than allowing the metric_for_best_model to default to loss, it would be easier to debug if we force the user to explicitly set a loss.

But if we change the __post_init__ method of TrainingArguments, the metric_for_best_model would default to loss if load_best_model_at_end == True regardless of the value of save_strategy.

I don't think we have to remove the error message, but I just thought that it would be a bit different from what we initially intended.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Left a few comments

self.assertIn("`args.metric_for_best_model` must be provided", str(context.exception))

# Case 4: Metric name not provided and save_best_strategy is "steps" (i.e., not "best").
with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinion. We can split it into a separate test for case 3 and 4.

@@ -477,7 +477,7 @@ class TrainingArguments:
metric_for_best_model (`str`, *optional*):
Use in conjunction with `load_best_model_at_end` to specify the metric to use to compare two different
models. Must be the name of a metric returned by the evaluation with or without the prefix `"eval_"`. Will
default to `"loss"` if unspecified and `load_best_model_at_end=True` (to use the evaluation loss).
default to `"loss"` if unspecified, `load_best_model_at_end=True`, and `save_strategy!="best"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find the place where we set metric_for_best_model = "loss" when save_strategy!=best. Can you explain a bit why you changed the description ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants