Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pydantic 2.x cfg #1239

Merged
merged 14 commits into from
Feb 26, 2024
Merged

Pydantic 2.x cfg #1239

merged 14 commits into from
Feb 26, 2024

Conversation

winglian
Copy link
Collaborator

@winglian winglian commented Feb 1, 2024

Description

This PR migrates the validation to use Pydantic validators. We keep all of the existing tests with some modification. I've attempted to capture all the cfg.* attributes I could find and make sure they are represented in the Pydantic config models. I've also added a GPUCapabilities model to abstract away the underlying hardware checks so that it can be checked offline before sending the config for actual training.

For now, we run the validation and convert it back to a DictDefault that we currently use for the cfg. This is to minimize the blast radius of this change to strictly validation. We can consider down the line how to swap out the various uses of cfg and how the attributes are accessed and if it's compatible with pydantic models.

Motivation and Context

How has this been tested?

Existing unit tests.

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

@winglian winglian marked this pull request as draft February 1, 2024 04:03
@winglian winglian changed the title WIP: Pydantic cfg Pydantic 2.x cfg Feb 20, 2024
@winglian winglian marked this pull request as ready for review February 20, 2024 21:41
Copy link
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first review. I'll take a look later on the rest.

Comment on lines 347 to 357
capabilities = GPUCapabilities(
bf16=is_torch_bf16_gpu_available(), n_gpu=os.environ.get("WORLD_SIZE", 1)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved into validate or normalize config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm avoiding setting this in the validation as there are downstream use cases where a user might want to make sure their configuration works on a configured GPU cluster before firing off the training.

src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
@model_validator(mode="before")
@classmethod
def check_sample_packing_w_xformers(cls, root):
if root.get("sample_packing") and root.get("xformers_attention"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a few methods check for when sample_packing is on but not actually active due to unsupported model type?

Pseudo code

    def check_sample_packing_active(cls, root):
         if root.get("sample_packing") and not any(llama/mistral/qwen):
             raise ValueError(
                 "sample_packing not compatible with current model type"
             )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that we can reliably detect the model at this step of the validation to raise a ValueError

src/axolotl/utils/config/models/input/v0_4_1/__init__.py Outdated Show resolved Hide resolved
Copy link

@mattt mattt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. Thanks for taking this on, @winglian! Left a couple non-blocking suggestions.

In a follow-up, it'd be lovely to annotate each field with the documentation from "All yaml options (click me)" in the README.

@@ -543,7 +543,7 @@ is_mistral_derived_model:
is_qwen_derived_model:

# optional overrides to the base model configuration
model_config:
model_config_overrides:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this, but we would need to deprecate the old name (add valueerror)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't actually deprecate it with Pydantic because model_config is an internal variable name for pydantic models

@winglian winglian merged commit cc3cebf into main Feb 26, 2024
7 checks passed
@winglian winglian deleted the pydantic-cfg branch February 26, 2024 17:24
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* WIP conversion to use pydantic for config validation

* wip, more fields, add capabilities

* wip

* update pydantic validation to match existing tests

* tweak requirements

* setup deprecated paams pydantic model

* more validations

* wrap up rest of the validations

* flesh out the rest of the options from the readme into pydantic

* fix model validators as class methods

remember to return in validator
missing return
add missing relora attributes
fix test for DictDefault change
fix sys template for mistral from fastchat change in PR 2872
fix test for batch size warning

* more missing attributes for cfg

* updates from PR feedback

* fix validation for datasets and pretrain datasets

* fix test for lora check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants