Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow subgroup argument selection via Enum #37

Open
janvainer opened this issue Sep 24, 2023 · 2 comments
Open

Allow subgroup argument selection via Enum #37

janvainer opened this issue Sep 24, 2023 · 2 comments

Comments

@janvainer
Copy link

janvainer commented Sep 24, 2023

Is your feature request related to a problem? Please describe.

Hi, first of all, this project has a great potential for ML project configuration! Well done <3!!!
There is one usecase that is quite common in ML. It is when you have two different sub-configurations and you want to easily switch between them and then also specify certain sub-options. For example, consider a ML training script where you want to be able to select different optimizers:

class Optimizer(BaseModel):
    lr: float = 0.001
    eps: float = 1e-7

class Adam(Optimizer):
    lr: float = 0.003


class SGD(Optimizer):
    lr: float = 0.002


class Config(BaseModel):
    optimizer: Adam | SGD = Adam()

Now it would be awesome to somehow specify which optimizer to initialize in the config and also be able to set some of its parameters.

Describe the solution you'd like
How the CLI should look:

python train.py --optimizer adam --optimizer.lr 0.01  # specify the optimizer type and then also specify some of its params

There may be an issue regarding naming of the arguments based on the Union type. Instead, perhaps enums could be used:

class Optimizers(Optimizer, Enum):
    adam = Adam()
    sgd = SGD()
    sgd_custom = SGD(lr=1.0)


class Config(BaseModel):  # this gets parsed by argdantic into a cli later
    optimizer: Optimizers = Optimizers.adam

The CLI would have to check that the enum value itself is a BaseModel and treat it as a nested configuration node to be displayed and available in terminal.

Describe alternatives you've considered
Hydra allows this kind of sub-grouping via subfolders. Simple-parsing solves it via subgroups type. Unfortunately, none of them are built on pydantic, so the user has to take care of validation themselves.

WDYT about the feature? It would allow quite complex configurations, for example:

python train.py \
    --optimizer adam | sgd \
    --optimizer.lr 0.1 \
    --encoder lstm | conv \
    --encoder.channels 512

Edit: after a bit of thought, perhaps the Annotated type could be better suited 🤔 It would be something like

class Config(BaseModel):  # this gets parsed by argdantic into a cli later
    optimizer: Annotated[Optimizer, argdantic.Subgroups(adam=Adam(), sgd=SGD), sgd_custom=SGD(lr=1.0))] = Adam()

The advantage is that this can be used outside of CLI world without issues - one would be able to initialize Config without the need to import the enum class. It would be simply Config(optimizer=SGD(0.1)) instead of Config(optimizer=Optimizers.SGD). Another advantage is that it would be possible to pass optimizer config that is not pre-defined in the Enum.

@edornd
Copy link
Owner

edornd commented Sep 27, 2023

Hey @janvainer! Thank you for the kind words! I'm glad this tiny library has been helpful, the objective (well, at least mine) was exactly to provide a composable configuration for ML purposes.
I see your point and it would indeed be helpful, but it will require some thought for the implementation 🤔. The major problem I see here (if I understood correctly) is that the arguments would be defined by the choice of the parent, which is not known at creation time.

From the top of my head, I think it might be feasible by considering a Union of BaseModel as a multi-choice argument, at least for the root arg (i.e., --optimizer=adam), but I don't have a clean solution for subargs (--optim.lr=...).

I'll give it a try soon though! In the meantime, I was using something like this for this very purpose:

from enum import Enum

from argdantic import ArgParser


# imagine using torch optimizers
# there should be a base class for this
class Optimizer:
    def __init__(self, name: str, lr: float):
        self.name = name
        self.lr = lr


class SGD(Optimizer):
    def __init__(self, lr: float):
        super().__init__("SGD", lr)


class Adam(Optimizer):
    def __init__(self, lr: float):
        super().__init__("Adam", lr)


# define an Enum where the values are the classes
# or a partial(Class, fixed arguments)
class Optimizers(Enum):
    sgd = SGD
    adam = Adam


cli = ArgParser()


# make the user select the class, then use other arguments
# to define the input parameters
@cli.command()
def main(
    optimizer: Optimizers = Optimizers.sgd,
    lr: float = 0.01,
    epochs: int = 10,
    batch_size: int = 32,
):
    print(optimizer.value(lr))
    print(lr)
    print(epochs)
    print(batch_size)


if __name__ == "__main__":
    cli()

I agree that this is a workaround, but at least allows this sort of mechanism to an extent.
I'll see what can be done!

@janvainer
Copy link
Author

Thank you for your response and the code! Yes, it is a bit cumbersome. Please keep me in the loop! ;) I am curious how this unfolds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants