Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to introduce other optimizers into DI-engine? #813

Open
7 of 11 tasks
weidaolee opened this issue Jul 1, 2024 · 2 comments
Open
7 of 11 tasks

How to introduce other optimizers into DI-engine? #813

weidaolee opened this issue Jul 1, 2024 · 2 comments
Labels
algo Add new algorithm or improve old one good first issue Good for newcomers

Comments

@weidaolee
Copy link

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    Hi,

I would like to incorporate optimizers from the parameterfree library into RL training.
However, I've noticed that DI-engine has hard-coded the optimizer in most of its RL algorithms.
e.g:

I have a few questions regarding this:

  1. Why is the optimizer hard-coded in most of DI-engine's RL algorithms?

  2. What is the best practice if I really want to use a different optimizer?

  3. Why does DI-engine implement a different optimizer for each that from Pytorch instead of using a design pattern like the strategy pattern or template pattern, which would enable users to easily compose or implement their optimizers?

  4. Based on my understanding, the optimizer and the "thing" to clip are independent. Therefore, is decoupling the RL algorithm and optimizer feasible?

Feel free to correct any misunderstandings or gaps in my knowledge. If you have plans to refactor this part, I am willing to contribute code.

@PaParaZz1 PaParaZz1 added good first issue Good for newcomers algo Add new algorithm or improve old one labels Jul 2, 2024
@PaParaZz1
Copy link
Member

Thanks for your attention about DI-engine. Now I will give some basic explanations about your questions:

  1. Why is the optimizer hard-coded in most of DI-engine's RL algorithms?

In most RL problems, the AdamW optimizer with LambdaLRScheduler is enough to acquire a group of excellent hyper-parameters. This is often no need to adjust this part frequently like other machine learning tasks. On the other hand, some RL algorithms like DDPG often need multiple optimizers rather than a single optimizer in algorithms like DQN, thus it is not easy to abstract a unified optimizer interface for all the policies.

  1. What is the best practice if I really want to use a different optimizer?

If you want to use different optimizer, you can indeed modify the relevant policy file to suit your specific requirements. The DI-engine framework is designed with flexibility in mind, allowing users to customize the model, policy, and env modules according to their needs. Due the complexity and rapid development of RL area, it is difficult to abstract a fixed algorithm configurations and implementation. Thus, we implement this policy files like a kind of template for different users, and we hope these files could be easy-to-hack.

3 & 4

There are indeed some kinds of coding design patterns about how to elegantly compose different modules. However, in the early stage of DI-engine, we thought the optimizer is important to hack in the large-scale RL training such as DI-star. In this scenario, we modified the optimizer in-place to implement some special gradient clipping operations with little overhead of GPU memory. And this historical implementation is left to today's version. From the viewpoint of most classical RL algorithms, I think it is feasible to decoupling the RL algorithm and optimizer, but it needs some detailed design to figure out the problems I mentioned above. If interested, we can provide the corresponding support about the refactor plan.

@weidaolee
Copy link
Author

weidaolee commented Jul 7, 2024

Hello, thank you so much for your explanation.
This is my imagination of this refactoring:

The TemplateOptimizer has a template method called step. The before_step serves as a hook method that can be deferredly implemented by subclasses. We can use __getattr__ to delegate methods and attributes to self.optimizer so that TemplateOptimizer can act as a torch.optim.Optimizer.

With this implementation, we can separate the clipping and ignoring method from the optimizers. Then, we can also separate the RL algorithms from the optimizers.

However, after carefully studying the implementation of DI-engine, I feel that refactoring this part does introduce a lot of complexity. If maintenance and modification costs outweigh the benefits, perhaps maintaining the existing implementation is a better choice.

I still provide a sketch of this idea. If you evaluate it and think it is not necessary for any reason, you can just close this issue. Alternatively, if you think this refactoring is valuable, I'd be willing to discuss it further.

Thanks again for your reply. By the way, your code is very good and it makes me feel happy physically and mentally while reading it.

class TemplateOptimizer(torch.optim.Optimizer):
    def __init__(self, optimizer: torch.optim.Optimizer):
        self.optimizer = optimizer
        # Initialize the base Optimizer with the parameters of the passed optimizer
        super(TemplateOptimizer, self).__init__(
            optimizer.param_groups, optimizer.defaults
        )

    def before_step(self):
        # This method can be overridden by subclasses if needed
        ...

    def step(self, closure=None):
        self.before_step()
        self.optimizer.step(closure)

    def __getattr__(self, name):
        return getattr(self.optimizer, name)


class DingOptimizer(TemplateOptimizer):
    def __init__(
        self,
        optimizer: torch.optim.Optimizer,
        optim_type: str = "",
        grad_clip_type: str = None,
        clip_value: Union[float, None] = None,
        clip_coef: float = 5,
        clip_norm_type: float = 2.0,
        clip_momentum_timestep: int = 100,
        grad_norm_type: str = None,
        grad_ignore_type: str = None,
        ignore_value: Union[float, None] = None,
        ignore_coef: float = 5,
        ignore_norm_type: float = 2.0,
        ignore_momentum_timestep: int = 100,
    ):
        super(DingOptimizer, self).__init__(optimizer)
        self.optim_type = optim_type
        self.grad_clip_type = grad_clip_type
        self.clip_value = clip_value
        self.clip_coef = clip_coef
        self.clip_norm_type = clip_norm_type
        self.clip_momentum_timestep = clip_momentum_timestep
        self.grad_norm_type = grad_norm_type
        self.grad_ignore_type = grad_ignore_type
        self.ignore_value = ignore_value
        self.ignore_coef = ignore_coef
        self.ignore_norm_type = ignore_norm_type
        self.ignore_momentum_timestep = ignore_momentum_timestep

    def clipping(self):
        ...
    def ignoring(self):
        ...

    @override
    def before_step(self):
        self.clipping()
        self.ignoring()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algo Add new algorithm or improve old one good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants