feat: [Trainer controller] Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario #182

seshapad · 2024-06-11T16:46:00Z

Is your feature request related to a problem? Please describe.

Controlling the trainer in per-process level in terms of metric computation and operation execution should be enabled.

Describe the solution you'd like

Trainer controller capabilities in terms of process-specific rules is required.

Describe alternatives you've considered

NA

Additional context

NA

kmehant added the feat:instructlab instructlab related items label Jun 12, 2024

seshapad changed the title ~~feat: Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario~~ feat: [Trainer controller] Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: [Trainer controller] Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario #182

feat: [Trainer controller] Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario #182

seshapad commented Jun 11, 2024

feat: [Trainer controller] Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario #182

feat: [Trainer controller] Extend trainer controller capabilities to have fine-grained control in multi-node-multi-gpu scenario #182

Comments

seshapad commented Jun 11, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context