Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Proposal] Distributed Training for FL #330

Open
slyviacassell opened this issue Aug 16, 2023 · 3 comments
Open

[Feature Proposal] Distributed Training for FL #330

slyviacassell opened this issue Aug 16, 2023 · 3 comments
Labels

Comments

@slyviacassell
Copy link

slyviacassell commented Aug 16, 2023

As the title described, does standalone mode support multiple GPUs to speed up training?

@slyviacassell slyviacassell changed the title Does FedLab support DDP of pytorch? Does FedLab support multiple GPUs? Aug 16, 2023
@slyviacassell slyviacassell changed the title Does FedLab support multiple GPUs? Does standalone mode support multiple GPUs? Aug 16, 2023
@dunzeng
Copy link
Collaborator

dunzeng commented Aug 16, 2023

We didn't provide multiple GPUs in the standalone module.
However, you can use the DP module of PyTorch in train function in SGDSerialClientTrainer.

@slyviacassell slyviacassell changed the title Does standalone mode support multiple GPUs? [Feature Proposal] Distributed Training for FL Aug 24, 2023
@slyviacassell
Copy link
Author

slyviacassell commented Aug 24, 2023

We define the following variables to further illustrate the idea:

  • K: the number of clients who participated in training each round
  • N: the number of available GPUs

When K == N, each selected client is allocated to a GPU to train.

When K > N, multiple clients are allocated to a GPU, then they execute training sequentially in the GPU.

When K < N, you can adjust to use fewer GPUs in training.

We need to set the number of GPUs in gpu and specific distributed settings in the distributed configs.

The implementation is under working. Anybody would like to help?

@QiTianyu-0403
Copy link

We define the following variables to further illustrate the idea:

  • K: the number of clients who participated in training each round
  • N: the number of available GPUs

When K == N, each selected client is allocated to a GPU to train.

When K > N, multiple clients are allocated to a GPU, then they execute training sequentially in the GPU.

When K < N, you can adjust to use fewer GPUs in training.

We need to set the number of GPUs in gpu and specific distributed settings in the distributed configs.

The implementation is under working. Anybody would like to help?

I'm very interested in the function you mentioned. Is there any code available that can implement this function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants