Added support for PyTorch Lightning in the DDP backend. #162
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request includes several changes to improve the handling of distributed data parallel (DDP) setups and trial evaluation in the
neps
runtime. The changes focus on adding support for evaluating trials in a DDP context and ensuring proper state management.DDP and Trial Evaluation Enhancements:
_is_ddp_and_not_rank_zero
to check if the current process is part of a DDP setup and is not the rank zero process. (neps/runtime.py
, neps/runtime.pyR49-R66)_launch_ddp_runtime
function to handle the evaluation of trials in a DDP setup. This function ensures that only the rank zero process launches a new worker. (neps/runtime.py
, neps/runtime.pyR512-R531)_launch_runtime
function to use_launch_ddp_runtime
when in a DDP setup and not rank zero. This prevents non-rank-zero processes from launching new workers. (neps/runtime.py
, neps/runtime.pyR550-R556)State Management Improvements:
evaluating
method to theFileBasedTrialStore
class to retrieve all evaluating trials. (neps/state/filebased.py
, neps/state/filebased.pyR212-R220)get_current_evaluating_trial
method to theNepsState
class to get the current trial being evaluated. (neps/state/neps_state.py
, neps/state/neps_state.pyR217-R222)evaluating
method in theTrialStore
protocol to standardize the retrieval of evaluating trials across different implementations. (neps/state/protocols.py
, neps/state/protocols.pyR141-R144)These changes collectively enhance the
neps
runtime's ability to manage and evaluate trials efficiently, especially in distributed computing environments.