Merge pull request #22 from Iacob-Alexandru-Andrei/main

Expand Readme
camlsys · Jan 17, 2024 · afd2241 · afd2241
2 parents 5597894 + a765ddc
commit afd2241
Show file tree

Hide file tree

Showing 2 changed files with 18 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -109,7 +109,19 @@ After implementing the task, dynamically starting it via ```hydra``` requires ch
     - ```dispatch_config``` selects the configs used during fit and eval, you will likely not have to change this as the default task provides a sensible version.
 - The ```project.dispatch``` module requires you to add the task-specific ```dispatch_data```, ```dispatch_train``` and ```dispatch_config``` functions from the ```project.<new_task>.dispatch``` module to the list of possible tasks that can match the config. The statically-declared function order determines which task is selected if multiple ones match the config.
 
-Yu have now implemneted an entire new FL task without having to touch any of the FL-specific code. 
+Yu have now implemented an entire new FL task without having to touch any of the FL-specific code.
+
+## Using checkpoints
+
+By default, the entire template is synchronized across server rounds and the model parameters, `RNG` state, `Wandb` run, metric `History`, config files and logs are all checkpointed either every `freq` rounds, or once at the end of training when the process exists. If Wandb is used, any restarted run continues at the exact same link in Wandb with no cumbersome tracking necessary. 
+
+To use the checkpoint system all you have to do is to specify the `hydra.run.dir` to be a previous execution directory rather than the default timestamped output directory. If you wish to restore a specific round rather than the most recent one then modify the `server_round` in the `fed` config.
+
+## Reproducibility  
+One of the primary functionalities of this template is to allow for easily reproducible FL checkpointing. It achieves this by controlling the client sampling, server `RNG`, and client `RNG` seeding and saving the rng states for `Random`, `np`, and `torch`. The server and every client are provided with an isolated RNG generator making them usable in a multithreaded context where the global generators may get accessed unpredictably. 
+
+The `RNG` states of all of the relevant packages and generators are automatically saved and synchronized to the round allowing for reproducible client sample and client execution at the same round. Every relevant piece of client functionality also receives the isolated `RNG` state and can use it to guarantee reproducibility (e.g., the `PyTorch`` dataloader).
+
 ## Template Structure
 
 The template uses poetry with the ``project`` name for the top-level package. All imports are made from this package, and no relative imports are allowed. The structure is as follows:

diff --git a/project/client/client.py b/project/client/client.py
@@ -17,6 +17,7 @@
     get_isolated_rng_tuple,
 )
 from project.types.common import (
+    CID,
     ClientDataloaderGen,
     ClientGen,
     EvalRes,
@@ -56,7 +57,7 @@ class Client(fl.client.NumPyClient):
 
     def __init__(
         self,
-        cid: int | str | Path,
+        cid: CID,
         working_dir: Path,
         net_generator: NetGen,
         dataloader_gen: ClientDataloaderGen,
@@ -71,7 +72,7 @@ def __init__(
 
         Parameters
         ----------
-        cid : int | str
+        cid : int | str | Path
             The client's ID.
         working_dir : Path
             The path to the working directory.
@@ -307,12 +308,12 @@ def get_client_generator(
         The function which creates a new Client.
     """
 
-    def client_generator(cid: int | str) -> Client:
+    def client_generator(cid: CID) -> Client:
         """Return a new Client.
 
         Parameters
         ----------
-        cid : int | str
+        cid : int | str | Path
             The client's ID.
 
         Returns