From 94a08977b384ece91ceb0cb808b3f931b43d673c Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Tue, 20 Aug 2024 00:04:15 +0200 Subject: [PATCH 01/12] Update headings in quickstart to align with the typical user-flow in a FL project. --- docs/quickstart.rst | 62 +++++++++++++++++++++++++-------------------- 1 file changed, 35 insertions(+), 27 deletions(-) diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 7723e7f28..b1d8ff748 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -11,21 +11,24 @@ Getting started with FEDn - `A FEDn Studio account `__ -1. Start a FEDn Studio Project ------------------------------- +1. Start the server +-------------------- -Start by creating an account in Studio. Head over to `fedn.scaleoutsystems.com/signup `_ and sign up. +The first step is to start the server side (aggregator, controller). We do this by setting up a new Project in FEDn Studio. +Start by creating an account in Studio: `fedn.scaleoutsystems.com/signup `_. Logged into Studio, create a new project by clicking on the "New Project" button in the top right corner of the screen. -You will see a Studio project similar to the image below. The Studio project provides all the necessary server side components of FEDn. -We will use this project in a later stage to run the federated experiments. But first, we will set up the local client. - +You will see a Studio project similar to the image below. The Studio project provides a secure and managed deployment of all the necessary server side components. .. image:: img/studio_project_overview.png +2. Prepare the clients +----------------------- + +Next, we will prepare and package the ML code to be executed by each client and create a first version of the global model (seed model). +We will work with one of the pre-defined projects in the FEDn repository, ``mnist-pytorch``. -2. Install FEDn on your client -------------------------------- +First install the FEDn API on your local machine (client): **Using pip** @@ -49,10 +52,7 @@ It is recommended to use a virtual environment when installing FEDn. .. _package-creation: -Next, we will prepare the client. We will use one of the pre-defined projects in the FEDn repository, ``mnist-pytorch``. - -3. Create the compute package and seed model --------------------------------------------- +**Create the compute package and seed model** In order to train a federated model using FEDn, your Studio project needs to be initialized with a ``compute package`` and a ``seed model``. The compute package is a code bundle containing the code used by the client to execute local training and local validation. The seed model is a first version of the global model. @@ -90,14 +90,13 @@ This will create a file called ``seed.npz`` in the root of the project. Next will now upload these files to your Studio project: -4. Initialize your FEDn Studio Project --------------------------------------- +**Initialize your FEDn Studio Project** -In the Studio UI, navigate to the project you created above and click on the "Sessions" tab. Click on the "New Session" button. Under the "Compute package" tab, select a name and upload the generated package file. Under the "Seed model" tab, upload the generated seed file: +In the Studio UI, navigate to the project you created in step one and click on the "Sessions" tab. Click on the "New Session" button. Under the "Compute package" tab, select a name and upload the generated package file. Under the "Seed model" tab, upload the generated seed file: .. image:: img/upload_package.png -**Upload the package and seed model using the Python APIClient** +** (Alternative) Upload the package and seed model using the Python APIClient** It is also possible to upload a package and seed model using the Python API Client. @@ -116,8 +115,13 @@ To upload the package and seed model using the APIClient: >>> client.set_active_model("seed.npz") -5. Configure and attach clients -------------------------------- +3. Start clients +----------------- + +Now we are ready to start FEDn clients on your local machine. There are two steps involved: + +1. Register a new client in your Studio project, issuing an access token. +2. Start up a client process on your local host (using the token to connect securely) **Generate an access token for the client (in Studio)** @@ -138,7 +142,7 @@ A normal laptop should be able to handle several clients for this example. **Modifying the data split (multiple-clients, optional):** -The default traning and test data for this particular example (mnist-pytorch) is for convenience downloaded and split automatically by the client when it starts up (see the 'startup' entrypoint). +The default traning and test data for this particular example (mnist-pytorch) is for convenience downloaded and split automatically by the client when it starts up. The number of splits and which split to use by a client can be controlled via the environment variables ``FEDN_NUM_DATA_SPLITS`` and ``FEDN_DATA_PATH``. For example, to split the data in 10 parts and start a client using the 8th partiton: @@ -161,15 +165,19 @@ For example, to split the data in 10 parts and start a client using the 8th part fedn client start -in client.yaml --secure=True --force-ssl -6. Start a training session ---------------------------- +4. Training +-------------- + +With clients connected, we are now ready to train the global model. This can be done using either the Studio dashboard or the Python API. In FEDn, training is organised +in Sessions. One training session consists of a configurable number of training rounds (local model updates and aggregation). In Studio click on the "Sessions" link, then the "New session" button in the upper right corner. Click the "Start session" tab and enter your desirable settings (the default settings are good for this example) and hit the "Start run" button. In the terminal where your are running your client you should now see some activity. When a round is completed, you can see the results on the "Models" page. -**Watch the training progress** +**Watch real-time updates of training progress** -Once a training session is started, you can monitor the progress of the training by navigating to "Sessions" and click on the "Open" button of the active session. The session page will list the models as soon as they are generated. +Once a training session is started, you can monitor the progress by clicking the drop-down button for the active Sessions and the clicking on the "View session" button. The session page will show +metrics related to the training progress (accuracy, loss etc), as well as performance data such as total round times and individual client training times. A list of models in the session is updated as soon as new models are generated. To get more information about a particular model, navigate to the model page by clicking the model name. From the model page you can download the model weights and get validation metrics. .. image:: img/studio_model_overview.png @@ -179,7 +187,7 @@ To get more information about a particular model, navigate to the model page by Congratulations, you have now completed your first federated training session with FEDn! Below you find additional information that can be useful as you progress in your federated learning journey. -**Control training sessions using the Python APIClient** +**Run training sessions using the Python APIClient** You can also issue training sessions using the APIClient: @@ -214,7 +222,7 @@ You can also access global model updates via the APIClient: **Where to go from here?** -------------------------- -With you first FEDn federated project set up, we suggest that you take a close look at how a FEDn project is structured +With you first FEDn federated project set up, we suggest that you take a closer look at how a FEDn project is structured and how you develop your own FEDn projects: - :ref:`projects-label` @@ -223,8 +231,8 @@ You can also dive into the architecture overview to learn more about how FEDn is - :ref:`architecture-label` -For developers looking to cutomize FEDn and develop own aggregators, check out the local development guide. -This page also has instructions for using Docker to run clients: +For developers looking to customize FEDn and develop own aggregators, check out the local development guide +to learn how to set up an all-in-one deployment. This page also has instructions for using Docker to run clients: - :ref:`developer-label` From e0200e79a0eef20b1d7ea7ae15c42c680a278c55 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Sun, 25 Aug 2024 17:00:58 +0200 Subject: [PATCH 02/12] Improved introduction section --- docs/index.rst | 2 +- docs/introduction.rst | 58 ++++++++++++++++++++++++++----------------- docs/projects.rst | 13 ++++++++++ 3 files changed, 49 insertions(+), 24 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 79b862c0e..2a1452790 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -5,12 +5,12 @@ introduction quickstart projects + apiclient .. toctree:: :maxdepth: 1 :caption: Documentation - apiclient architecture aggregators helpers diff --git a/docs/introduction.rst b/docs/introduction.rst index 0a6a59879..a8857ca7d 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -1,7 +1,7 @@ -What is FEDn? -============= +What is Federated Learning? +=========================== -Federated Learning offers a novel approach to address challenges related to data privacy, security, +Federated Learning is a novel approach to address challenges related to data privacy, security, and decentralized data distribution. In contrast to traditional machine learning setups where data is collected and stored centrally, Federated Learning allows for collaborative model training while keeping data local with the data owner or device. This is particularly advantageous in scenarios where data cannot be easily shared due to privacy regulations, network limitations, or ownership concerns. @@ -12,44 +12,56 @@ each participant computes gradients locally based on its data. These gradients a The server aggregates and combines the gradients from multiple participants to update a global model. This iterative process allows the global model to improve without the need to share the raw data. -FEDn empowers users to create federated learning applications that seamlessly transition from local proofs-of-concept to secure distributed deployments. -We develop the FEDn framework following these core design principles: -- **Seamless transition from proof-of-concepts to real-world FL**. FEDn has been designed to make the journey from R&D to real-world deployments as smooth as possibe. Develop your federated learning use case in a pseudo-local environment, then deploy it to FEDn Studio (cloud or on-premise) for real-world scenarios. No code change is required to go from development and testing to production. +FEDn: An enterprise-ready federated learning framework +------------------------------------------------------- -- **Designed for scalability and resilience.** FEDn enables model aggregation through multiple aggregation servers sharing the workload. A hierarchical architecture makes the framework well suited borh for cross-silo and cross-device use-cases. FEDn seamlessly recover from failures in all critical components, and manages intermittent client-connections, ensuring robust deployment in production environments. +Our goal is to provide a federated learning framework that is both secure, scalable and easy-to-use. We believe that that minimal code change should be needed to progress from early proof-of-concepts to production. This is reflected in our core design: -- **Secure by design.** FL clients do not need to open any ingress ports, facilitating distributed deployments across a wide variety of settings. Additionally, FEDn utilizes secure, industry-standard communication protocols and supports token-based authentication and RBAC for FL clients (JWT), providing flexible integration in production environments. +- **Minimal server-side complexity for the end-user**. Running a proper distributed FL deployment is hard. With FEDn Studio we seek to handle all server-side complexity and provide a UI, REST API and a Python interface to help users manage FL experiments and track metrics in real time. -- **Developer and data scientist friendly.** Extensive event logging and distributed tracing enables developers to monitor experiments in real-time, simplifying troubleshooting and auditing. Machine learning metrics can be accessed via both a Python API and visualized in an intuitive UI that helps the data scientists analyze and communicate ML-model training progress. +- **Secure by design.** FL clients do not need to open any ingress ports. Industry-standard communication protocols (gRPC) and token-based authentication and RBAC (Jason Web Tokens) provides flexible integration in a range of production environments. +- **ML-framework agnostic**. A black-box client-side architecture lets data scientists interface with their framework of choice. + +- **Cloud native.** By following cloud native design principles, we ensure a wide range of deployment options including private cloud and on-premise infrastructure. + +- **Scalability and resilience.** Multiple aggregation servers (combiners) can share the workload. FEDn seamlessly recover from failures in all critical components and manages intermittent client-connections. + +- **Developer and DevOps friendly.** Extensive event logging and distributed tracing enables developers to monitor the sytem in real-time, simplifying troubleshooting and auditing. Extensions and integrations are facilitated by a flexible plug-in architecture. Features -========= +-------- -Federated machine learning: +Federated learning: -- Support for any ML framework (e.g. PyTorch, Tensforflow/Keras and Scikit-learn) +- Tiered federated learning architecture enabling massive scalability and resilience. +- Support for any ML framework (examples for PyTorch, Tensforflow/Keras and Scikit-learn) - Extendable via a plug-in architecture (aggregators, load balancers, object storage backends, databases etc.) - Built-in federated algorithms (FedAvg, FedAdam, FedYogi, FedAdaGrad, etc.) -- CLI and Python API client for running FEDn networks and coordinating experiments. +- UI, CLI and Python API. - Implement clients in any language (Python, C++, Kotlin etc.) - No open ports needed client-side. -FEDn Studio - From development to FL in production: +From development to FL in production: + +- Secure deployment of server-side / control-plane on Kubernetes. +- UI with dashboards for orchestrating FL experiments and for visualizing results +- Team features - collaborate with other users in shared project workspaces. +- Features for the trusted-third party: Manage access to the FL network, FL clients and training progress. +- REST API for handling experiments/jobs. +- View and export logging and tracing information. +- Public cloud, dedicated cloud and on-premise deployment options. + +Available client APIs: -- Leverage Scaleout's free managed service for development and testing in real-world scenarios (SaaS). -- Token-based authentication (JWT) and role-based access control (RBAC) for FL clients. -- REST API and UI. -- Data science dashboard for orchestrating experiments and visualizing results. -- Admin dashboard for managing the FEDn network and users/clients. -- View extensive logging and tracing information. -- Collaborate with other data-scientists on the project specification in a shared workspace. -- Cloud or on-premise deployment (cloud-native design, deploy to any Kubernetes cluster) +- Python client (this repository) +- C++ client (`FEDn C++ client `__) +- Android Kotlin client (`FEDn Kotlin client `__) Support -========= +-------- Community support in available in our `Discord server `__. diff --git a/docs/projects.rst b/docs/projects.rst index 8e8592532..dd34956b2 100644 --- a/docs/projects.rst +++ b/docs/projects.rst @@ -174,4 +174,17 @@ by looking at the code above. Here we assume that the dataset is present in a fi the execution of entrypoint.py. Then, independent of the preferred way to run the client (native, Docker, K8s etc) this structure needs to be maintained for this particular compute package. Note however, that there are many ways to accomplish this on a local operational level. +Where to go from here? +----------------------- + +With an understanding of how FEDn Projects are structured and created, you can explore our library of example projects. They demonstrate different use case scenarios of FEDn +and its integration with popular machine learning frameworks like PyTorch and TensorFlow. + +- `FEDn + PyTorch `__ +- `FEDn + Tensforflow/Keras `__ +- `FEDn + MONAI `__ +- `FEDn + Hugging Face `__ +- `FEDn + Flower `__ +- `FEDN + Self-supervised learning `__ + From 854aae69bb132696b7285ffe0a24323f8040740f Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Sun, 25 Aug 2024 17:23:40 +0200 Subject: [PATCH 03/12] Move APIClient examples from Quickstart to APIClient page --- docs/apiclient.rst | 47 ++++++++++++++++++++++ docs/quickstart.rst | 97 +++++++++++---------------------------------- 2 files changed, 71 insertions(+), 73 deletions(-) diff --git a/docs/apiclient.rst b/docs/apiclient.rst index 4bfb7fe79..6af223654 100644 --- a/docs/apiclient.rst +++ b/docs/apiclient.rst @@ -51,6 +51,25 @@ To set the initial seed model, you can use the following code snippet: client.set_active_model(path="path/to/seed.npz") + +** (Alternative) Upload the package and seed model using the Python APIClient** + +It is also possible to upload a package and seed model using the Python API Client. + +.. note:: + You need to create an API admin token and use the token to authenticate the APIClient. + Do this by going to the 'Settings' tab in FEDn Studio and click 'Generate token'. Copy the access token and use it in the APIClient below. + The controller host can be found on the main Dashboard in FEDn Studio. More information on the use of the APIClient can be found here: :ref:`apiclient-label. + +To upload the package and seed model using the APIClient: + +.. code:: python + + >>> from fedn import APIClient + >>> client = APIClient(host="", token="", secure=True, verify=True) + >>> client.set_active_package("package.tgz", helper="numpyhelper") + >>> client.set_active_model("seed.npz") + **Start a training session** Once the active package and seed model are set, you can connect clients to the network and start training models. The following code snippet starts a traing session: @@ -59,6 +78,34 @@ Once the active package and seed model are set, you can connect clients to the n session = client.start_session(id="session_name") + +**Run training sessions using the Python APIClient** + +You can also issue training sessions using the APIClient: + +.. code:: python + + >>> ... + >>> client.start_session(id="test-session", rounds=3) + # Wait for training to complete, when controller is idle: + >>> client.get_controller_status() + # Show model trail: + >>> models = client.get_model_trail() + # Show performance of latest global model: + >>> model_id = models[-1]['model'] + >>> validations = client.get_validations(model_id=model_id) + +**Accessing global models** + +You can also access global model updates via the APIClient: + +.. code:: python + + >>> ... + >>> client.download_model("", path="model.npz") + +Please see :py:mod:`fedn.network.api` for more details on how to use the APIClient. + **List data** Other than starting training sessions, the APIClient can be used to get data from the network, such as sessions, models etc. All entities are represented and they all work in a similar fashion. diff --git a/docs/quickstart.rst b/docs/quickstart.rst index b1d8ff748..bda38b297 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -22,8 +22,8 @@ You will see a Studio project similar to the image below. The Studio project pro .. image:: img/studio_project_overview.png -2. Prepare the clients ------------------------ +2. Prepare the clients and define the global model +--------------------------------------------------- Next, we will prepare and package the ML code to be executed by each client and create a first version of the global model (seed model). We will work with one of the pre-defined projects in the FEDn repository, ``mnist-pytorch``. @@ -88,34 +88,16 @@ This will create a file called ``seed.npz`` in the root of the project. When you first exectue the above commands, FEDn will build a venv, and this takes a bit of time. For more information on the various options to manage the environement, see :ref:`projects-label`. -Next will now upload these files to your Studio project: - -**Initialize your FEDn Studio Project** +Next will now upload these files to your Studio project. +3. Initialize the server-side +------------------------------ +The next step is to initialize the server side with the client code and the initial global model. In the Studio UI, navigate to the project you created in step one and click on the "Sessions" tab. Click on the "New Session" button. Under the "Compute package" tab, select a name and upload the generated package file. Under the "Seed model" tab, upload the generated seed file: .. image:: img/upload_package.png -** (Alternative) Upload the package and seed model using the Python APIClient** - -It is also possible to upload a package and seed model using the Python API Client. - -.. note:: - You need to create an API admin token and use the token to authenticate the APIClient. - Do this by going to the 'Settings' tab in FEDn Studio and click 'Generate token'. Copy the access token and use it in the APIClient below. - The controller host can be found on the main Dashboard in FEDn Studio. More information on the use of the APIClient can be found here: :ref:`apiclient-label. - -To upload the package and seed model using the APIClient: - -.. code:: python - - >>> from fedn import APIClient - >>> client = APIClient(host="", token="", secure=True, verify=True) - >>> client.set_active_package("package.tgz", helper="numpyhelper") - >>> client.set_active_model("seed.npz") - - -3. Start clients +4. Start clients ----------------- Now we are ready to start FEDn clients on your local machine. There are two steps involved: @@ -131,20 +113,10 @@ Rename the file to 'client.yaml'. **Start the client (on your local machine)** -Now we can start the client by running the following command: - -.. code-block:: - - fedn run client -in client.yaml --secure=True --force-ssl - -Repeat these two steps (generate an access token and start a local client) for the number of clients you want to use. -A normal laptop should be able to handle several clients for this example. - -**Modifying the data split (multiple-clients, optional):** - -The default traning and test data for this particular example (mnist-pytorch) is for convenience downloaded and split automatically by the client when it starts up. +The default training and test data for this particular example (mnist-pytorch) is for convenience downloaded and split automatically by the client when it starts up. The number of splits and which split to use by a client can be controlled via the environment variables ``FEDN_NUM_DATA_SPLITS`` and ``FEDN_DATA_PATH``. -For example, to split the data in 10 parts and start a client using the 8th partiton: + +Start a client (using a 10-split and the first partition) by running the following commands: .. tabs:: @@ -153,7 +125,7 @@ For example, to split the data in 10 parts and start a client using the 8th part export FEDN_PACKAGE_EXTRACT_DIR=package export FEDN_NUM_DATA_SPLITS=10 - export FEDN_DATA_PATH=./data/clients/8/mnist.pt + export FEDN_DATA_PATH=./data/clients/1/mnist.pt fedn client start -in client.yaml --secure=True --force-ssl .. code-tab:: bash @@ -161,12 +133,14 @@ For example, to split the data in 10 parts and start a client using the 8th part $env:FEDN_PACKAGE_EXTRACT_DIR="package" $env:FEDN_NUM_DATA_SPLITS=10 - $env:FEDN_DATA_PATH="./data/clients/8/mnist.pt" + $env:FEDN_DATA_PATH="./data/clients/1/mnist.pt" fedn client start -in client.yaml --secure=True --force-ssl +Repeat these two steps (generate an access token and start a local client) for the number of clients you want to use. +A normal laptop should be able to handle several clients for this example. Remember to use different partitions for each client. -4. Training --------------- +5. Train the global model +----------------------------- With clients connected, we are now ready to train the global model. This can be done using either the Studio dashboard or the Python API. In FEDn, training is organised in Sessions. One training session consists of a configurable number of training rounds (local model updates and aggregation). @@ -187,52 +161,29 @@ To get more information about a particular model, navigate to the model page by Congratulations, you have now completed your first federated training session with FEDn! Below you find additional information that can be useful as you progress in your federated learning journey. -**Run training sessions using the Python APIClient** - -You can also issue training sessions using the APIClient: - -.. code:: python - - >>> ... - >>> client.start_session(id="test-session", rounds=3) - # Wait for training to complete, when controller is idle: - >>> client.get_controller_status() - # Show model trail: - >>> models = client.get_model_trail() - # Show performance of latest global model: - >>> model_id = models[-1]['model'] - >>> validations = client.get_validations(model_id=model_id) - - -Please see :py:mod:`fedn.network.api` for more details on how to use the APIClient. - **Downloading global model updates** .. note:: In FEDn Studio, you can access global model updates by going to the 'Models' or 'Sessions' tab. Here you can download model updates, metrics (as csv) and view the model trail. - -You can also access global model updates via the APIClient: - -.. code:: python - - >>> ... - >>> client.download_model("", path="model.npz") - **Where to go from here?** --------------------------- With you first FEDn federated project set up, we suggest that you take a closer look at how a FEDn project is structured -and how you develop your own FEDn projects: +to learn how to develop your own FEDn projects: - :ref:`projects-label` -You can also dive into the architecture overview to learn more about how FEDn is designed and works under the hood: +In this tutorial we relied on the UI. The Python APIClient provides a flexible alternative, with additional functionality +such as use of different aggregators. Learn how to use the APIClient here: + +- :ref:`apiclient-label` + +Study the architecture overview to learn more about how FEDn is designed and works under the hood: - :ref:`architecture-label` For developers looking to customize FEDn and develop own aggregators, check out the local development guide -to learn how to set up an all-in-one deployment. This page also has instructions for using Docker to run clients: +to learn how to set up an all-in-one development environment using Docker and docker-compose: - :ref:`developer-label` From 1403c6a8dea4aa11396afc299aef68ff917b7169 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Sun, 25 Aug 2024 17:45:50 +0200 Subject: [PATCH 04/12] Work in progress on APIClient page --- docs/apiclient.rst | 43 +++++++++++-------------------------------- docs/quickstart.rst | 12 +++++------- 2 files changed, 16 insertions(+), 39 deletions(-) diff --git a/docs/apiclient.rst b/docs/apiclient.rst index 6af223654..1bb011004 100644 --- a/docs/apiclient.rst +++ b/docs/apiclient.rst @@ -1,9 +1,15 @@ .. _apiclient-label: -APIClient -========= +Using the Python API +==================== FEDn comes with an *APIClient* - a Python3 library that can be used to interact with FEDn programmatically. +In this tutorial we show how to use the APIClient to initialize the server-side with the compute package and seed models, +run and control training sessions, use different aggregators, and to retrieve models and metrics. + +We assume a basic understanding of the FEDn framework, i.e. that the user have taken the Getting Started tutorial: + +- :ref:`apiclient-label` **Installation** @@ -13,12 +19,11 @@ The APIClient is available as a Python package on PyPI, and can be installed usi $ pip install fedn -**Initialize the APIClient** +**Initialize the APIClient to a FEDn Studio project** The FEDn REST API is available at /api/v1/. To access this API you need the url to the controller-host, as well as an admin API token. The controller host can be found in the project dashboard (top right corner). To obtain an admin API token, navigate to the "Settings" tab in your Studio project and click on the "Generate token" button. Copy the 'access' token and use it to access the API using the instructions below. - .. code-block:: python >>> from fedn import APIClient @@ -36,32 +41,9 @@ Then passing a token as an argument is not required. >>> from fedn import APIClient >>> client = APIClient(host="", secure=True, verify=True) +**Set the active package and seed model** -**Set active package and seed model** - -The active package can be set using the following code snippet: - -.. code-block:: python - - client.set_active_package(path="path/to/package.tgz", helper="numpyhelper") - -To set the initial seed model, you can use the following code snippet: - -.. code-block:: python - - client.set_active_model(path="path/to/seed.npz") - - -** (Alternative) Upload the package and seed model using the Python APIClient** - -It is also possible to upload a package and seed model using the Python API Client. - -.. note:: - You need to create an API admin token and use the token to authenticate the APIClient. - Do this by going to the 'Settings' tab in FEDn Studio and click 'Generate token'. Copy the access token and use it in the APIClient below. - The controller host can be found on the main Dashboard in FEDn Studio. More information on the use of the APIClient can be found here: :ref:`apiclient-label. - -To upload the package and seed model using the APIClient: +To set the active compute package in the FEDn Studio Project: .. code:: python @@ -78,11 +60,8 @@ Once the active package and seed model are set, you can connect clients to the n session = client.start_session(id="session_name") - **Run training sessions using the Python APIClient** -You can also issue training sessions using the APIClient: - .. code:: python >>> ... diff --git a/docs/quickstart.rst b/docs/quickstart.rst index bda38b297..26e27053e 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -1,3 +1,5 @@ +.. _quickstart-label: + Getting started with FEDn ========================= @@ -173,8 +175,9 @@ to learn how to develop your own FEDn projects: - :ref:`projects-label` -In this tutorial we relied on the UI. The Python APIClient provides a flexible alternative, with additional functionality -such as use of different aggregators. Learn how to use the APIClient here: +In this tutorial we relied on the UI for running training sessions and retrieving models and results. +The Python APIClient provides a flexible alternative, with additional functionality exposed, +including the use of different aggregators. Learn how to use the APIClient here: - :ref:`apiclient-label` @@ -186,8 +189,3 @@ For developers looking to customize FEDn and develop own aggregators, check out to learn how to set up an all-in-one development environment using Docker and docker-compose: - :ref:`developer-label` - - - - - From 2275d1168bf325c08629a54bf07494815a590253 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Sun, 25 Aug 2024 18:15:28 +0200 Subject: [PATCH 05/12] Move auth into developer guide --- docs/developer.rst | 142 ++++++++++++++++++++++++++++++++++++++------- docs/index.rst | 1 - 2 files changed, 121 insertions(+), 22 deletions(-) diff --git a/docs/developer.rst b/docs/developer.rst index 8a9e4b87d..bb30c6f00 100644 --- a/docs/developer.rst +++ b/docs/developer.rst @@ -1,20 +1,16 @@ .. _developer-label: -Local development and deployment -================================ +Local development sandbox +========================= .. note:: - These instructions are for users wanting to set up a local development deployment of FEDn (i.e. without FEDn Studio). - This requires practical knowledge of Docker and docker-compose. + These instructions are for users wanting to set up a bare-minimum local deployment of FEDn (without FEDn Studio). + We here assume practical knowledge of Docker and docker-compose. We recommend all new users of FEDn to start + by taking the Getting Started tutorial: :ref:`quickstart-label` -Running the FEDn development sandbox (docker-compose) ------------------------------------------------------- - -During development on FEDn, and when working on own aggregators/helpers, it is -useful to have a local development setup of the core FEDn services (controller, combiner, database, object store). -For this, we provide Dockerfiles and docker-compose template. - -To start a development sandbox for FEDn using docker-compose: +During development on FEDn, and when working on own extentions including aggregators and helpers, it is +useful to have a local development setup of the core FEDn server-side services (controller, combiner, database, object store). +We provide Dockerfiles and docker-compose template for an all-in-one local sandbox: .. code-block:: @@ -24,14 +20,14 @@ To start a development sandbox for FEDn using docker-compose: up This starts up local services for MongoDB, Minio, the API Server, one Combiner and two clients. -You can verify the deployment using these urls: +You can verify the deployment on localhost using these urls: - API Server: http://localhost:8092/get_controller_status - Minio: http://localhost:9000 - Mongo Express: http://localhost:8081 -This setup does not include the security features of Studio, and thus will not require authentication of clients. -To use the APIClient to test a compute package and seed model against a local FEDn deployment: +This setup does not include any of the security and authentication features available in a Studio Project, +so we will not require authentication of clients (insecure mode) when using the APIClient: .. code-block:: @@ -40,8 +36,7 @@ To use the APIClient to test a compute package and seed model against a local FE client.set_active_package("package.tgz", helper="numpyhelper") client.set_active_model("seed.npz") - -To connect a native FEDn client, you need to make sure that the combiner service can be resolved using the name "combiner". +To connect a native FEDn client to the sandbox deployment, you need to make sure that the combiner service can be resolved by the client using the name "combiner". One way to achieve this is to edit your '/etc/hosts' and add a line '127.0.0.1 combiner'. Access message logs and validation data from MongoDB @@ -76,7 +71,6 @@ You can clean up by running docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml down -v - Connecting clients using Docker: ------------------------------------------------------ @@ -93,8 +87,8 @@ and FEDN 0.10.0, run this from the example folder: ghcr.io/scaleoutsystems/fedn/fedn:0.10.0 run client -in client.yaml --force-ssl --secure=True -Self-managed distributed deployment ------------------------------------------------------- +Distributed deployment on a local network +========================================= You can use different hosts for the various FEDn services. These instructions shows how to set up FEDn on a **local network** using a single workstation or laptop as the host for the servier-side components, and other hosts or devices as clients. @@ -116,7 +110,6 @@ the host for the servier-side components, and other hosts or devices as clients. Launch a distributed FEDn Network --------------------------------- - Start by noting your host's local IP address, used within your network. Discover it by running ifconfig on UNIX or ipconfig on Windows, typically listed under inet for Unix and IPv4 for Windows. @@ -159,3 +152,110 @@ Alternatively updating the `/etc/hosts` file, appending the following lines for api-server combiner + + +.. _auth-label: + +Authentication and Authorization (RBAC) +======================================== + +.. warning:: The FEDn RBAC system is an experimental feature and may change in the future. + +FEDn supports Role-Based Access Control (RBAC) for controlling access to the FEDn API and gRPC endpoints. The RBAC system is based on JSON Web Tokens (JWT) and is implemented using the `jwt` package. The JWT tokens are used to authenticate users and to control access to the FEDn API. +There are two types of JWT tokens used in the FEDn RBAC system: +- Access tokens: Used to authenticate users and to control access to the FEDn API. +- Refresh tokens: Used to obtain new access tokens when the old ones expire. + +.. note:: Please note that the FEDn RBAC system is not enabled by default and does not issue JWT tokens. It is used to integrate with external authentication and authorization systems such as FEDn Studio. + +FEDn RBAC system is by default configured with four types of roles: +- `admin`: Has full access to the FEDn API. This role is used to manage the FEDn network using the API client or the FEDn CLI. +- `combiner`: Has access to the /add_combiner endpoint in the API. +- `client`: Has access to the /add_client endpoint in the API and various gRPC endpoint to participate in federated learning sessions. + +A full list of the "roles to endpoint" mappings for gRPC can be found in the `fedn/network/grpc/auth.py`. For the API, the mappings are defined using custom decorators defined in `fedn/network/api/auth.py`. + +.. note:: The roles are handled by a custom claim in the JWT token called `role`. The claim is used to control access to the FEDn API and gRPC endpoints. + +To enable the FEDn RBAC system, you need to set the following environment variables in the controller and combiner: + +Authentication Environment Variables +------------------------------------- + +.. line-block:: + + **FEDN_JWT_SECRET_KEY** + - **Type:** str + - **Required:** yes + - **Default:** None + - **Description:** The secret key used for JWT token encryption. + + **FEDN_JWT_ALGORITHM** + - **Type:** str + - **Required:** no + - **Default:** "HS256" + - **Description:** The algorithm used for JWT token encryption. + + **FEDN_AUTH_SCHEME** + - **Type:** str + - **Required:** no + - **Default:** "Token" + - **Description:** The authentication scheme used in the FEDn API and gRPC interceptors. + +Additional Environment Variables +-------------------------------- + +For further flexibility, you can also set the following environment variables: + +.. line-block:: + + **FEDN_CUSTOM_URL_PREFIX** + - **Type:** str + - **Required:** no + - **Default:** None + - **Description:** Add a custom URL prefix used in the FEDn API, such as /internal or /v1. + + **FEDN_AUTH_WHITELIST_URL** + - **Type:** str + - **Required:** no + - **Default:** None + - **Description:** A URL pattern to the API that should be excluded from the FEDn RBAC system. For example, /internal (to enable internal API calls). + + **FEDN_JWT_CUSTOM_CLAIM_KEY** + - **Type:** str + - **Required:** no + - **Default:** None + - **Description:** The custom claim key used in the JWT token. + + **FEDN_JWT_CUSTOM_CLAIM_VALUE** + - **Type:** str + - **Required:** no + - **Default:** None + - **Description:** The custom claim value used in the JWT token. + +Client Environment Variables +----------------------------- + +For the client, you need to set the following environment variables: + +.. line-block:: + + **FEDN_AUTH_REFRESH_TOKEN_URI** + - **Type:** str + - **Required:** no + - **Default:** None + - **Description:** The URI used to obtain new access tokens when the old ones expire. + + **FEDN_AUTH_REFRESH_TOKEN** + - **Type:** str + - **Required:** no + - **Default:** None + - **Description:** The refresh token used to obtain new access tokens when the old ones expire. + + **FEDN_AUTH_SCHEME** + - **Type:** str + - **Required:** no + - **Default:** "Token" + - **Description:** The authentication scheme used in the FEDn API and gRPC interceptors. + +You can use `--token` flags in the FEDn CLI to set the access token. diff --git a/docs/index.rst b/docs/index.rst index 2a1452790..07f2dcdc9 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -14,7 +14,6 @@ architecture aggregators helpers - auth developer .. toctree:: From 5872de77def9cb24046a8d530d60721090230cd0 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Mon, 26 Aug 2024 00:00:00 +0200 Subject: [PATCH 06/12] latest wip --- docs/apiclient.rst | 4 +--- docs/index.rst | 1 - 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/apiclient.rst b/docs/apiclient.rst index 1bb011004..6097db570 100644 --- a/docs/apiclient.rst +++ b/docs/apiclient.rst @@ -7,9 +7,7 @@ FEDn comes with an *APIClient* - a Python3 library that can be used to interact In this tutorial we show how to use the APIClient to initialize the server-side with the compute package and seed models, run and control training sessions, use different aggregators, and to retrieve models and metrics. -We assume a basic understanding of the FEDn framework, i.e. that the user have taken the Getting Started tutorial: - -- :ref:`apiclient-label` +We assume a basic understanding of the FEDn framework, i.e. that the user have taken the Getting Started tutorial: :ref:`quickstart-label` **Installation** diff --git a/docs/index.rst b/docs/index.rst index 07f2dcdc9..d8f1c3541 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -23,7 +23,6 @@ faq modules - Indices and tables ================== From 5576c0422b9697a2c23f38c031404770494f37e3 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Mon, 26 Aug 2024 00:02:38 +0200 Subject: [PATCH 07/12] Removed prerequisite in projects page --- docs/projects.rst | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/docs/projects.rst b/docs/projects.rst index dd34956b2..3dc2f8dfd 100644 --- a/docs/projects.rst +++ b/docs/projects.rst @@ -1,22 +1,17 @@ .. _projects-label: ================================================ -Develop your own FEDn project +Develop your own project ================================================ This guide explains how a FEDn project is structured, and details how to develop and run your own projects. **In this article** -`Prerequisites`_ + `Overview`_ `Build a FEDn project`_ `Deploy a FEDn project`_ - -Prerequisites -============== - - Overview ========== From 4495dc93b0f357edc37069364a3845d76466335e Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Tue, 27 Aug 2024 16:26:03 +0200 Subject: [PATCH 08/12] wip --- docs/quickstart.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/quickstart.rst b/docs/quickstart.rst index 26e27053e..47ddeaa2e 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -128,7 +128,7 @@ Start a client (using a 10-split and the first partition) by running the followi export FEDN_PACKAGE_EXTRACT_DIR=package export FEDN_NUM_DATA_SPLITS=10 export FEDN_DATA_PATH=./data/clients/1/mnist.pt - fedn client start -in client.yaml --secure=True --force-ssl + fedn run client -in client.yaml --secure=True --force-ssl .. code-tab:: bash :caption: Windows (Powershell) @@ -136,7 +136,7 @@ Start a client (using a 10-split and the first partition) by running the followi $env:FEDN_PACKAGE_EXTRACT_DIR="package" $env:FEDN_NUM_DATA_SPLITS=10 $env:FEDN_DATA_PATH="./data/clients/1/mnist.pt" - fedn client start -in client.yaml --secure=True --force-ssl + fedn run client -in client.yaml --secure=True --force-ssl Repeat these two steps (generate an access token and start a local client) for the number of clients you want to use. A normal laptop should be able to handle several clients for this example. Remember to use different partitions for each client. From 263d1fe23ecda560c74fbd6126e5b60dbf234e0a Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Wed, 28 Aug 2024 15:35:02 +0200 Subject: [PATCH 09/12] Rework the project page, add code example --- docs/conf.py | 4 +- docs/projects.rst | 370 +++++++++++++++++++++++++++++++++++++--------- 2 files changed, 306 insertions(+), 68 deletions(-) diff --git a/docs/conf.py b/docs/conf.py index 6edc9b490..f52a282b3 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -25,6 +25,7 @@ "sphinx.ext.viewcode", "sphinx_rtd_theme", "sphinx_code_tabs", + "sphinx_design", ] # The master toctree document. @@ -97,7 +98,8 @@ # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ - (master_doc, "fedn", "FEDn Documentation", author, "fedn", "One line description of project.", "Miscellaneous"), + (master_doc, "fedn", "FEDn Documentation", author, "fedn", + "One line description of project.", "Miscellaneous"), ] # Bibliographic Dublin Core info. diff --git a/docs/projects.rst b/docs/projects.rst index 3dc2f8dfd..07d38ef9c 100644 --- a/docs/projects.rst +++ b/docs/projects.rst @@ -4,29 +4,24 @@ Develop your own project ================================================ -This guide explains how a FEDn project is structured, and details how to develop and run your own -projects. - -**In this article** - -`Overview`_ -`Build a FEDn project`_ -`Deploy a FEDn project`_ +This guide explains how a FEDn project is structured, and details how to develop your own +project. We assume knowledge of how to run a federated learning project with FEDn, corresponding to +the tutorial: :ref:`quickstart-label`. Overview ========== A FEDn project is a convention for packaging/wrapping machine learning code to be used for federated learning with FEDn. At the core, a project is a directory of files (often a Git repository), containing your machine learning code, FEDn entry points, and a specification -of the runtime environment (python environment or a Docker image). The FEDn API and command-line tools provides functionality +of the runtime environment for the client (python environment or a Docker image). The FEDn API and command-line tools provide functionality to help a user automate deployment and management of a project that follows the conventions. +The structure of a FEDn project +================================ -Build a FEDn project -===================== - -We recommend that projects have roughly the following folder and file structure: +We recommend that projects have the following folder and file structure, here illustrated by the 'mnist-pytorch' example from +the Getting Started Guide: | project | ├ client @@ -40,22 +35,28 @@ We recommend that projects have roughly the following folder and file structure: | │ └ mnist.npz | ├ README.md | ├ scripts / notebooks -| └ Dockerfile / docker-compose.yaml +| └ Dockerfile | -The ``client`` folder is commonly referred to as the *compute package* and it contains files with logic specific to a single client. The file ``fedn.yaml`` is the FEDn Project File and contains information about the commands that fedn will run when a client recieves a new train or validation request. These fedn commmands are referred to as ``entry points`` and there are up to four entry points in the project folder example given above that need to be specified, namely: -**build** - used for any kind of setup that needs to be done before the client starts up, such as initializing the global seed model. In the `quickstart tutorial`_, it runs model.py when called -**startup** - used immediately after the client starts up and the environment has been initalized. In the `quickstart tutorial`_, it runs data.py when invoked -**train** - runs train.py when called -**validate** - runs validate.py when called +The content of the ``client`` folder is what we commonly refer to as the *compute package*. It contains modules and files specifying the logic of a single client. +The file ``fedn.yaml`` is the FEDn Project File. It is used by FEDn to get information about the specific commands to run when building the initial 'seed model', +and when a client recieves a training request or a validation request from the server. +These commmands are referred to as the ``entry points``. -The compute package content (client folder) -------------------------------------------- +The compute package (client folder) +==================================== **The Project File (fedn.yaml)** -FEDn uses a project file named 'fedn.yaml' to specify which entry points to execute when the client recieves a training or validation request, and -what environment to execute those entry points in. +FEDn uses a project file 'fedn.yaml' to specify which entry points to execute when the client recieves a training or validation request, +and (optionally) what runtime environment to execute those entry points in. There are up to four entry points: + +- **build** - used for any kind of setup that needs to be done before the client starts up, such as initializing the global seed model. +- **startup** - invoked immediately after the client starts up and the environment has been initalized. +- **train** - invoked by the FEDn client to perform a model update. +- **validate** - invoked by the FEDn client to perform a model validation. + +To illustrate this, we look the ``fedn.yaml`` from the 'mnist-pytorch' project used in the Getting Started Guide: .. code-block:: yaml @@ -71,106 +72,341 @@ what environment to execute those entry points in. validate: command: python validate.py +In this example, all entrypoints are python scripts (model.py, data.py, train.py and validate.py). +They are executed by FEDn using the system default python interpreter 'python', in an environment with dependencies specified by "python_env.yaml". +Next, we look at the environment specification and each entry point in more detail. **Environment (python_env.yaml)** -It is assumed that all entry points are executable within the client runtime environment. As a user, you have two main options -to specify the environment: +FEDn assumes that all entry points (build, startup, train, validate) are executable within the client's runtime environment. You have two main options +to handle the environment: - 1. Provide a ``python_env`` in the ``fedn.yaml`` file. In this case, FEDn will create an isolated virtual environment and install the project dependencies into it before starting up the client. FEDn currently supports Virtualenv environments, with packages on PyPI. + 1. Let FEDn create and initalize the environment automatically by specifying ``python_env``. FEDn will then create an isolated virtual environment and install the dependencies specified in ``python_env.yaml`` into it before starting up the client. FEDn currently supports Virtualenv environments, with packages on PyPI. 2. Manage the environment manually. Here you have several options, such as managing your own virtualenv, running in a Docker container, etc. Remove the ``python_env`` tag from ``fedn.yaml`` to handle the environment manually. **build (optional):** -This entry point is used for any kind of setup that **needs to be done before the client starts up**, such as initializing the global seed model, and is intended to be called **once**. +This entry point is used for any kind of setup that **needs to be done to initialize FEDn prior to federated training**. +This is the only entrypoint not used by the client during global training rounds - rather it is used by the project initator. +Most often it is used to build the seed model. + +In the 'mnist-pytorch' example, ``build`` executes 'model.py' (shown below). This script contains the definition of the CNN model along with a main method +that instantiates a model object (with random weights), exctracts its parameters into a list of numpy arrays and writes them to a file "seed.npz". + + +.. code-block:: python + + import collections + + import torch + + from fedn.utils.helpers.helpers import get_helper + + HELPER_MODULE = "numpyhelper" + helper = get_helper(HELPER_MODULE) + + + def compile_model(): + """Compile the pytorch model. + + :return: The compiled model. + :rtype: torch.nn.Module + """ + + class Net(torch.nn.Module): + def __init__(self): + super(Net, self).__init__() + self.fc1 = torch.nn.Linear(784, 64) + self.fc2 = torch.nn.Linear(64, 32) + self.fc3 = torch.nn.Linear(32, 10) + + def forward(self, x): + x = torch.nn.functional.relu(self.fc1(x.reshape(x.size(0), 784))) + x = torch.nn.functional.dropout(x, p=0.5, training=self.training) + x = torch.nn.functional.relu(self.fc2(x)) + x = torch.nn.functional.log_softmax(self.fc3(x), dim=1) + return x + + return Net() + + + def save_parameters(model, out_path): + """Save model paramters to file. + + :param model: The model to serialize. + :type model: torch.nn.Module + :param out_path: The path to save to. + :type out_path: str + """ + parameters_np = [val.cpu().numpy() for _, val in model.state_dict().items()] + helper.save(parameters_np, out_path) + + + def load_parameters(model_path): + """Load model parameters from file and populate model. + + param model_path: The path to load from. + :type model_path: str + :return: The loaded model. + :rtype: torch.nn.Module + """ + model = compile_model() + parameters_np = helper.load(model_path) + + params_dict = zip(model.state_dict().keys(), parameters_np) + state_dict = collections.OrderedDict({key: torch.tensor(x) for key, x in params_dict}) + model.load_state_dict(state_dict, strict=True) + return model + + + def init_seed(out_path="seed.npz"): + """Initialize seed model and save it to file. + + :param out_path: The path to save the seed model to. + :type out_path: str + """ + # Init and save + model = compile_model() + save_parameters(model, out_path) + + + if __name__ == "__main__": + init_seed("../seed.npz") **startup (optional):** -Like the 'build' entry point, 'startup' is also called **once**, immediately after the client starts up and the environment has been initalized. -It can be used to do runtime configurations of the local execution environment. For example, in the `quickstart tutorial`_, -the startup entry point invokes a script that downloads the MNIST dataset and creates a partition to be used by that client. -This is a convenience useful for automation of experiments and not all clients will specify such a script. +The entry point 'startup' is used by the client. It is called **once**, immediately after the client starts up and the environment has been initalized. +It can be used to do runtime configurations of the client's local execution environment. +In the 'mnist-pytorch' project, the startup entry point invokes a script that downloads the MNIST dataset from an external server and creates a partition to be used by that client. +Not all projects will specify a startup script. In the case of the mnist-pytorch example it is simply used as a convenience to automate experiments by splitting +a publicly available dataset. However, in real-world settings with truly private data, the client will have the data locally. **train (mandatory):** -This entry point is invoked every time the client recieves a new model update (training) request. The training entry point must be a single-input single-output (SISO) program. It will be invoked by FEDn as such: +This entry point is invoked when the client recieves a new model update (training) request from the server. The training entry point must be a single-input single-output (SISO) program. +Upon recipt of a traing request, the FEDn client will download the latest version of the global model, write it to a (temporary) file and execute the command specified in the entrypoint: .. code-block:: python python train.py model_in model_out -where 'model_in' is the **file** containing the current global model to be updated, and 'model_out' is a **path** to write the new model update to. -Download and upload of these files are handled automatically by the FEDn client, the user only specifies how to read and parse the data contained in them (see `examples`_). +where 'model_in' is the **file** containing the current global model (parameters) to be updated, and 'model_out' is a **path** to write the new model update to (FEDn substitutes this path for tempfile location). +When a traing update is complete, FEDn reads the updated paramters from 'model_out' and streams them back to the server for aggregation. + +.. note:: + The training entrypoint must also write metadata to a json-file. The entry ``num_example`` is mandatory - it is used by the aggregators to compute a weighted average. The user can in addition choose to log other variables such as hyperparamters. These will then be stored in the backend database and accessible via the API and UI. + +In our 'mnist-pytorch' example, upon startup a client downloads the MNIST image dataset and creates partitions (one for each client). This partition is in turn divided +into a train/test split. The file 'train.py' (shown below) reads the train split, runs an epoch of training and writes the updated paramters to file. -The format of the input and output files (model updates) are using numpy ndarrays. A helper instance :py:mod:`fedn.utils.helpers.plugins.numpyhelper` is used to handle the serialization and deserialization of the model updates. +To learn more about how model serialization and model marshalling works in FEDn, see :ref:`helper-label` and :ref:`agg-label`. + +.. code-block:: python + + import math + import os + import sys + + import torch + from model import load_parameters, save_parameters + + from data import load_data + from fedn.utils.helpers.helpers import save_metadata + + dir_path = os.path.dirname(os.path.realpath(__file__)) + sys.path.append(os.path.abspath(dir_path)) + + + def train(in_model_path, out_model_path, data_path=None, batch_size=32, epochs=1, lr=0.01): + """Complete a model update. + + Load model paramters from in_model_path (managed by the FEDn client), + perform a model update, and write updated paramters + to out_model_path (picked up by the FEDn client). + + :param in_model_path: The path to the input model. + :type in_model_path: str + :param out_model_path: The path to save the output model to. + :type out_model_path: str + :param data_path: The path to the data file. + :type data_path: str + :param batch_size: The batch size to use. + :type batch_size: int + :param epochs: The number of epochs to train. + :type epochs: int + :param lr: The learning rate to use. + :type lr: float + """ + # Load data + x_train, y_train = load_data(data_path) + + # Load parmeters and initialize model + model = load_parameters(in_model_path) + + # Train + optimizer = torch.optim.SGD(model.parameters(), lr=lr) + n_batches = int(math.ceil(len(x_train) / batch_size)) + criterion = torch.nn.NLLLoss() + for e in range(epochs): # epoch loop + for b in range(n_batches): # batch loop + # Retrieve current batch + batch_x = x_train[b * batch_size : (b + 1) * batch_size] + batch_y = y_train[b * batch_size : (b + 1) * batch_size] + # Train on batch + optimizer.zero_grad() + outputs = model(batch_x) + loss = criterion(outputs, batch_y) + loss.backward() + optimizer.step() + # Log + if b % 100 == 0: + print(f"Epoch {e}/{epochs-1} | Batch: {b}/{n_batches-1} | Loss: {loss.item()}") + + # Metadata needed for aggregation server side + metadata = { + # num_examples are mandatory + "num_examples": len(x_train), + "batch_size": batch_size, + "epochs": epochs, + "lr": lr, + } + + # Save JSON metadata file (mandatory) + save_metadata(metadata, out_model_path) + + # Save model update (mandatory) + save_parameters(model, out_model_path) + + + if __name__ == "__main__": + train(sys.argv[1], sys.argv[2]) **validate (optional):** -The validation entry point is invoked every time the client recieves a validation request. It can be used to specify how a client should validate the current global -model on local test/validation data. It should read a model update from file, validate it (in any way suitable to the user), and write a **json file** containing validation data: +When training a global model with FEDn, the data scientist can choose to ask clients to perform local model validation of each new global model version +by specifying an entry point called 'validate'. + +Similar to the training entrypoint, the validation entry point must be a SISO program. It should reads a model update from file, validate it (in any way suitable to the user), and write a **json file** containing validation data: .. code-block:: python python validate.py model_in validations.json -The validate entry point is optional. +The content of the file 'validations.json' is captured by FEDn, passed on to the server and then stored in the database backend. The validate entry point is optional. +In our 'mnist-pytorch' example, upon startup a client downloads the MNIST image dataset and creates partitions (one for each client). This partition is in turn divided +into a train/test split. The file 'validate.py' (shown below) reads both the train and test splits and computes accuracy scores and the loss. -Deploy a FEDn project -=================== +It is a requirement that the output of validate.py is valid json. Furthermore, the FEDn Studio UI will be able to capture and visualize all **scalar metrics** +specified in this file. The entire conent of the json file will be retrievable programatically using the FEDn APIClient, and can be downloaded from the Studio UI. -We recommend you to test your entry points locally before deploying your FEDn project. You can test *train* and *validate* by (example for the mnist-keras -project): +.. code-block:: python + + import os + import sys + + import torch + from model import load_parameters + + from data import load_data + from fedn.utils.helpers.helpers import save_metrics + + dir_path = os.path.dirname(os.path.realpath(__file__)) + sys.path.append(os.path.abspath(dir_path)) + + + def validate(in_model_path, out_json_path, data_path=None): + """Validate model. + + :param in_model_path: The path to the input model. + :type in_model_path: str + :param out_json_path: The path to save the output JSON to. + :type out_json_path: str + :param data_path: The path to the data file. + :type data_path: str + """ + # Load data + x_train, y_train = load_data(data_path) + x_test, y_test = load_data(data_path, is_train=False) + + # Load model + model = load_parameters(in_model_path) + model.eval() + + # Evaluate + criterion = torch.nn.NLLLoss() + with torch.no_grad(): + train_out = model(x_train) + training_loss = criterion(train_out, y_train) + training_accuracy = torch.sum(torch.argmax(train_out, dim=1) == y_train) / len(train_out) + test_out = model(x_test) + test_loss = criterion(test_out, y_test) + test_accuracy = torch.sum(torch.argmax(test_out, dim=1) == y_test) / len(test_out) + + # JSON schema + report = { + "training_loss": training_loss.item(), + "training_accuracy": training_accuracy.item(), + "test_loss": test_loss.item(), + "test_accuracy": test_accuracy.item(), + } + + # Save JSON + save_metrics(report, out_json_path) -.. code-block:: bash + if __name__ == "__main__": + validate(sys.argv[1], sys.argv[2]) + +Testing the entrypoints +======================= + +We recommend you to test your training and validation entry points locally before creating the compute package and uploading it to Studio. +You can test *train* and *validate* by (example for the mnist-keras project): + +.. code-block:: bash + python train.py ../seed.npz ../model_update.npz --data_path ../data/mnist.npz python validate.py ../model_update.npz ../validation.json --data_path ../data/mnist.npz Note that we here assume execution in the correct Python environment. -To deploy a project to FEDn (Studio or pseudo-local) we simply compress the compute package as a .tgz file. using fedn command line tool or manually: + +Packaging for training on FEDn +=============================== + +To run a project on FEDn we compress the entire client folder as a .tgz file. There is a utility command in the FEDn CLI to do this: .. code-block:: bash fedn package create --path client +To learn how to initialize FEDn with the package seed model, see :ref:`quickstart-label`. + +How is FEDn using the project? +=============================== -The created file package.tgz can then be uploaded to the FEDn network using the :py:meth:`fedn.network.api.client.APIClient.set_package` API. FEDn then manages the distribution of the compute package to each client. -Upon receipt of the package, a client will unpack it and stage it locally. +With an understanding of the FEDn project, the compute package (entrypoints), we can take a closer look at how FEDn +is using the project during federated training. The figure below shows the logical view of how a training request +is handled. + +A training round is initiated by the controller. It asks a Combiner for a model update. The model in turn asks clients to compute a model update, by publishing a training request +to its request stream. The FEDn Client, :py:mod:`fedn.network.client`, subscribes to the stream and picks up the request. It then calls upon the Dispatcher, :py:mod:`fedn.utils.Dispatcher`. +The dispatcher reads the Project File, 'fedn.yaml', looking up the entry point definition and executes that command. Upon successful execution, the FEDn Client reads the +model update and metadata from file, and streams the content back to the combiner for aggregration. .. image:: img/ComputePackageOverview.png :alt: Compute package overview :width: 100% :align: center -The above figure provides a logical view of how FEDn uses the compute package. When the :py:mod:`fedn.network.client` -recieves a model update or validation request, it calls upon a Dispatcher that looks up entry point definitions -in the compute package from the FEDn Project File to determine which code files to execute. - -Before starting a training or validation session, the global seed model needs to be initialized which in our example is done by invoking the build entry point. - -To invoke the build entry point using the CLI: - -.. code-block:: bash - fedn run build --path client - - -More on local data access --------------------------- - -There are many possible ways to interact with the local dataset. In principle, the only requirement is that the train and validate end points are able to correctly -read and use the data. In practice, it is then necessary to make some assumption on the local environemnt when writing entrypoint.py. This is best explained -by looking at the code above. Here we assume that the dataset is present in a file called "mnist.npz" in a folder "data" one level up in the file hierarchy relative to -the execution of entrypoint.py. Then, independent of the preferred way to run the client (native, Docker, K8s etc) this structure needs to be maintained for this particular -compute package. Note however, that there are many ways to accomplish this on a local operational level. Where to go from here? ------------------------ +====================== With an understanding of how FEDn Projects are structured and created, you can explore our library of example projects. They demonstrate different use case scenarios of FEDn and its integration with popular machine learning frameworks like PyTorch and TensorFlow. From 7ce942c681fe5b5ab52265f7e87536b5b2c746a4 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Wed, 28 Aug 2024 15:53:50 +0200 Subject: [PATCH 10/12] Fixed test compute package section --- docs/projects.rst | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/docs/projects.rst b/docs/projects.rst index 07d38ef9c..4ff75549f 100644 --- a/docs/projects.rst +++ b/docs/projects.rst @@ -366,15 +366,24 @@ Testing the entrypoints ======================= We recommend you to test your training and validation entry points locally before creating the compute package and uploading it to Studio. -You can test *train* and *validate* by (example for the mnist-keras project): +To run the 'build' entrypoint and create the seed model (deafult filename 'seed.npz'): -.. code-block:: bash - - python train.py ../seed.npz ../model_update.npz --data_path ../data/mnist.npz - python validate.py ../model_update.npz ../validation.json --data_path ../data/mnist.npz +.. code-block:: python + + fedn run build --path client + +Run the 'startup' entrypoint to download the dataset: -Note that we here assume execution in the correct Python environment. +.. code-block:: python + + fedn run startup --path client + +Then, standing inside the 'client folder', you can test *train* and *validate* by: +.. code-block:: bash + + python train.py ../seed.npz ../model_update.npz --data_path data/clients/1/mnist.pt + python validate.py ../model_update.npz ../validation.json --data_path data/clients/1/mnist.pt Packaging for training on FEDn =============================== From 110c18887cb42b238d6c07e07c69ce6caf82040b Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Wed, 28 Aug 2024 15:54:16 +0200 Subject: [PATCH 11/12] Removed unused file --- docs/studio.rst | 90 ------------------------------------------------- 1 file changed, 90 deletions(-) delete mode 100644 docs/studio.rst diff --git a/docs/studio.rst b/docs/studio.rst deleted file mode 100644 index e52a9fa6a..000000000 --- a/docs/studio.rst +++ /dev/null @@ -1,90 +0,0 @@ -.. _studio: - -Studio -=============== - -FEDn Studio is a web-based tool for managing and monitoring federated learning experiments. It provides the FEDn network as a managed service, as well as a user-friendly interface for monitoring the progress of training and visualizing the results. FEDn Studio is available as a SaaS at `fedn.scaleoutsystems.com `_ . It is free for development, testing and research (one project per user, backend compute resources sized for dev/test). - -Scaleout can also support users to scale up experiments and demonstrators on Studio, by granting custom resource quotas. Additionally, charts are available for self-managed deployment on-premise or in your cloud VPC (all major cloud providers). Contact the Scaleout team for more information. - -Getting started ---------------- - -Before you can start using Studio, you will need an account. Head over to `fedn.scaleoutsystems.com/signup `_ and sign up. - -**Create a project** - -Start by creating a new project. A project can be used to organize your work. It can be shared with other users, allowing you to collaborate on experiments. - -1. Click on the "New Project" button in the top right corner of the screen. -2. Continue by clicking the "Create button". The FEDn template contains all the services necessary to start a federation. -3. Enter the project name (mandatory). The project description is optional. -4. Click the "Create" button to create the project. - -**Project overview** - -Once you have created a project, you can find it via the sidebar link Projects. Here you will find the list of all your projects. When inside a project you can see the following tabs in the sidebar: - -1. **Dashboard**: The dashboard provides an overview of the project. The controller and combiner(s) are listed under "Network". This is also where you can find the current FEDn version and have the option to upgrade to a newer version if available. -2. **Clients**: management of client configurations and a list of current clients. Observe that this feature does not deploy clients, instead it configures a client config that contains a unique token which is required to connect to the reducer and the combiner. -3. **Combiners**: a list of combiners. Observe number of active clients for each combiner. -4. **Sessions**: a list of sessions with related models. Configure and start a new session. Upload compute package and seed model, set number of rounds, timeout limit etc. -5. **Models**: a list of models generated across sessions, and dashboards for visualizing training progress. -6. **Events**: a log of events from the combiner and the clients of the federated network. -7. **Settings**: project settings, including the option to give access to other users and to delete the project. - -.. image:: img/studio_project_overview.png - - -Package and seed model ----------------------- - -Please see :ref:`package-creation` for instructions on how to create a package and a seed model. - -.. _studio-upload-files: - -Upload files ------------- - -In the Studio UI, navigate to the project you created and click on the "Sessions" tab. Click on the "New Session" button. Under the "Compute package" tab, select a name and upload the generated package file. Under the "Seed model" tab, upload the generated seed file: - -.. image:: img/upload_package.png - -Connect a client ----------------- - -Navigate to "Clients" in the sidebar. - -Click on the "Connect client" button. Follow the instructions on the site to connect the client. -Alternatively, you can connect the client using a docker container by running the following command: - -.. code-block:: bash - - docker run \ - -v $PWD/client.yaml:/app/client.yaml \ - ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client --secure=True --force-ssl -in client.yaml - -If the client is successfully connected, you should see the client listed in the "Clients log" list. - -Start a training session ------------------------- - -In Studio click on the "Sessions" link, then the "New session" button in the upper right corner. Click the "Start session" tab and enter your desirable settings (or use default) and hit the "Start run" button. In the terminal where your are running your client you should now see some activity. When the round is completed, you can see the results in the FEDn Studio UI on the "Models" page. - -Watch the training progress ---------------------------- - -Once a training session is started, you can monitor the progress of the training by navigating to "Sessions" and click on the "Open" button of the active session. The session page will list the models as soon as they are generated. To get more information about a particular model, navigate to the model page by clicking the model name. From the model page you can download the model weights and get validation metrics. - -To get an overview of how the models have evolved over time, navigate to the "Models" tab in the sidebar. Here you can see a list of all models generated across sessions along with a graph showing some metrics of how the models are performing. - -.. image:: img/studio_model_overview.png - -.. _studio-api: - -Accessing the API ------------------ - -The FEDn Studio API is available at /api/v1/. The controller host can be found in the project dashboard. Further, to access the API you need an admin API token. -Navigate to the "Settings" tab in the project and click on the "Generate token" button. Copy the token and use it to access the API. Please see :py:mod:`fedn.network.api` for how to pass the token to the APIClient. - From 8eec5404e166e320185ca4547bd184d3d11804c9 Mon Sep 17 00:00:00 2001 From: Andreas Hellander Date: Fri, 30 Aug 2024 13:11:33 +0200 Subject: [PATCH 12/12] Fixed API page and some headings --- docs/apiclient.rst | 54 ++++++++++++++++++++++++++++--------------- docs/developer.rst | 9 ++++++-- docs/helpers.rst | 2 +- docs/introduction.rst | 9 ++++---- 4 files changed, 48 insertions(+), 26 deletions(-) diff --git a/docs/apiclient.rst b/docs/apiclient.rst index 6097db570..034709a06 100644 --- a/docs/apiclient.rst +++ b/docs/apiclient.rst @@ -39,7 +39,7 @@ Then passing a token as an argument is not required. >>> from fedn import APIClient >>> client = APIClient(host="", secure=True, verify=True) -**Set the active package and seed model** +**Set the active compute package and seed model** To set the active compute package in the FEDn Studio Project: @@ -52,13 +52,8 @@ To set the active compute package in the FEDn Studio Project: **Start a training session** -Once the active package and seed model are set, you can connect clients to the network and start training models. The following code snippet starts a traing session: - -.. code-block:: python - - session = client.start_session(id="session_name") - -**Run training sessions using the Python APIClient** +Once the active package and seed model are set, you can connect clients to the network and start training models. To run a training session +using the default aggregator (FedAvg): .. code:: python @@ -72,17 +67,38 @@ Once the active package and seed model are set, you can connect clients to the n >>> model_id = models[-1]['model'] >>> validations = client.get_validations(model_id=model_id) -**Accessing global models** +To run a session using the FedAdam aggregator using custom hyperparamters: + +.. code-block:: python + + >>> session_id = "experiment_fedadam" -You can also access global model updates via the APIClient: + >>> session_config = { + "helper": "numpyhelper", + "id": session_id, + "aggregator": "fedopt", + "aggregator_kwargs": { + "serveropt": "adam", + "learning_rate": 1e-2, + "beta1": 0.9, + "beta2": 0.99, + "tau": 1e-4 + }, + "model_id": seed_model['model'], + "rounds": 10 + } + + >>> result_fedadam = client.start_session(**session_config) + +**Download a global model** + +To download a global model and write it to file: .. code:: python >>> ... >>> client.download_model("", path="model.npz") -Please see :py:mod:`fedn.network.api` for more details on how to use the APIClient. - **List data** Other than starting training sessions, the APIClient can be used to get data from the network, such as sessions, models etc. All entities are represented and they all work in a similar fashion. @@ -101,17 +117,17 @@ Entities represented in the APIClient are: * statuses * validations -The following code snippet shows how to list all sessions: - +To list all sessions: .. code-block:: python - sessions = client.get_sessions() + >>> sessions = client.get_sessions() -And the following code snippet shows how to get a specific session: +To get a specific session: .. code-block:: python - session = client.get_session(id="session_name") - + >>> session = client.get_session(id="session_name") -For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`, and the example `Notebooks `_. +For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`, and the collection of example Jupyter Notebooks: + +- `API Example `_ . \ No newline at end of file diff --git a/docs/developer.rst b/docs/developer.rst index bb30c6f00..a631657f7 100644 --- a/docs/developer.rst +++ b/docs/developer.rst @@ -1,7 +1,12 @@ .. _developer-label: -Local development sandbox -========================= +================ +Developer guide +================ + + +Pseudo-distributed sandbox +=========================== .. note:: These instructions are for users wanting to set up a bare-minimum local deployment of FEDn (without FEDn Studio). diff --git a/docs/helpers.rst b/docs/helpers.rst index 8f89a8317..277ec1c40 100644 --- a/docs/helpers.rst +++ b/docs/helpers.rst @@ -1,6 +1,6 @@ .. _helper-label: -Model Serialization/Deserialization +Model marshalling =================================== In federated learning, model updates need to be serialized and deserialized in order to be diff --git a/docs/introduction.rst b/docs/introduction.rst index a8857ca7d..e4d77283e 100644 --- a/docs/introduction.rst +++ b/docs/introduction.rst @@ -13,10 +13,11 @@ The server aggregates and combines the gradients from multiple participants to u This iterative process allows the global model to improve without the need to share the raw data. -FEDn: An enterprise-ready federated learning framework -------------------------------------------------------- +The FEDn framework +-------------------- -Our goal is to provide a federated learning framework that is both secure, scalable and easy-to-use. We believe that that minimal code change should be needed to progress from early proof-of-concepts to production. This is reflected in our core design: +The goal with FEDn is to provide a federated learning framework that is secure, scalable and easy-to-use. Our ambition is that FEDn supports the full journey from early +testing/exploration, through pilot projects, to real-world depoyments and integration. We believe that that minimal code change should be needed to progress from early proof-of-concepts to production. This is reflected in our core design: - **Minimal server-side complexity for the end-user**. Running a proper distributed FL deployment is hard. With FEDn Studio we seek to handle all server-side complexity and provide a UI, REST API and a Python interface to help users manage FL experiments and track metrics in real time. @@ -66,4 +67,4 @@ Support Community support in available in our `Discord server `__. -Options are available for `Enterprise support `__. \ No newline at end of file +For professionals / Enteprise, we offer `Dedicated support `__. \ No newline at end of file