From 30360de886700cbbc66404aba4b3779288ce5670 Mon Sep 17 00:00:00 2001 From: Andreas Hellander <andreas@scaleoutsystems.com> Date: Wed, 17 Jul 2024 11:06:40 +0200 Subject: [PATCH 1/8] Docs/SK-000 | Update main readme (#652) * Update README.rst Update main readme to clarify use of Studio a bit more. * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst --- README.rst | 30 ++++++++++++++---------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/README.rst b/README.rst index a13d7463f..7c32fc7bd 100644 --- a/README.rst +++ b/README.rst @@ -9,50 +9,48 @@ .. |pic3| image:: https://readthedocs.org/projects/fedn/badge/?version=latest&style=flat :target: https://fedn.readthedocs.io -FEDn --------- +FEDn: An enterprise-ready federated learning framework +------------------------------------------------------- -FEDn empowers its users to create federated learning applications that seamlessly transition from local proofs-of-concept to secure distributed deployments. +Our goal is to provide a federated learning framework that is both secure, scalable and easy to use. We believe that that minimal code change should be needed to progress from early proof-of-concepts to production. This is reflected in our core design principles: -Leverage a flexible pseudo-local sandbox to rapidly transition your existing ML project to a federated setting. Test and scale in real-world scenarios using FEDn Studio - a fully managed, secure deployment of all server-side components (SaaS). +- **Data-scientist friendly**. A ML-framework agnostic design lets data scientists implement use-cases using their framework of choice. A UI and a Python API enables users to manage complex FL experiments and track metrics in real time. -We develop the FEDn framework following these core design principles: +- **Secure by design.** FL clients do not need to open any ingress ports. Industry-standard communication protocols (gRPC) and token-based authentication and RBAC (JWT) provides flexible integration in a range of production environments. -- **Seamless transition from proof-of-concepts to real-world FL**. FEDn has been designed to make the journey from R&D to real-world deployments as smooth as possibe. Develop your federated learning use case in a pseudo-local environment, then deploy it to FEDn Studio (cloud or on-premise) for real-world scenarios. No code change is required to go from development and testing to production. +- **Cloud native.** By following cloud native design principles, we ensure a wide range of deployment options including private cloud and on-premise infrastructure. Reference deployment here: https://fedn.scaleoutsystems.com. -- **Designed for scalability and resilience.** FEDn enables model aggregation through multiple aggregation servers sharing the workload. A hierarchical architecture makes the framework well suited borh for cross-silo and cross-device use-cases. FEDn seamlessly recover from failures in all critical components, and manages intermittent client-connections, ensuring robust deployment in production environments. +- **Scalability and resilience.** Multiple aggregation servers (combiners) can share the workload. FEDn seamlessly recover from failures in all critical components and manages intermittent client-connections. -- **Secure by design.** FL clients do not need to open any ingress ports, facilitating distributed deployments across a wide variety of settings. Additionally, FEDn utilizes secure, industry-standard communication protocols and supports token-based authentication and RBAC for FL clients (JWT), providing flexible integration in production environments. - -- **Developer and data scientist friendly.** Extensive event logging and distributed tracing enables developers to monitor experiments in real-time, simplifying troubleshooting and auditing. Machine learning metrics can be accessed via both a Python API and visualized in an intuitive UI that helps the data scientists analyze and communicate ML-model training progress. +- **Developer friendly.** Extensive event logging and distributed tracing enables developers to monitor the sytem in real-time, simplifying troubleshooting and auditing. +We provide a fully managed deployment free of charge for for testing, academic, and personal use. Sign up for a `FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__ and take the `Quickstart tutorial <https://fedn.readthedocs.io/en/stable/quickstart.html>`__. Features ========= -Core FL framework (this repository): +Federated learning: - Tiered federated learning architecture enabling massive scalability and resilience. - Support for any ML framework (examples for PyTorch, Tensforflow/Keras and Scikit-learn) - Extendable via a plug-in architecture (aggregators, load balancers, object storage backends, databases etc.) - Built-in federated algorithms (FedAvg, FedAdam, FedYogi, FedAdaGrad, etc.) -- CLI and Python API. +- UI, CLI and Python API. - Implement clients in any language (Python, C++, Kotlin etc.) - No open ports needed client-side. -- Flexible deployment of server-side components using Docker / docker compose. -FEDn Studio - From development to FL in production: +From development to FL in production: - Secure deployment of server-side / control-plane on Kubernetes. -- UI with dashboards for orchestrating experiments and visualizing results +- UI with dashboards for orchestrating FL experiments and for visualizing results - Team features - collaborate with other users in shared project workspaces. - Features for the trusted-third party: Manage access to the FL network, FL clients and training progress. - REST API for handling experiments/jobs. - View and export logging and tracing information. - Public cloud, dedicated cloud and on-premise deployment options. -Available clients: +Available client APIs: - Python client (this repository) - C++ client (`FEDn C++ client <https://github.com/scaleoutsystems/fedn-cpp-client>`__) From a746e4e671faf0ae90443453e0d2badc7a5c324a Mon Sep 17 00:00:00 2001 From: Andreas Hellander <andreas@scaleoutsystems.com> Date: Tue, 23 Jul 2024 14:08:00 +0200 Subject: [PATCH 2/8] Docs/SK-924 | Update local development guide (#657) * Reorganize menu, remove local dev instructions from readme in git repo * Rename distributed.rst to developer.rst * Improved quickstart * Updated project page * Improved readibility a bit * Remove Studio from the index (keep the file), most info is now in quickstart * Make the APIClient page Studio-centric * Less detailed instructions for setting up the Studio project * Add instructions to clone the repo. * Clarify that we only use the example from the local clone of fedn * Fix path * Remove Docker client instructions from quickstart * Fixed typos and some broken refs * Updated example readme in the github repo * Fix subheadings in developer guide * Remove local sandbox instructions from all examples. --------- Co-authored-by: Andreas Hellander <andreas.hellander@gmail.com> --- docs/developer.rst | 25 +++----- docs/projects.rst | 15 +++-- examples/FedSimSiam/README.rst | 2 +- examples/flower-client/README.rst | 44 ++------------ examples/monai-2D-mednist/README.rst | 89 +--------------------------- 5 files changed, 24 insertions(+), 151 deletions(-) diff --git a/docs/developer.rst b/docs/developer.rst index 3e05357b4..8a9e4b87d 100644 --- a/docs/developer.rst +++ b/docs/developer.rst @@ -1,14 +1,14 @@ .. _developer-label: -Local development -================= +Local development and deployment +================================ .. note:: These instructions are for users wanting to set up a local development deployment of FEDn (i.e. without FEDn Studio). This requires practical knowledge of Docker and docker-compose. Running the FEDn development sandbox (docker-compose) -===================================================== +------------------------------------------------------ During development on FEDn, and when working on own aggregators/helpers, it is useful to have a local development setup of the core FEDn services (controller, combiner, database, object store). @@ -45,8 +45,7 @@ To connect a native FEDn client, you need to make sure that the combiner service One way to achieve this is to edit your '/etc/hosts' and add a line '127.0.0.1 combiner'. Access message logs and validation data from MongoDB -==================================================== - +------------------------------------------------------ You can access and download event logs and validation data via the API, and you can also as a developer obtain the MongoDB backend data using pymongo or via the MongoExpress interface: @@ -55,7 +54,7 @@ the MongoDB backend data using pymongo or via the MongoExpress interface: Username and password are found in 'docker-compose.yaml'. Access global models -==================== +------------------------------------------------------ You can obtain global model updates from the 'fedn-models' bucket in Minio: @@ -64,13 +63,13 @@ You can obtain global model updates from the 'fedn-models' bucket in Minio: Username and password are found in 'docker-compose.yaml'. Reset the FEDn deployment -========================= +------------------------------------------------------ To purge all data from a deployment incuding all session and round data, access the MongoExpress UI interface and delete the entire ``fedn-network`` collection. Then restart all services. Clean up -======== +------------------------------------------------------ You can clean up by running .. code-block:: @@ -79,7 +78,7 @@ You can clean up by running Connecting clients using Docker: -================================ +------------------------------------------------------ For convenience, we distribute a Docker image hosted on ghrc.io with FEDn preinstalled. For example, to start a client for the MNIST PyTorch example using Docker and FEDN 0.10.0, run this from the example folder: @@ -95,7 +94,7 @@ and FEDN 0.10.0, run this from the example folder: Self-managed distributed deployment -=================================== +------------------------------------------------------ You can use different hosts for the various FEDn services. These instructions shows how to set up FEDn on a **local network** using a single workstation or laptop as the host for the servier-side components, and other hosts or devices as clients. @@ -160,9 +159,3 @@ Alternatively updating the `/etc/hosts` file, appending the following lines for <host local ip> api-server <host local ip> combiner - - -Start a training session ------------------------- - -After connecting with your clients, you are ready to start training sessions from the host machine. \ No newline at end of file diff --git a/docs/projects.rst b/docs/projects.rst index 2b86faa62..2cf31f23f 100644 --- a/docs/projects.rst +++ b/docs/projects.rst @@ -4,7 +4,7 @@ Develop your own project ================================================ This guide explains how a FEDn project is structured, and details how to develop your own -projects for your own use-cases. +projects. A FEDn project is a convention for packaging/wrapping machine learning code to be used for federated learning with FEDn. At the core, a project is a directory of files (often a Git repository), containing your machine learning code, FEDn entry points, and a specification @@ -71,11 +71,12 @@ to specify the environment: 1. Provide a ``python_env`` in the ``fedn.yaml`` file. In this case, FEDn will create an isolated virtual environment and install the project dependencies into it before starting up the client. FEDn currently supports Virtualenv environments, with packages on PyPI. 2. Manage the environment manually. Here you have several options, such as managing your own virtualenv, running in a Docker container, etc. Remove the ``python_env`` tag from ``fedn.yaml`` to handle the environment manually. -**Entry Points** +Entry Points +------------- There are up to four Entry Points to be specified. -**Build Entrypoint (build, optional):** +**build (optional):** This entrypoint is intended to be called **once** for building artifacts such as initial seed models. However, it not limited to artifacts, and can be used for any kind of setup that needs to be done before the client starts up. @@ -85,16 +86,14 @@ To invoke the build entrypoint using the CLI: fedn build -- - -**Startup Entrypoint (startup, optional):** - +**startup (optional):** This entrypoint is called **once**, immediately after the client starts up and the environment has been initalized. It can be used to do runtime configurations of the local execution environment. For example, in the quickstart tutorial example, the startup entrypoint invokes a script that downloads the MNIST dataset and creates a partition to be used by that client. This is a convenience useful for automation of experiments and not all clients will specify such a script. -**Training Entrypoint (train, mandatory):** +**train (mandatory):** This entrypoint is invoked every time the client recieves a new model update request. The training entry point must be a single-input single-output (SISO) program. It will be invoked by FEDn as such: @@ -105,7 +104,7 @@ This entrypoint is invoked every time the client recieves a new model update req where 'model_in' is the file containing the current global model to be updated, and 'model_out' is a path to write the new model update to. Download and upload of these files are handled automatically by the FEDn client, the user only specifies how to read and parse the data contained in them (see examples) . -**Validation Entrypoint (validate, optional):** +**validate (optional):** The validation entry point works in a similar was as the trainig entrypoint. It can be used to specify how a client should validate the current global model on local test/validation data. It should read a model update from file, validate it (in any way suitable to the user), and write a **json file** containing validation data: diff --git a/examples/FedSimSiam/README.rst b/examples/FedSimSiam/README.rst index e9afd02e5..5831fd3ea 100644 --- a/examples/FedSimSiam/README.rst +++ b/examples/FedSimSiam/README.rst @@ -68,4 +68,4 @@ In the figure below we can see that the kNN accuracy increases over the training indicating that the training of FedSimSiam is proceeding as intended. .. image:: figs/fedsimsiam_monitoring.png - :width: 50% \ No newline at end of file + :width: 50% diff --git a/examples/flower-client/README.rst b/examples/flower-client/README.rst index fff8e20b3..4207ee019 100644 --- a/examples/flower-client/README.rst +++ b/examples/flower-client/README.rst @@ -47,10 +47,10 @@ a FEDn network. Here you have two main options: using FEDn Studio (recommended for new users), or a self-managed pseudo-distributed deployment on your own machine. -If you are using FEDn Studio (recommended): +Using FEDn Studio: ------------------------------------------- -Follow instructions here to register for Studio and start a project: https://fedn.readthedocs.io/en/stable/studio.html. +Follow instructions here to register for Studio and start a project: https://fedn.readthedocs.io/en/stable/quickstart.html. In your Studio project: @@ -73,47 +73,13 @@ Or, if you prefer to use Docker (this might take a long time): -v $PWD/client.yaml:/app/client.yaml \ -e CLIENT_NUMBER=0 \ -e FEDN_PACKAGE_EXTRACT_DIR=package \ - ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --secure=True --force-ssl - - -If you are running FEDn in local development mode: --------------------------------------------------- - -Deploy a FEDn network on local host (see `https://fedn.readthedocs.io/en/stable/quickstart.html#local-development-deployment-using-docker-compose`). - -Use the FEDn API Client to initalize FEDn with the compute package and seed model: - -.. code-block:: - - python init_fedn.py - -Create a file 'client.yaml' with the following content: - -.. code-block:: - - network_id: fedn-network - discover_host: api-server - discover_port: 8092 - name: myclient - -Then start the client (using Docker) - -.. code-block:: - - docker run \ - -v $PWD/client.yaml:/app/client.yaml \ - --network=fedn_default \ - -e CLIENT_NUMBER=0 \ - -e FEDN_PACKAGE_EXTRACT_DIR=package \ - ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml - + ghcr.io/scaleoutsystems/fedn/fedn:0.11.1 run client -in client.yaml --secure=True --force-ssl Scaling to multiple clients ------------------------------------------------------------------ -To scale the experiment with additional clients on the same host, execute the run command -again from another terminal. If running from another host, add another 'client.yaml', install -fedn, and execute the run command. In both cases inject a client number as an environment +To scale the experiment with additional clients on the same host, generate another 'client.yaml' and execute the run command +again from another terminal. Inject a client number as an environment varible which is used for distributing data (see 'flwr_task.py'). For Unix Operating Systems: diff --git a/examples/monai-2D-mednist/README.rst b/examples/monai-2D-mednist/README.rst index cb46047ed..f61820682 100644 --- a/examples/monai-2D-mednist/README.rst +++ b/examples/monai-2D-mednist/README.rst @@ -19,11 +19,6 @@ Using FEDn Studio: - `Python 3.8, 3.9, 3.10 or 3.11 <https://www.python.org/downloads>`__ - `A FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__ -If using pseudo-distributed mode with docker-compose: - -- `Docker <https://docs.docker.com/get-docker>`__ -- `Docker Compose <https://docs.docker.com/compose/install>`__ - Creating the compute package and seed model ------------------------------------------- @@ -74,11 +69,10 @@ below command we divide the dataset into 10 parts. python prepare_data.py 10 - Using FEDn Studio ----------------- -Follow the guide here to set up your FEDn Studio project and learn how to connect clients (using token authentication): `Studio guide <https://fedn.readthedocs.io/en/stable/studio.html>`__. +Follow the guide here to set up your FEDn Studio project and learn how to connect clients (using token authentication): `Studio guide <https://fedn.readthedocs.io/en/stable/quickstart.html>`__. On the step "Upload Files", upload 'package.tgz' and 'seed.npz' created above. Connecting clients: @@ -110,83 +104,4 @@ For convenience, there is a Docker image hosted on ghrc.io with fedn preinstalle -e FEDN_DATA_PATH=/app/data/ \ -e FEDN_CLIENT_SETTINGS_PATH=/app/client_settings.yaml \ -e FEDN_DATA_SPLIT_INDEX=0 \ - ghcr.io/scaleoutsystems/fedn/fedn:0.9.0 run client -in client.yaml --force-ssl --secure=True - - -**NOTE: The following instructions are only for SDK-based client communication and for local development environments using Docker.** - - -Local development mode using Docker/docker compose --------------------------------------------------- - -Follow the steps above to install FEDn, generate 'package.tgz' and 'seed.tgz'. - -Start a pseudo-distributed FEDn network using docker-compose: - -.. code-block:: - - docker compose \ - -f ../../docker-compose.yaml \ - -f docker-compose.override.yaml \ - up - -This starts up local services for MongoDB, Minio, the API Server, one Combiner and two clients. -You can verify the deployment using these urls: - -- API Server: http://localhost:8092/get_controller_status -- Minio: http://localhost:9000 -- Mongo Express: http://localhost:8081 - -Upload the package and seed model to FEDn controller using the APIClient. In Python: - -.. code-block:: - - from fedn import APIClient - client = APIClient(host="localhost", port=8092) - client.set_active_package("package.tgz", helper="numpyhelper") - client.set_active_model("seed.npz") - -You can now start a training session with 5 rounds (default): - -.. code-block:: - - client.start_session() - -Automate experimentation with several clients -============================================= - -If you want to scale the number of clients, you can do so by modifying ``docker-compose.override.yaml``. For example, -in order to run with 3 clients, change the environment variable ``FEDN_NUM_DATA_SPLITS`` to 3, and add one more client -by copying ``client1``. - - -Access message logs and validation data from MongoDB -==================================================== - -You can access and download event logs and validation data via the API, and you can also as a developer obtain -the MongoDB backend data using pymongo or via the MongoExpress interface: - -- http://localhost:8081/db/fedn-network/ - -The credentials are as set in docker-compose.yaml in the root of the repository. - -Access global models -==================== - -You can obtain global model updates from the 'fedn-models' bucket in Minio: - -- http://localhost:9000 - -Reset the FEDn deployment -========================= - -To purge all data from a deployment incuding all session and round data, access the MongoExpress UI interface and -delete the entire ``fedn-network`` collection. Then restart all services. - -Clean up -======== -You can clean up by running - -.. code-block:: - - docker-compose -f ../../docker-compose.yaml -f docker-compose.override.yaml down -v + ghcr.io/scaleoutsystems/fedn/fedn:0.11.1 run client -in client.yaml --force-ssl --secure=True \ No newline at end of file From efcea628dfdc84470bfd6237e01546d7bf04f748 Mon Sep 17 00:00:00 2001 From: Andreas Hellander <andreas@scaleoutsystems.com> Date: Wed, 24 Jul 2024 23:19:05 +0200 Subject: [PATCH 3/8] Docs/SK-000 | Update main readme (#665) * Update readme * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst * Update README.rst --- README.rst | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/README.rst b/README.rst index 7c32fc7bd..c0fbc2836 100644 --- a/README.rst +++ b/README.rst @@ -12,19 +12,21 @@ FEDn: An enterprise-ready federated learning framework ------------------------------------------------------- -Our goal is to provide a federated learning framework that is both secure, scalable and easy to use. We believe that that minimal code change should be needed to progress from early proof-of-concepts to production. This is reflected in our core design principles: +Our goal is to provide a federated learning framework that is both secure, scalable and easy-to-use. We believe that that minimal code change should be needed to progress from early proof-of-concepts to production. This is reflected in our core design: -- **Data-scientist friendly**. A ML-framework agnostic design lets data scientists implement use-cases using their framework of choice. A UI and a Python API enables users to manage complex FL experiments and track metrics in real time. +- **Minimal server-side complexity for the end-user**. Running a proper distributed FL deployment is hard. With FEDn Studio we seek to handle all server-side complexity and provide a UI, REST API and a Python interface to help users manage FL experiments and track metrics in real time. -- **Secure by design.** FL clients do not need to open any ingress ports. Industry-standard communication protocols (gRPC) and token-based authentication and RBAC (JWT) provides flexible integration in a range of production environments. +- **Secure by design.** FL clients do not need to open any ingress ports. Industry-standard communication protocols (gRPC) and token-based authentication and RBAC (Jason Web Tokens) provides flexible integration in a range of production environments. -- **Cloud native.** By following cloud native design principles, we ensure a wide range of deployment options including private cloud and on-premise infrastructure. Reference deployment here: https://fedn.scaleoutsystems.com. +- **ML-framework agnostic**. A black-box client-side architecture lets data scientists interface with their framework of choice. + +- **Cloud native.** By following cloud native design principles, we ensure a wide range of deployment options including private cloud and on-premise infrastructure. - **Scalability and resilience.** Multiple aggregation servers (combiners) can share the workload. FEDn seamlessly recover from failures in all critical components and manages intermittent client-connections. -- **Developer friendly.** Extensive event logging and distributed tracing enables developers to monitor the sytem in real-time, simplifying troubleshooting and auditing. +- **Developer and DevOps friendly.** Extensive event logging and distributed tracing enables developers to monitor the sytem in real-time, simplifying troubleshooting and auditing. Extensions and integrations are facilitated by a flexible plug-in architecture. -We provide a fully managed deployment free of charge for for testing, academic, and personal use. Sign up for a `FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__ and take the `Quickstart tutorial <https://fedn.readthedocs.io/en/stable/quickstart.html>`__. +We provide a fully managed deployment for testing, academic, and personal use. Sign up for a `FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__ and take the `Quickstart tutorial <https://fedn.readthedocs.io/en/stable/quickstart.html>`__ to get started with FEDn. Features ========= @@ -62,11 +64,11 @@ Getting started Get started with FEDn in two steps: -1. Sign up for a `Free FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__ +1. Register for a `FEDn Studio account <https://fedn.scaleoutsystems.com/signup>`__ 2. Take the `Quickstart tutorial <https://fedn.readthedocs.io/en/stable/quickstart.html>`__ -FEDn Studio (SaaS) is free for academic use and personal development / small-scale testing and exploration. For users and teams requiring -additional project resources, dedicated support or other hosting options, `explore our plans <https://www.scaleoutsystems.com/start#pricing>`__. +Use of our multi-tenant, managed deployment of FEDn Studio (SaaS) is free forever for academic research and personal development/testing purposes. +For users and teams requiring additional resources, more storage and cpu, dedicated support, and other hosting options (private cloud, on-premise), `explore our plans <https://www.scaleoutsystems.com/start#pricing>`__. Documentation ============= From 264c6fec30dfdd6c58115de2e1b6767cdf16e34b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonas=20Frankem=C3=B6lle?= <48800769+FrankJonasmoelle@users.noreply.github.com> Date: Wed, 14 Aug 2024 11:56:25 +0200 Subject: [PATCH 4/8] Feature/SK-829 | Link to examples in readme (#671) --- README.rst | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/README.rst b/README.rst index c0fbc2836..1ed7a394f 100644 --- a/README.rst +++ b/README.rst @@ -77,6 +77,18 @@ More details about the architecture, deployment, and how to develop your own app - `Documentation <https://fedn.readthedocs.io>`__ +FEDn Project Examples +===================== + +Our example projects demonstrate different use case scenarios of FEDn +and its integration with popular machine learning frameworks like PyTorch and TensorFlow. + +- `FEDn + PyTorch <https://github.com/scaleoutsystems/fedn/tree/master/examples/mnist-pytorch>`__ +- `FEDn + Tensforflow/Keras <https://github.com/scaleoutsystems/fedn/tree/master/examples/mnist-keras>`__ +- `FEDn + MONAI <https://github.com/scaleoutsystems/fedn/tree/master/examples/monai-2D-mednist>`__ +- `FEDn + Hugging Face <https://github.com/scaleoutsystems/fedn/tree/master/examples/huggingface>`__ +- `FEDn + Flower <https://github.com/scaleoutsystems/fedn/tree/master/examples/flower-client>`__ +- `FEDN + Self-supervised learning <https://github.com/scaleoutsystems/fedn/tree/master/examples/FedSimSiam>`__ FEDn Studio Deployment options ============================== From ceee280aaa1d36e90fdbc773f36405ef7ea80b76 Mon Sep 17 00:00:00 2001 From: Katja Hellgren <96579188+KatHellg@users.noreply.github.com> Date: Wed, 14 Aug 2024 14:11:33 +0200 Subject: [PATCH 5/8] Docs/SK-949 | Develop you own project update (#670) * Docs/SK-949 | Develop your own project review and update * draft 2 * Docs/SK-949 | Develop your own project update --------- Co-authored-by: KatHellg <katja@scaleoutsystems.com> --- docs/projects.rst | 227 ++++++++++++++++------------------------------ 1 file changed, 80 insertions(+), 147 deletions(-) diff --git a/docs/projects.rst b/docs/projects.rst index 2cf31f23f..8e8592532 100644 --- a/docs/projects.rst +++ b/docs/projects.rst @@ -1,18 +1,35 @@ .. _projects-label: -Develop your own project +================================================ +Develop your own FEDn project ================================================ -This guide explains how a FEDn project is structured, and details how to develop your own +This guide explains how a FEDn project is structured, and details how to develop and run your own projects. +**In this article** +`Prerequisites`_ +`Overview`_ +`Build a FEDn project`_ +`Deploy a FEDn project`_ + +Prerequisites +============== + + + +Overview +========== + A FEDn project is a convention for packaging/wrapping machine learning code to be used for federated learning with FEDn. At the core, a project is a directory of files (often a Git repository), containing your machine learning code, FEDn entry points, and a specification of the runtime environment (python environment or a Docker image). The FEDn API and command-line tools provides functionality to help a user automate deployment and management of a project that follows the conventions. - -Overview ------------------------------- + + + +Build a FEDn project +===================== We recommend that projects have roughly the following folder and file structure: @@ -20,8 +37,8 @@ We recommend that projects have roughly the following folder and file structure: | ├ client | │ ├ fedn.yaml | │ ├ python_env.yaml -| │ ├ data.py | │ ├ model.py +| │ ├ data.py | │ ├ train.py | │ └ validate.py | ├ data @@ -31,30 +48,27 @@ We recommend that projects have roughly the following folder and file structure: | └ Dockerfile / docker-compose.yaml | -The ``client`` folder is commonly referred to as the *compute package*. The file ``fedn.yaml`` is the FEDn Project File. It contains information about the ``entry points``. The entry points are used by the client to compute model updates (local training) and local validations (optional) . -To run a project in FEDn, the client folder is compressed as a .tgz bundle and pushed to the FEDn controller. FEDn then manages the distribution of the compute package to each client. -Upon recipt of the package, a client will unpack it and stage it locally. +The ``client`` folder is commonly referred to as the *compute package* and it contains files with logic specific to a single client. The file ``fedn.yaml`` is the FEDn Project File and contains information about the commands that fedn will run when a client recieves a new train or validation request. These fedn commmands are referred to as ``entry points`` and there are up to four entry points in the project folder example given above that need to be specified, namely: +**build** - used for any kind of setup that needs to be done before the client starts up, such as initializing the global seed model. In the `quickstart tutorial<https://fedn.readthedocs.io/en/stable/quickstart.html>`_, it runs model.py when called +**startup** - used immediately after the client starts up and the environment has been initalized. In the `quickstart tutorial<https://fedn.readthedocs.io/en/stable/quickstart.html>`_, it runs data.py when invoked +**train** - runs train.py when called +**validate** - runs validate.py when called -.. image:: img/ComputePackageOverview.png - :alt: Compute package overview - :width: 100% - :align: center - -The above figure provides a logical view of how FEDn uses the compute package (client folder). When the :py:mod:`fedn.network.clients` -recieves a model update request, it calls upon a Dispatcher that looks up entry point definitions -in the compute package from the FEDn Project File. +The compute package content (client folder) +------------------------------------------- -The Project File (fedn.yaml) ------------------------------- +**The Project File (fedn.yaml)** -FEDn uses on a project file named 'fedn.yaml' to specify which entrypoints to execute when the client recieves a training or validation request, and -what environment to execute those entrypoints in. +FEDn uses a project file named 'fedn.yaml' to specify which entry points to execute when the client recieves a training or validation request, and +what environment to execute those entry points in. .. code-block:: yaml python_env: python_env.yaml entry_points: + build: + command: python model.py startup: command: python data.py train: @@ -63,7 +77,7 @@ what environment to execute those entrypoints in. command: python validate.py -**Environment** +**Environment (python_env.yaml)** It is assumed that all entry points are executable within the client runtime environment. As a user, you have two main options to specify the environment: @@ -71,174 +85,93 @@ to specify the environment: 1. Provide a ``python_env`` in the ``fedn.yaml`` file. In this case, FEDn will create an isolated virtual environment and install the project dependencies into it before starting up the client. FEDn currently supports Virtualenv environments, with packages on PyPI. 2. Manage the environment manually. Here you have several options, such as managing your own virtualenv, running in a Docker container, etc. Remove the ``python_env`` tag from ``fedn.yaml`` to handle the environment manually. -Entry Points -------------- - -There are up to four Entry Points to be specified. **build (optional):** -This entrypoint is intended to be called **once** for building artifacts such as initial seed models. However, it not limited to artifacts, and can be used for any kind of setup that needs to be done before the client starts up. - -To invoke the build entrypoint using the CLI: - -.. code-block:: bash - fedn build -- +This entry point is used for any kind of setup that **needs to be done before the client starts up**, such as initializing the global seed model, and is intended to be called **once**. **startup (optional):** -This entrypoint is called **once**, immediately after the client starts up and the environment has been initalized. -It can be used to do runtime configurations of the local execution environment. For example, in the quickstart tutorial example, -the startup entrypoint invokes a script that downloads the MNIST dataset and creates a partition to be used by that client. +Like the 'build' entry point, 'startup' is also called **once**, immediately after the client starts up and the environment has been initalized. +It can be used to do runtime configurations of the local execution environment. For example, in the `quickstart tutorial<https://fedn.readthedocs.io/en/stable/quickstart.html>`_, +the startup entry point invokes a script that downloads the MNIST dataset and creates a partition to be used by that client. This is a convenience useful for automation of experiments and not all clients will specify such a script. + **train (mandatory):** -This entrypoint is invoked every time the client recieves a new model update request. The training entry point must be a single-input single-output (SISO) program. It will be invoked by FEDn as such: +This entry point is invoked every time the client recieves a new model update (training) request. The training entry point must be a single-input single-output (SISO) program. It will be invoked by FEDn as such: .. code-block:: python python train.py model_in model_out -where 'model_in' is the file containing the current global model to be updated, and 'model_out' is a path to write the new model update to. -Download and upload of these files are handled automatically by the FEDn client, the user only specifies how to read and parse the data contained in them (see examples) . +where 'model_in' is the **file** containing the current global model to be updated, and 'model_out' is a **path** to write the new model update to. +Download and upload of these files are handled automatically by the FEDn client, the user only specifies how to read and parse the data contained in them (see `examples<https://github.com/scaleoutsystems/fedn/tree/master/examples>`_). + +The format of the input and output files (model updates) are using numpy ndarrays. A helper instance :py:mod:`fedn.utils.helpers.plugins.numpyhelper` is used to handle the serialization and deserialization of the model updates. + **validate (optional):** -The validation entry point works in a similar was as the trainig entrypoint. It can be used to specify how a client should validate the current global +The validation entry point is invoked every time the client recieves a validation request. It can be used to specify how a client should validate the current global model on local test/validation data. It should read a model update from file, validate it (in any way suitable to the user), and write a **json file** containing validation data: .. code-block:: python python validate.py model_in validations.json - The validate entrypoint is optional. +The validate entry point is optional. -**Example train entry point** -Below is an example training entry point taken from the PyTorch getting stated project. +Deploy a FEDn project +=================== -.. code-block:: python +We recommend you to test your entry points locally before deploying your FEDn project. You can test *train* and *validate* by (example for the mnist-keras +project): - import math - import os - import sys - - import torch - from data import load_data - from model import load_parameters, save_parameters - - from fedn.utils.helpers.helpers import save_metadata - - dir_path = os.path.dirname(os.path.realpath(__file__)) - sys.path.append(os.path.abspath(dir_path)) - - - def train(in_model_path, out_model_path, data_path=None, batch_size=32, epochs=1, lr=0.01): - """ Complete a model update. - - Load model paramters from in_model_path (managed by the FEDn client), - perform a model update, and write updated paramters - to out_model_path (picked up by the FEDn client). - - :param in_model_path: The path to the input model. - :type in_model_path: str - :param out_model_path: The path to save the output model to. - :type out_model_path: str - :param data_path: The path to the data file. - :type data_path: str - :param batch_size: The batch size to use. - :type batch_size: int - :param epochs: The number of epochs to train. - :type epochs: int - :param lr: The learning rate to use. - :type lr: float - """ - # Load data - x_train, y_train = load_data(data_path) - - # Load parmeters and initialize model - model = load_parameters(in_model_path) - - # Train - optimizer = torch.optim.SGD(model.parameters(), lr=lr) - n_batches = int(math.ceil(len(x_train) / batch_size)) - criterion = torch.nn.NLLLoss() - for e in range(epochs): # epoch loop - for b in range(n_batches): # batch loop - # Retrieve current batch - batch_x = x_train[b * batch_size:(b + 1) * batch_size] - batch_y = y_train[b * batch_size:(b + 1) * batch_size] - # Train on batch - optimizer.zero_grad() - outputs = model(batch_x) - loss = criterion(outputs, batch_y) - loss.backward() - optimizer.step() - # Log - if b % 100 == 0: - print( - f"Epoch {e}/{epochs-1} | Batch: {b}/{n_batches-1} | Loss: {loss.item()}") - - # Metadata needed for aggregation server side - metadata = { - # num_examples are mandatory - 'num_examples': len(x_train), - 'batch_size': batch_size, - 'epochs': epochs, - 'lr': lr - } - - # Save JSON metadata file (mandatory) - save_metadata(metadata, out_model_path) - - # Save model update (mandatory) - save_parameters(model, out_model_path) - - - if __name__ == "__main__": - train(sys.argv[1], sys.argv[2]) - - +.. code-block:: bash -The format of the input and output files (model updates) are using numpy ndarrays. A helper instance :py:mod:`fedn.utils.helpers.plugins.numpyhelper` is used to handle the serialization and deserialization of the model updates. -The first function (_compile_model) is used to define the model architecture and creates an initial model (which is then used by _init_seed). The second function (_load_data) is used to read the data (train and test) from disk. -The third function (_save_model) is used to save the model to disk using the numpy helper module :py:mod:`fedn.utils.helpers.plugins.numpyhelper`. The fourth function (_load_model) is used to load the model from disk, again -using the pytorch helper module. The fifth function (_init_seed) is used to initialize the seed model. The sixth function (_train) is used to train the model, observe the two first arguments which will be set by the FEDn client. -The seventh function (_validate) is used to validate the model, again observe the two first arguments which will be set by the FEDn client. + python train.py ../seed.npz ../model_update.npz --data_path ../data/mnist.npz + python validate.py ../model_update.npz ../validation.json --data_path ../data/mnist.npz +Note that we here assume execution in the correct Python environment. -Build a compute package --------------------------- -To deploy a project to FEDn (Studio or pseudo-local) we simply compress the *client* folder as .tgz file. using fedn command line tool or manually: +To deploy a project to FEDn (Studio or pseudo-local) we simply compress the compute package as a .tgz file. using fedn command line tool or manually: .. code-block:: bash fedn package create --path client -The created file package.tgz can then be uploaded to the FEDn network using the :py:meth:`fedn.network.api.client.APIClient.set_package`. - +The created file package.tgz can then be uploaded to the FEDn network using the :py:meth:`fedn.network.api.client.APIClient.set_package` API. FEDn then manages the distribution of the compute package to each client. +Upon receipt of the package, a client will unpack it and stage it locally. -More on local data access -------------------------- +.. image:: img/ComputePackageOverview.png + :alt: Compute package overview + :width: 100% + :align: center -There are many possible ways to interact with the local dataset. In principle, the only requirement is that the train and validate endpoints are able to correctly -read and use the data. In practice, it is then necessary to make some assumption on the local environemnt when writing entrypoint.py. This is best explained -by looking at the code above. Here we assume that the dataset is present in a file called "mnist.npz" in a folder "data" one level up in the file hierarchy relative to -the exection of entrypoint.py. Then, independent on the preferred way to run the client (native, Docker, K8s etc) this structure needs to be maintained for this particular -compute package. Note however, that there are many ways to accompish this on a local operational level. +The above figure provides a logical view of how FEDn uses the compute package. When the :py:mod:`fedn.network.client` +recieves a model update or validation request, it calls upon a Dispatcher that looks up entry point definitions +in the compute package from the FEDn Project File to determine which code files to execute. -Testing the entry points locally ---------------------------------- +Before starting a training or validation session, the global seed model needs to be initialized which in our example is done by invoking the build entry point. -We recommend you to test your entrypoints locally before uploading the compute package to Studio. You can test *train* and *validate* by (example for the mnist-keras -project): +To invoke the build entry point using the CLI: .. code-block:: bash + fedn run build --path client + + +More on local data access +-------------------------- + +There are many possible ways to interact with the local dataset. In principle, the only requirement is that the train and validate end points are able to correctly +read and use the data. In practice, it is then necessary to make some assumption on the local environemnt when writing entrypoint.py. This is best explained +by looking at the code above. Here we assume that the dataset is present in a file called "mnist.npz" in a folder "data" one level up in the file hierarchy relative to +the execution of entrypoint.py. Then, independent of the preferred way to run the client (native, Docker, K8s etc) this structure needs to be maintained for this particular +compute package. Note however, that there are many ways to accomplish this on a local operational level. - python train.py ../seed.npz ../model_update.npz --data_path ../data/mnist.npz - python validate.py ../model_update.npz ../validation.json --data_path ../data/mnist.npz -Note that we here assume execution in the correct Python environment. From b8b368b42b3b372633b37d2e8ac219e395c0e670 Mon Sep 17 00:00:00 2001 From: sowmyasris <sowmya@scaleoutsystems.com> Date: Fri, 16 Aug 2024 13:35:07 +0200 Subject: [PATCH 6/8] Bug/SK-931 | --preferred-combiner should not be boolean (#667) --- .github/workflows/branch-name-check.yaml | 2 +- examples/async-clients/run_clients.py | 1 - fedn/cli/client_cmd.py | 2 +- fedn/cli/run_cmd.py | 3 +-- fedn/network/combiner/roundhandler.py | 3 +-- pyproject.toml | 5 +++-- 6 files changed, 7 insertions(+), 9 deletions(-) diff --git a/.github/workflows/branch-name-check.yaml b/.github/workflows/branch-name-check.yaml index 41f431cc1..eeefbe400 100644 --- a/.github/workflows/branch-name-check.yaml +++ b/.github/workflows/branch-name-check.yaml @@ -7,7 +7,7 @@ on: - master env: - BRANCH_REGEX: '^((feature|github|dependabot|hotfix|bugfix|fix|bug|docs|refactor)\/.+)|(release\/v((([0-9]+)\.([0-9]+)\.([0-9]+)(?:-([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?)(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?))$' + BRANCH_REGEX: '(?i)^((feature|github|dependabot|hotfix|bugfix|fix|bug|docs|refactor)\/.+)|(release\/v((([0-9]+)\.([0-9]+)\.([0-9]+)(?:-([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?)(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?))$' jobs: branch-name-check: diff --git a/examples/async-clients/run_clients.py b/examples/async-clients/run_clients.py index f2ce72291..78dc1ed55 100644 --- a/examples/async-clients/run_clients.py +++ b/examples/async-clients/run_clients.py @@ -47,7 +47,6 @@ "secure": False, "preshared_cert": False, "verify": False, - "preferred_combiner": False, "validator": True, "trainer": True, "init": None, diff --git a/fedn/cli/client_cmd.py b/fedn/cli/client_cmd.py index 80b0b3353..0d3da5f7e 100644 --- a/fedn/cli/client_cmd.py +++ b/fedn/cli/client_cmd.py @@ -79,7 +79,7 @@ def list_clients(ctx, protocol: str, host: str, port: str, token: str = None, n_ @click.option("-s", "--secure", required=False, default=False) @click.option("-pc", "--preshared-cert", required=False, default=False) @click.option("-v", "--verify", is_flag=True, help="Verify SSL/TLS for REST service") -@click.option("-c", "--preferred-combiner", required=False, default=False) +@click.option("-c", "--preferred-combiner", type=str,required=False, default="combiner",help="name of the preferred combiner") @click.option("-va", "--validator", required=False, default=True) @click.option("-tr", "--trainer", required=False, default=True) @click.option("-in", "--init", required=False, default=None, help="Set to a filename to (re)init client from file state.") diff --git a/fedn/cli/run_cmd.py b/fedn/cli/run_cmd.py index 0aa069046..bf6c5f36c 100644 --- a/fedn/cli/run_cmd.py +++ b/fedn/cli/run_cmd.py @@ -182,7 +182,7 @@ def build_cmd(ctx, path): @click.option("-s", "--secure", required=False, default=False) @click.option("-pc", "--preshared-cert", required=False, default=False) @click.option("-v", "--verify", is_flag=True, help="Verify SSL/TLS for REST service") -@click.option("-c", "--preferred-combiner", required=False, default=False) +@click.option("-c", "--preferred-combiner", required=False,type=str, default="combiner",help="url to the combiner or name of the preferred combiner") @click.option("-va", "--validator", required=False, default=True) @click.option("-tr", "--trainer", required=False, default=True) @click.option("-in", "--init", required=False, default=None, help="Set to a filename to (re)init client from file state.") @@ -262,7 +262,6 @@ def client_cmd( apply_config(init, config) click.echo(f"\nClient configuration loaded from file: {init}") click.echo("Values set in file override defaults and command line arguments...\n") - try: validate_client_config(config) except InvalidClientConfig as e: diff --git a/fedn/network/combiner/roundhandler.py b/fedn/network/combiner/roundhandler.py index 816957323..54cfd189c 100644 --- a/fedn/network/combiner/roundhandler.py +++ b/fedn/network/combiner/roundhandler.py @@ -311,8 +311,7 @@ def _assign_round_clients(self, n, type="trainers"): logger.error("(ERROR): {} is not a supported type of client".format(type)) # If the number of requested trainers exceeds the number of available, use all available. - if n > len(clients): - n = len(clients) + n = min(n, len(clients)) # If not, we pick a random subsample of all available clients. clients = random.sample(clients, n) diff --git a/pyproject.toml b/pyproject.toml index 3806fee56..8278029ee 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -63,7 +63,7 @@ include-package-data = true [tool.setuptools.packages.find] where = ["."] include = ["fedn*"] -exclude = ["tests", "tests.*"] +exclude = ["tests", "tests.*", "examples/notebooks/*.ipynb"] [tool.ruff] line-length = 160 @@ -105,7 +105,8 @@ exclude = [ "fedn_pb2.py", "fedn_pb2_grpc.py", ".ci", - "test*" + "test*", + "**/*.ipynb" ] lint.ignore = [ From 67662170eacaa454c0a6221b90cd00ef881f3af5 Mon Sep 17 00:00:00 2001 From: Fredrik Wrede <fredrik@scaleoutsystems.com> Date: Fri, 16 Aug 2024 12:36:45 +0000 Subject: [PATCH 7/8] bump --- docs/conf.py | 2 +- pyproject.toml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/conf.py b/docs/conf.py index c45e90846..54dd859d5 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -12,7 +12,7 @@ author = "Scaleout Systems AB" # The full version, including alpha/beta/rc tags -release = "0.11.1" +release = "0.12.0" # Add any Sphinx extension module names here, as strings extensions = [ diff --git a/pyproject.toml b/pyproject.toml index 8278029ee..2bcde6e1d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta" [project] name = "fedn" -version = "0.11.1" +version = "0.12.0" description = "Scaleout Federated Learning" authors = [{ name = "Scaleout Systems AB", email = "contact@scaleoutsystems.com" }] readme = "README.rst" From b6cf292eccb2e4715ed0faecbff9309d376095ae Mon Sep 17 00:00:00 2001 From: Salman Toor <salman.toor@gmail.com> Date: Fri, 16 Aug 2024 14:43:50 +0200 Subject: [PATCH 8/8] Docs/SK-955 | Adding information about setting up environment variable (#672) --- docs/apiclient.rst | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/docs/apiclient.rst b/docs/apiclient.rst index 2806ebe86..4bfb7fe79 100644 --- a/docs/apiclient.rst +++ b/docs/apiclient.rst @@ -24,6 +24,18 @@ To obtain an admin API token, navigate to the "Settings" tab in your Studio proj >>> from fedn import APIClient >>> client = APIClient(host="<controller-host>", token="<access-token>", secure=True, verify=True) +Alternatively, the access token can be sourced from an environment variable. + +.. code-block:: bash + $ export FEDN_AUTH_TOKEN=<access-token> + +Then passing a token as an argument is not required. + +.. code-block:: python + + >>> from fedn import APIClient + >>> client = APIClient(host="<controller-host>", secure=True, verify=True) + **Set active package and seed model** @@ -78,4 +90,4 @@ And the following code snippet shows how to get a specific session: session = client.get_session(id="session_name") -For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`, and the example `Notebooks <https://github.com/scaleoutsystems/fedn/blob/master/examples/mnist-pytorch/API_Example.ipynb>`_. +For more information on how to use the APIClient, see the :py:mod:`fedn.network.api.client`, and the example `Notebooks <https://github.com/scaleoutsystems/fedn/tree/master/examples/notebooks>`_.