We recommend using Kubernetes as an underlying infrastructure to create and manage the FATE clusters in a production environment. KubeFATE supports deploying multiple FATE clusters in an instance of Kubernetes with different namespaces for the purposes of development, testing and production. Considering the different IT designs and standards in each company, the actual deployment should be customized. KubeFATE is flexibile for the FATE configuration.
If you focus on how to quickly use KubeFATE, please jump to Use Scenarios.
The high-level architecture of a multi-party federated learning deployment (e.g. two parties) is shown as follows:
- KubeFATE: Orchestrates a FATE cluster of a party. It offers APIs for FATE-Cloud Manager and other management portals.
- Harbor (Optional): Versioned FATE deployments and images management.
- Kubernetes: Container orchestration engine.
KubeFATE is responsible for:
- Day 1 initialization: Provision a FATE cluster on Kubernetes
- Day 2 operations: Provides RESTful APIs to manage FATE clusters
The high-level architecture of KubeFATE is shwon as follows:
The numbers depicted in the diagram:
- Accepting external API calls of Authentication & authorization
- Rendering templates via Helm;
- Storing jobs and configuration of a FATE deployment
- KubeFATE is running as a service of Kubernetes
There are two parts of KubeFATE:
- The KubeFATE CLI. KubeFATE CLI is an executable helps to quickly initialize and manage a FATE cluster in an interactive mode. It does not rely on Kubernetes. Eventually, KubeFATE CLI calls KubeFATE Service for operations with a KubeFATE user token.
- The KubeFATE Service. The KubeFATE service provides RESTful APIs for managing FATE clusters. The KubeFATE service is deployed in Kubernetes, and exposes APIs via Ingress. For the authentication and authorization, the KubeFATE service implements JWT, and neutral to other security solutions which can be added to Kubernetes ingress.
KubeFATE is designed to handle different versions FATE. Normally, KubeFATE CLI and KubeFATE service can work with several FATE releases.
Suppose in an organization, there are two roles:
- System Admin: who is responsible for the infrastructure management as well as Kubernetes administration
- ML Infrastructure Operators: who is responsible for managing the machine learning cluster like FATE
Recommended version of dependent software:
Kubernetes: v1.23.5
Ingress-nginx: v1.1.3
The example yaml can be found in rbac-config.yaml. In this example, we create a kube-fate namespace for KubeFATE service. Resource constraints can be applied to kube-fate namespace, refer to Kubernetes Namespace, Configure Memory and CPU Quotas for Namespace.
Run the following command to create the namespace:
$ kubectl apply -f ./rbac-config.yaml
Note that, the default username and password of KubeFATE service can be set in rbac-config.yaml
Secret->kubefate-secret->stringData :
stringData:
kubefateUsername: admin
kubefatePassword: admin
Because KubeFATE service exposes RESTful APIs for external access, system admin needs to prepare a domain name for KubeFATE service. In our example, the domain name is example.com
. Moreover, system admin should create a namespace (e.g. fate-9999) for FATE deployment.
$ kubectl apply -f ./kubefate.yaml
$ kubectl create namespace fate-9999
For more about the configuration of KubeFATE service, please refer to: KubeFATE service Configuration Guild.
After the system admin deployed the KubeFATE service and prepared the namespace for FATE. The ML Infrastructure Operator is able to start the deployment of FATE. The config.yaml
for kubefate
CLI is required. It contains the username and password of KubeFATE access, and the KubeFATE service URL:
log:
level: info
user:
username: admin
password: admin
serviceurl: example.com
Name | Type | Description |
---|---|---|
log | scalars | The log level of command line. |
user | mappings | User name and password when logging into KubeFATE service. |
serviceurl | scalars | KubeFATE service's ingress domain name, defined in kubefate.yaml. |
Create a cluster.yaml
for FATE cluster configuration. The details of configuration can be found here: FATE Cluster Configuration Guide.
NOTE: For Chinese user, specifying a local image registry in cluster.yaml
can accelerate the download of images. The details are as follows:
registry: "hub.c.163.com/federatedai"
Next, install the FATE cluster,
$ kubefate cluster install -f ./cluster.yaml
create job success, job id=d92d7a56-7002-46a4-9363-da9c7346e05a
NOTE: If you want to deploy FATE on Spark, you can use cluster-spark.yaml
.
After the above command has finished, a job is created for installing a FATE cluster. Run the command kubefate job describe
to check the status of the job, until the "Status" turns to Success
.
$ kubefate job describe d92d7a56-7002-46a4-9363-da9c7346e05a
UUID d92d7a56-7002-46a4-9363-da9c7346e05a
StartTime 2022-04-12 07:34:09
EndTime 2022-04-12 07:48:14
Duration 14m
Status Success
Creator admin
ClusterId 24bb75ff-f636-4c64-8c04-1b9073f89a2f
States - update job status to Running
- create Cluster in DB Success
- helm install Success
- checkout Cluster status [794]
- job run Success
SubJobs nodemanager-0 ModuleStatus: Available, SubJobStatus: Success, Duration: 13m, StartTime:
2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:26
nodemanager-1 ModuleStatus: Available, SubJobStatus: Success, Duration: 13m, StartTime:
2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:18
python ModuleStatus: Available, SubJobStatus: Success, Duration: 14m, StartTime:
2022-04-12 07:34:09, EndTime: 2022-04-12 07:48:14
rollsite ModuleStatus: Available, SubJobStatus: Success, Duration: 13m, StartTime:
2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:24
client ModuleStatus: Available, SubJobStatus: Success, Duration: 11m, StartTime:
2022-04-12 07:34:09, EndTime: 2022-04-12 07:45:22
clustermanager ModuleStatus: Available, SubJobStatus: Success, Duration: 13m, StartTime:
2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:11
mysql ModuleStatus: Available, SubJobStatus: Success, Duration: 13m, StartTime:
2022-04-12 07:34:09, EndTime: 2022-04-12 07:47:11
After the installing cluster
job succeeded, use kubefate cluster describe
to check the FATE access information:
$ kubefate cluster describe 24bb75ff-f636-4c64-8c04-1b9073f89a2f
UUID 24bb75ff-f636-4c64-8c04-1b9073f89a2f
Name fate-9999
NameSpace fate-9999
ChartName fate
ChartVersion v1.8.0
Revision 1
Age 44h
Status Running
Spec backend: eggroll
chartName: fate
chartVersion: v1.8.0
imagePullSecrets:
- name: myregistrykey
imageTag: 1.8.0-release
ingress:
client:
hosts:
- name: party9999.notebook.example.com
fateboard:
hosts:
- name: party9999.fateboard.example.com
ingressClassName: nginx
istio:
enabled: false
modules:
- rollsite
- clustermanager
- nodemanager
- mysql
- python
- fateboard
- client
name: fate-9999
namespace: fate-9999
partyId: 9999
persistence: false
podSecurityPolicy:
enabled: false
pullPolicy: null
python:
grpcNodePort: 30092
httpNodePort: 30097
logLevel: INFO
type: NodePort
registry: ""
rollsite:
nodePort: 30091
partyList:
- partyId: 10000
partyIp: 192.168.10.1
partyPort: 30101
type: NodePort
servingIp: 192.168.9.2
servingPort: 30095
Info dashboard:
- party9999.notebook.example.com
- party9999.fateboard.example.com
ip: 192.168.9.1
port: 30091
status:
containers:
client: Running
clustermanager: Running
fateboard: Running
mysql: Running
nodemanager-0: Running
nodemanager-0-eggrollpair: Running
nodemanager-1: Running
nodemanager-1-eggrollpair: Running
python: Running
rollsite: Running
deployments:
client: Available
clustermanager: Available
mysql: Available
nodemanager-0: Available
nodemanager-1: Available
python: Available
rollsite: Available
If the components of fateboard and client are installed, you can use the information party9999.fateboard.example.com
and party9999.notebook.example.com
obtained in the previous step to access FATEBoard and Notebook UI, and configure the resolution of these two domain names It can be opened in the browser.
http://party9999.fateboard.example.com
Access to FATEBoard UI requires a login user name and password, which can be found in cluster.yaml
[Configuration](../docs/configurations/FATE_cluster_configuration.md#fateboard mappings).