-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #20 from almaslennikov/docs
Add documentation for the operator
- Loading branch information
Showing
11 changed files
with
1,085 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,175 @@ | ||
# nic-configuration-operator | ||
Nvidia Networking NIC Configuration Operator For Kubernetes | ||
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0) | ||
[![Go Report Card](https://goreportcard.com/badge/github.com/Mellanox/nic-configuration-operator)](https://goreportcard.com/report/github.com/Mellanox/nic-configuration-operator) | ||
[![Coverage Status](https://coveralls.io/repos/github/Mellanox/nic-configuration-operator/badge.svg)](https://coveralls.io/github/Mellanox/nic-configuration-operator) | ||
[![Build, Test, Lint](https://github.com/Mellanox/nic-configuration-operator/actions/workflows/build-test-lint.yml/badge.svg?event=push)](https://github.com/Mellanox/nic-configuration-operator/actions/workflows/build-test-lint.yml) | ||
[![CodeQL](https://github.com/Mellanox/nic-configuration-operator/actions/workflows/codeql.yml/badge.svg)](https://github.com/Mellanox/nic-configuration-operator/actions/workflows/codeql.yml) | ||
[![Image push](https://github.com/Mellanox/nic-configuration-operator/actions/workflows/image-push-main.yml/badge.svg?event=push)](https://github.com/Mellanox/nic-configuration-operator/actions/workflows/image-push-main.yml) | ||
|
||
# NVIDIA Nic Configuration Operator | ||
|
||
NVIDIA Maintenance Operator provides Kubernetes API(Custom Resource Definition) to allow FW configuration on Nvidia NICs | ||
in a coordinated manner. It deploys a configuration daemon on each of the desired nodes to configure Nvidia NICs there. | ||
NVIDIA Nic Configuration operator uses [maintenance operator](https://github.com/Mellanox/maintenance-operator) to prepare a node for maintenance before the actual configuration. | ||
|
||
## Deployment | ||
|
||
### Prerequisites | ||
|
||
* Kubernetes cluster | ||
* [Maintenance operator](https://github.com/Mellanox/maintenance-operator) deployed | ||
|
||
### Helm | ||
|
||
#### Deploy latest from project sources | ||
|
||
```bash | ||
# Clone project | ||
git clone https://github.com/Mellanox/nic-configuration-operator.git ; cd nic-configuration-operator | ||
|
||
# Install Operator | ||
helm install -n nic-configuration-operator --create-namespace --set operator.image.tag=latest nic-configuration ./deployment/nic-configuration-operator-chart | ||
|
||
# View deployed resources | ||
kubectl -n nic-configuration-operator get all | ||
``` | ||
|
||
> [!NOTE] | ||
> Refer to [helm values documentation](deployment/nic-configuration-operator-chart/README.md) for more information | ||
#### Deploy last release from OCI repo | ||
|
||
```bash | ||
helm install -n nic-configuration-operator --create-namespace nic-configuration-operator oci://ghcr.io/mellanox/nic-configuration-operator-chart | ||
``` | ||
|
||
## CRDs | ||
|
||
### NICConfigurationTemplate | ||
|
||
The NICConfigurationTemplate CRD is used to request FW configuration for a subset of devices | ||
|
||
Nic Configuration Operator will select NIC devices in the cluster that match the template's selectors and apply the configuration spec to them. | ||
|
||
If more than one template match a single device, none will be applied and the error will be reported in all of their statuses. | ||
|
||
for more information refer to [api-reference](docs/api-reference.md). | ||
|
||
#### Example NICConfigurationTemplate | ||
|
||
```yaml | ||
apiVersion: configuration.net.nvidia.com/v1alpha1 | ||
kind: NICConfigurationTemplate | ||
metadata: | ||
name: connectx6-config | ||
namespace: nic-configuration-operator | ||
spec: | ||
nodeSelector: | ||
feature.node.kubernetes.io/network-sriov.capable: "true" | ||
nicSelector: | ||
# nicType selector is mandatory the rest are optional only a single type can be specified. | ||
nicType: 101b | ||
pciAddress: | ||
- "0000:03:00.0" | ||
- “0000:04:00.0” | ||
serialNumbers: | ||
- "MT2116X09299" | ||
resetToDefault: false # if set, template is ignored, device configuration should reset | ||
template: | ||
numVfs: 2 | ||
linkType: Ethernet | ||
pciPerformanceOptimized: | ||
enabled: true | ||
maxAccOutRead: 44 | ||
maxReadRequest: 5 | ||
roceOptimized: | ||
enabled: true | ||
qos: | ||
trust: dscp | ||
pfc: "0,0,0,1,0,0,0,0" | ||
gpuDirectOptimized: | ||
enabled: true | ||
env: Baremetal | ||
rawNvConfig: | ||
THIS_IS_A_SPECIAL_NVCONFIG_PARAM: "55" | ||
SOME_ADVANCED_NVCONFIG_PARAM: "true" | ||
``` | ||
#### Configuration details | ||
* `numVFs`: if provided, configure SR-IOV VFs via nvconfig. | ||
* E.g: if `numVFs=2` then `SRIOV_EN=1` and `SRIOV_NUM_OF_VFS=2`. | ||
* If `numVFs=0` then `SRIOV_EN=0` and `SRIOV_NUM_OF_VFS=0`. | ||
* `linkType`: if provided configure `linkType` for the NIC for all NIC ports. | ||
* E.g `linkType = Infiniband` then set `LINK_TYPE_P1=IB` and `LINK_TYPE_P2=IB` if second PCI function is present | ||
* `pciPerformanceOptimized`: performs PCI performance optimizations. If enabled then by default the following will happen: | ||
* Set nvconfig `MAX_ACC_OUT_READ` nvconfig parameter. | ||
* Set the value of `MAX_ACC_OUT_READ` to `44` if PCI link is gen4 | ||
* Set the value of `MAX_ACC_OUT_READ` to `0` (use device defaults) if PCI link is gen5 or newer | ||
* Set PCI max read request size for each PF to `4096` (note: this is a runtime config and is not persistent) | ||
* Users can override values via `maxAccOutRead` and `maxReadRequest` | ||
* roceOptimized: performs RoCE related optimizations. If enabled performs the following by default: | ||
* Nvconfig set for both ports (can be applied from PF0) | ||
* Conditionally applied for second port if present | ||
* `ROCE_CC_PRIO_MASK_P1=255`, `ROCE_CC_PRIO_MASK_P2=255` | ||
* `CNP_DSCP_P1=4`, `CNP_DSCP_P2=4` | ||
* `CNP_802P_PRIO_P1=6`, `CNP_802P_PRIO_P2=6` | ||
* Configure pfc (Priority Flow Control) for priority 3 and set trust to dscp on each PF | ||
* Non-persistent (need to be applied after each boot) | ||
* Users can override values via `trust` and `pfc` parameters | ||
* `gpuDirectOptimized`: performs gpu direct optimizations. ATM only optimizations for Baremetal environment are supported. If enabled perform the following: | ||
* Set nvconfig `ATS_ENABLED=0` | ||
* Can only be enabled when `pciPerformanceOptimized` is enabled | ||
* `rawNvConfig`: a `map[string]string` which contains NVConfig parameters to apply for a NIC on all of its PFs. | ||
* For per port parameters (suffix `_P1`, `_P2`) parameters with `_P2` suffix are ignored if the device is single port. | ||
|
||
### NicDevice | ||
|
||
The NicDevice CRD is created automatically by the configuration daemon and represents a specific NVIDIA NIC on a specific K8s node. | ||
The name of the device combines the node name, device type and its serial number for easier tracking. | ||
|
||
`ConfigUpdateInProgress` status condition can be used for tracking the state of the FW configuration update on a specific device. If an error occurs during FW configuration update, it will be reflected in this field. | ||
|
||
for more information refer to [api-reference](docs/api-reference.md). | ||
|
||
#### Example NodeMaintenance | ||
|
||
```yaml | ||
apiVersion: configuration.net.nvidia.com/v1alpha1 | ||
kind: NicDevice | ||
metadata: | ||
name: co-node-25-101b-mt2232t13210 | ||
namespace: nic-configuration-operator | ||
spec: | ||
configuration: | ||
template: | ||
linkType: Ethernet | ||
numVfs: 8 | ||
pciPerformanceOptimized: | ||
enabled: true | ||
rawNvConfig: | ||
- name: TLS_OPTIMIZE | ||
value: "1" | ||
status: | ||
conditions: | ||
- reason: UpdateSuccessful | ||
status: "False" | ||
type: ConfigUpdateInProgress | ||
firmwareVersion: 20.42.1000 | ||
node: co-node-25 | ||
partNumber: mcx632312a-hdat | ||
ports: | ||
- networkInterface: enp4s0f0np0 | ||
pci: "0000:04:00.0" | ||
rdmaInterface: mlx5_0 | ||
- networkInterface: enp4s0f1np1 | ||
pci: "0000:04:00.1" | ||
rdmaInterface: mlx5_1 | ||
psid: mt_0000000225 | ||
serialNumber: mt2232t13210 | ||
type: 101b | ||
``` | ||
|
||
#### Implementation details: | ||
|
||
The NicDevice CRD is created and reconciled by the configuration daemon. The reconciliation logic scheme can be found [here](docs/nic-configuration-reconcile-diagram.png). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
/* | ||
2024 NVIDIA CORPORATION & AFFILIATES | ||
Licensed under the Apache License, Version 2.0 (the License); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an AS IS BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
// Package v1alpha1 contains API Schema definitions for the configuration.net v1alpha1 API group | ||
// +kubebuilder:object:generate=true | ||
// +groupName=configuration.net.nvidia.com | ||
package v1alpha1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# nic-configuration-operator-chart | ||
|
||
![Version: 0.0.1](https://img.shields.io/badge/Version-0.0.1-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: latest](https://img.shields.io/badge/AppVersion-latest-informational?style=flat-square) | ||
|
||
A Helm chart for NIC Configuration Operator | ||
|
||
## Values | ||
|
||
| Key | Type | Default | Description | | ||
|-----|------|---------|-------------| | ||
| configDaemon.image.name | string | `"nic-configuration-operator-daemon"` | | | ||
| configDaemon.image.repository | string | `"ghcr.io/mellanox"` | repository to use for the config daemon image | | ||
| configDaemon.image.tag | string | `"latest"` | image tag to use for the config daemon image | | ||
| configDaemon.nodeSelector | object | `{}` | node selector for the config daemon | | ||
| configDaemon.resources | object | `{"limits":{"cpu":"500m","memory":"128Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}` | resources and limits for the config daemon | | ||
| imagePullSecrets | list | `[]` | image pull secrets for both the operator and the config daemon | | ||
| operator.affinity | object | `{"nodeAffinity":{"preferredDuringSchedulingIgnoredDuringExecution":[{"preference":{"matchExpressions":[{"key":"node-role.kubernetes.io/master","operator":"Exists"}]},"weight":1},{"preference":{"matchExpressions":[{"key":"node-role.kubernetes.io/control-plane","operator":"Exists"}]},"weight":1}]}}` | node affinity for the operator | | ||
| operator.image.name | string | `"nic-configuration-operator"` | | | ||
| operator.image.repository | string | `"ghcr.io/mellanox"` | repository to use for the operator image | | ||
| operator.image.tag | string | `"latest"` | image tag to use for the operator image | | ||
| operator.logLevel | string | `"info"` | log level configuration | | ||
| operator.nodeSelector | object | `{}` | node selector for the operator | | ||
| operator.replicas | int | `1` | operator deployment number of replicas | | ||
| operator.resources | object | `{"limits":{"cpu":"500m","memory":"128Mi"},"requests":{"cpu":"10m","memory":"64Mi"}}` | specify resource requests and limits for the operator | | ||
| operator.serviceAccount.annotations | object | `{}` | set annotations for the operator service account | | ||
| operator.tolerations | list | `[{"effect":"NoSchedule","key":"node-role.kubernetes.io/master","operator":"Exists"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane","operator":"Exists"}]` | tolerations for the operator | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.