forked from DaoCloud/DaoCloud-docs
-
Notifications
You must be signed in to change notification settings - Fork 1
/
admin-bake.yml
964 lines (963 loc) · 60.9 KB
/
admin-bake.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
# Page tree
nav:
- 首页: index.md
- 终端用户手册:
- 什么是 AI 算力平台: end-user/index.md
- 用户注册: end-user/register/index.md
- 算力服务:
- 云主机:
- 创建云主机: end-user/host/createhost.md
- 使用云主机: end-user/host/usehost.md
- 容器管理:
- 云上 K8s 集群:
- 接入集群: end-user/kpanda/clusters/integrate-cluster.md
- 访问集群: end-user/kpanda/clusters/access-cluster.md
- 集群角色: end-user/kpanda/clusters/cluster-role.md
- 集群状态: end-user/kpanda/clusters/cluster-status.md
- 卸载/解除接入集群: end-user/kpanda/clusters/delete-cluster.md
- 节点管理:
- 节点调度: end-user/kpanda/nodes/schedule.md
- 标签与注解: end-user/kpanda/nodes/labels-annotations.md
- 污点管理: end-user/kpanda/nodes/taints.md
- 节点详情: end-user/kpanda/nodes/node-details.md
- 工作负载:
- 创建 Deployment: end-user/kpanda/workloads/create-deployment.md
- 创建 StatefulSet: end-user/kpanda/workloads/create-statefulset.md
- 创建 DaemonSet: end-user/kpanda/workloads/create-daemonset.md
- 创建 CronJob: end-user/kpanda/workloads/create-cronjob.md
- 创建 Job: end-user/kpanda/workloads/create-job.md
- 工作负载参数配置:
- 工作负载状态: end-user/kpanda/workloads/pod-config/workload-status.md
- Job 参数: end-user/kpanda/workloads/pod-config/job-parameters.md
- 生命周期: end-user/kpanda/workloads/pod-config/lifecycle.md
- 环境变量: end-user/kpanda/workloads/pod-config/env-variables.md
- 健康检查: end-user/kpanda/workloads/pod-config/health-check.md
- 集群调度: end-user/kpanda/workloads/pod-config/scheduling-policy.md
- Helm 应用:
- Helm 模板: end-user/kpanda/helm/README.md
- 上传 Helm 模板: end-user/kpanda/helm/upload-helm.md
- 管理 Helm 应用: end-user/kpanda/helm/helm-app.md
- 管理 Helm 仓库: end-user/kpanda/helm/helm-repo.md
- Operator 应用: end-user/kpanda/olm/import-miniooperator.md
- 容器网络:
- 创建服务: end-user/kpanda/network/create-services.md
- 创建路由: end-user/kpanda/network/create-ingress.md
- 网络策略: end-user/kpanda/network/network-policy.md
- 自定义资源: end-user/kpanda/custom-resources/create.md
- 容器存储:
- 数据卷声明: end-user/kpanda/storage/pvc.md
- 数据卷: end-user/kpanda/storage/pv.md
- 存储池: end-user/kpanda/storage/sc.md
- 共享存储池: end-user/kpanda/storage/sc-share.md
- 配置与密钥:
- 创建配置项: end-user/kpanda/configmaps-secrets/create-configmap.md
- 使用配置项: end-user/kpanda/configmaps-secrets/use-configmap.md
- 创建密钥: end-user/kpanda/configmaps-secrets/create-secret.md
- 使用密钥: end-user/kpanda/configmaps-secrets/use-secret.md
- Configmap/Secret热加载: end-user/kpanda/configmaps-secrets/configmap-hot-loading.md
- 命名空间:
- 创建命名空间: end-user/kpanda/namespaces/createns.md
- 命名空间独享节点: end-user/kpanda/namespaces/exclusive.md
- 容器组安全策略: end-user/kpanda/namespaces/podsecurity.md
- 集群运维:
- 最近操作: end-user/kpanda/clusterops/latest-operations.md
- 集群设置: end-user/kpanda/clusterops/cluster-settings.md
- 集群巡检:
- 介绍: end-user/kpanda/inspect/index.md
- 创建巡检配置: end-user/kpanda/inspect/config.md
- 执行巡检: end-user/kpanda/inspect/inspect.md
- 查看巡检报告: end-user/kpanda/inspect/report.md
- 备份恢复:
- 介绍: end-user/kpanda/backup/index.md
- 安装 Velero 插件: end-user/kpanda/backup/install-velero.md
- 应用备份: end-user/kpanda/backup/deployment.md
- ETCD 备份: end-user/kpanda/backup/etcd-backup.md
- 安全管理:
- 安全扫描类型: end-user/kpanda/security/index.md
- 合规性扫描:
- 扫描配置: end-user/kpanda/security/cis/config.md
- 扫描策略: end-user/kpanda/security/cis/policy.md
- 扫描报告: end-user/kpanda/security/cis/report.md
- 权限扫描: end-user/kpanda/security/audit.md
- 漏洞扫描: end-user/kpanda/security/hunter.md
- 权限管理:
- 权限体系介绍: end-user/kpanda/permissions/permission-brief.md
- 集群和命名空间授权: end-user/kpanda/permissions/cluster-ns-auth.md
- 增加容器管理内置权限点: end-user/kpanda/permissions/custom-kpanda-role.md
- 算法开发:
- 快速入门: end-user/baize/quick-start.md
- 创建 AI 工作负载: end-user/share/workload.md
- 使用 Notebook: end-user/share/notebook.md
- 创建训练任务:
- 创建训练任务: end-user/baize/jobs/create.md
- 创建 Pytorch 任务: end-user/baize/jobs/pytorch.md
- 创建 Tensorflow 任务: end-user/baize/jobs/tensorflow.md
- 创建 MPI 任务: end-user/baize/jobs/mpi.md
- 创建 MXNet 任务: end-user/baize/jobs/mxnet.md
- 创建 PaddlePaddle 任务: end-user/baize/jobs/paddle.md
- 删除任务: end-user/baize/jobs/delete.md
- 查看任务负载: end-user/baize/jobs/view.md
- 任务分析: end-user/baize/jobs/tensorboard.md
- 数据管理:
- 数据集列表: end-user/baize/dataset/create-use-delete.md
- 环境管理: end-user/baize/dataset/environments.md
- 模型服务:
- 模型支持情况: end-user/baize/inference/models.md
- 创建 Triton 推理服务: end-user/baize/inference/triton-inference.md
- 创建 vLLM 推理服务: end-user/baize/inference/vllm-inference.md
- 管理:
- 可观测性:
- 仪表盘:
- 概览: end-user/insight/dashboard/overview.md
- 仪表盘: end-user/insight/dashboard/dashboard.md
- 以管理员登录 Grafana: end-user/insight/dashboard/login-grafana.md
- 导入自定义仪表盘: end-user/insight/dashboard/import-dashboard.md
- 基础设施:
- 集群: end-user/insight/infra/cluster.md
- 节点: end-user/insight/infra/node.md
- 命名空间: end-user/insight/infra/namespace.md
- 工作负载: end-user/insight/infra/container.md
- 事件查询: end-user/insight/infra/event.md
- 拨测: end-user/insight/infra/probe.md
- 指标: end-user/insight/data-query/metric.md
- 日志: end-user/insight/data-query/log.md
- 链路追踪:
- 服务拓扑: end-user/insight/trace/topology.md
- 服务: end-user/insight/trace/service.md
- 链路查询: end-user/insight/trace/trace.md
- 告警:
- 告警策略: end-user/insight/alert-center/alert-policy.md
- 告警模板: end-user/insight/alert-center/alert-template.md
- 通知配置: end-user/insight/alert-center/message.md
- 配置通知服务器: end-user/insight/alert-center/sms-provider.md
- 消息模板: end-user/insight/alert-center/msg-template.md
- 告警静默: end-user/insight/alert-center/silent.md
- 告警抑制: end-user/insight/alert-center/inhibition.md
- 采集管理:
- 采集管理: end-user/insight/collection-manag/collection-manag.md
- 服务监控: end-user/insight/collection-manag/service-monitor.md
- 采集组件insight-agent状态: end-user/insight/collection-manag/agent-status.md
- 全局管理:
- 工作空间与层级:
- 工作空间与层级: end-user/ghippo/workspace/ws-folder.md
- 资源配额: end-user/ghippo/workspace/quota.md
- 资源组与共享资源的区别: end-user/ghippo/workspace/res-gp-and-shared-res.md
- 资源绑定权限说明: end-user/ghippo/workspace/wsbind-permission.md
- 个人中心:
- 安全设置: end-user/ghippo/personal-center/security-setting.md
- 访问密钥: end-user/ghippo/personal-center/accesstoken.md
- SSH 公钥: end-user/ghippo/personal-center/ssh-key.md
- 语言设置: end-user/ghippo/personal-center/language.md
- 管理员手册:
- 什么是 AI 算力平台: admin/index.md
- 算力服务:
- 云主机:
- 安装:
- 安装云主机: admin/virtnest/install/index.md
- 安装依赖及前提条件: admin/virtnest/install/install-dependency.md
- 离线升级: admin/virtnest/install/offline-install.md
- 安装 virtnest-agent: admin/virtnest/install/virtnest-agent.md
- 快速入门:
- 创建云主机: admin/virtnest/quickstart/index.md
- 更新云主机: admin/virtnest/quickstart/update.md
- 连接云主机: admin/virtnest/quickstart/access.md
- 通过 NodePort 访问云主机: admin/virtnest/quickstart/nodeport.md
- 云主机详情: admin/virtnest/quickstart/detail.md
- 云主机管理:
- 创建密钥: admin/virtnest/vm/create-secret.md
- 克隆云主机: admin/virtnest/vm/clone.md
- 快照管理: admin/virtnest/vm/snapshot.md
- 定时快照: admin/virtnest/vm/scheduled-snapshot.md
- 实时迁移: admin/virtnest/vm/live-migration.md
- 集群内冷迁移: admin/virtnest/vm/migratiom.md
- 云主机跨集群迁移: admin/virtnest/vm/cross-cluster-migrate.md
- 云主机监控: admin/virtnest/vm/monitor.md
- 云主机网络: admin/virtnest/vm/vm-network.md
- 云主机存储: admin/virtnest/vm/vm-sc.md
- 云主机漂移: admin/virtnest/vm/auto-migrate.md
- 云主机健康检查: admin/virtnest/vm/health-check.md
- 云主机 GPU:
- 云主机 GPU: admin/virtnest/gpu/vm-gpu.md
- 云主机 vGPU: admin/virtnest/gpu/vm-vgpu.md
- 云主机模板:
- 通过模板创建云主机: admin/virtnest/template/index.md
- 云主机模板: admin/virtnest/template/tep.md
- 云主机镜像: admin/virtnest/vm-image/index.md
- 最佳实践:
- 从 VMware 导入传统 Linux 云主机: admin/virtnest/best-practice/import-ubuntu.md
- 从 VMware 导入传统 Windows 云主机: admin/virtnest/best-practice/import-windows.md
- 创建 Windows 云主机: admin/virtnest/best-practice/vm-windows.md
- 容器管理:
- 集群管理:
- 创建集群: admin/kpanda/clusters/create-cluster.md
- 接入集群: admin/kpanda/clusters/integrate-cluster.md
- 访问集群: admin/kpanda/clusters/access-cluster.md
- 升级集群: admin/kpanda/clusters/upgrade-cluster.md
- 卸载/解除接入集群: admin/kpanda/clusters/delete-cluster.md
- 集群角色: admin/kpanda/clusters/cluster-role.md
- 集群状态: admin/kpanda/clusters/cluster-status.md
- 如何选择运行时: admin/kpanda/clusters/runtime.md
- 集群版本支持范围: admin/kpanda/clusters/cluster-version.md
- 接入 Rancher 集群: admin/kpanda/clusters/integrate-rancher-cluster.md
- 集群中部署第二调度器: admin/kpanda/clusters/cluster-scheduler-plugin.md
- 集群证书更新: admin/kpanda/clusters/k8s-cert.md
- 节点管理:
- 节点可用性检查: admin/kpanda/nodes/node-check.md
- 节点认证: admin/kpanda/nodes/node-authentication.md
- 节点扩容: admin/kpanda/nodes/add-node.md
- 节点缩容: admin/kpanda/nodes/delete-node.md
- 污点管理: admin/kpanda/nodes/taints.md
- 节点调度: admin/kpanda/nodes/schedule.md
- 节点详情: admin/kpanda/nodes/node-details.md
- 标签与注解: admin/kpanda/nodes/labels-annotations.md
- 添加工作节点: admin/k8s/add-node.md
- 移除 GPU 工作节点: admin/k8s/remove-node.md
- 命名空间:
- 创建命名空间: admin/kpanda/namespaces/createns.md
- 命名空间独享节点: admin/kpanda/namespaces/exclusive.md
- 容器组安全策略: admin/kpanda/namespaces/podsecurity.md
- 弹性伸缩:
- 安装 metrics-server 插件: admin/kpanda/scale/install-metrics-server.md
- 安装 kubernetes-cronhpa-controller: admin/kpanda/scale/install-cronhpa.md
- 安装 VPA 插件: admin/kpanda/scale/install-vpa.md
- 基于内置指标创建 HPA: admin/kpanda/scale/create-hpa.md
- 基于自定义指标创建 HPA: admin/kpanda/scale/custom-hpa.md
- 创建 VPA 策略: admin/kpanda/scale/create-vpa.md
- HPA和CronHPA兼容规则: admin/kpanda/scale/hpa-cronhpa-compatibility-rules.md
- 其他方案:
- Knative 介绍: admin/kpanda/scale/knative/knative.md
- 安装 Knative: admin/kpanda/scale/knative/install.md
- Kantive 场景: admin/kpanda/scale/knative/scene.md
- Knative 实践: admin/kpanda/scale/knative/playground.md
- Helm 应用:
- Helm 模板: admin/kpanda/helm/README.md
- 上传 Helm 模板: admin/kpanda/helm/upload-helm.md
- 管理 Helm 应用: admin/kpanda/helm/helm-app.md
- 管理 Helm 仓库: admin/kpanda/helm/helm-repo.md
- 多架构 Helm 应用: admin/kpanda/helm/multi-archi-helm.md
- 自定义Helm应用导入Addon: admin/kpanda/helm/Import-addon.md
- Operator 应用: admin/kpanda/olm/import-miniooperator.md
- 容器网络:
- 创建服务: admin/kpanda/network/create-services.md
- 创建路由: admin/kpanda/network/create-ingress.md
- 网络策略: admin/kpanda/network/network-policy.md
- 容器存储:
- 数据卷声明: admin/kpanda/storage/pvc.md
- 数据卷: admin/kpanda/storage/pv.md
- 存储池: admin/kpanda/storage/sc.md
- 共享存储池: admin/kpanda/storage/sc-share.md
- 自定义资源: admin/kpanda/custom-resources/create.md
- 配置与密钥:
- 创建配置项: admin/kpanda/configmaps-secrets/create-configmap.md
- 使用配置项: admin/kpanda/configmaps-secrets/use-configmap.md
- 创建密钥: admin/kpanda/configmaps-secrets/create-secret.md
- 使用密钥: admin/kpanda/configmaps-secrets/use-secret.md
- Configmap/Secret热加载: admin/kpanda/configmaps-secrets/configmap-hot-loading.md
- 集群运维:
- 最近操作: admin/kpanda/clusterops/latest-operations.md
- 集群升级: admin/kpanda/clusters/upgrade-cluster.md
- 集群设置: admin/kpanda/clusterops/cluster-settings.md
- 集群动态资源超卖: admin/kpanda/clusterops/cluster-oversold.md
- 集群巡检:
- 介绍: admin/kpanda/inspect/index.md
- 创建巡检配置: admin/kpanda/inspect/config.md
- 执行巡检: admin/kpanda/inspect/inspect.md
- 查看巡检报告: admin/kpanda/inspect/report.md
- 备份恢复:
- 介绍: admin/kpanda/backup/index.md
- 安装 Velero 插件: admin/kpanda/backup/install-velero.md
- 应用备份: admin/kpanda/backup/deployment.md
- ETCD 备份: admin/kpanda/backup/etcd-backup.md
- 安全管理:
- 安全扫描类型: admin/kpanda/security/index.md
- 合规性扫描:
- 扫描配置: admin/kpanda/security/cis/config.md
- 扫描策略: admin/kpanda/security/cis/policy.md
- 扫描报告: admin/kpanda/security/cis/report.md
- 权限扫描: admin/kpanda/security/audit.md
- 漏洞扫描: admin/kpanda/security/hunter.md
- 集成 Falco 安全工具:
- 介绍: admin/security/falco.md
- 安装: admin/security/falco-install.md
- Falco-exporter: admin/security/falco-exporter.md
- 权限管理:
- 权限体系介绍: admin/kpanda/permissions/permission-brief.md
- 集群和命名空间授权: admin/kpanda/permissions/cluster-ns-auth.md
- 增加容器管理内置权限点: admin/kpanda/permissions/custom-kpanda-role.md
- GPU 管理:
- GPU 管理概述: admin/kpanda/gpu/index.md
- GPU 支持矩阵: admin/kpanda/gpu/gpu_matrix.md
- NVIDIA GPU 管理:
- NVIDIA GPU模式说明: admin/kpanda/gpu/nvidia/index.md
- GPU Operator:
- 离线安装GPU Operator: admin/kpanda/gpu/nvidia/install_nvidia_driver_of_operator.md
- 上传Red Hat GPU Operator离线镜像: admin/kpanda/gpu/nvidia/push_image_to_repo.md
- 构建Red Hat 8.4离线yum源: admin/kpanda/gpu/nvidia/upgrade_yum_source_redhat8_4.md
- 构建Red Hat 7.9离线yum源: admin/kpanda/gpu/nvidia/yum_source_redhat7_9.md
- 构建Red Hat 9.2离线yum源: admin/kpanda/gpu/nvidia/rhel9.2_offline_install_driver.md
- Ubuntu22.04 离线安装GPU驱动: admin/kpanda/gpu/nvidia/ubuntu22.04_offline_install_driver.md
- NVIDIA GPU整卡模式: admin/kpanda/gpu/nvidia/full_gpu_userguide.md
- NVIDIA vGPU模式:
- 安装 vGPU Addon: admin/kpanda/gpu/nvidia/vgpu/vgpu_addon.md
- 使用 NVIDIA vGPU: admin/kpanda/gpu/nvidia/vgpu/vgpu_user.md
- 构建 vGPU 显存超配镜像: admin/kpanda/gpu/nvidia/vgpu/hami.md
- NVIDIA MIG 模式:
- NVIDIA MIG 概述: admin/kpanda/gpu/nvidia/mig/index.md
- 开启 MIG 功能: admin/kpanda/gpu/nvidia/mig/create_mig.md
- 使用 NVIDIA MIG: admin/kpanda/gpu/nvidia/mig/mig_usage.md
- MIG 相关命令: admin/kpanda/gpu/nvidia/mig/mig_command.md
- GPU 配额管理: admin/kpanda/gpu/vgpu_quota.md
- GPU 动态调节: admin/kpanda/gpu/dynamic-regulation.md
- GPU 监控告警:
- GPU 监控指标: admin/kpanda/gpu/nvidia/gpu-monitoring-alarm/gpu-metrics.md
- GPU 告警规则: admin/kpanda/gpu/nvidia/gpu-monitoring-alarm/gpu-alarm.md
- 使用 Volcano:
- 安装 Volcano: admin/kpanda/gpu/volcano/volcano_user_guide.md
- 使用Volcano Gang Scheduler: admin/kpanda/gpu/volcano/volcano-gang-scheduler.md
- 使用 Volcano 优先级抢占策略: admin/kpanda/gpu/volcano/volcano_priority.md
- 使用 Volcano Binpack 策略: admin/kpanda/gpu/volcano/volcano_binpack.md
- 使用 Volcano DRF 策略: admin/kpanda/gpu/volcano/drf.md
- 使用 Volcano NUMA 亲和性调度: admin/kpanda/gpu/volcano/numa.md
- GPU 调度配置: admin/kpanda/gpu/gpu_scheduler_config.md
- 昇腾 NPU 管理:
- 昇腾 NPU 组件安装: admin/kpanda/gpu/ascend/ascend_driver_install.md
- 使用昇腾 NPU: admin/kpanda/gpu/ascend/ascend_usage.md
- 启用昇腾 VNPU: admin/kpanda/gpu/ascend/vnpu.md
- 天数 GPU 管理: admin/kpanda/gpu/Iluvatar_usage.md
- 沐曦 GPU 管理: admin/kpanda/gpu/metax/usemetax.md
- 寒武纪 GPU 管理: admin/kpanda/gpu/mlu/use-mlu.md
- GPU 相关FAQ: admin/kpanda/gpu/FAQ.md
- 最佳实践:
- 对 etcd 进行备份还原: admin/kpanda/best-practice/etcd-backup.md
- 跨集群备份恢复MySQL: admin/kpanda/best-practice/backup-mysql-on-nfs.md
- 工作集群相关实践:
- 加固自建工作集群: admin/kpanda/best-practice/hardening-cluster.md
- 离线部署/升级工作集群: admin/kpanda/best-practice/update-offline-cluster.md
- 工作集群添加异构节点: admin/kpanda/best-practice/multi-arch.md
- 工作集群的控制节点扩容: admin/kpanda/best-practice/add-master-node.md
- 替换工作集群首个控制节点: admin/kpanda/best-practice/replace-first-master-node.md
- 全局集群工作节点扩容: admin/kpanda/best-practice/add-worker-node-on-global.md
- CentOS上创建Ubuntu工作集群: admin/kpanda/best-practice/create-ubuntu-on-centos-platform.md
- CentOS上创建Red Hat工作集群: admin/kpanda/best-practice/create-redhat9.2-on-centos-platform.md
- 非主流操作系统上创建集群: admin/kpanda/best-practice/use-otherlinux-create-custer.md
- 部署/升级Kubean兼容版本: admin/kpanda/best-practice/kubean-low-version.md
- 限制Docker单容器磁盘空间: admin/kpanda/best-practice/limit-disk-usage-docker.md
- 边缘集群部署和管理实践: admin/kpanda/best-practice/k3s-lcm.md
- 在离线混部:
- 在离线混部概述: admin/kpanda/best-practice/co-located/index.md
- koordinator 安装: admin/kpanda/best-practice/co-located/install.md
- 算法开发:
- 运维管理:
- 介绍: admin/baize/oam/index.md
- 资源管理: admin/baize/oam/resource.md
- 队列管理:
- 创建队列: admin/baize/oam/queue/create.md
- 删除队列: admin/baize/oam/queue/delete.md
- 最佳实践:
- 部署 NFS 做数据集预热: admin/baize/best-practice/deploy-nfs-in-worker.md
- 更新 Notebook 内置镜像: admin/baize/best-practice/change-notebook-image.md
- Checkpoint 机制及使用介绍: admin/baize/best-practice/checkpoint.md
- 使用 AI Lab 微调 ChatGLM3: admin/baize/best-practice/finetunel-llm.md
- 提交 DeepSpeed 训练任务: admin/baize/best-practice/train-with-deepspeed.md
- 训练任务增加调度器选项: admin/baize/best-practice/add-scheduler.md
- 部署 Label Studio: admin/baize/best-practice/label-studio.md
- 故障排查:
- 故障排查索引: admin/baize/troubleshoot/index.md
- 集群下拉列表中找不到集群: admin/baize/troubleshoot/cluster-not-found.md
- Notebook 不受队列配额控制: admin/baize/troubleshoot/notebook-not-controlled-by-quotas.md
- 队列初始化失败: admin/baize/troubleshoot/local-queue-initialization-failed.md
- 管理:
- 可观测性:
- 快速入门:
- 部署资源规划:
- Prometheus 资源规划: admin/insight/quickstart/res-plan/prometheus-res.md
- vmstorage 磁盘规划: admin/insight/quickstart/res-plan/vms-res-plan.md
- 调整 vmstorage 磁盘: admin/insight/quickstart/res-plan/modify-vms-disk.md
- 安装与升级:
- 在线安装 insight-agent: admin/insight/quickstart/install/install-agent.md
- 获取全局服务集群的数据存储地址: admin/insight/quickstart/install/gethosturl.md
- 升级注意事项: admin/insight/quickstart/install/upgrade-note.md
- 已知问题: admin/insight/quickstart/install/knownissues.md
- 大规模日志部署调整: admin/insight/best-practice/insight-kafka.md
- 定制Insight组件调度策略: admin/insight/quickstart/install/component-scheduling.md
- 开启大日志和大链路模式: admin/insight/quickstart/install/big-log-and-trace.md
- 在OpenShift 4.x上安装insight-agent: admin/insight/quickstart/other/install-agent-on-ocp.md
- 开始观测:
- OpenTelemetry 观测:
- 使用OTel赋予应用可观测: admin/insight/quickstart/otel/otel.md
- Operator无侵入增强应用: admin/insight/quickstart/otel/operator.md
- 向Insight发送链路数据: admin/insight/quickstart/otel/send_tracing_to_insight.md
- 链路数据尾部采样方案: admin/insight/best-practice/tail-based-sampling.md
- Java 应用观测:
- 使用JMX Exporter暴露JVM监控指标: admin/insight/quickstart/otel/java/jvm-monitor/jmx-exporter.md
- 使用OTel Java Agent暴露JVM监控指标: admin/insight/quickstart/otel/java/jvm-monitor/otel-java-agent.md
- 已有JVM指标的Java应用对接可观测性: admin/insight/quickstart/otel/java/jvm-monitor/legacy-jvm.md
- 通过 OTel SDK 增强 Go 应用: admin/insight/quickstart/otel/golang/golang.md
- 使用OTel接收SkyWalking链路数据: admin/insight/best-practice/sw-to-otel.md
- 仪表盘:
- 概览: admin/insight/dashboard/overview.md
- 仪表盘: admin/insight/dashboard/dashboard.md
- 以管理员登录 Grafana: admin/insight/dashboard/login-grafana.md
- 导入自定义仪表盘: admin/insight/dashboard/import-dashboard.md
- 基础设施:
- 集群: admin/insight/infra/cluster.md
- 节点: admin/insight/infra/node.md
- 命名空间: admin/insight/infra/namespace.md
- 工作负载: admin/insight/infra/container.md
- 事件查询: admin/insight/infra/event.md
- 拨测: admin/insight/infra/probe.md
- 指标: admin/insight/data-query/metric.md
- 日志: admin/insight/data-query/log.md
- 链路追踪:
- 服务拓扑: admin/insight/trace/topology.md
- 服务: admin/insight/trace/service.md
- 调用链: admin/insight/trace/trace.md
- 告警:
- 告警策略: admin/insight/alert-center/alert-policy.md
- 告警模板: admin/insight/alert-center/alert-template.md
- 通知配置: admin/insight/alert-center/message.md
- 配置通知服务器: admin/insight/alert-center/sms-provider.md
- 消息模板: admin/insight/alert-center/msg-template.md
- 告警静默: admin/insight/alert-center/silent.md
- 告警抑制: admin/insight/alert-center/inhibition.md
- 数据采集:
- 采集管理: admin/insight/collection-manag/collection-manag.md
- 服务监控: admin/insight/collection-manag/service-monitor.md
- 采集组件insight-agent状态: admin/insight/collection-manag/agent-status.md
- 系统配置:
- 系统组件: admin/insight/system-config/system-component.md
- 系统配置: admin/insight/system-config/system-config.md
- 修改配置: admin/insight/system-config/modify-config.md
- 网络监控之集成 DeepFlow: admin/insight/best-practice/integration_deepflow.md
- 参考文档:
- 可观性参考指标说明: admin/insight/reference/used-metric-in-insight.md
- Insight Grafana 持久化到数据库: admin/insight/best-practice/grafana-use-db.md
- 自定义探测方式: admin/insight/collection-manag/probe-module.md
- 指标抓取说明: admin/insight/collection-manag/metric-collect.md
- 告警通知流程说明: admin/insight/reference/alertnotification.md
- 通知模板使用说明: admin/insight/reference/notify-helper.md
- Lucene 语法使用方法: admin/insight/reference/lucene.md
- 通过 Sidecar 采集容器日志: admin/insight/reference/tailing-sidecar.md
- 兼容性测试:
- Kubernetes 兼容性测试: admin/insight/compati-test/k8s-compatibility.md
- Rancher 兼容性测试: admin/insight/compati-test/rancher-compatibility.md
- Openshift 兼容性测试: admin/insight/compati-test/ocp-compatibility.md
- 常见问题:
- 链路数据的时钟偏移: admin/insight/faq/traceclockskew.md
- 日志采集排障指南: admin/insight/best-practice/debug-log.md
- 链路采集排障指南: admin/insight/best-practice/debug-trace.md
- 使用Insight定位异常: admin/insight/best-practice/find_root_cause.md
- ES 数据塞满时如何操作: admin/insight/faq/expand-once-es-full.md
- 容器日志黑名单: admin/insight/faq/ignore-pod-log-collect.md
- 全局管理:
- 安装、登录和升级:
- 自定义反向代理服务器地址: admin/ghippo/install/reverse-proxy.md
- 开启Folder/WS的隔离模式: admin/ghippo/install/user-isolation.md
- 国密网关: admin/ghippo/install/gm-gateway.md
- 登录: admin/ghippo/install/login.md
- 用户与访问控制:
- 什么是用户与访问控制: admin/ghippo/access-control/iam.md
- 用户: admin/ghippo/access-control/user.md
- 用户组: admin/ghippo/access-control/group.md
- 角色:
- 角色: admin/ghippo/access-control/role.md
- 系统角色: admin/ghippo/access-control/global.md
- 自定义角色: admin/ghippo/access-control/custom-role.md
- 身份提供商:
- 身份提供商: admin/ghippo/access-control/idprovider.md
- LDAP: admin/ghippo/access-control/ldap.md
- OIDC: admin/ghippo/access-control/oidc.md
- OAuth 2.0 之企业微信: admin/ghippo/access-control/oauth2.0.md
- 接入管理:
- 接入管理: admin/ghippo/access-control/docking.md
- Webhook: admin/ghippo/access-control/webhook.md
- 工作空间与层级:
- 工作空间与层级: admin/ghippo/workspace/ws-folder.md
- 绑定工作空间: admin/register/bindws.md
- 为工作空间分配资源: admin/register/wsres.md
- 创建和删除工作空间: admin/ghippo/workspace/workspace.md
- 工作空间权限: admin/ghippo/workspace/ws-permission.md
- 创建和删除文件夹: admin/ghippo/workspace/folders.md
- 文件夹权限: admin/ghippo/workspace/folder-permission.md
- 资源配额: admin/ghippo/workspace/quota.md
- 资源组与共享资源的区别: admin/ghippo/workspace/res-gp-and-shared-res.md
- 资源绑定权限说明: admin/ghippo/workspace/wsbind-permission.md
- 审计日志:
- 采集 K8s 审计日志: admin/ghippo/audit/open-audit.md
- 生成 K8s 审计日志: admin/ghippo/audit/open-k8s-audit.md
- 下载和导出审计日志: admin/ghippo/audit/audit-log.md
- 获取审计日志源IP: admin/ghippo/audit/source-ip.md
- 审计项汇总:
- 容器管理审计项: admin/ghippo/audit/gproduct-audit/kpanda.md
- 云主机审计项: admin/ghippo/audit/gproduct-audit/virtnest.md
- 可观测性审计项: admin/ghippo/audit/gproduct-audit/insight.md
- 全局管理审计项: admin/ghippo/audit/gproduct-audit/ghippo.md
- 运营管理:
- 运营管理: admin/ghippo/report-billing/index.md
- 报表管理: admin/ghippo/report-billing/report.md
- 计费计量: admin/ghippo/report-billing/billing.md
- 平台设置:
- 安全策略: admin/ghippo/platform-setting/security.md
- 邮箱服务器设置: admin/ghippo/platform-setting/mail-server.md
- 外观定制: admin/ghippo/platform-setting/appearance.md
- 关于平台: admin/ghippo/platform-setting/about.md
- 密码重置: admin/ghippo/password.md
- 个人中心:
- 安全设置: admin/ghippo/personal-center/security-setting.md
- 访问密钥: admin/ghippo/personal-center/accesstoken.md
- SSH 公钥: admin/ghippo/personal-center/ssh-key.md
- 语言设置: admin/ghippo/personal-center/language.md
- 权限说明:
- 容器管理权限说明: admin/ghippo/permissions/kpanda.md
- AI Lab 权限说明: admin/ghippo/permissions/baize.md
- 最佳实践:
- 工作空间最佳实践: admin/ghippo/best-practice/ws-best-practice.md
- 工作空间绑定命名空间: admin/ghippo/best-practice/ws-to-ns.md
- 单集群分配给多个工作空间: admin/ghippo/best-practice/cluster-for-multiws.md
- 文件夹最佳实践: admin/ghippo/best-practice/folder-practice.md
- 普通用户授权规划: admin/ghippo/best-practice/authz-plan.md
- 超大型企业的架构管理: admin/ghippo/best-practice/super-group.md
- GProduct 对接全局管理:
- GProduct 对接全局管理: admin/ghippo/best-practice/gproduct/intro.md
- 接入导航栏: admin/ghippo/best-practice/gproduct/nav.md
- 接入路由和登录认证: admin/ghippo/best-practice/gproduct/route-auth.md
- 集成与被集成(OEM IN/OUT):
- OEM IN: admin/ghippo/best-practice/oem/oem-in.md
- OEM OUT: admin/ghippo/best-practice/oem/oem-out.md
- 定制对接 IdP: admin/ghippo/best-practice/oem/custom-idp.md
- Keycloak 自定义 IdP: admin/ghippo/best-practice/oem/keycloak-idp.md
- 定制导航栏:
- 定制导航栏: admin/ghippo/best-practice/menu/navigator.md
- 基于权限显示/隐藏导航栏菜单: admin/ghippo/best-practice/menu/menu-display-or-hiding.md
- 故障排查:
- ingressgateway 无法启动: admin/ghippo/troubleshooting/ghippo01.md
- 登录报错 401 或 403: admin/ghippo/troubleshooting/ghippo02.md
- keycloak 无法启动: admin/ghippo/troubleshooting/ghippo03.md
- 单独升级全局管理时失败: admin/ghippo/troubleshooting/ghippo04.md
- 开发者手册:
- OpenAPI 访问密钥: openapi/index.md
- 云主机 OpenAPI 文档: openapi/virtnest/index.md
- AI Lab OpenAPI 文档: openapi/baize/index.md
- 容器管理 OpenAPI 文档: openapi/kpanda/index.md
- 可观测性 OpenAPI 文档: openapi/insight/index.md
- 全局管理 OpenAPI 文档: openapi/ghippo/index.md
# i18n
plugins:
i18n:
docs_structure: folder
reconfigure_material: true
reconfigure_search: true
languages:
- locale: zh
name: 中文
default: true
build: true
- locale: en
name: English
build: true
nav_translations:
首页: Home
终端用户手册: User Manual
什么是 AI 算力平台: What is AI Platform
用户注册: User Registration
算力服务: Computing Services
云主机: Cloud Host
创建云主机: Create Cloud Host
使用云主机: Use Cloud Host
容器管理: Container Management
云上 K8s 集群: K8s Cluster on Cloud
接入集群: Integrate Cluster
访问集群: Access Cluster
集群角色: Cluster Roles
集群状态: Cluster Status
卸载/解除接入集群: Delete/Remove Cluster
节点管理: Nodes
节点调度: Node Scheduling
标签与注解: Labels and Annotations
污点管理: Node Taints
节点详情: Node Details
工作负载: Workloads
创建 Deployment: Create Deployment
创建 StatefulSet: Create StatefulSet
创建 DaemonSet: Create DaemonSet
创建 CronJob: Create CronJob
创建 Job: Create Job
工作负载参数配置: Workload Parameters
工作负载状态: Workload Status
Job 参数: Job Parameters
生命周期: Lifecycle
环境变量: Environment Variables
健康检查: Health Check
集群调度: Cluster Scheduling
Helm 应用: Helm Apps
Helm 模板: Helm Charts
上传 Helm 模板: Upload Helm Chart
管理 Helm 应用: Manage Helm Apps
管理 Helm 仓库: Manage Helm Repo
Operator 应用: Operator Apps
容器网络: Container Networking
创建服务: Create Services
创建路由: Create Ingress
网络策略: Network Policies
自定义资源: Custom Resources
容器存储: Container Storage
数据卷声明: PersistentVolumeClaim
数据卷: PersistentVolume
存储池: StorageClass
共享存储池: Shared StorageClass
配置与密钥: ConfigMaps and Secrets
创建配置项: Create ConfigMap
使用配置项: Use ConfigMap
创建密钥: Create Secret
使用密钥: Use Secret
Configmap/Secret热加载: Preloading ConfigMap/Secret
命名空间: Namespaces
创建命名空间: Create Namespace
命名空间独享节点: Nodes Exclusive to a Namespace
容器组安全策略: Pod Security Policies
集群运维: Cluster Operations
最近操作: Recent Operations
集群设置: Cluster Settings
集群巡检: Cluster Inspection
介绍: Introduction
创建巡检配置: Configure Inspection
执行巡检: Run Inspection
查看巡检报告: View Inspection Report
备份恢复: Backup and Recovery
安全管理: Security Management
安全扫描类型: Scanning Types
合规性扫描: Compliance Scanning
扫描配置: Scanning Settings
扫描策略: Scanning Policy
扫描报告: Scanning Report
权限扫描: Permission Scanning
漏洞扫描: Vulnerability Scanning
权限管理: Permissions
权限体系介绍: Introduction
集群和命名空间授权: Authorize Cluster and Namespace
增加容器管理内置权限点: Add Built-in Permission
算法开发: AI Lab
创建 AI 工作负载: Create AI Workload
使用 Notebook: Use Notebook
创建训练任务: Create Training Job
删除任务: Delete Job
查看任务负载: View Job Workload
任务分析: Job Analysis
数据管理: Data Management
数据集列表: Datasets
环境管理: Environments
模型服务: Model Services
模型支持情况: Supported Models
创建 Triton 推理服务: Create Triton Inference
创建 vLLM 推理服务: Create vLLM Inference
管理: Management
可观测性: Insight
仪表盘: Dashboard
概览: Overview
以管理员登录 Grafana: Log in to Grafana as Admin
导入自定义仪表盘: Import Custom Dashboard
基础设施: Infrastructure
集群: Cluster
节点: Node
事件查询: Event Query
拨测: Probe
指标: Metrics
日志: Logs
链路追踪: Trace
服务拓扑: Service Topology
服务: Service
告警: Alerts
告警策略: Alert Policy
告警模板: Alert Template
通知配置: Notification Configuration
配置通知服务器: Configure Notification Server
消息模板: Message Template
告警静默: Alert Silence
告警抑制: Alert Inhibition
采集管理: Collection Management
服务监控: Service Monitoring
采集组件insight-agent状态: insight-agent Status
全局管理: Global Management
工作空间与层级: Workspace and Folder
资源配额: Resource Quota
资源组与共享资源的区别: Resource Group and Shared Resource
资源绑定权限说明: Resource Binding Permissions
个人中心: Personal Center
安全设置: Security Settings
访问密钥: Access Key
SSH 公钥: SSH Public Key
语言设置: Language Settings
管理员手册: Administrator Manual
什么是 AI 算力平台: What is AI Platform
安装: Installation
安装云主机: Install VM
安装依赖及前提条件: Install Dependencies and Prerequisites
离线升级: Offline Upgrade
安装 virtnest-agent: Install virtnest-agent
快速入门: Quick Start
更新云主机: Update VM
连接云主机: Connect VM
通过 NodePort 访问云主机: Access VM via NodePort
云主机详情: VM Details
云主机管理: VM Management
创建密钥: Create sceret
克隆云主机: Clone VM
快照管理: Snapshot Management
定时快照: Scheduled Snapshots
实时迁移: Live Migration
云主机跨集群迁移: Cross-cluster Migration of VM
云主机监控: VM Monitoring
云主机网络: VM Network
云主机存储: VM Storage
云主机漂移: VM Drift
云主机健康检查: VM Health Check
云主机 GPU: VM GPU
云主机 vGPU: VM vGPU
云主机模板: VM Templates
通过模板创建云主机: Create VM with Template
云主机镜像: VM Images
最佳实践: Best Practices
从 VMware 导入传统 Linux 云主机: Import Traditional Linux VM from VMware
从 VMware 导入传统 Windows 云主机: Import Traditional Windows VM from VMware
创建 Windows 云主机: Create Windows VM
集群管理: Manage Clusters
创建集群: Create Cluster
升级集群: Upgrade Cluster
节点可用性检查: Node Availability Check
节点认证: Node Authentication
节点扩容: Node Expansion
节点缩容: Node Shrinkage
添加工作节点: Add Worker Node
移除 GPU 工作节点: Remove GPU Worker Node
弹性伸缩: Autoscaling
安装 metrics-server 插件: Install Metrics Server Plugin
安装 kubernetes-cronhpa-controller: Install Kubernetes CronHPA Controller
安装 VPA 插件: Install VPA Plugin
基于内置指标创建 HPA: Create HPA Based on Built-in Metrics
基于自定义指标创建 HPA: Create HPA Based on Custom Metrics
创建 VPA 策略: Create VPA Policy
HPA和CronHPA兼容规则: HPA and CronHPA Compatibility Rules
其他方案: Other Solutions
Knative 介绍: Introduction to Knative
安装 Knative: Install Knative
Kantive 场景: Knative Scenarios
Knative 实践: Knative Practices
多架构 Helm 应用: Multi-archi Helm Apps
自定义Helm应用导入Addon: Import Custom Helm App Addon
NVIDIA GPU 管理: NVIDIA GPUs
NVIDIA GPU模式说明: Explanation of NVIDIA GPU Modes
GPU Operator: GPU Operator
离线安装GPU Operator: Offline Install GPU Operator
上传Red Hat GPU Operator离线镜像: Upload Red Hat GPU Operator Offline Image
构建Red Hat 8.4离线yum源: Build Red Hat 8.4 Offline Yum Source
构建Red Hat 7.9离线yum源: Build Red Hat 7.9 Offline Yum Source
构建Red Hat 9.2离线yum源: Build Red Hat 9.2 Offline Yum Source
Ubuntu22.04 离线安装GPU驱动: Offline Install GPU Driver for Ubuntu 22.04
NVIDIA GPU整卡模式: NVIDIA Full GPU Mode
NVIDIA vGPU模式: NVIDIA vGPU Mode
安装 vGPU Addon: Install vGPU Addon
使用 NVIDIA vGPU: Use NVIDIA vGPU
构建 vGPU 显存超配镜像: Build vGPU Memory Overprovisioning Image
NVIDIA MIG 模式: NVIDIA MIG Mode
NVIDIA MIG 概述: Overview of NVIDIA MIG
开启 MIG 功能: Enable MIG
使用 NVIDIA MIG: Use NVIDIA MIG
MIG 相关命令: MIG Related Commands
GPU 配额管理: GPU Quota Management
GPU 动态调节: GPU Dynamic Regulation
GPU 监控告警: GPU Alerts
GPU 监控指标: GPU Metrics
GPU 告警规则: GPU Alert Policies
使用 Volcano: Use Volcano
安装 Volcano: Install Volcano
使用Volcano Gang Scheduler: Use Volcano Gang Scheduler
使用 Volcano 优先级抢占策略: Use Volcano Priority Preemption Policy
使用 Volcano Binpack 策略: Use Volcano Binpack Policy
使用 Volcano DRF 策略: Use Volcano DRF Policy
使用 Volcano NUMA 亲和性调度: Scheduling with Volcano NUMA Affinity
GPU 调度配置: GPU Scheduling Configuration
昇腾 NPU 管理: Ascend NPUs
昇腾 NPU 组件安装: Install Ascend NPU Components
使用昇腾 NPU: Use Ascend NPU
启用昇腾 VNPU: Enable Ascend VNPU
天数 GPU 管理: Iluvatar GPUs
沐曦 GPU 管理: MetaX GPUs
寒武纪 GPU 管理: Cambrian MLU GPUs
GPU 相关FAQ: GPU Related FAQ
对 etcd 进行备份还原: Backup and Restore etcd
跨集群备份恢复MySQL: Cross-cluster Backup and Recovery for MySQL
工作集群相关实践: Practices Related to Worker Clusters
加固自建工作集群: Harden Self-built Worker Cluster
离线部署/升级工作集群: Offline Deployment/Upgrade of Worker Cluster
工作集群添加异构节点: Add Heterogeneous Nodes to Worker Cluster
工作集群的控制节点扩容: Expand Control Nodes of Worker Cluster
替换工作集群首个控制节点: Replace First Control Node of Worker Cluster
全局集群工作节点扩容: Global Cluster Worker Node Expansion
CentOS上创建Ubuntu工作集群: Create Ubuntu Worker Cluster on CentOS
CentOS上创建Red Hat工作集群: Create Red Hat Worker Cluster on CentOS
非主流操作系统上创建集群: Create Cluster on Non-mainstream Operating Systems
部署/升级Kubean兼容版本: Deploy/Upgrade Kubean Compatible Version
限制Docker单容器磁盘空间: Limit Disk Space for Single Docker Container
边缘集群部署和管理实践: Edge Cluster Deployment and Management Practices
在离线混部: Online and Offline Deployment
在离线混部概述: Overview
koordinator 安装: Install Koordinator
运维管理: Operations Management
资源管理: Resource Management
队列管理: Queue Management
创建队列: Create Queue
删除队列: Delete Queue
最佳实践: Best Practices
部署 NFS 做数据集预热: Deploy NFS for Dataset Warm-up
更新 Notebook 内置镜像: Update Notebook Built-in Image
Checkpoint 机制及使用介绍: Checkpoint Mechanism and Usage Introduction
使用 AI Lab 微调 ChatGLM3: Fine-tune ChatGLM3 with AI Lab
提交 DeepSpeed 训练任务: Submit DeepSpeed Training Job
训练任务增加调度器选项: Add Scheduler Options to Training Jobs
部署 Label Studio: Deploy Label Studio
故障排查: Troubleshooting
故障排查索引: Troubleshooting Index
集群下拉列表中找不到集群: Cluster Not Found in Dropdown List
Notebook 不受队列配额控制: Notebook Not Controlled by Queue Quota
队列初始化失败: Queue Initialization Failed
开发者手册: Developer Manual
OpenAPI 访问密钥: OpenAPI Access Key
云主机 OpenAPI 文档: Cloud Host OpenAPI
AI Lab OpenAPI 文档: AI Lab OpenAPI
容器管理 OpenAPI 文档: Container Management OpenAPI
可观测性 OpenAPI 文档: Insight OpenAPI
全局管理 OpenAPI 文档: Global Management OpenAPI
安装 Velero 插件: Install Velero Plugin
应用备份: Backup Apps
ETCD 备份: ETCD Backup
创建 Pytorch 任务: Create Pytorch Job
创建 Tensorflow 任务: Create Tensorflow Job
创建 MPI 任务: Create MPI Job
创建 MXNet 任务: Create MXNet Job
创建 PaddlePaddle 任务: Create PaddlePaddle Job
链路查询: Tracing
集群内冷迁移: Cold Migration in Cluster
如何选择运行时: Choose a Runtime
集群版本支持范围: Supported K8s Versions
接入 Rancher 集群: Integrate Rancher Cluster
集群中部署第二调度器: Deploy 2nd Scheduler in Cluster
集群证书更新: Update Cluster Certificate
集群动态资源超卖: Dynamic Resource Overprovision
集成 Falco 安全工具: Integrate Falco
GPU 管理: Manage GPUs
GPU 支持矩阵: GPU Support Matrix
部署资源规划: Resource Planning
Prometheus 资源规划: Prometheus Resource Planning
vmstorage 磁盘规划: vmstorage Disk Planning
调整 vmstorage 磁盘: Adjust vmstorage Disk
安装与升级: Install and Upgrade
在线安装 insight-agent: Online Install insight-agent
获取全局服务集群的数据存储地址: Get Storage Address in Global Service Cluster
升级注意事项: Upgrade Notes
已知问题: Known Issues
大规模日志部署调整: Adjust Large Scale Log
定制Insight组件调度策略: Customize Insight Scheduling Policy
开启大日志和大链路模式: Enable Large Log and Large Trace Mode
在OpenShift 4.x上安装insight-agent: Install insight-agent on OpenShift 4.x
开始观测: Start Observing
OpenTelemetry 观测: OpenTelemetry Observation
使用OTel赋予应用可观测: Use OTel to Enhance Observability
Operator无侵入增强应用: Non-Intrusive Enhancement with Opeartor
向Insight发送链路数据: Send Traces to Insight
链路数据尾部采样方案: Tail Trace Sampling Plan
Java 应用观测: Java App Observation
通过 OTel SDK 增强 Go 应用: Enhance Go App via OTel SDK
使用OTel接收SkyWalking链路数据: Use OTel to Receive SkyWalking Traces
调用链: Call Chain
数据采集: Data Collection
系统配置: System Settings
系统组件: System Components
修改配置: Modify Settings
网络监控之集成 DeepFlow: Network Monitoring with DeepFlow
参考文档: Reference
可观性参考指标说明: Insight Reference Metrics
Insight Grafana 持久化到数据库: Insight Grafana Persistence to Database
自定义探测方式: Custom Probe Method
指标抓取说明: Collect Metrics
告警通知流程说明: Alert Notification Process
通知模板使用说明: Use Notification Template
Lucene 语法使用方法: Lucene Syntax
通过 Sidecar 采集容器日志: Collect Container Logs via Sidecar
兼容性测试: Compatibility Test
Kubernetes 兼容性测试: Kubernetes Compatibility Test
Rancher 兼容性测试: Rancher Compatibility Test
Openshift 兼容性测试: Openshift Compatibility Test
常见问题: FAQs
链路数据的时钟偏移: Clock Offset of Traces
日志采集排障指南: Log Collection Troubleshooting
链路采集排障指南: Trace Collection Troubleshooting
使用Insight定位异常: Use Insight to Locate Errors
ES 数据塞满时如何操作: How to Operate When ES Data is Full
容器日志黑名单: Container Log Blacklist
安装、登录和升级: Install, Login, and Upgrade
离线升级全局管理: Offline Upgrade Global Management
自定义反向代理服务器地址: Custom Reverse Proxy Address
开启Folder/WS的隔离模式: Enable Isolation Mode for Folder/WS
国密网关: National Secret Gateway
登录: Login
用户与访问控制: Access Control
什么是用户与访问控制: What is Access Control
用户: User
用户组: Group
角色: Role
系统角色: System Role
自定义角色: Custom Role
身份提供商: Identity Provider
OAuth 2.0 之企业微信: OAuth 2.0 for WeWork
接入管理: Docking Portal
工作空间与层级: Workspace and Folder
绑定工作空间: Bind Workspace
为工作空间分配资源: Allocate Resources to Workspace
创建和删除工作空间: Create and Delete Workspace
工作空间权限: Workspace Permissions
创建和删除文件夹: Create and Delete Folder
文件夹权限: Folder Permissions
资源配额: Resource Quotas
资源组与共享资源的区别: Resource Groups and Shared Resources
资源绑定权限说明: Permissons to Bind Resource
审计日志: Audit Logs
采集 K8s 审计日志: Collect K8s Audit Logs
生成 K8s 审计日志: Generate K8s Audit Logs
下载和导出审计日志: Download and Export Audit Logs
获取审计日志源IP: Get Source IP of Audit Logs
审计项汇总: Summary of Audit Items
容器管理审计项: Container Management Audit Items
云主机审计项: Cloud Host Audit Items
可观测性审计项: Insight Audit Items
全球管理审计项: Global Management Audit Items
运营管理: Operations Management
报表管理: Reports
计费计量: Billing and Accounting
平台设置: Platform Settings
安全策略: Security Policy
邮箱服务器设置: Email Settings
外观定制: Appearance
关于平台: About
密码重置: Reset Password
工作空间最佳实践: Workspace Best Practices
工作空间绑定命名空间: Bind Namespace to Workspace
单集群分配给多个工作空间: Single Cluster Allocated to Workspaces
文件夹最佳实践: Folder Best Practices
普通用户授权规划: Regular User Authorization
超大型企业的架构管理: Architecture for Large Enterprises
GProduct 对接全局管理: GProduct Integration with Global Management
接入导航栏: Docking Navigation Bar
接入路由和登录认证: Docking Ingress and Login Authentication
集成与被集成(OEM IN/OUT): OEM IN/OUT)
定制对接 IdP: Custom Integration with IdP
Keycloak 自定义 IdP: Keycloak Custom IdP
定制导航栏: Custom Navigation Bar
基于权限显示/隐藏导航栏菜单: Show/Hide Navigation Menu
故障排查: Troubleshooting
ingressgateway 无法启动: ingressgateway Fails to Start
登录报错 401 或 403: Login Error 401 or 403
keycloak 无法启动: Keycloak Fails to Start
单独升级全局管理时失败: Failure When Upgrading Global Management
权限说明: Permissions
容器管理权限说明: Container Management Permissions
AI Lab 权限说明: AI Lab Permissions