PaddleFlow-cli 是基于PaddleFlow python-sdk 上层封装的命令行工具。支持用户管理(user
)、 存储管理(fs
- python_requires >= 3.6.1
- click==7.1.2
- tabulate==0.8.9
- setuptools >= 21.0.0
$ paddleflow --help
Usage: paddleflow [OPTIONS] COMMAND [ARGS]...
paddleflow is the command line interface to paddleflow service.
provide `user`, `queue`, `fs`, `run`, `pipeline`, `cluster`, `flavour`
operation commands
--pf_config TEXT the path of default config.
--output [table|json|text] The formatting style for command output.
[default: table]
--help Show this message and exit.
cluster manage cluster resources
flavour manage flavour resources
fs manage fs resources
job manage job resources
log manage log resources
pipeline manage pipeline resources
queue manage queue resources
run manage run resources
schedule manage schedule resources
statistics show resources statistics
user manage user resources
命令行使用多层命令结构,命令行以 paddleflow
为开头,其中 <>
paddleflow [options] [<submodule> <verb> <parameters> [options]]
安装paddleflow python sdk的过程中会同步完成paddleflow cli 的安装流程。用户在安装完成后只需要完成配置文件的配置,即可开始paddleflow cli的使用之旅。
name = 账户名
password = 账户密码
# paddleflow server 主机地址
paddleflow_server_host =
# paddleflow server 端口
paddleflow_server_port = 8999
,不是必须填写选择,如果用户在使用过程中没有调整过paddleflow server
服务的端口,则不需要进行填写。 paddleflow cli
, list
, set
四种不同的方法。 四种不同操作的示例如下:
paddleflow user add name password //新增用户 仅root账号可以使用
paddleflow user delete name //删除用户 仅root账号可以使用
paddleflow user set name password // 用户密码更新
paddleflow user list // 用户列表展示 仅root账号可以使用
新增用户:paddleflow user add test pass****
user[test] add success
删除用户: root账号输入paddleflow user delete ****
user[****] delete success
用户密码更新: 输入paddleflow user set test pass***
user[test] update success
用户列表展示: root账号输入 paddleflow user list
| name | create time |
| aaa | 2021-09-01T22:51:33+08:00 |
| aaa1 | 2021-09-01T23:11:06+08:00 |
| aaa3 | 2021-09-01T23:11:25+08:00 |
, delete
, list
, show
, update
, grant
八种不同的方法。 八种不同操作的示例如下:
$ paddleflow queue --help
Usage: paddleflow queue [OPTIONS] COMMAND [ARGS]...
manage queue resources
--help Show this message and exit.
create create queue.
delete delete queue.
grant add grant.
grantlist list grant
list list queue.
show show queue info.
ungrant delete grant.
update update queue.
队列列表: 用户输入 paddleflow queue list
| name | namespace | status | cluster name | create time | update time |
| default-queue | default | open | default-cluster | 2022-08-03 20:18:31 | 2022-08-18 15:45:57 |
| default | default | open | default-cluster | 2022-08-03 20:18:31 | 2022-08-03 20:18:31 |
| pf-queue | default | open | default-cluster | 2022-08-03 20:18:31 | 2022-08-03 20:18:31 |
| root | default | open | default-cluster | 2022-08-03 20:18:31 | 2022-08-03 20:18:31 |
marker: None
队列详情: 用户输入 paddleflow queue show default-queue
| name | namespace | status | quota type | cluster name | create time | update time |
| default-queue | default | open | volcanoCapabilityQuota | default-cluster | 2022-08-03 20:18:31 | 2022-08-18 15:45:57 |
queue info:
"max resources": {
"cpu": "20",
"mem": "20Gi"
"min resources": {
"cpu": "0",
"mem": "0"
"used resources": {
"cpu": "0",
"mem": "0"
"idle resources": {
"cpu": "20",
"mem": "20Gi"
队列创建:用户输入 paddleflow queue create queuename default 10 20Gi --clustername default-cluster
queue[queuename] create success
队列更新:用户输入 paddleflow queue update queuename --maxcpu 20
queue[queuename] update success
队列删除:用户输入 paddleflow queue delete queuename
queue[queuename] delete success
队列授权: root账号输入 paddleflow queue grant username queuename
queue[queuename] add username[username] success
队列取消授权:root账号输入 paddleflow queue ungrant username queuename
queue[queuename] delete username[username] success
队列授权信息展示:root账号输入paddleflow queue grantlist
| user name | queue name |
| aaa3 | aa |
| binbin | aa |
| binbin | aa1 |
提供了 create
, delete
, list
, show
, update
五种不同的方法。 操作的示例如下:
$ paddleflow flavour --help
Usage: paddleflow flavour [OPTIONS] COMMAND [ARGS]...
manage flavour resources
--help Show this message and exit.
create create flavour.
delete delete flavour.
list list flavour.
show show flavour info.
update update info from flavourname.
flavour列表显示:用户输入paddleflow flavour list
| name | cpu | mem | scalarResources |
| flavour1 | 1 | 1Gi | null |
| flavour2 | 4 | 8Gi | {"": "1"} |
| flavour3 | 4 | 8Gi | {"": "2"} |
套餐详情: 用户输入 paddleflow flavour show flavour1
| name | cpu | mem | scalarResources |
| flavour1 | 4 | 8G | {"": "8"} |
套餐创建:用户输入 paddleflow flavour create flavour_gpu -c 4 -m 8G -s
flavour[flavour_gpu] create success
套餐更新:用户输入 paddleflow flavour update flavour_gpu -m 8
update [flavour_gpu] success
套餐删除:用户输入 paddleflow flavour delete flavour_gpu
flavour[flavour_gpu] delete success
, delete
, list
, mount
, listlink
, showlink
九种不同的方法。 九种不同操作的示例如下:
paddleflow fs list -u username// 展示fs列表 -u 表示特定用户的fs
paddleflow fs show fsname -u username// 显示某个fs详情 -u 表示特定用户的fs
paddleflow fs delete fsname -u username // 删除某个fs -u 表示特定用户的fs
paddleflow fs create fsname url -o ak=xxx -o sk=xxx //创建fs
paddleflow fs mount fsname /home/mountpath -u username //把某个文件系统挂载到本地
paddleflow fs link fsname fspath url -o ak=xxx -o sk=xxx -u username //创建link -u 表示特定用户的fs
paddleflow fs unlink fsname fspath -u username // 删除某个特定用户特点文件系统下的link -u 表示特定用户的fs
paddleflow fs listlink fsname -u username// 展示某个文件系统下面的link列表 -u 表示特定用户的fs
paddleflow fs showlink fsname fspath -u username// 显示某个link详情 -u 表示特定用户的fs
存储列表:用户输入paddleflow fs list
| name | owner | type | server address | sub path | properties |
| sftp1236 | root | sftp | localhost:8001 | /data2 | {'password': 'xxx', 'user': 'xxx'} |
| elsiefs1 | root | hdfs |, | /elsiefs | {'dfs.namenode.address': ',', 'group': 'test', 'user': 'test'}
存储详情:用户输入paddleflow fs show {fs_name}
| name | owner | type | server address | sub path | properties |
| s3-1 | root | s3 | | /myfs1 | {'accessKey': 'xxx', 'bucket': 'test', 'endpoint': '', 'region': 'bj', 'secretKey': 'xxx'} |
创建存储:用户输入paddleflow fs create {fs_name} {fs_url} -o {}
,可以通过paddleflow fs create --help
fs[{fs_name}] create success
删除存储:用户输入paddleflow fs delete {fs_name}
fs[{fs_name}] delete success
mount命令:用户输入paddleflow fs mount {fs_name} {mountpath}
mount success
某个文件系统下的Link列表:用户输入paddleflow fs listlink {fsname}
| name | owner | type | fs path | server address | sub path | properties |
| local12 | root | local | /linktest1 | | /home/yoursubpath/1| {'debug': 'true'} |
| local12 | root | local | /linktest2 | | /home/yoursubpath/1| {'debug': 'true'} |
某个Link详情:用户输入paddleflow fs showlink {fsname} {fspath}
| name | owner | type | fs path | server address | sub path | properties |
| local12 | root | local | /linktest1 | | /home/yoursubpath/7| {'debug': 'true'} |
创建link:用户输入paddleflow fs link {fs_name} {fspath} {fs_url} -o {}
,可以通过paddleflow fs link --help
fs[{fs_name}] create link success
删除link:用户输入paddleflow fs unlink {fs_name} {fspath}
fs[{fs_name}] delete link success
, list
, status
, stop
, retry
, delete
, listcache
, showcache
, delcache
, artifact
十种不同的方法。 十种不同操作的示例如下:
paddleflow run create -f(--fsname) fs_name -n(--name) run_name -d(--desc) xxx -u(--username) username -p(--param) data_file=xxx -p regularization=*** -yp(--runyamlpath) ./run.yaml -pplid(--pipelineid) ppl-000666 -pplver(--pplversionid) 1 -yr(runyamlraw) xxx --disabled some_step_names -de(--dockerenv) docker_env
// 创建pipeline作业,-yp、-pplid、yr为3中发起任务的方式,每次只能使用其中一种
paddleflow run list -f(--fsname) fsname -u(--username) username -r(--runid) runid -n(--name) name -s(--status) runinng -m(--maxsize) 10 -mk(--marker) xxx
// 列出所有运行的pipeline (通过fsname 列出特定fs下面的pipeline;通过username 列出特定用户的pipeline(限root用户);通过runid列出特定runid的pipeline; 通过name列出特定name的pipeline; 通过status列出特定状态的pipeline)
paddleflow run show runid // 展示一个pipeline下面的详细信息,包括job信息列表
paddleflow run stop runid -f(--force) // 停止一个pipeline
paddleflow run retry runid // 重跑一个pipeline
paddleflow run delete runid -not-cc(-notcheckcache) // 删除一个运行的工作流
paddleflow run listcache -u(--userfilter) username -f(--fsfilter) fsname -r(--runfilter) run-000666 -m(--maxsize) 10 -mk(--marker) xxx // 列出搜有的工作流缓存
paddleflow run showcache cacheid // 显示工作流缓存详情
paddleflow run deletecahce cacheid // 删除指定工作流缓存
paddleflow run listartifact -u(--userfilter) username -f(--fsfilter) fsname -r(runfilter) run-000666 -t(--typefilter) type -p(--pathfilter) path -m(--maxsize) 10 -mk(--marker) xxx // 列出所有工作流产出
创建工作流:用户输入paddleflow run create -f {fs_name} -n {run_name} -d {main} -yp {yaml_path}
run[{run_name}] create success with runid[{runid}]
参数名称 | 是否必填 | 参数含义 |
-f --fsname | optional | 存储名称 |
-n --name | optional | 任务名称 |
-d --desc | optional | 任务描述 |
-u --username | optional | 用户名,仅当登录用户为root时可以填写 |
-p --param | optional | 用于进行参数替换 |
--disabled | opitonal | 用于指定不需要运行的节点 |
-de --dockerenv | optional | 用于指定全局DockerEnv,可以为镜像的url或镜像tar包在fs的路径 |
-yp --runyamlpath | optional | 任务发起方式之一,fs下yaml文件的路径 |
-yr --runyamlraw | optional | 任务发起方式之一,base64编码的yaml文件内容 |
-pplid --pipelineid | opitonal | 任务发起方式之一,工作流模板的ID,如何创建工作流模板请查看后文工作流模板的相关内容 |
创建工作流至少需要提供1个参数,且必须是 -yp/-yr/-pplid 中的一个,这个参数用来指定创建工作刘的方法,而对于-yp对应的方法,还必须再指定 -f 参数。其他参数则均为选填。
- runyamlpath:
paddleflow run create -f testfs -yp ./run.yaml
上面的命令中,-f 和 -yp 都是必须要填写的参数。
name: base_pipeline
command: bash base_pipeline/shells/ {{data_path}}
docker_env: kfp_mysql:1.7.0
PF_JOB_FLAVOUR: flavour1
PF_JOB_QUEUE_NAME: ppl-queue
PF_JOB_TYPE: vcjob
data_path: ./base_pipeline/data/{{PF_RUN_ID}}
command: bash base_pipeline/shells/ {{epoch}} {{train_data}} {{model_path}}
deps: preprocess
PF_JOB_FLAVOUR: flavour1
PF_JOB_QUEUE_NAME: ppl-queue
PF_JOB_TYPE: vcjob
epoch: 5
model_path: ./output/{{PF_RUN_ID}}
train_data: '{{preprocess.data_path}}'
command: bash base_pipeline/shells/ {{model_path}}
deps: train
PF_JOB_FLAVOUR: flavour1
PF_JOB_QUEUE_NAME: ppl-queue
PF_JOB_TYPE: vcjob
model_path: '{{train.model_path}}'
parallelism: 1
docker_env: nginx:1.7.9
- runyamlraw:
paddleflow run create -yr {{base64yaml}}
其中 {{base64yaml}} 为将yaml文件的内容进行base64转码后的结果。
需要注意的是,如果使用这种方法,或者下面马上要介绍的pipelineid方法,-f 就不是必须的了,但依赖fs的功能则无法使用,如 Artifact。
- pipelineid:
paddleflow run create -pplid ppl-000666 -pplver 1
工作流列表:用户输入paddleflow run list
,界面上能够显示出所有工作流列表信息,marker为下一页的起始位,供 -mk --marker 参数使用
| run id | fs name | username | status | name | description | run msg | source | schedule id | scheduled time | create time | activate time | update time |
| run-000001 | fs | root | succeeded | myproject | | begin to running, update status to running | runDag.yaml | schedule-000003 | 2022-08-15 16:47:00 | 2022-08-04 11:45:25 | 2022-08-04 11:45:25 | 2022-08-04 11:48:48 |
| run-000002 | fs | root | succeeded | myproject | | begin to running, update status to running | runDag.yaml | | | 2022-08-04 11:45:29 | 2022-08-04 11:45:29 | 2022-08-04 11:48:59 |
| run-000003 | | root | succeeded | test | | begin to running, update status to running | 0daaa6deade1ff7f10531da958e0c298 | | | 2022-08-04 11:45:52 | 2022-08-04 11:45:53 | 2022-08-04 11:46:11 |
工作流详情:用户输入paddleflow run show run-cb96cf93(run id)
| run id | status | name | desc | param | source | run msg | create time | update time | activate time |
| run-000060 | running | dagProject | | | runDag.yaml | begin to running, update status to running | 2022-08-18 19:11:33 | 2022-08-18 19:12:51 | 2022-08-18 19:12:51 |
| fs name | username | docker env | schedule id | fs options(json) | failure options(json) | disabled | run cached ids |
| cyang14 | root | docekrenv | | {'mainFS': {'id': '', 'name': 'abcdefg', 'mountPath': '', 'subPath': '', 'readOnly': False}} | {'strategy': 'fail_fast'} | | |
| run yaml detail |
| name: dagProject |
| ...... |
| runtime in json |
| { |
| "disDag": [ |
| { |
| "dagID": "dag-run-000059-8c6fb65ec1d2d3d1", |
| "name": "dag-run-000059-disDag", |
| Omitted here ...... |
| postProcess in json |
| { |
| "last": { |
| "artifacts": { |
| "input": {}, |
| "output": {} |
| }, |
| Omitted here ...... |
作业停止:用户输入paddleflow run stop {runid}
run[runid] stop success
作业重试:用户输入paddleflow run retry {runid}
run[runid] retry success, new run id is [run-newid]
作业删除:用户输入paddleflow run delete {runid}
run[runid] delete success
作业缓存列表显示:用户输入paddleflow run listcache
| cache id | run id | jobid | fsname | username | expired time | create time | update time |
| cch-000001 | run-000064 | job-xxxxxx | mxy | root | 400 | 2021-12-10 09:56:42 | 2021-12-10 09:56:42 |
| cch-000002 | run-000064 | job-xxxxxx | mxy | root | 400 | 2021-12-10 09:57:00 | 2021-12-10 09:57:00 |
| cch-000003 | run-000064 | job-xxxxxx | mxy | root | 400 | 2021-12-10 09:57:12 | 2021-12-10 09:57:12 |
| cch-000005 | run-000076 | job-xxxxxx | zzc | root | 400 | 2021-12-10 11:51:04 | 2021-12-10 11:51:04 |
marker: 075edd00394b91e6af9d3e02acecc722
作业缓存详情显示:用户输入paddleflow run showcache {cacheid}
| cache id | run id | step | fsname | username | expired time | strategy | custom | create time | update time |
| cch-000001 | run-000064 | preprocess | mxy | root | 400 | conservative | | 2021-12-10 09:56:42 | 2021-12-10 09:56:42 |
| first fp | second fp |
| xxx | xxx |
作业缓存删除:用户输入paddleflow run deletecache {cacheid}
runcache[cacheid] delete success
工作流产出显示:用户输入paddleflow run listartifact
| run id | fsname | username | artifact path | type | step | artifact name | meta | create time | update time |
| run-000043 | xiangy2 | root | ./data/run-000043 | preprocess | input | | | 2021-12-10 07:55:28 | 2021-12-10 07:55:28 |
| run-000043 | xiangy2 | root | ./data/run-000043 | preprocess | input | | | 2021-12-10 07:55:28 | 2021-12-10 07:55:28 |
| run-000265 | xiangy2 | root | ./data/ | preprocess | input | | | 2021-12-10 08:08:08 | 2021-12-13 11:28:09 |
| run-000047 | xiangy2 | root | ./data/run-000047 | preprocess | input | | | 2021-12-10 08:11:52 | 2021-12-10 08:18:14 |
marker: f990bc858cbd2a8d5eae9243970a2d8c
, list
, delete
, update
, showverion
, deleteverion
7种不同的方法。 7种不同操作的示例如下:
paddleflow pipeline create fsname:required(必须) -yp (--yamlpath) path -n(--name) pipeline_name -u(--username) username // 创建pipeline模板(指定创建的pipeline模板名称;指定模板的用户)
paddleflow pipeline list -u(--userfilter) user -n(--namefilter) pipeline_name -m(--maxkeys) int -mk(--marker) xxx // 列出所有的pipeline模板 (通过username 列出特定用户的pipeline模板(限root用户);通过fsname 列出特定fs下面的pipeline模板;通过pipelinename列出特定的pipeline模板;列出指定数量的pipeline模板;从marker列出pipeline模板)
paddleflow pipeline show pipelineid // 展示一个pipeline模板下面的详细信息,包括yaml信息
paddleflow pipeline delete pipelineid // 删除一个pipeline模板
paddleflow pipeline update piplineid fsname yamlpath -n(--name) pipeline_name -u(--username) username // 更新Pipeline模板(创建pipeline模板版本)
paddleflow pipeline showversion pipelineid pipelineversionid // 查看一个pipeline模板版本
paddleflow pipeline deleteversion pipelineid pipelineversionid // 删除一个pipeline模板版本
工作流模板创建:用户输入paddleflow pipeline create fsname -yp yamlpyth
pipeline[dagProject] create success, id[ppl-000009], versionID[1]
工作流模板列表显示:用户输入paddleflow pipeline list
| pipeline id | name | username | desc | create time | update time |
| ppl-000004 | myproject | root | | 2022-08-11 12:03:39 | 2022-08-11 15:01:54 |
| ppl-000006 | dagProject_test | root | | 2022-08-17 15:31:29 | 2022-08-17 15:31:29 |
| ppl-000008 | dagProject | root | | 2022-08-18 19:02:42 | 2022-08-18 19:04:23 |
marker: None
工作流模板详情显示:用户输入paddleflow pipeline show {pipelineid}
| pipeline id | name | username | pipeline desc | create time | update time |
| ppl-000009 | dagProject | root | | 2022-08-19 16:04:38 | 2022-08-19 16:04:38 |
| ppl ver id | fs name | yaml path | username | create time | update time |
| 1 | abcdefg | runDag.yaml | root | 2022-08-19 16:04:38 | 2022-08-19 16:04:38 |
工作流模板删除:用户输入paddleflow pipeline delete pipelineid
pipelineid[pipelineid] delete success
工作流模板更新:用户输入paddleflow pipeline update ppl-000001 fsName yamlPath
pipeline[ppl-000009] update success, new version id [2]
工作流模板版本详情查看:用户输入paddleflow pipeline showversion ppl-000001 1
| pipeline id | name | username | pipeline desc | create time | update time |
| ppl-000001 | dagProject | root | | 2022-08-19 16:04:38 | 2022-08-19 16:04:38 |
| ppl ver id | fs name | yaml path | username | create time | update time |
| 1 | fs_name | runDag.yaml | root | 2022-08-19 16:04:38 | 2022-08-19 16:04:38 |
| ppl ver id | pipeline yaml |
| 1 | name: dagProject |
| | |
| | fs_options: |
| | main_fs: {name: lalala} |
| | omitted here ...... |
工作流模板版本删除:用户输入paddleflow pipeline deleteversion ppl-000001 1
pipeline version [ppl-000001] of pipeline [1] delete success
paddleflow schedule create name pplid pplverid crontab -d(--desc) xxx -s(starttime) xxx -e(--endtime) xxx -c(--concurrency) xxx -cp(--concurrencypolicy) xxx -ei(--expireinterval) xxx (--catchup) xxx -u(--username) xxx
// 创建周期调度
paddleflow schedule list -u(--userfilter) user -n(--namefilter) name -p(--pplfilter) xxx -pv(--pplverfilter) xx -s(--statusfilter) xx -m(--maxkeys) 50 -mk(--marker) xxx
// 查看周期调度列表
paddleflow schedule show scheduleid -r(--runfilter) xxx -s(--statusfilter) xxx -m(--maxkeys) 50 -mk(--marker) xxx
// 查看周期调度详情,同时查看该周期调度已发起的Run
paddleflow schedule stop scheduleid
// 暂停周期调度
paddleflow schedule delete scheduleid
// 删除周期调度
创建周期调度:用户输入paddleflow schedule create NAME ppl-000001 1 '*/10 * * * *'
注意这里的crontab需要用引号包起来,如 '*/10 * * * *'
schedule [NAME] create success with schedule id[schedule-000001]
停止周期调度:用户输入paddleflow schedule stop schedule-000001
schedule with id [schedule-000001] stop success
查看周期调度详情:用户输入paddleflow schedule show schedule-000001
| schedule id | name | desc | pipeline id | pipeline version id | crontab | username | fs config | options | message | status |
| schedule-000006 | test3 | | ppl-000005 | 1 | */5 * * * * | root | {'username': ''} | {'catchup': False, 'expireInterval': 0, 'concurrency': 0, 'concurrencyPolicy': 'suspend'} | | terminated |
| schedule id | start time | end time | create time | update time | next run time |
| schedule-000006 | | | 2022-08-18 16:36:07 | 2022-08-18 17:16:29 | 2022-08-18 17:20:00 |
| run id | fs name | username | status | name | description | run msg | source | schedule id | scheduled time | create time | activate time | update time |
| run-000048 | fs_name | root | succeeded | test3 | | begin to running, update status to running | ppl-000005-1 | schedule-000006 | 2022-08-18 16:40:00 | 2022-08-18 16:40:00 | 2022-08-18 16:40:00 | 2022-08-18 16:42:13 |
| run-000049 | fs_name | root | succeeded | test3 | | begin to running, update status to running | ppl-000005-1 | schedule-000006 | 2022-08-18 16:45:00 | 2022-08-18 16:45:00 | 2022-08-18 16:45:00 | 2022-08-18 16:47:14 |
| run-000050 | fs_name | root | succeeded | test3 | | begin to running, update status to running | ppl-000005-1 | schedule-000006 | 2022-08-18 16:50:00 | 2022-08-18 16:50:00 | 2022-08-18 16:50:00 | 2022-08-18 16:52:19 |
查看周期调度列表:用户输入paddleflow schedule list
| schedule id | name | desc | pipeline id | pipeline version id | crontab | username | fs config | options | message | status |
| schedule-000002 | test | | ppl-000003 | 2 | * */10 * * * | root | {'username': ''} | {'catchup': True, 'expireInterval': 0, 'concurrency': 0, 'concurrencyPolicy': 'suspend'} | | terminated |
| schedule-000005 | test2 | | ppl-000005 | 1 | */5 * * * * | root | {'username': ''} | {'catchup': False, 'expireInterval': 0, 'concurrency': 0, 'concurrencyPolicy': 'suspend'} | | terminated |
| schedule id | start time | end time | create time | update time | next run time |
| schedule-000002 | | | 2022-08-10 20:14:41 | 2022-08-11 11:14:10 | 2022-08-11 20:00:00 |
| schedule-000005 | | | 2022-08-18 16:30:00 | 2022-08-18 16:33:47 | 2022-08-18 16:35:00 |
删除周期调度:用户输入paddleflow schedule delete schedule-000001
schedule with id [schedule-000001] delete success
, show
, list
, delete
, update
, resource
六种不同的方法。 六种不同操作的示例如下:
$ paddleflow cluster --help
Usage: paddleflow cluster [OPTIONS] COMMAND [ARGS]...
manage cluster resources
--help Show this message and exit.
create create cluster.
delete delete cluster.
list list cluster.
resource Get the remaining resource information of the cluster.
show show cluster info.
update update info from clustername.
paddleflow cluster list -cn(--clustername) cluster_name -cs(--clusterstatus) cluster_status -m(--maxkeys) int -mk(--marker) xxx//列出所有的集群 (通过cluster_name 列出指定名称的集群;通过cluster_status 列出指定状态的集群;列出指定数量的集群;从marker列出集群)
paddleflow cluster show clustername // 展示一个集群的详细信息,包括credential凭证信息
paddleflow cluster delete clustername //删除一个集群
paddleflow cluster create clustername:required(必须)集群名称 endpoint:required(必须) 节点 clustertype:required(必须) 集群类型 -c(--credential) 凭证文件绝对路径 -id(--clusterid) clusterid -d(--description) 描述 --source Source --setting setting --status status -ns(--namespacelist) namespacelist// 创建集群(自定义集群名称;集群的节点;集群的类型;集群认证的凭证信息,本地文件路径;自定义集群id;集群描述;集群源[AWS, CCE, etc];集群配置信息;集群状态;namespace列表,比如['NS1','NS2'],传入中括号的内容)
paddleflow cluster update clustername:required(必须)集群名称 -e(--endpoint) 节点 -t(--clustertype) 集群类型 -c(--credential) 凭证文件绝对路径 -id(--clusterid) clusterid -d(--description) 描述 --source Source --setting setting --status status -ns(--namespacelist) namespacelist// 更新集群(需要更新的集群名称;集群的节点;集群的类型;集群认证的凭证信息,本地文件路径;自定义集群id;集群描述;集群源[AWS, CCE, etc];集群配置信息;集群状态;namespace列表,比如['NS1','NS2'],传入
paddleflow cluster resource -cn(--clustername) cluster_name // 列表显示所有集群剩余资源(显示指定集群的剩余资源)
集群创建:用户输入paddleflow cluster create clustername, endpoint, clustertype
cluster[cluster name] create success, id[cluster id]
集群创建:用户输入paddleflow cluster list
| cluster id | cluster name | description | cluster type | status | create time | update time |
| 由PF生成 | incididunt | culpa ipsum pariatur | Duis | online | 2021-12-14T16:23:31+08:00 | 2021-12-17T12:59:55+08:00 |
| cluster-81cbff32 | test_zzc | culpa ipsum pariatur | kubernetes-v1.16 | online | 2021-12-14T16:32:03+08:00 | 2021-12-15T17:51:32+08:00 |
| cluster-461ac0b9 | test | test | k8s1.16 | online | 2021-12-15T17:53:32+08:00 | 2021-12-15T17:53:32+08:00 |
| 由PF生产 | test_whq | culpa ipsum pariatur | kubernetes-v1.16 | offline | 2021-12-20T14:24:40+08:00 | 2021-12-20T14:24:40+08:00 |
marker: 879629d9a18721b9a4d1ea6e875e6eaf
集群详情显示:用户输入paddleflow cluster show clustername
| cluster id | cluster name | description | endpoint | source | cluster type | status | setting | namespace list | create time | update time |
| cluster-13797361fc624d7cb17c7635 | yyjtest | | | OnPremise | kusba | online | | | 2021-12-29T10:15:41+08:00 | 2021-12-29T10:35:47+08:00 |
credential value:
apiVersion: v1
集群删除:用户输入paddleflow cluster delete clustername
cluster[clustername] delete success
集群更新:用户输入paddleflow cluster update clustername
cluster[clustername] update success
获取集群资源列表显示:用户输入paddleflow cluster resource
| cluster name | cluster info |
| test_zzc | { |
| | "nodeList": [ |
| | { |
| | "nodeName": "", |
| | "schedulable": true, |
| | "total": { |
| | "cpu": , |
| | "memory": "", |
| | "storage": "" |
| | }, |
| | "idle": { |
| | "cpu": , |
| | "memory": "", |
| | "storage": "" |
| | } |
| | }, |
| | { |
| | "nodeName": "c-2l7plfgf-tpawy7pj", |
| | "schedulable": true, |
| | "total": { |
| | "cpu": , |
| | "memory": "", |
| | "storage": "", |
| | "scalarResources": { |
| | "": "" |
| | } |
| | }, |
| | "idle": { |
| | "cpu": , |
| | "memory": "", |
| | "storage": "", |
| | "scalarResources": { |
| | "": "" |
| | } |
| | } |
| | } |
| | ], |
| | "summary": { |
| | "total": { |
| | "cpu": , |
| | "memory": "", |
| | "storage": "", |
| | "scalarResources": { |
| | "": "" |
| | } |
| | }, |
| | "idle": { |
| | "cpu": , |
| | "memory": "", |
| | "storage": "", |
| | "scalarResources": { |
| | "": "" |
| | } |
| | } |
| | }, |
| | "errMsg": "" |
| | } |
paddleflow log show runid -j(--jobid) jobid -ps(--pagesize) pagesize -pn(--pageno) pageno -fp(--logfileposition) logfileposition
// (required)runid为要查询的run任务的日志;
// (optional)jobid为run任务下指定某个job的id,默认返回所有;
// (optional)pagesize为返回的日志内容的每页行数,默认为100;
// (optional)pageno为返回的日志内容的页数,默认为1;
// (optional)logfileposition为读取日志的顺序,从最开始位置读取为begin,从末尾位置读取为end,默认从尾部开始读取
查询某个run任务下的运行日志:用户输入paddleflow log show run-000419
| run id | job id | task id | has next page | truncated | page no | page size | log content |
| run-000419 | job-run-000419-main-219bc706 | 627e0603-96fd-4fad-91dc-7d0d5fffaf12_container | False | False | 1 | 100 | 2022-02-23T09:10:13.673403692Z bash: line 0: cd: /home/work/model-factory/model_: No such file or directory |
| | | | | | | | 2022-02-23T09:10:13.693219789Z sh: No such file or directory |
提供了 job
paddleflow statistics job jobid -d(--detail) -s(--start) start -e(--end) end -st(--step) step
// (required)jobid为要查询统计信息的job的id;
// (optional)添加detail可以查询更详细的统计信息, 默认不添加;
// (optional)start可以查询指定起始时间的统计信息, 默认为空, 必须和 -d(--detail)参数一起使用;
// (optional)end可以查询指定结束时间的统计信息, 默认为空, 必须和 -d(--detail) 与 -s(--start)参数一起使用, 且必须大于start;
// (optional)step可以查询指定时间范围内的统计信息, 默认为空, 必须和 -d(--detail)参数一起使用;
查询某个job的统计信息:用户输入paddleflow statistics job {jobid}
| cpu usage rate | disk read rate | disk usage | disk write rate | gpu memory usage | gpu memory util | gpu util | memory usage | memory usage rate | net receive bytes | net send bytes |
| 96.49% | 0.00(B/s) | 265420.80(Bytes) | 0.00(B/s) | 0.00(Bytes) | 0.00% | 0.00% | 4067737.60(bytes) | 0.38% | 1.71(B/s) | 0.00(B/s) |
查询某个job的详细统计信息:用户输入paddleflow statistics job {jobid} -d(--detail)
| timestamp | cpu usage rate | memory usage | disk usage | net receive bytes | net send bytes | gpu util | gpu memory util |
| 1657807709 | 0.265535 | 0.245772 | 1961984 | 0 | 0 | 0 | 0.082387 |
| 1657807769 | 0.264947 | 0.245773 | 1982464 | 0 | 0 | 0 | 0.082387 |
| 1657807829 | 0.264354 | 0.245773 | 2007040 | 0 | 0 | 0 | 0.082387 |
# 如果返回的消息过长,则会被服务器截断
results has been truncated due to server side limitation