Skip to content

upstream kubernetes scalability evaluations

hwchen edited this page Jul 23, 2021 · 7 revisions

In order to have an idea how well the upstream Kubernetes is doing with large scaled configuration, we conduct some kubemark perf tests and record the results. the purpose of such exercises is for comparison with Arktos in scalability, and get the estimate the room for Arktos 1x1 setup to potentially improve.

Perf tool used is clusterloader2 corresponding k8s 1.18.5 (commit d9a552b4f2e00dfc4c12e74eddae0a5e10aeed71, 2020/4/9). This version is used as we would like to have the "same" tool as Arktos perf runs. The density configuration artifact is from arktos repo (pvc being disabled).

This is known issue that perf test reported as failed due to inability to extract metrics from scheduler pod by the perf tool against k8s 1.21. Since we care about pod latency only for the sake of scalability, such failure is ignored.

version 1.21

15K hollow nodes

  • significant vars in use
export KUBEMARK_NUM_NODES=15000 NUM_NODES=160
export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB MASTER_SIZE=n1-highmem-96
export NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB
export LOGROTATE_FILES_MAX_COUNT=200 LOGROTATE_MAX_SIZE=200M

export KUBE_CONTROLLER_EXTRA_ARGS="--kube-api-qps=100 --kube-api-burst=150"
export KUBE_SCHEDULER_EXTRA_ARGS="--kube-api-qps=200 --kube-api-burst=300"
export KUBE_APISERVER_EXTRA_ARGS="--max-mutating-requests-inflight=20000 --max-requests-inflight=40000"
  • Latency result
phase pod_startup(ms) create_to_schedule(ms) schedule_to_run(ms) run_to_watch(ms) schedule_to_watch(ms)
latency P50=1543.16
P90=2612.09
P99=5645.06
P50=639.83
P90=1137.97
P99=3503.60
P50=-98.76
P90=292.26
P99=809.40
P50=1066.72
P90=1677.73
P99=4841.41
P50=934.70
P90=1460.66
P99=4730.00
  • detailed log

host: hw-k8s-test
folder: home/howell/logs/perf-test/gce-15000/hwtest-k8s121-0715-15k/

20K hollow nodes

  • significant vars in use
export KUBEMARK_NUM_NODES=20000 NUM_NODES=236
export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB MASTER_SIZE=n1-highmem-96
export NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB
export LOGROTATE_FILES_MAX_COUNT=200 LOGROTATE_MAX_SIZE=200M

export KUBE_CONTROLLER_EXTRA_ARGS="--kube-api-qps=100 --kube-api-burst=150"
export KUBE_SCHEDULER_EXTRA_ARGS="--kube-api-qps=200 --kube-api-burst=300"
export KUBE_APISERVER_EXTRA_ARGS="--max-mutating-requests-inflight=20000 --max-requests-inflight=40000"
  • Test summary

Test was aborted, as we observed that scheduler and KCM had been restarted several times due to "leader election lost" error.

  • Detailed log

No test result files available as it was aborted.
We did collect some log files, saved at //hw-k8s-test:/home/howell/logs/perf-test/gce-20000/hwtest-k8s121-20k-0716/

20K hollow nodes (disabling leader-elect)

  • on top of the above regular 20K used env vars,
export KUBE_CONTROLLER_EXTRA_ARGS="${KUBE_CONTROLLER_EXTRA_ARGS} --leader-elect=false"
export KUBE_SCHEDULER_EXTRA_ARGS="${KUBE_SCHEDULER_EXTRA_ARGS} --leader-elect=false"