Node annotator regularly pulls real-time node load metrics from promethues and synchronizes them to the node annotation.
The node annotation generated by the node-annotator has the flow format:
[name] = "[timestamp]:[interval]:[value]:[threshold]:[weight]"
Note:
- name: node load metrics name, start with caih.com/
- timestamp: timestamp for node load metrics
- interval: annotation interval
- value: [0-1], the higher value, the higher node load
- threshold: [0-1], the threshold for pre-selection policy
- weight: [0-1], the score weight for preferential selection policy, sum(weight)=1
groups:
- name: cpu_mem_usage_active
interval: 30s
rules:
- record: scheduler_cpu_usage_active_percent
expr: 1 - avg by (node_name) (irate(node_cpu_seconds_total{mode="idle"}[2m]))
- record: scheduler_mem_usage_active_percent
expr: 1 - node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes
- name: cpu-mem-usage-total-avg
interval: 5m
rules:
- record: scheduler_cpu_utilization_total_avg_percent
expr: 1 - avg(irate(node_cpu_seconds_total{mode="idle"}[2m]))
- record: scheduler_mem_utilization_total_avg_percent
expr: 1 - sum(node_memory_MemAvailable_bytes)/sum(node_memory_MemTotal_bytes)
- name: cpu-usage-5m
interval: 5m
rules:
- record: scheduler_cpu_usage_max_avg_1h_percent
expr: max_over_time(scheduler_cpu_usage_avg_5m_percent[1h])
- record: scheduler_cpu_usage_max_avg_1d_percent
expr: max_over_time(scheduler_cpu_usage_avg_5m_percent[1d])
- name: cpu-usage-1m
interval: 1m
rules:
- record: scheduler_cpu_usage_avg_5m_percent
expr: avg_over_time(scheduler_cpu_usage_active_percent[5m])
- name: mem-usage-5m
interval: 5m
rules:
- record: scheduler_mem_usage_max_avg_1h_percent
expr: max_over_time(scheduler_mem_usage_avg_5m_percent[1h])
- record: scheduler_mem_usage_max_avg_1d_percent
expr: max_over_time(scheduler_mem_usage_avg_5m_percent[1d])
- name: mem-usage-1m
interval: 1m
rules:
- record: scheduler_mem_usage_avg_5m_percent
expr: avg_over_time(scheduler_mem_usage_active_percent[5m])
curl -XPOST <prometheus-url>/-/reload
- name: scheduler_cpu_usage_avg_5m_percent
expr: scheduler_cpu_usage_avg_5m_percent
weight: 0.15
threshold: 0.75
- name: scheduler_cpu_usage_max_avg_1h_percent
expr: scheduler_cpu_usage_max_avg_1h_percent
weight: 0.05
threshold: 0.85
- name: scheduler_cpu_usage_max_avg_1d_percent
expr: scheduler_cpu_usage_max_avg_1d_percent
weight: 0.05
- name: scheduler_mem_usage_avg_5m_percent
expr: scheduler_mem_usage_avg_5m_percent
weight: 0.4
threshold: 0.75
- name: scheduler_mem_usage_max_avg_1h_percent
expr: scheduler_mem_usage_max_avg_1h_percent
weight: 0.2
- name: scheduler_mem_usage_max_avg_1d_percent
expr: scheduler_mem_usage_max_avg_1d_percent
weight: 0.15
Note:
- name: node load metric name, start with scheduler_
- expr: node load metric expression
- threshold: [0-1], the threshold for pre-selection policy
- weight: [0-1], the score weight for Preferential selection policy
- --config: k8s config file path, default: '' for inclusterconfig model
- --prometheus-url: prometheus url, default: ''
- --pushgateway-url:pushgateway url, for hotspot-prevention policy, default: ''
- --scheduler-name: scheduler name, default: 'caihcloud-scheduler'
- --lease-lock-namespace: leaderelection lock namespace, default: 'monitor'
- --annotator-config: config file, default: '/config/annotator-config.yaml'
$ kubectl apply -f deploy/node-annotator.yaml
$ go run main.go --config [kube-config path] --prometheus-url [prometheus-url] --pushgateway-url [pushgateway-url] --scheduler-name --annotator-config [annotator-config]
$ sh build.sh