Envoy's lb_policy: ROUND_ROBIN is not balancing cluster hosts #37369

rookrunner · 2024-11-26T12:21:43Z

Title: Envoy's lb_policy: ROUND_ROBIN is not balancing cluster hosts

Description:

I have a envoy configuration like this:

 - name: testserverpolling
    address:
      socket_address:
        address: 127.0.0.1
        port_value: 8129
        protocol: TCP
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          codec_type: AUTO
          stat_prefix: testserverpolling
          route_config:
            name: testserverpolling
            virtual_hosts:
            - name: testserverpolling
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: testserverpolling
                  timeout:
                    seconds: 1800
                decorator:
                  operation: testserverpolling
              require_tls: NONE
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          server_name: envoy-ingress-testserverpolling
          drain_timeout:
            seconds: 1800
          common_http_protocol_options:
            idle_timeout: 1800s
          access_log:
          - name: envoy.access_loggers.file
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
              path: "/var/DebugTrace/envoylb/envoylb_testserverpolling.log"
          generate_request_id: true
  - name: testserverpolling
    type: STRICT_DNS
    connect_timeout:
      seconds: 1800
    lb_policy: ROUND_ROBIN
    transport_socket:
      name: envoy.transport_sockets.tls
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
        common_tls_context:
          tls_params:
            tls_minimum_protocol_version: "TLSv1_2"
          tls_certificates:
          - certificate_chain:
              filename: "/certs/cert.pem"
            private_key:
              filename: "/certs/key.pem"
            password:
              filename: "/etc/pkencryptkey/password"
    load_assignment:
      cluster_name: testserverpolling
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: test-ckey-svc-headless.{{ .Release.Namespace }}.{{ .Values.global.servicedomainName }}.
                    port_value: 8124
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http_protocol_options: { }

I have 6 replicas of the pods which are behind the service - test-ckey-svc-headless.{{ .Release.Namespace }}.{{ .Values.global.servicedomainName }}.

My expectation was when I send 6 requests to envoy, it will evenly distribute 1 requests each to all the pods. But, it is not evenly distributing. Some pods do not even get any requests.

[testvm ~]$ kubectl exec -it test-sd-ckey-0 -n testns -- bash -c "wc -l /var/DebugTrace/envoy/test_polling_service.log"
3 /var/DebugTrace/envoy/test_polling_service.log
[testvm ~]$ kubectl exec -it test-sd-ckey-1 -n testns -- bash -c "wc -l /var/DebugTrace/envoy/test_polling_service.log"
0 /var/DebugTrace/envoy/test_polling_service.log
[testvm ~]$ kubectl exec -it test-sd-ckey-2 -n testns -- bash -c "wc -l /var/DebugTrace/envoy/test_polling_service.log"
2 /var/DebugTrace/envoy/test_polling_service.log
[testvm ~]$ kubectl exec -it test-sd-ckey-3 -n testns -- bash -c "wc -l /var/DebugTrace/envoy/test_polling_service.log"
0 /var/DebugTrace/envoy/test_polling_service.log
[testvm ~]$ kubectl exec -it test-sd-ckey-4 -n testns -- bash -c "wc -l /var/DebugTrace/envoy/test_polling_service.log"
1 /var/DebugTrace/envoy/test_polling_service.log
[testvm ~]$ kubectl exec -it test-sd-ckey-5 -n testns -- bash -c "wc -l /var/DebugTrace/envoy/test_polling_service.log"
0 /var/DebugTrace/envoy/test_polling_service.log

Is there anything wrong in my configuration. Can anyone please help.

I tried using max_requests_per_connection = 1, in the cluster section but, that too didnt help.

The text was updated successfully, but these errors were encountered:

RyanTheOptimist · 2024-11-26T16:06:58Z

@wbpcode @tonya11en @nezdolik

tonya11en · 2024-11-26T19:07:56Z

Are there log files showing the behavior?

wbpcode · 2024-11-27T02:24:24Z

how much concurrency you used? Note the round robin is a thread local load balancer, it only ensure the requests in same thread will be destributed to the hosts in turn.

rookrunner · 2024-11-27T03:32:17Z

@wbpcode Using concurrency as 8.
referred this: https://www.envoyproxy.io/docs/envoy/latest/faq/load_balancing/concurrency_lb

/Envoy $ ps -aef | grep envoy

1001050+ 436 293 0 Nov26 pts/0 00:04:13 /usr/local/bin/envoy --config-path /tmp/envoy_bkp.yaml --base-id 0 --concurrency 8 --drain-time-s 30 --drain-strategy immediate --parent-shutdown-time-s 40 --restart-epoch 1
1001050+ 23935 23808 0 03:28 pts/2 00:00:00 grep envoy

rookrunner · 2024-11-27T04:03:31Z

@tonya11en

I tried running envoy in debug and got this log.
/usr/local/bin/envoy --config-path /tmp/envoy.yaml --base-id 0 --concurrency 8 --drain-time-s 30 --log-level debug --drain-strategy immediate --parent-shutdown-time-s 40 --restart-epoch 1
It does not have much info though

[2024-11-27 03:52:00.102][24477][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:302] dns resolution for test-ckey-svc-headless.testns.svc.cluster.local. completed with status 0
[2024-11-27 03:52:00.102][24477][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 192.168.46.68:8002
[2024-11-27 03:52:00.102][24477][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 192.168.61.84:8002
[2024-11-27 03:52:00.102][24477][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 192.168.27.220:8002
[2024-11-27 03:52:00.102][24477][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 192.168.28.217:8002
[2024-11-27 03:52:00.102][24477][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 192.168.45.147:8002
[2024-11-27 03:52:00.102][24477][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 192.168.40.191:8002
[2024-11-27 03:52:00.102][24477][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:193] DNS refresh rate reset for test-ckey-svc-headless.testns.svc.cluster.local.,, refresh rate 5000 ms

This is how I am sending requests to port 8129 exposed by envoy:
for i in $(seq 1 6); do echo "Attempt #$i"; curl -vvv http://127.0.0.1:8129/ack_route; echo; done

rookrunner added the triage Issue requires triage label Nov 26, 2024

RyanTheOptimist added bug area/load balancing and removed triage Issue requires triage labels Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Envoy's lb_policy: ROUND_ROBIN is not balancing cluster hosts #37369

Envoy's lb_policy: ROUND_ROBIN is not balancing cluster hosts #37369

rookrunner commented Nov 26, 2024 •

edited

Loading

RyanTheOptimist commented Nov 26, 2024

tonya11en commented Nov 26, 2024

wbpcode commented Nov 27, 2024

rookrunner commented Nov 27, 2024

rookrunner commented Nov 27, 2024

Envoy's lb_policy: ROUND_ROBIN is not balancing cluster hosts #37369

Envoy's lb_policy: ROUND_ROBIN is not balancing cluster hosts #37369

Comments

rookrunner commented Nov 26, 2024 • edited Loading

RyanTheOptimist commented Nov 26, 2024

tonya11en commented Nov 26, 2024

wbpcode commented Nov 27, 2024

rookrunner commented Nov 27, 2024

rookrunner commented Nov 27, 2024

rookrunner commented Nov 26, 2024 •

edited

Loading