Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

outbound: Add a request_duration histogram for route backends #2624

Closed
wants to merge 2 commits into from

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Jan 5, 2024

a105b79 - Extract the request counting middleware to an http-prom crate

0218487 - outbound: Add a request_duration histogram for route backends
The outbound proxy reports a counter,
outbound_http_route_backend_requests_total, that illustrates how requests are
dispatched over a logical service's backends.

This change augments these metrics with "request duration" histograms. This
terminology is consistent with that of the prometheus Go client library.

# HELP outbound_http_route_backend_request_duration_seconds The durations between sending an HTTP request and receiving response headers.
# TYPE outbound_http_route_backend_request_duration_seconds histogram
# UNIT outbound_http_route_backend_request_duration_seconds seconds
outbound_http_route_backend_request_duration_seconds_sum{status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 0.07080217000000001
outbound_http_route_backend_request_duration_seconds_count{status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="0.025",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="0.05",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="0.1",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="0.25",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="0.5",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="1.0",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="2.5",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="5.0",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54
outbound_http_route_backend_request_duration_seconds_bucket{le="+Inf",status_code="200",parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 54

A constrained histogram is used to balance the tradeoff between accuracy and
cost.

Additionally, a basic counter is added to track errors emitted from backends.
Given the current proxy configuration, these can only indicate load shedding
errors:

# HELP outbound_http_route_backend_request_errors The total number of errors encountered while waiting for a response.
# TYPE outbound_http_route_backend_request_errors counter
outbound_http_route_backend_request_errors_total{parent_group="core",parent_kind="Service",parent_namespace="emojivoto",parent_name="emoji-svc",parent_port="8080",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",backend_group="core",backend_kind="Service",backend_namespace="emojivoto",backend_name="emoji-svc",backend_port="8080",backend_section_name=""} 0

Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: Patch coverage is 80.89888% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 67.83%. Comparing base (96124bc) to head (0218487).
Report is 218 commits behind head on main.

❗ Current head 0218487 differs from pull request most recent head f6b1351. Consider uploading reports for the commit f6b1351 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2624      +/-   ##
==========================================
+ Coverage   67.68%   67.83%   +0.15%     
==========================================
  Files         332      330       -2     
  Lines       15158    15065      -93     
==========================================
- Hits        10259    10220      -39     
+ Misses       4899     4845      -54     
Files Coverage Δ
...kerd/app/outbound/src/http/logical/policy/route.rs 76.92% <100.00%> (ø)
.../outbound/src/http/logical/policy/route/backend.rs 80.95% <100.00%> (ø)
...d/src/http/logical/policy/route/backend/metrics.rs 100.00% <100.00%> (ø)
...erd/app/outbound/src/http/logical/policy/router.rs 69.47% <ø> (ø)
linkerd/http-prom/src/count_reqs.rs 70.83% <75.00%> (ø)
linkerd/http-prom/src/lib.rs 81.81% <81.81%> (ø)
linkerd/http-prom/src/count_rsps.rs 79.59% <79.59%> (ø)

... and 23 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d1a957...f6b1351. Read the comment docs.

@olix0r olix0r force-pushed the ver/http-route-metrics branch from 2be4c74 to 641fc05 Compare May 1, 2024 21:38
olix0r added 2 commits May 2, 2024 01:53
The outbound proxy reports a counter,
outbound_http_route_backend_requests_total, that illustrates how requests are
dispatched over a logical service's backends.

This change complements these metrics with "request duration" histograms. This
terminology is consistent with that of the prometheus Go client library.

    # HELP outbound_http_route_backend_request_duration_seconds The durations between sending an HTTP request and receiving response headers.
    # TYPE outbound_http_route_backend_request_duration_seconds histogram
    # UNIT outbound_http_route_backend_request_duration_seconds seconds
    outbound_http_route_backend_request_duration_seconds_sum{status_code="200",...} 0.07080217000000001
    outbound_http_route_backend_request_duration_seconds_count{status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="0.025",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="0.05",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="0.1",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="0.25",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="0.5",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="1.0",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="2.5",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="5.0",status_code="200",...} 54
    outbound_http_route_backend_request_duration_seconds_bucket{le="+Inf",status_code="200",...} 54

A constrained histogram is used to balance the tradeoff between accuracy and
cost.

Errors are counted with a "0" status code. We expect errors to be reported
elsewhere, and it's not useful to distinguish between different error types in
this context.
@olix0r olix0r force-pushed the ver/http-route-metrics branch from 641fc05 to f6b1351 Compare May 2, 2024 02:19
@olix0r olix0r closed this Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant