feat: Instrument latency without streaming duration #290

dosuken123 · 2024-03-04T02:23:48Z

What does this do?

Add a brief description of what the feature or update does.

This PR adds an option to track HTTP response duration without streaming duration.

Config Example:

    instrumentator.add(
        metrics.latency(
            should_include_handler=True,
            should_include_method=True,
            should_include_status=True,
            buckets=(0.5, 1, 2.5, 5, 10, 30, 60),
        ),
        metrics.latency(
            metric_name="http_request_duration_without_streaming_seconds",
            should_include_handler=True,
            should_include_method=True,
            should_include_status=True,
            buckets=(0.5, 1, 2.5, 5, 10, 30, 60),
            should_exclude_streaming_duration=True,               # <= New option
        )
    )

https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/blob/main/ai_gateway/app.py?ref_type=heads#L51-58

Output example:

# HELP http_request_response_start_duration_seconds Duration of HTTP requests in seconds
# TYPE http_request_response_start_duration_seconds histogram
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="0.5",method="POST",status="2xx"} 0.0
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="1.0",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="2.5",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="5.0",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="10.0",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="30.0",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="60.0",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_bucket{handler="/v2/code/generations",le="+Inf",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_count{handler="/v2/code/generations",method="POST",status="2xx"} 1.0
http_request_response_start_duration_seconds_sum{handler="/v2/code/generations",method="POST",status="2xx"} 0.6706487989995367
# HELP http_request_response_start_duration_seconds_created Duration of HTTP requests in seconds
# TYPE http_request_response_start_duration_seconds_created gauge
http_request_response_start_duration_seconds_created{handler="/v2/code/generations",method="POST",status="2xx"} 1.7095186511967359e+09

Fixes #291

Why do we need it?

Users often feel the latency as the first chunk arrival instead of the last chunk arrival as LLM inference APIs usually support HTTP streaming to improve the UX. We want to instrument the duration.

Who is this for?

GitLab, software developers, LLM app optimizations

Linked issues

Related to https://gitlab.com/gitlab-com/runbooks/-/merge_requests/6928#note_1796949998

Reviewer notes

Add special notes for your reviewer.

This commit adds a feature to track the latency excluding streaming duration.

codecov · 2024-03-11T15:30:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.98%. Comparing base (c608c4e) to head (3ae762e).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #290      +/-   ##
==========================================
+ Coverage   95.79%   95.98%   +0.19%     
==========================================
  Files           5        5              
  Lines         357      374      +17     
==========================================
+ Hits          342      359      +17     
  Misses         15       15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

trallnag · 2024-03-11T22:57:50Z

Hi @dosuken123 thanks for the proposal and implementation. I will be included in the next version that will be released sometime this week

dosuken123 · 2024-03-12T03:31:21Z

@trallnag Thanks for help! Much appreciated 🙇

Track response start duration

019c8a5

This commit adds a feature to track the latency excluding streaming duration.

dosuken123 force-pushed the track-response-start-duration branch from ff9169e to 019c8a5 Compare March 4, 2024 02:27

dosuken123 changed the title ~~Track response start duration~~ Instrument latency without streaming duration Mar 4, 2024

dosuken123 mentioned this pull request Mar 4, 2024

Instrument latency without streaming duration #291

Closed

trallnag and others added 7 commits March 11, 2024 15:42

Merge branch 'master' into track-response-start-duration

b2e553f

ci(pre-commit): Apply hook auto fixes

b8a040d

fix: Add default to Info constructor and adjust test

f56e111

fix: Make mypy happy

deb5c0b

docs: Add parameter to docstring

6df689a

fix: Add start time stuff to body handler

e2922b2

test: Add test

c8c6b55

trallnag changed the title ~~Instrument latency without streaming duration~~ feat: Instrument latency without streaming duration Mar 11, 2024

trallnag added 2 commits March 11, 2024 20:39

feat: Add duration stuff to default and add tests

716346c

docs: Add entry to changelog

3ae762e

trallnag merged commit 4530ba4 into trallnag:master Mar 11, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Instrument latency without streaming duration #290

feat: Instrument latency without streaming duration #290

dosuken123 commented Mar 4, 2024 •

edited

Loading

codecov bot commented Mar 11, 2024 •

edited

Loading

trallnag commented Mar 11, 2024

dosuken123 commented Mar 12, 2024

feat: Instrument latency without streaming duration #290

feat: Instrument latency without streaming duration #290

Conversation

dosuken123 commented Mar 4, 2024 • edited Loading

What does this do?

Why do we need it?

Who is this for?

Linked issues

Reviewer notes

codecov bot commented Mar 11, 2024 • edited Loading

Codecov Report

trallnag commented Mar 11, 2024

dosuken123 commented Mar 12, 2024

dosuken123 commented Mar 4, 2024 •

edited

Loading

codecov bot commented Mar 11, 2024 •

edited

Loading