Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pyroscope/scrape): add support for configuring CPU profile's duration scraped by pyroscope.scrape #591

Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Main (unreleased)
- In `prometheus.exporter.kafka`, the interpolation table used to compute estimated lag metrics is now pruned
on `metadata_refresh_interval` instead of `prune_interval_seconds`. (@wildum)

- Add support for configuring CPU profile's duration scraped by `pyroscope.scrape`. (@hainenber)

### Bugfixes

- Fixed issue with defaults for Beyla component not being applied correctly. (marctc)
Expand Down
7 changes: 5 additions & 2 deletions docs/sources/reference/components/pyroscope.scrape.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Name | Type | Description
`params` | `map(list(string))` | A set of query parameters with which the target is scraped. | | no
`scrape_interval` | `duration` | How frequently to scrape the targets of this scrape configuration. | `"15s"` | no
`scrape_timeout` | `duration` | The timeout for scraping targets of this configuration. Must be larger than `scrape_interval`. | `"18s"` | no
`profiling_duration`| `duration` | The duration for a delta profiling to be scraped. Must be larger than 1 second. | `"14s"` | no
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one last thing. Could we name this with a little more descriptive name?

Suggested change
`profiling_duration`| `duration` | The duration for a delta profiling to be scraped. Must be larger than 1 second. | `"14s"` | no
`delta_profiling_duration`| `duration` | The duration for a delta profiling to be scraped. Must be larger than 1 second. | `"14s"` | no

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this in HEAD. PTAL

`scheme` | `string` | The URL scheme with which to fetch metrics from targets. | `"http"` | no
`bearer_token_file` | `string` | File containing a bearer token to authenticate with. | | no
`bearer_token` | `secret` | Bearer token to authenticate with. | | no
Expand Down Expand Up @@ -395,8 +396,10 @@ When the `delta` argument is `false`, the [pprof][] HTTP query will be instantan
When the `delta` argument is `true`:
* The [pprof][] HTTP query will run for a certain amount of time.
* A `seconds` parameter is automatically added to the HTTP request.
* The `seconds` used will be equal to `scrape_interval - 1`.
For example, if `scrape_interval` is `"15s"`, `seconds` will be 14 seconds.
* The default value for the `seconds` query parameter is `scrape_interval - 1`.
If you set `profiling_duration`, then `seconds` is assigned the same value as `profiling_duration`.
For example, if you set `scrape_interval` to `"15s"`, then `seconds` defaults to `14s`.
If you set `profiling_duration` to `16s`, then `seconds` is set to `16s` regardless of the `scrape_interval` value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @simonswine @korniltsev does it make sense for the profiling_duration to ever be larger than the scrape_interval here? Or similar to the Prometheus scrape_timeout it should be always less than the scrape interval?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should always be less. If you try to profile a service which is already running a profile, you'll get an error

server returned HTTP status (500) Could not enable CPU profiling: cpu profiling already in use

I think we should either add some validation, or default back to min(scrape_interval - 1, profiling_duration)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense as well, yeah cc @hainenber

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the validation in the HEAD. Looks A-OK to you guys?

If the HTTP endpoint is `/debug/pprof/profile`, then the HTTP query will become `/debug/pprof/profile?seconds=14`

## Exported fields
Expand Down
36 changes: 22 additions & 14 deletions internal/component/pyroscope/scrape/scrape.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,17 @@ import (
)

const (
pprofMemory string = "memory"
pprofBlock string = "block"
pprofGoroutine string = "goroutine"
pprofMutex string = "mutex"
pprofProcessCPU string = "process_cpu"
pprofFgprof string = "fgprof"
pprofGoDeltaProfMemory string = "godeltaprof_memory"
pprofGoDeltaProfBlock string = "godeltaprof_block"
pprofGoDeltaProfMutex string = "godeltaprof_mutex"
pprofMemory string = "memory"
pprofBlock string = "block"
pprofGoroutine string = "goroutine"
pprofMutex string = "mutex"
pprofProcessCPU string = "process_cpu"
pprofFgprof string = "fgprof"
pprofGoDeltaProfMemory string = "godeltaprof_memory"
pprofGoDeltaProfBlock string = "godeltaprof_block"
pprofGoDeltaProfMutex string = "godeltaprof_mutex"
defaultScrapeInterval time.Duration = 15 * time.Second
defaultProfilingDuration time.Duration = 14 * time.Second
hainenber marked this conversation as resolved.
Show resolved Hide resolved
)

func init() {
Expand Down Expand Up @@ -60,6 +62,8 @@ type Arguments struct {
ScrapeTimeout time.Duration `alloy:"scrape_timeout,attr,optional"`
// The URL scheme with which to fetch metrics from targets.
Scheme string `alloy:"scheme,attr,optional"`
// The duration for a profile to be scrapped.
ProfilingDuration time.Duration `alloy:"profiling_duration,attr,optional"`

// todo(ctovena): add support for limits.
// // An uncompressed response body larger than this many bytes will cause the
Expand Down Expand Up @@ -192,11 +196,12 @@ var DefaultArguments = NewDefaultArguments()
// NewDefaultArguments create the default settings for a scrape job.
func NewDefaultArguments() Arguments {
return Arguments{
Scheme: "http",
HTTPClientConfig: component_config.DefaultHTTPClientConfig,
ScrapeInterval: 15 * time.Second,
ScrapeTimeout: 10 * time.Second,
ProfilingConfig: DefaultProfilingConfig,
Scheme: "http",
HTTPClientConfig: component_config.DefaultHTTPClientConfig,
ScrapeInterval: 15 * time.Second,
ScrapeTimeout: 10 * time.Second,
ProfilingConfig: DefaultProfilingConfig,
ProfilingDuration: defaultProfilingDuration,
}
}

Expand All @@ -218,6 +223,9 @@ func (arg *Arguments) Validate() error {
if target.Enabled && target.Delta && arg.ScrapeInterval.Seconds() < 2 {
return fmt.Errorf("scrape_interval must be at least 2 seconds when using delta profiling")
}
if target.Enabled && target.Delta && arg.ProfilingDuration.Seconds() <= 1 {
return fmt.Errorf("profiling_duration must be larger then 1 second when using delta profiling")
}
}

// We must explicitly Validate because HTTPClientConfig is squashed and it won't run otherwise
Expand Down
10 changes: 10 additions & 0 deletions internal/component/pyroscope/scrape/scrape_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,16 @@ func TestUnmarshalConfig(t *testing.T) {
`,
expectedErr: "scrape_interval must be at least 2 seconds when using delta profiling",
},
"invalid cpu profiling_duration": {
in: `
targets = []
forward_to = null
scrape_timeout = "1s"
scrape_interval = "10s"
profiling_duration = "1s"
`,
expectedErr: "profiling_duration must be larger then 1 second when using delta profiling",
},
"allow short scrape_intervals without delta": {
in: `
targets = []
Expand Down
6 changes: 5 additions & 1 deletion internal/component/pyroscope/scrape/target.go
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,11 @@ func targetsFromGroup(group *targetgroup.Group, cfg Arguments, targetTypes map[s
}

if pcfg, found := targetTypes[profType]; found && pcfg.Delta {
params.Add("seconds", strconv.Itoa(int((cfg.ScrapeInterval)/time.Second)-1))
seconds := (cfg.ScrapeInterval)/time.Second - 1
if cfg.ProfilingDuration != defaultProfilingDuration {
seconds = (cfg.ProfilingDuration) / time.Second
}
params.Add("seconds", strconv.Itoa(int(seconds)))
}
targets = append(targets, NewTarget(lbls, origLabels, params))
}
Expand Down
63 changes: 63 additions & 0 deletions internal/component/pyroscope/scrape/target_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"net/url"
"sort"
"testing"
"time"

"github.com/prometheus/common/model"
"github.com/prometheus/prometheus/discovery/targetgroup"
Expand Down Expand Up @@ -236,3 +237,65 @@ func Test_targetsFromGroup(t *testing.T) {
require.Equal(t, expected, active)
require.Empty(t, dropped)
}

func Test_targetsFromGroup_withSpecifiedProfilingDuration(t *testing.T) {
args := NewDefaultArguments()
args.ProfilingConfig.Block.Enabled = false
args.ProfilingConfig.Goroutine.Enabled = false
args.ProfilingConfig.Mutex.Enabled = false
args.ProfilingDuration = 20 * time.Second

active, dropped, err := targetsFromGroup(&targetgroup.Group{
Targets: []model.LabelSet{
{model.AddressLabel: "localhost:9090"},
},
Labels: model.LabelSet{
"foo": "bar",
},
}, args, args.ProfilingConfig.AllTargets())
expected := []*Target{
// unspecified
NewTarget(
labels.FromMap(map[string]string{
model.AddressLabel: "localhost:9090",
serviceNameLabel: "unspecified",
model.MetricNameLabel: pprofMemory,
ProfilePath: "/debug/pprof/allocs",
model.SchemeLabel: "http",
"foo": "bar",
"instance": "localhost:9090",
}),
labels.FromMap(map[string]string{
model.AddressLabel: "localhost:9090",
model.MetricNameLabel: pprofMemory,
ProfilePath: "/debug/pprof/allocs",
model.SchemeLabel: "http",
"foo": "bar",
}),
url.Values{}),
NewTarget(
labels.FromMap(map[string]string{
model.AddressLabel: "localhost:9090",
serviceNameLabel: "unspecified",
model.MetricNameLabel: pprofProcessCPU,
ProfilePath: "/debug/pprof/profile",
model.SchemeLabel: "http",
"foo": "bar",
"instance": "localhost:9090",
}),
labels.FromMap(map[string]string{
model.AddressLabel: "localhost:9090",
model.MetricNameLabel: pprofProcessCPU,
ProfilePath: "/debug/pprof/profile",
model.SchemeLabel: "http",
"foo": "bar",
}),
url.Values{"seconds": []string{"20"}}),
}

require.NoError(t, err)
sort.Sort(Targets(active))
sort.Sort(Targets(expected))
require.Equal(t, expected, active)
require.Empty(t, dropped)
}