Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add details for setting memory limit automatically. #655

Merged
merged 12 commits into from
Apr 25, 2024
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ Main (unreleased)
- In `prometheus.exporter.kafka`, the interpolation table used to compute estimated lag metrics is now pruned
on `metadata_refresh_interval` instead of `prune_interval_seconds`. (@wildum)

### Features

- Add support for setting GOMEMLIMIT based on cgroup setting. (@mattdurham)

### Bugfixes

- Fixed issue with defaults for Beyla component not being applied correctly. (marctc)
Expand Down
7 changes: 7 additions & 0 deletions docs/sources/reference/cli/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ The following flags are supported:
* `--config.bypass-conversion-errors`: Enable bypassing errors when converting (default `false`).
* `--config.extra-args`: Extra arguments from the original format used by the converter.
* `--stability.level`: The minimum permitted stability level of functionality to run. Supported values: `experimental`, `public-preview`, `generally-available` (default `"generally-available"`).
* `--memory.auto-limit`: Enables automatic memory limit configuration when using cgroups (default `true`).

## Update the configuration file

Expand Down Expand Up @@ -164,6 +165,12 @@ original configuration.
Include `--config.extra-args` to pass additional command line flags from the original format to the converter.
Refer to [alloy convert][] for more details on how `extra-args` work.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO I think we should find a way to document this somewhere just so people know which improvements lower stability levels include. I don't know what page it should be on, but I'll leave that up to you since I'm not too opinionated.

## Memory

* `--memory.auto-limit` This value automatically sets the memory limit if using cgroups. Does nothing if cgroups are not available.
Defaults to true.

[alloy convert]: ../convert/
[clustering]: ../../../concepts/clustering/
[go-discover]: https://github.com/hashicorp/go-discover
Expand Down
3 changes: 3 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,7 @@ require (
)

require (
github.com/KimMachineGun/automemlimit v0.6.0
github.com/Shopify/sarama v1.38.1
github.com/grafana/kafka_exporter v0.0.0-20240409084445-5e3488ad9f9a
)
Expand Down Expand Up @@ -355,6 +356,7 @@ require (
github.com/cloudflare/golz4 v0.0.0-20150217214814-ef862a3cdc58 // indirect
github.com/cncf/xds/go v0.0.0-20231128003011-0fa0005c9caa // indirect
github.com/containerd/cgroups v1.1.0 // indirect
github.com/containerd/cgroups/v3 v3.0.2 // indirect
github.com/containerd/console v1.0.3 // indirect
github.com/containerd/containerd v1.7.11 // indirect
github.com/containerd/continuity v0.4.2 // indirect
Expand Down Expand Up @@ -579,6 +581,7 @@ require (
github.com/ovh/go-ovh v1.4.3 // indirect
github.com/packethost/packngo v0.1.1-0.20180711074735-b9cb5096f54c // indirect
github.com/patrickmn/go-cache v2.1.0+incompatible // indirect
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 // indirect
github.com/pelletier/go-toml/v2 v2.1.0 // indirect
github.com/pierrec/lz4/v4 v4.1.21 // indirect
github.com/pjbgf/sha1cd v0.3.0 // indirect
Expand Down
6 changes: 6 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ github.com/IBM/sarama v1.43.0/go.mod h1:zlE6HEbC/SMQ9mhEYaF7nNLYOUyrs0obySKCckWP
github.com/Jeffail/gabs v1.1.0/go.mod h1:6xMvQMK4k33lb7GUUpaAPh6nKMmemQeg5d4gn7/bOXc=
github.com/JohnCGriffin/overflow v0.0.0-20211019200055-46fa312c352c h1:RGWPOewvKIROun94nF7v2cua9qP+thov/7M50KEoeSU=
github.com/JohnCGriffin/overflow v0.0.0-20211019200055-46fa312c352c/go.mod h1:X0CRv0ky0k6m906ixxpzmDRLvX58TFUKS2eePweuyxk=
github.com/KimMachineGun/automemlimit v0.6.0 h1:p/BXkH+K40Hax+PuWWPQ478hPjsp9h1CPDhLlA3Z37E=
github.com/KimMachineGun/automemlimit v0.6.0/go.mod h1:T7xYht7B8r6AG/AqFcUdc7fzd2bIdBKmepfP2S1svPY=
github.com/Knetic/govaluate v3.0.1-0.20171022003610-9aa49832a739+incompatible/go.mod h1:r7JcOSlj0wfOMncg0iLm8Leh48TZaKVeNIfJntJ2wa0=
github.com/Lusitaniae/apache_exporter v0.11.1-0.20220518131644-f9522724dab4 h1:UEm6tMvLH2t2honcWG0q+h0qr/60++9cuh/fy7hLyMc=
github.com/Lusitaniae/apache_exporter v0.11.1-0.20220518131644-f9522724dab4/go.mod h1:IfUbHkoXypdKnVGHWQ3DLRRRZzIU/JIExQAd9xgxRfw=
Expand Down Expand Up @@ -474,6 +476,8 @@ github.com/cockroachdb/datadriven v0.0.0-20190809214429-80d97fb3cbaa/go.mod h1:z
github.com/codahale/hdrhistogram v0.0.0-20161010025455-3a0bb77429bd/go.mod h1:sE/e/2PUdi/liOCUjSTXgM1o87ZssimdTWN964YiIeI=
github.com/containerd/cgroups v1.1.0 h1:v8rEWFl6EoqHB+swVNjVoCJE8o3jX7e8nqBGPLaDFBM=
github.com/containerd/cgroups v1.1.0/go.mod h1:6ppBcbh/NOOUU+dMKrykgaBnK9lCIBxHqJDGwsa1mIw=
github.com/containerd/cgroups/v3 v3.0.2 h1:f5WFqIVSgo5IZmtTT3qVBo6TzI1ON6sycSBKkymb9L0=
github.com/containerd/cgroups/v3 v3.0.2/go.mod h1:JUgITrzdFqp42uI2ryGA+ge0ap/nxzYgkGmIcetmErE=
github.com/containerd/console v1.0.2/go.mod h1:ytZPjGgY2oeTkAONYafi2kSj0aYggsf8acV1PGKCbzQ=
github.com/containerd/console v1.0.3 h1:lIr7SlA5PxZyMV30bDW0MGbiOPXwc63yRuCP0ARubLw=
github.com/containerd/console v1.0.3/go.mod h1:7LqA/THxQ86k76b8c/EMSiaJ3h1eZkMkXar0TQ1gf3U=
Expand Down Expand Up @@ -1855,6 +1859,8 @@ github.com/pascaldekloe/goe v0.1.0/go.mod h1:lzWF7FIEvWOWxwDKqyGYQf6ZUaNfKdP144T
github.com/patrickmn/go-cache v0.0.0-20180527043350-9f6ff22cfff8/go.mod h1:3Qf8kWWT7OJRJbdiICTKqZju1ZixQ/KpMGzzAfe6+WQ=
github.com/patrickmn/go-cache v2.1.0+incompatible h1:HRMgzkcYKYpi3C8ajMPV8OFXaaRUnok+kx1WdO15EQc=
github.com/patrickmn/go-cache v2.1.0+incompatible/go.mod h1:3Qf8kWWT7OJRJbdiICTKqZju1ZixQ/KpMGzzAfe6+WQ=
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 h1:onHthvaw9LFnH4t2DcNVpwGmV9E1BkGknEliJkfwQj0=
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58/go.mod h1:DXv8WO4yhMYhSNPKjeNKa5WY9YCIEBRbNzFFPJbWO6Y=
github.com/pborman/getopt v0.0.0-20180811024354-2b5b3bfb099b/go.mod h1:85jBQOZwpVEaDAr341tbn15RS4fCAsIst0qp7i8ex1o=
github.com/pborman/uuid v1.2.0/go.mod h1:X/NO0urCmaxf9VXbdlT7C2Yzkj2IKimNn4k+gtPdI/k=
github.com/pelletier/go-toml v1.7.0/go.mod h1:vwGMzjaWMwyfHwgIBhI2YUM4fB6nL6lVAvS1LBMMhTE=
Expand Down
11 changes: 11 additions & 0 deletions internal/alloycli/cmd_run.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ import (
"context"
"errors"
"fmt"
"github.com/KimMachineGun/automemlimit/memlimit"
"io/fs"
"log/slog"
"os"
"os/signal"
"path/filepath"
Expand Down Expand Up @@ -138,6 +140,7 @@ depending on the nature of the reload error.
BoolVar(&r.disableReporting, "disable-reporting", r.disableReporting, "Disable reporting of enabled components to Grafana.")
cmd.Flags().StringVar(&r.storagePath, "storage.path", r.storagePath, "Base directory where components can store data")
cmd.Flags().Var(&r.minStability, "stability.level", fmt.Sprintf("Minimum stability level of features to enable. Supported values: %s", strings.Join(featuregate.AllowedValues(), ", ")))
cmd.Flags().BoolVar(&r.enableAutoMemoryLimit, "memory.auto-limit", true, "Enables automatic memory limit configuration when using cgroups.")
return cmd
}

Expand All @@ -161,9 +164,11 @@ type alloyRun struct {
configFormat string
configBypassConversionErrors bool
configExtraArgs string
enableAutoMemoryLimit bool
}

func (fr *alloyRun) Run(configPath string) error {

var wg sync.WaitGroup
defer wg.Wait()

Expand Down Expand Up @@ -192,6 +197,12 @@ func (fr *alloyRun) Run(configPath string) error {

level.Info(l).Log("boringcrypto enabled", boringcrypto.Enabled)

// Set the memory limit, this will honor GOMEMLIMIT if set
// If there is a cgroup will follow that
if fr.enableAutoMemoryLimit {
memlimit.SetGoMemLimitWithOpts(memlimit.WithLogger(slog.Default()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use slog.New(l.Handler()) here instead of slog.Default(); the controller's logger implements the slog.Handler interface. I think this is important so that log entries are emitted in the proper format if the user requests json vs the default.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that. The format isnt set by this point so it errors, its possible we could default to one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could always move it after the logger is resolved but that feels like an odd time to set memory limits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a bug in the logging code, it's supposed to buffer logs until the controller evaluates, but it only seems to do that when you're using the go-kit interface and not the slog.Handler interface. I think it might be worth fixing that so we can use the right logger here.

I don't think defaulting to a format is the right move, since when we used to do this, users complained that some of the early log lines weren't in the format they were expecting and parsing the log lines at ingestion time was failing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth fixing that so we can use the right logger here.

Since we've fixed issues about this in the past I'd say not using the right logger would be a regression here. IMO I think we need to do something to use the right logger, either fixing the buffering issue with the slog.Handler implementation or moving this later in evaluation at least temporarily (I agree it's weird for setting the memory limit to not happen as early as possible)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to fork this out into its own PR, making the logger work with slog is bigger than this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdurham Is creating a PR to fix the logger still on your radar? Given this was merged first, there's a known regression in main now, and we're running out of time for 1.1 (RC gets cut this Thursday).

If we can't fix this by Thursday we'll have to discuss this feature skipping 1.1 or putting out the release with a known regression.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, wrapping up testing now, it ended up being slightly more complicated than expected.

}

Copy link
Member

@rfratto rfratto Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than exposing this as a new configuration option, what do you think about having this automatically be enabled for users using experimental/public-preview, and then once we get enough information on it, we can decide whether it should always be on or needs to be configurable?

(The larger our API surface is the more annoying backwards compatibility can be, so avoiding new configuration options when possible is preferable for me)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE if stability is experimental automatically enable? I can also set the flag as experimental, though unsure if we have a precedent for that with alloy.

Copy link
Member

@rfratto rfratto Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE if stability is experimental automatically enable?

Yeah, like we would say that "auto memory limit is an experimental performance improvement" and only applies when --stability.level=experimental. This is something I expect we'll be doing at some frequency, including for the eventual controller refactor.

I can also set the flag as experimental

We don't have an existing precedent but I expect that most things will support feature gating at some point, including flags.

However in this case I don't think marking the flag as experimental fully addresses my concerns, since presumably we intend for this to GA, and I prefer not adding too many more GA flags because of the increased API surface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked as public preview and added a helper function.

// Enable the profiling.
setMutexBlockProfiling(l)

Expand Down
Loading