Skip to content

Commit

Permalink
Implement an initial version of the support bundle in Alloy (#2009)
Browse files Browse the repository at this point in the history
* Implement an initial version of the support bundle in Alloy

* Add documentation for support bundle

* Update changelog

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Initial PR feedback

* Rewrite http service to use logging library internal to alloy

* Revert accidental commit of e2e test changes

* Fix comment on exported function

* Clean up added host variable that is no longer used

* Refactor usage of logger in http service

* Update internal/service/http/http.go

Co-authored-by: Piotr <[email protected]>

* implement PR feedback

* Hide support bundle behind public preview stability level

* Update docs based on feedback

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* Update docs/sources/troubleshoot/support_bundle.md

Co-authored-by: Clayton Cornell <[email protected]>

* More PR feedback in docs

* Fix race condition in logger

* Add a note about backward-compatibility exception

---------

Co-authored-by: Clayton Cornell <[email protected]>
Co-authored-by: Piotr <[email protected]>
  • Loading branch information
3 people authored Nov 11, 2024
1 parent f3108e7 commit 3cf2bcd
Show file tree
Hide file tree
Showing 10 changed files with 445 additions and 27 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ v1.5.0-rc.0

### Features

- Add support bundle generation via the API endpoint /-/support (@dehaansa)

- Add the function `path_join` to the stdlib. (@wildum)

- Add `pyroscope.receive_http` component to receive and forward Pyroscope profiles (@marcsanmi)
Expand Down
2 changes: 2 additions & 0 deletions docs/sources/reference/cli/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ The following flags are supported:
* `--server.http.ui-path-prefix`: Base path where the UI is exposed (default `/`).
* `--storage.path`: Base directory where components can store data (default `data-alloy/`).
* `--disable-reporting`: Disable [data collection][] (default `false`).
* `--disable-support-bundle`: Disable [support bundle][] endpoint (default `false`).
* `--cluster.enabled`: Start {{< param "PRODUCT_NAME" >}} in clustered mode (default `false`).
* `--cluster.node-name`: The name to use for this node (defaults to the environment's hostname).
* `--cluster.join-addresses`: Comma-separated list of addresses to join the cluster at (default `""`). Mutually exclusive with `--cluster.discover-peers`.
Expand Down Expand Up @@ -178,6 +179,7 @@ Refer to [alloy convert][] for more details on how `extra-args` work.
[go-discover]: https://github.com/hashicorp/go-discover
[in-memory HTTP traffic]: ../../../get-started/component_controller/#in-memory-traffic
[data collection]: ../../../data-collection/
[support bundle]: ../../../troubleshoot/support_bundle
[components]: ../../get-started/components/
[component controller]: ../../../get-started/component_controller/
[UI]: ../../../troubleshoot/debug/#clustering-page
51 changes: 51 additions & 0 deletions docs/sources/troubleshoot/support_bundle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
canonical: https://grafana.com/docs/alloy/latest/troubleshoot/support_bundle/
description: Learn how to generate a support bundle
title: Generate a support bundle
menuTitle: Generate a support bundle
weight: 300
---

<span class="badge docs-labels__stage docs-labels__item">Public preview</span>

# Generate a support bundle

{{< docs/public-preview product="Generate support bundle" >}}

The `/-/support?duration=N` endpoint returns a support bundle, a zip file that contains information
about a running {{< param "PRODUCT_NAME" >}} instance, and can be used as a baseline of information when trying
to debug an issue.

This feature is not covered by our [backward-compatibility][backward-compatibility] guarantees.

{{< admonition type="note" >}}
This endpoint is enabled by default, but may be disabled using the `--disable-support-bundle` runtime flag.
{{< /admonition >}}

The duration parameter is optional, must be less than or equal to the
configured HTTP server write timeout, and if not provided, defaults to it.
The endpoint is only exposed to the {{< param "PRODUCT_NAME" >}} HTTP server listen address, which
defaults to `localhost:12345`.

The support bundle contains all information in plain text, so you can
inspect it before sharing to verify that no sensitive information has leaked.

In addition, you can inspect the [supportbundle implementation](https://github.com/grafana/alloy/tree/internal/service/http/supportbundle.go)
to verify the code used to generate these bundles.

A support bundle contains the following data:
* `alloy-components.json` contains information about the [components][components] running on this {{< param "PRODUCT_NAME" >}} instance, generated by the
`/api/v0/web/components` endpoint.
* `alloy-logs.txt` contains the logs during the bundle generation.
* `alloy-metadata.yaml` contains the {{< param "PRODUCT_NAME" >}} build version and the installation's operating system, architecture, and uptime.
* `alloy-metrics.txt` contains a snapshot of the internal metrics for {{< param "PRODUCT_NAME" >}}.
* `alloy-peers.json` contains information about the identified cluster peers of this {{< param "PRODUCT_NAME" >}} instance, generated by the
`/api/v0/web/peers` endpoint.
* `alloy-runtime-flags.txt` contains the values of the runtime flags available in {{< param "PRODUCT_NAME" >}}.
* The `pprof/` directory contains Go runtime profiling data (CPU, heap, goroutine, mutex, block profiles) as exported by the pprof package.
Refer to the [profile][profile] documentation for more details on how to use this information.

[profile]: ../profile
[components]: ../../get-started/components/
[alloy-repo]: https://github.com/grafana/alloy/issues
[backward-compatibility]: ../../introduction/backward-compatibility
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,8 @@ require (
go.opentelemetry.io/otel/exporters/stdout/stdoutlog v0.7.0 // indirect
)

require github.com/mackerelio/go-osstat v0.2.5

// NOTE: replace directives below must always be *temporary*.
//
// Adding a replace directive to change a module to a fork of a module will
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -1702,6 +1702,8 @@ github.com/lufia/plan9stats v0.0.0-20220913051719-115f729f3c8c h1:VtwQ41oftZwlMn
github.com/lufia/plan9stats v0.0.0-20220913051719-115f729f3c8c/go.mod h1:JKx41uQRwqlTZabZc+kILPrO/3jlKnQ2Z8b7YiVw5cE=
github.com/lyft/protoc-gen-validate v0.0.0-20180911180927-64fcb82c878e/go.mod h1:XbGvPuh87YZc5TdIa2/I4pLk0QoUACkjt2znoq26NVQ=
github.com/lyft/protoc-gen-validate v0.0.13/go.mod h1:XbGvPuh87YZc5TdIa2/I4pLk0QoUACkjt2znoq26NVQ=
github.com/mackerelio/go-osstat v0.2.5 h1:+MqTbZUhoIt4m8qzkVoXUJg1EuifwlAJSk4Yl2GXh+o=
github.com/mackerelio/go-osstat v0.2.5/go.mod h1:atxwWF+POUZcdtR1wnsUcQxTytoHG4uhl2AKKzrOajY=
github.com/magefile/mage v1.15.0 h1:BvGheCMAsG3bWUDbZ8AyXXpCNwU9u5CB6sM+HNb9HYg=
github.com/magefile/mage v1.15.0/go.mod h1:z5UZb/iS3GoOSn0JgWuiw7dxlurVYTu+/jHXqQg881A=
github.com/magiconair/properties v1.8.1/go.mod h1:PppfXfuXeibc/6YijjN8zIbojt8czPbwD3XqdrwzmxQ=
Expand Down
23 changes: 20 additions & 3 deletions internal/alloycli/cmd_run.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"github.com/grafana/ckit/peer"
"github.com/prometheus/client_golang/prometheus"
"github.com/spf13/cobra"
"github.com/spf13/pflag"
"go.opentelemetry.io/otel"
"golang.org/x/exp/maps"

Expand Down Expand Up @@ -64,6 +65,7 @@ func runCommand() *cobra.Command {
clusterAdvInterfaces: advertise.DefaultInterfaces,
clusterMaxJoinPeers: 5,
clusterRejoinInterval: 60 * time.Second,
disableSupportBundle: false,
}

cmd := &cobra.Command{
Expand Down Expand Up @@ -100,7 +102,7 @@ depending on the nature of the reload error.
SilenceUsage: true,

RunE: func(cmd *cobra.Command, args []string) error {
return r.Run(args[0])
return r.Run(cmd, args[0])
},
}

Expand All @@ -111,6 +113,8 @@ depending on the nature of the reload error.
cmd.Flags().StringVar(&r.uiPrefix, "server.http.ui-path-prefix", r.uiPrefix, "Prefix to serve the HTTP UI at")
cmd.Flags().
BoolVar(&r.enablePprof, "server.http.enable-pprof", r.enablePprof, "Enable /debug/pprof profiling endpoints.")
cmd.Flags().
BoolVar(&r.disableSupportBundle, "server.http.disable-support-bundle", r.disableSupportBundle, "Disable /-/support support bundle retrieval.")

// Cluster flags
cmd.Flags().
Expand Down Expand Up @@ -184,9 +188,10 @@ type alloyRun struct {
configBypassConversionErrors bool
configExtraArgs string
enableCommunityComps bool
disableSupportBundle bool
}

func (fr *alloyRun) Run(configPath string) error {
func (fr *alloyRun) Run(cmd *cobra.Command, configPath string) error {
var wg sync.WaitGroup
defer wg.Wait()

Expand Down Expand Up @@ -275,8 +280,15 @@ func (fr *alloyRun) Run(configPath string) error {
return err
}

runtimeFlags := []string{}
if !fr.disableSupportBundle {
cmd.Flags().VisitAll(func(f *pflag.Flag) {
runtimeFlags = append(runtimeFlags, fmt.Sprintf("%s=%s", f.Name, f.Value.String()))
})
}

httpService := httpservice.New(httpservice.Options{
Logger: log.With(l, "service", "http"),
Logger: l,
Tracer: t,
Gatherer: prometheus.DefaultGatherer,

Expand All @@ -286,6 +298,11 @@ func (fr *alloyRun) Run(configPath string) error {
HTTPListenAddr: fr.httpListenAddr,
MemoryListenAddr: fr.inMemoryAddr,
EnablePProf: fr.enablePprof,
MinStability: fr.minStability,
BundleContext: httpservice.SupportBundleContext{
RuntimeFlags: runtimeFlags,
DisableSupportBundle: fr.disableSupportBundle,
},
})

remoteCfgService, err := remotecfgservice.New(remotecfgservice.Options{
Expand Down
71 changes: 61 additions & 10 deletions internal/runtime/logging/logger.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@ func New(w io.Writer, o Options) (*Logger, error) {
return l, nil
}

// NewNop returns a logger that does nothing
func NewNop() *Logger {
l, _ := NewDeferred(io.Discard)
return l
}

// NewDeferred creates a new logger with the default log level and format.
// The logger is not updated during initialization.
func NewDeferred(w io.Writer) (*Logger, error) {
Expand All @@ -63,7 +69,6 @@ func NewDeferred(w io.Writer) (*Logger, error) {
format formatVar
writer writerVar
)

l := &Logger{
inner: w,

Expand Down Expand Up @@ -104,11 +109,10 @@ func (l *Logger) Update(o Options) error {
l.level.Set(slogLevel(o.Level).Level())
l.format.Set(o.Format)

newWriter := l.inner
l.writer.SetInnerWriter(l.inner)
if len(o.WriteTo) > 0 {
newWriter = io.MultiWriter(l.inner, &lokiWriter{o.WriteTo})
l.writer.SetLokiWriter(&lokiWriter{o.WriteTo})
}
l.writer.Set(newWriter)

// Build all our deferred handlers
if l.deferredSlog != nil {
Expand All @@ -133,6 +137,14 @@ func (l *Logger) Update(o Options) error {
return nil
}

func (l *Logger) SetTemporaryWriter(w io.Writer) {
l.writer.SetTemporaryWriter(w)
}

func (l *Logger) RemoveTemporaryWriter() {
l.writer.RemoveTemporaryWriter()
}

// Log implements log.Logger.
func (l *Logger) Log(kvps ...interface{}) error {
// Buffer logs before confirming log format is configured in `logging` block
Expand Down Expand Up @@ -215,24 +227,63 @@ func (f *formatVar) Set(format Format) {

type writerVar struct {
mut sync.RWMutex
w io.Writer

lokiWriter *lokiWriter
innerWriter io.Writer
tmpWriter io.Writer
}

func (w *writerVar) Set(inner io.Writer) {
func (w *writerVar) SetTemporaryWriter(writer io.Writer) {
w.mut.Lock()
defer w.mut.Unlock()
w.w = inner
w.tmpWriter = writer
}

func (w *writerVar) Write(p []byte) (n int, err error) {
func (w *writerVar) RemoveTemporaryWriter() {
w.mut.Lock()
defer w.mut.Unlock()
w.tmpWriter = nil
}

func (w *writerVar) SetInnerWriter(writer io.Writer) {
w.mut.Lock()
defer w.mut.Unlock()
w.innerWriter = writer
}

func (w *writerVar) SetLokiWriter(writer *lokiWriter) {
w.mut.Lock()
defer w.mut.Unlock()
w.lokiWriter = writer
}

func (w *writerVar) Write(p []byte) (int, error) {
w.mut.RLock()
defer w.mut.RUnlock()

if w.w == nil {
if w.innerWriter == nil {
return 0, fmt.Errorf("no writer available")
}

return w.w.Write(p)
// The following is effectively an io.Multiwriter, but without updating
// the Multiwriter each time tmpWriter is added or removed.
if _, err := w.innerWriter.Write(p); err != nil {
return 0, err
}

if w.lokiWriter != nil {
if _, err := w.lokiWriter.Write(p); err != nil {
return 0, err
}
}

if w.tmpWriter != nil {
if _, err := w.tmpWriter.Write(p); err != nil {
return 0, err
}
}

return len(p), nil
}

type bufferedItem struct {
Expand Down
Loading

0 comments on commit 3cf2bcd

Please sign in to comment.