Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic with multiple OTLP exporters for a single opentelemetry batch processor #2448

Open
bengesoff opened this issue Jan 20, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@bengesoff
Copy link

What's wrong?

I just restarted Alloy with a new config that has multiple OTLP exporters and got a nil-pointer panic. I've attached the logs below.

Steps to reproduce

Start using the config below.

System information

No response

Software version

v1.5.1

Configuration

otelcol.receiver.otlp "default" {
    grpc {
        endpoint = "0.0.0.0:4317"
    }

    http {
        endpoint = "0.0.0.0:4318"
    }

    output {
        logs    = [otelcol.processor.batch.default.input]
        traces  = [otelcol.processor.batch.default.input]
    }
}
otelcol.processor.batch "default" {
    output {
        logs    = [otelcol.exporter.otlphttp.default.input, otelcol.exporter.otlphttp.grafana_cloud_pov.input]
        traces  = [otelcol.exporter.otlphttp.default.input, otelcol.exporter.otlphttp.grafana_cloud_pov.input]
    }
}

// Exporters:
otelcol.auth.basic "default" {
    username = sys.env("ALLOY_OTLP_USERNAME")
    password = sys.env("ALLOY_API_KEY")
}

otelcol.exporter.otlphttp "default" {
    client {
        endpoint = sys.env("ALLOY_OTLP_ENDPOINT")
        auth     = otelcol.auth.basic.default.handler
    }
}

otelcol.auth.basic "grafana_cloud_pov" {
    username = sys.env("ALLOY_OTLP_USERNAME2")
    password = sys.env("ALLOY_API_KEY2")
}

otelcol.exporter.otlphttp "grafana_cloud_pov" {
    client {
        endpoint = sys.env("ALLOY_OTLP_ENDPOINT2")
        auth     = otelcol.auth.basic.grafana_cloud_pov.handler
    }
}

Logs

interrupt received
ts=2025-01-20T07:48:52.455842581Z level=error msg="failed to start reporter" err="context canceled"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x375ad87]

goroutine 1 [running]:
github.com/grafana/alloy/internal/component/otelcol/internal/fanoutconsumer.Traces({0xc0020c17a0, 0x2, 0x0?})
        /src/alloy/internal/component/otelcol/internal/fanoutconsumer/traces.go:46 +0x227
github.com/grafana/alloy/internal/component/otelcol/processor.(*Processor).Update(0xc0000197a0, {0xa54d780?, 0xc004e69ae0?})
        /src/alloy/internal/component/otelcol/processor/processor.go:200 +0x9da
github.com/grafana/alloy/internal/component/otelcol/processor.New({{0x7f1528570668, 0xc003434ba0}, {0xc002dd7980, 0x1f}, {0xc002900, 0xc003438000}, {0xc002ddab80, 0x33}, 0xc0033d3a50, {0xc08e228, ...}, ...}, ...)
        /src/alloy/internal/component/otelcol/processor/processor.go:125 +0x528
github.com/grafana/alloy/internal/component/otelcol/processor/batch.init.0.func1({{0x7f1528570668, 0xc003434ba0}, {0xc002dd7980, 0x1f}, {0xc002900, 0xc003438000}, {0xc002ddab80, 0x33}, 0xc0033d3a50, {0xc08e228, ...}, ...}, ...)
        /src/alloy/internal/component/otelcol/processor/batch/batch.go:28 +0xf9
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).evaluate(0xc002deefc8, 0xc0033d3dd0)
        /src/alloy/internal/runtime/internal/controller/node_builtin_component.go:275 +0x4b6
github.com/grafana/alloy/internal/runtime/internal/controller.(*BuiltinComponentNode).Evaluate(0xc002deefc8, 0x9f5b800?)
        /src/alloy/internal/runtime/internal/controller/node_builtin_component.go:248 +0x1c
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).evaluate(0xc002ddc9c0, {0xc002900, 0xc0034393b0}, {0xc0b8e98, 0xc002deefc8})
        /src/alloy/internal/runtime/internal/controller/loader.go:837 +0x49
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).Apply.func2({0x7f1528570688, 0xc002deefc8})
        /src/alloy/internal/runtime/internal/controller/loader.go:207 +0x1085
github.com/grafana/alloy/internal/runtime/internal/dag.WalkTopological(0xc0034295a0, {0xc003389200, 0x12, 0x117dcd70?}, 0xc004547088)
        /src/alloy/internal/runtime/internal/dag/walk.go:83 +0x222
github.com/grafana/alloy/internal/runtime/internal/controller.(*Loader).Apply(0xc002ddc9c0, {0x0, {0xc003387700, 0x1a, 0x20}, {0x0, 0x0, 0x0}, {0x0, 0x0, ...}, ...})
        /src/alloy/internal/runtime/internal/controller/loader.go:190 +0xb6d
github.com/grafana/alloy/internal/runtime.(*Runtime).applyLoaderConfig(0xc003387300, {0x0, {0xc003387700, 0x1a, 0x20}, {0x0, 0x0, 0x0}, {0x0, 0x0, ...}, ...})
        /src/alloy/internal/runtime/alloy.go:334 +0xd8
github.com/grafana/alloy/internal/runtime.(*Runtime).LoadSource(0xc003387300, 0xc0024547e0, 0x0, {0x7ffe18104d59, 0x17})
        /src/alloy/internal/runtime/alloy.go:307 +0x365
github.com/grafana/alloy/internal/alloycli.(*alloyRun).Run.func5()
        /src/alloy/internal/alloycli/cmd_run.go:359 +0x285
github.com/grafana/alloy/internal/alloycli.(*alloyRun).Run(0xc002dfe140, 0xc00338a908, {0x7ffe18104d59, 0x17})
        /src/alloy/internal/alloycli/cmd_run.go:392 +0x1455
github.com/grafana/alloy/internal/alloycli.runCommand.func1(0xc003386300?, {0xc003297be0?, 0x4?, 0xa8d8ad5?})
        /src/alloy/internal/alloycli/cmd_run.go:105 +0x2e
github.com/spf13/cobra.(*Command).execute(0xc00338a908, {0xc003297bc0, 0x2, 0x2})
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:985 +0xaca
github.com/spf13/cobra.(*Command).ExecuteC(0xc00338a008)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:1041
github.com/grafana/alloy/internal/alloycli.Run()
        /src/alloy/internal/alloycli/alloycli.go:33 +0x2f8
main.main()
        /src/alloy/main.go:35 +0xf
@bengesoff bengesoff added the bug Something isn't working label Jan 20, 2025
@bengesoff
Copy link
Author

Found a workaround (fanning out before the batch processor and having two batch processors):

otelcol.receiver.otlp "default" {
    grpc {
        endpoint = "0.0.0.0:4317"
    }

    http {
        endpoint = "0.0.0.0:4318"
    }

    output {
        metrics = [otelcol.processor.batch.default.input, otelcol.processor.batch.pov.input]
        logs    = [otelcol.processor.batch.default.input, otelcol.processor.batch.pov.input]
        traces  = [otelcol.processor.batch.default.input, otelcol.processor.batch.pov.input]
    }
}

otelcol.processor.batch "default" {
    output {
        metrics = [otelcol.exporter.prometheus.default.input]
        logs    = [otelcol.exporter.otlphttp.default.input]
        traces  = [otelcol.exporter.otlphttp.default.input]
    }
}

otelcol.processor.batch "pov" {
    output {
        metrics = [otelcol.exporter.prometheus.default.input]
        logs    = [otelcol.exporter.otlphttp.grafana_cloud_pov.input]
        traces  = [otelcol.exporter.otlphttp.grafana_cloud_pov.input]
    }
}


// Exporters:
otelcol.auth.basic "default" {
    username = sys.env("ALLOY_OTLP_USERNAME")
    password = sys.env("ALLOY_API_KEY")
}

otelcol.exporter.otlphttp "default" {
    client {
        endpoint = sys.env("ALLOY_OTLP_ENDPOINT")
        auth     = otelcol.auth.basic.default.handler
    }
}

otelcol.auth.basic "grafana_cloud_pov" {
    username = sys.env("ALLOY_OTLP_USERNAME2")
    password = sys.env("ALLOY_API_KEY2")
}

otelcol.exporter.otlphttp "grafana_cloud_pov" {
    client {
        endpoint = sys.env("ALLOY_OTLP_ENDPOINT2")
        auth     = otelcol.auth.basic.grafana_cloud_pov.handler
    }
}

@wildum
Copy link
Contributor

wildum commented Jan 21, 2025

Hi, thanks for opening a ticket. I have not been able to reproduce the bug so far. I ran the exact same config on v1.5.1 but I don't get any panic.

  • Do you always get a panic when you try to run the config?
  • Do you also get a panic if you replace the sys.env() by "foobar" strings? (trying to see whether the loaded env variable could have an impact there or not)?
  • The next time you trigger a panic, could you share the full logs from the very start of Alloy please?

@tpaschalis
Copy link
Member

tpaschalis commented Jan 21, 2025

I think I've hit a similar panic before: grafana/agent#6746 but we'd solved it.

It looks like the offending line in fanoutconsumer/traces.go:46 should be protected by the nilness check a few lines above, so I'm wondering if there's a reason why the consumer was indeed exited. As William mentioned, logs might help to provide some context around the behavior of individual components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants