Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject SPLUNK_ACCESS_TOKEN as secret #129

Open
dude0001 opened this issue Mar 23, 2023 · 6 comments
Open

Inject SPLUNK_ACCESS_TOKEN as secret #129

dude0001 opened this issue Mar 23, 2023 · 6 comments

Comments

@dude0001
Copy link

dude0001 commented Mar 23, 2023

In our environment, we are asked to not put the ingestion token as plaintext in the SPLUNK_ACCESS_TOKEN environment variable, as anyone is able to read this in the AWS console or APIs when describing the Lambda. To work around this, we created our own Lambda layer which is a Lambda execution wrapper around the Splunk provided wrapper. Our own wrapper expects an AWS Secrets Manager ARN as an environment variable. It then fetches the secret, parses out the token and sets the SPLUNK_ACCESS_TOKEN environment variable. Our wrapper then calls the Splunk wrapper to continue as normal.

The change in #114 has broken this flow for us. It looks like the OTEL collector starts up before our own wrapper is able to execute and set up the environment variable.

Is there a way we can delay the OTEL collector starting up? Is there another way to keep the token secret and out of the AWS Lambda console as plaintext?

Or can a mechanism be added to the Lambda Layer that can fetch the token from a secret input as an environment variable? The script could either use the plaintext value of the secret, or expert JSON and use syntax similar to what AWS ECS uses that expects the secret value to be JSON and which key to pull the token from. e.g. arn:aws:secretsmanager:region:aws_account_id:secret:secret-name:json-key:version-stage:version-id.

This mechanism works with arn:aws:lambda:us-east-2:254067382080:layer:splunk-apm:222. Trying this in the latest version of the Lambda Layer arn:aws:lambda:us-east-2:254067382080:layer:splunk-apm:365 this is what we see in the logs.

Lambda starts up

INIT_START Runtime Version: python:3.9.v18	Runtime Version ARN: arn:aws:lambda:us-east-2::runtime:edb5a058bfa782cb9cedc6d534ac8b8c193bc28e9a9879d9f5ebaaf619cd0fc0

We see this error which we've always got and doesn't seem to cause a problem, but would be nice if we didn't see this.

2023/03/23 01:13:16 [ERROR] Exporter endpoint must be set when SPLUNK_REALM is not set. To export data, set either a realm and access token or a custom exporter endpoint.

The commit sha of the Splunk wrapper is logged

[splunk-extension-wrapper] splunk-extension-wrapper, version: 4552de7

The OTEL collector listening on localhost starts up successfully. The SPLUNK_ACCESS_TOKEN is not set yet in our case.

{
    "level": "info",
    "ts": 1679533996.8630877,
    "msg": "Launching OpenTelemetry Lambda extension",
    "version": "v0.69.1"
}

{
    "level": "info",
    "ts": 1679533996.8672311,
    "logger": "telemetryAPI.Listener",
    "msg": "Listening for requests",
    "address": "sandbox:53612"
}

{
    "level": "info",
    "ts": 1679533996.8673244,
    "logger": "telemetryAPI.Client",
    "msg": "Subscribing",
    "baseURL": "http://127.0.0.1:9001/2022-07-01/telemetry"
}

TELEMETRY	Name: collector	State: Subscribed	Types: [Platform]
{
    "level": "info",
    "ts": 1679533996.8688502,
    "logger": "telemetryAPI.Client",
    "msg": "Subscription success",
    "response": "\"OK\""
}

{
    "level": "info",
    "ts": 1679533996.874017,
    "caller": "service/telemetry.go:90",
    "msg": "Setting up own telemetry..."
}

{
    "level": "Basic",
    "ts": 1679533996.8743467,
    "caller": "service/telemetry.go:116",
    "msg": "Serving Prometheus metrics",
    "address": ":8888"
}

{
    "level": "info",
    "ts": 1679533996.8772216,
    "caller": "service/service.go:128",
    "msg": "Starting otelcol-lambda...",
    "Version": "v0.69.1",
    "NumCPU": 2
}

{
    "level": "info",
    "ts": 1679533996.8773112,
    "caller": "extensions/extensions.go:41",
    "msg": "Starting extensions..."
}

{
    "level": "info",
    "ts": 1679533996.8773668,
    "caller": "service/pipelines.go:86",
    "msg": "Starting exporters..."
}

{
    "level": "info",
    "ts": 1679533996.877425,
    "caller": "service/pipelines.go:90",
    "msg": "Exporter is starting...",
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlphttp"
}

{
    "level": "info",
    "ts": 1679533996.8788476,
    "caller": "service/pipelines.go:94",
    "msg": "Exporter started.",
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlphttp"
}

{
    "level": "info",
    "ts": 1679533996.8789244,
    "caller": "service/pipelines.go:98",
    "msg": "Starting processors..."
}

{
    "level": "info",
    "ts": 1679533996.8789926,
    "caller": "service/pipelines.go:110",
    "msg": "Starting receivers..."
}

{
    "level": "info",
    "ts": 1679533996.8790362,
    "caller": "service/pipelines.go:114",
    "msg": "Receiver is starting...",
    "kind": "receiver",
    "name": "otlp",
    "pipeline": "traces"
}

{
    "level": "warn",
    "ts": 1679533996.8790877,
    "caller": "internal/warning.go:51",
    "msg": "Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks",
    "kind": "receiver",
    "name": "otlp",
    "pipeline": "traces",
    "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}

{
    "level": "info",
    "ts": 1679533996.8791919,
    "caller": "[email protected]/otlp.go:94",
    "msg": "Starting GRPC server",
    "kind": "receiver",
    "name": "otlp",
    "pipeline": "traces",
    "endpoint": "0.0.0.0:4317"
}

{
    "level": "warn",
    "ts": 1679533996.8792677,
    "caller": "internal/warning.go:51",
    "msg": "Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks",
    "kind": "receiver",
    "name": "otlp",
    "pipeline": "traces",
    "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"
}

{
    "level": "info",
    "ts": 1679533996.8793197,
    "caller": "[email protected]/otlp.go:112",
    "msg": "Starting HTTP server",
    "kind": "receiver",
    "name": "otlp",
    "pipeline": "traces",
    "endpoint": "0.0.0.0:4318"
}

{
    "level": "info",
    "ts": 1679533996.879386,
    "caller": "service/pipelines.go:118",
    "msg": "Receiver started.",
    "kind": "receiver",
    "name": "otlp",
    "pipeline": "traces"
}

{
    "level": "info",
    "ts": 1679533996.8794274,
    "caller": "service/service.go:145",
    "msg": "Everything is ready. Begin running and processing data."
}

Our own wrapper starts executing, fetching the token from the input secret and setting the SPLUNK_ACCESS_TOKEN environment variable

[WRAPPER] - INFO - START
[WRAPPER] - INFO - Fetching Splunk token
[WRAPPER] - INFO - Fetching arn:aws:secretsmanager:us-east-2:my-aws-acct-id:secret:splunk-token-secret
[WRAPPER] - INFO - END

The Splunk extension begins executing as called from our own wrapper. The the change in #114 this script is unsettling SPLUNK_ACCESS_TOKEN so traces are sent to localhost collector that is already set up with the token.

EXTENSION	Name: collector	State: Ready	Events: [INVOKE, SHUTDOWN]
EXTENSION	Name: splunk-extension-wrapper	State: Ready	Events: [INVOKE, SHUTDOWN]

We get a request. Ingesting traces through the localhost collector errors with 401 unauthorized and eventual times out the Lambda in retry policies.

START RequestId: 2bdc5088-8c42-42eb-9013-79f41f191fd4 Version: $LATEST

[WARNING]	2023-03-23T01:13:20.564Z	2bdc5088-8c42-42eb-9013-79f41f191fd4	Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
{
    "level": "error",
    "ts": 1679534000.7317784,
    "caller": "exporterhelper/queued_retry.go:394",
    "msg": "Exporting failed. The error is not retryable. Dropping data.",
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlphttp",
    "error": "Permanent error: error exporting items, request to https://ingest.us1.signalfx.com:443/v2/trace/otlp responded with HTTP Status Code 401",
    "dropped_items": 8,
    "stacktrace": "go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:394\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:137\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:294\ngo.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:116\ngo.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces\n\tgo.opentelemetry.io/collector/[email protected]/traces.go:36\ngo.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export\n\tgo.opentelemetry.io/collector/receiver/[email protected]/internal/trace/otlp.go:55\ngo.opentelemetry.io/collector/receiver/otlpreceiver.handleTraces\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlphttp.go:47\ngo.opentelemetry.io/collector/receiver/otlpreceiver.(*otlpReceiver).registerTraceConsumer.func1\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlp.go:210\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2462\ngo.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1\n\tgo.opentelemetry.io/[email protected]/config/confighttp/compression.go:162\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:210\ngo.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP\n\tgo.opentelemetry.io/[email protected]/config/confighttp/clientinfohandler.go:39\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2916\nnet/http.(*conn).serve\n\tnet/http/server.go:1966"
}

{
    "level": "error",
    "ts": 1679534000.731938,
    "caller": "exporterhelper/queued_retry.go:296",
    "msg": "Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.",
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlphttp",
    "dropped_items": 8,
    "stacktrace": "go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:296\ngo.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:116\ngo.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces\n\tgo.opentelemetry.io/collector/[email protected]/traces.go:36\ngo.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export\n\tgo.opentelemetry.io/collector/receiver/[email protected]/internal/trace/otlp.go:55\ngo.opentelemetry.io/collector/receiver/otlpreceiver.handleTraces\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlphttp.go:47\ngo.opentelemetry.io/collector/receiver/otlpreceiver.(*otlpReceiver).registerTraceConsumer.func1\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlp.go:210\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2462\ngo.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1\n\tgo.opentelemetry.io/[email protected]/config/confighttp/compression.go:162\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:210\ngo.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP\n\tgo.opentelemetry.io/[email protected]/config/confighttp/clientinfohandler.go:39\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2916\nnet/http.(*conn).serve\n\tnet/http/server.go:1966"
}

[WARNING]	2023-03-23T01:13:20.734Z	2bdc5088-8c42-42eb-9013-79f41f191fd4	Transient error Internal Server Error encountered while exporting span batch, retrying in 1s.
[WARNING]	2023-03-23T01:13:21.797Z	2bdc5088-8c42-42eb-9013-79f41f191fd4	Transient error Internal Server Error encountered while exporting span batch, retrying in 2s.
[WARNING]	2023-03-23T01:13:23.856Z	2bdc5088-8c42-42eb-9013-79f41f191fd4	Transient error Internal Server Error encountered while exporting span batch, retrying in 4s.
[WARNING]	2023-03-23T01:13:27.919Z	2bdc5088-8c42-42eb-9013-79f41f191fd4	Transient error Internal Server Error encountered while exporting span batch, retrying in 8s.
{
    "level": "error",
    "ts": 1679534015.9845555,
    "caller": "exporterhelper/queued_retry.go:394",
    "msg": "Exporting failed. The error is not retryable. Dropping data.",
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlphttp",
    "error": "Permanent error: error exporting items, request to https://ingest.us1.signalfx.com:443/v2/trace/otlp responded with HTTP Status Code 401",
    "dropped_items": 8,
    "stacktrace": "go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:394\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*tracesExporterWithObservability).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:137\ngo.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:294\ngo.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:116\ngo.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces\n\tgo.opentelemetry.io/collector/[email protected]/traces.go:36\ngo.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export\n\tgo.opentelemetry.io/collector/receiver/[email protected]/internal/trace/otlp.go:55\ngo.opentelemetry.io/collector/receiver/otlpreceiver.handleTraces\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlphttp.go:47\ngo.opentelemetry.io/collector/receiver/otlpreceiver.(*otlpReceiver).registerTraceConsumer.func1\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlp.go:210\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2462\ngo.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1\n\tgo.opentelemetry.io/[email protected]/config/confighttp/compression.go:162\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:210\ngo.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP\n\tgo.opentelemetry.io/[email protected]/config/confighttp/clientinfohandler.go:39\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2916\nnet/http.(*conn).serve\n\tnet/http/server.go:1966"
}

{
    "level": "error",
    "ts": 1679534015.984702,
    "caller": "exporterhelper/queued_retry.go:296",
    "msg": "Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures.",
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlphttp",
    "dropped_items": 8,
    "stacktrace": "go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).send\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:296\ngo.opentelemetry.io/collector/exporter/exporterhelper.NewTracesExporter.func2\n\tgo.opentelemetry.io/[email protected]/exporter/exporterhelper/traces.go:116\ngo.opentelemetry.io/collector/consumer.ConsumeTracesFunc.ConsumeTraces\n\tgo.opentelemetry.io/collector/[email protected]/traces.go:36\ngo.opentelemetry.io/collector/receiver/otlpreceiver/internal/trace.(*Receiver).Export\n\tgo.opentelemetry.io/collector/receiver/[email protected]/internal/trace/otlp.go:55\ngo.opentelemetry.io/collector/receiver/otlpreceiver.handleTraces\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlphttp.go:47\ngo.opentelemetry.io/collector/receiver/otlpreceiver.(*otlpReceiver).registerTraceConsumer.func1\n\tgo.opentelemetry.io/collector/receiver/[email protected]/otlp.go:210\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\nnet/http.(*ServeMux).ServeHTTP\n\tnet/http/server.go:2462\ngo.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1\n\tgo.opentelemetry.io/[email protected]/config/confighttp/compression.go:162\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2084\ngo.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP\n\tgo.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:210\ngo.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP\n\tgo.opentelemetry.io/[email protected]/config/confighttp/clientinfohandler.go:39\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2916\nnet/http.(*conn).serve\n\tnet/http/server.go:1966"
}

[WARNING]	2023-03-23T01:13:35.985Z	2bdc5088-8c42-42eb-9013-79f41f191fd4	Transient error Internal Server Error encountered while exporting span batch, retrying in 16s.
[WARNING]	2023-03-23T01:13:50.564Z	2bdc5088-8c42-42eb-9013-79f41f191fd4	Timeout was exceeded in force_flush().
END RequestId: 2bdc5088-8c42-42eb-9013-79f41f191fd4
@tsloughter-splunk
Copy link
Contributor

I'm thinking something similar to what you were doing before but as a wrapper for starting the collector would be best. As in, if there is a way to have your wrapper run before the collector so it can set the access token. What do you think?

@dude0001
Copy link
Author

I agree with that idea. This crossed my mind and is my inquiry "Is there a way we can delay the OTEL collector starting up?". I think in general it would be nice to be able to control when the collector starts up if needed. Another reason is to be able to redirect the logs. We have another compliance issue all our logs should be going to Splunk, not CloudWatch. So the collector and the wrapper code in this Layer logging to CloudWatch is problematic for us. That might be a separate issue we need to open, but another benefit to having some control over when the collector starts.

We were sending traces directly from our app to SignalFX. I definitely see the value in routing through the collector and that being an async process. I think the original change that broke us is a good direction for this Layer.

@tsloughter-splunk
Copy link
Contributor

@dude0001 ah, the logs issue should be resolvable with a custom collector config. We can make this simpler with following versions.

@dude0001
Copy link
Author

dude0001 commented Apr 6, 2023

@tsloughter-splunk should I create a separate issue for the logs concern? Is there an example of using a custom collector config?

@tsloughter-splunk
Copy link
Contributor

@dude0001 yes, another issue would be good for tracking this. Sadly the custom collector config isn't actually going to work (at this time). There is work on the OpenTelemetry collector needed first. I've been trying to come up with a suggestion for the time being that doesn't hit CloudWatch but there may not be a good one. Disabling CloudWatch will just lose the collector logs and I doubt that is acceptable? So it may be that until there is a way to do this with the collector a way to bypass it is needed.

@dude0001
Copy link
Author

dude0001 commented Apr 6, 2023

I created #132 for the logs issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants