Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is exporting failing? Not seeing any data go to Splunk HF #986

Closed
xxPhuNguyenxx opened this issue Oct 23, 2023 · 7 comments
Closed

Why is exporting failing? Not seeing any data go to Splunk HF #986

xxPhuNguyenxx opened this issue Oct 23, 2023 · 7 comments

Comments

@xxPhuNguyenxx
Copy link

xxPhuNguyenxx commented Oct 23, 2023

We're currently trying to migrate from Splunk Connect for Kubernetes (SCK) to Splunk OpenTelemetry. We've uninstalled SCK and following the instructions for installing Splunk Otel. Below is values.yaml .

values.yaml
clusterName: ocp-sbx1
splunkPlatform:
token: xxxxxxxxxx
endpoint: https://SplunkHF/services/collector/event
index: sotl
metricsIndex: sotm
logsEnabled: true
metricsEnabled: false
tracesEnabled: false
maxConnections: 10 # Maximum HTTP connections to use simultaneously when sending data.
disableCompression: false # Whether to disable gzip compression over HTTP. Defaults to true.
timeout: 10s # HTTP timeout when sending data. Defaults to 10s.
idleConnTimeout: 5s # Idle connection timeout. defaults to 10s
insecureSkipVerify: false # default to true. Once this works, we'll need to put Ca certs in place
retryOnFailure:
enabled: true
initialInterval: 30s # Time to wait after the first failure before retrying; ignored if enabled is false. Defaults to 5s
maxInterval: 60s # The upper bound on backoff; ignored if enabled is false. Default is 30s
maxElapsedTime: 600s # The maximum amount of time spent trying to send a batch; ignored if enabled is false. Default is 300s
logsEngine: otel
cloudProvider: "aws"
distribution: "eks"
environment: sandbox

Otel Logging:
kubectl logs pod/ocp-aws-sbx1-splunk-otel-collector-agent-tgb87 -f
2023/10/04 14:29:12 settings.go:399: Set config to [/conf/relay.yaml]
2023/10/04 14:29:12 settings.go:452: Set ballast to 165 MiB
2023/10/04 14:29:12 settings.go:468: Set memory limit to 450 MiB
2023-10-04T14:29:12.823Z info service/telemetry.go:84 Setting up own telemetry...
2023-10-04T14:29:12.823Z info service/telemetry.go:201 Serving Prometheus metrics {"address": "0.0.0.0:58889", "level": "Basic"}
2023-10-04T14:29:12.825Z info service/service.go:138 Starting otelcol... {"Version": "v0.85.0", "NumCPU": 32}
2023-10-04T14:29:12.825Z info extensions/extensions.go:31 Starting extensions...
2023-10-04T14:29:12.825Z info extensions/extensions.go:34 Extension is starting... {"kind": "extension", "name": "file_storage"}
2023-10-04T14:29:12.825Z info extensions/extensions.go:38 Extension started. {"kind": "extension", "name": "file_storage"}
2023-10-04T14:29:12.825Z info extensions/extensions.go:34 Extension is starting... {"kind": "extension", "name": "health_check"}
2023-10-04T14:29:12.825Z info [email protected]/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2023-10-04T14:29:12.825Z warn [email protected]/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "extension", "name": "health_check", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-04T14:29:12.825Z info extensions/extensions.go:38 Extension started. {"kind": "extension", "name": "health_check"}
2023-10-04T14:29:12.825Z info extensions/extensions.go:34 Extension is starting... {"kind": "extension", "name": "k8s_observer"}
2023-10-04T14:29:12.825Z info extensions/extensions.go:38 Extension started. {"kind": "extension", "name": "k8s_observer"}
2023-10-04T14:29:12.825Z info extensions/extensions.go:34 Extension is starting... {"kind": "extension", "name": "memory_ballast"}
2023-10-04T14:29:12.908Z info [email protected]/memory_ballast.go:41 Setting memory ballast {"kind": "extension", "name": "memory_ballast", "MiBs": 165}
2023-10-04T14:29:13.008Z info extensions/extensions.go:38 Extension started. {"kind": "extension", "name": "memory_ballast"}
2023-10-04T14:29:13.008Z info extensions/extensions.go:34 Extension is starting... {"kind": "extension", "name": "zpages"}
2023-10-04T14:29:13.008Z info [email protected]/zpagesextension.go:53 Registered zPages span processor on tracer provider {"kind": "extension", "name": "zpages"}
2023-10-04T14:29:13.008Z info [email protected]/zpagesextension.go:63 Registered Host's zPages {"kind": "extension", "name": "zpages"}
2023-10-04T14:29:13.009Z info [email protected]/zpagesextension.go:75 Starting zPages extension {"kind": "extension", "name": "zpages", "config": {"TCPAddr":{"Endpoint":"localhost:55679"}}}
2023-10-04T14:29:13.009Z info extensions/extensions.go:38 Extension started. {"kind": "extension", "name": "zpages"}
2023-10-04T14:29:13.009Z warn [email protected]/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-04T14:29:13.009Z info [email protected]/otlp.go:83 Starting GRPC server {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
2023-10-04T14:29:13.009Z warn [email protected]/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
2023-10-04T14:29:13.009Z info [email protected]/otlp.go:101 Starting HTTP server {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4318"}
2023-10-04T14:29:13.009Z info adapter/receiver.go:45 Starting stanza receiver {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2023-10-04T14:29:13.011Z info healthcheck/handler.go:132 Health Check state change {"kind": "extension", "name": "health_check", "status": "ready"}
2023-10-04T14:29:13.011Z info service/service.go:161 Everything is ready. Begin running and processing data.
2023-10-04T14:29:13.213Z info fileconsumer/file.go:194 Started watching file {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/argo-cd_argocd-application-controller-0_5e3a9022-da1d-4987-9d5a-8ab6d91155b3/argocd-application-controller/0.log"}
2023-10-04T14:29:13.708Z info exporterhelper/queued_retry.go:351 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "splunk_hec/platform_logs", "error": "HTTP 404 "Not Found"", "interval": "36.084025499s"}
2023-10-04T14:29:30.534Z info exporterhelper/queued_retry.go:351 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "splunk_hec/platform_logs", "error": "HTTP 404 "Not Found"", "interval": "1m1.421101624s"}
2023-10-04T14:29:41.624Z info exporterhelper/queued_retry.go:351 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "splunk_hec/platform_logs", "error": "HTTP 404 "Not Found"", "interval": "27.609176856s"}

@atoulme
Copy link
Contributor

atoulme commented Oct 24, 2023

Is this a duplicate of #987 ? Please reach out to Splunk Support.

@xxPhuNguyenxx
Copy link
Author

xxPhuNguyenxx commented Oct 24, 2023

No actually its a different issue. The other one is from our onprem OCP environment and is different error. This one is from our AWS EKS cluster where we're sending to custom port due to conflict in port.

With the other one #987, we're getting weird connection issue even though manaul curl command to the HEC endpoint works from the pod.
2023-10-18T20:30:00.323Z info exporterhelper/retry_sender.go:177 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "splunk_hec/platform_logs", "error": "Post "https://splunkhf/services/collector/event\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)", "interval": "37.248737291s"}

With this one it just throws the "Exporting failed" error. In regards to "HTTP 404 "Not Found", what exactly is not found ? Whats causing the 404?

2023-10-04T14:29:41.624Z info exporterhelper/queued_retry.go:351 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "splunk_hec/platform_logs", "error": "HTTP 404 "Not Found"", "interval": "27.609176856s"}

@atoulme
Copy link
Contributor

atoulme commented Oct 24, 2023

The HEC exporter is hitting a HTTP server that returns a 404 not found error code. You need to check if the endpoint is correct. It might be a good idea to try to hit this endpoint from your cluster to check it works properly. For more help, it's best again to open a support case.

@xxPhuNguyenxx
Copy link
Author

I'm a bit hesitant about submitting a Splunk support case mainly because we'll spend alot of time on the HF which is definitely working (we have over 5tb going to it daily with no issues) or some side tracking stuff. Is there a way to set the logging to debug or confirm what endpoint is used?

When i run curl command from the pod, it always return success.

curl -k https://splunkhf/services/collector/event -H "Authorization: Splunk xxxx-xxxx-xxxx-xxxx-xxxx" -d '{"index":"main","event":"testing"}'
{"text":"Success","code":0}%

@atoulme
Copy link
Contributor

atoulme commented Oct 24, 2023

@matthewmodestino
Copy link

Has to be something in the way your values.yaml endpoint is being set in the configmap.

Try reviewing the configmap

kubectl get cm <name_of_agent_configmap> -o yaml

If you open a case the support team can reach out internally to grab one of us to help, or holler at your SE and tell them to ping me, or if you are in the splunk community slack we can review there.

@atoulme
Copy link
Contributor

atoulme commented Nov 17, 2023

Please follow up with a support case if the problem persists.

@atoulme atoulme closed this as completed Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants