Broken Sidekiq <> Datadog Integration #68617

LindseySaari · 2023-10-30T16:21:55Z

We encountered an issue with the official Datadog Sidekiq integration on our Va.gov API after upgrading to version 5.6.1. Metrics (only for Sidekiq) stopped coming into Datadog. After downgrading back to 5.6.0, the metrics resumed as expected. For context, our Sidekiq pods operate within EKS. Additionally, we followed the direct Datadog instructions for setting up the Datadog <> Sidekiq Integration.

Ticket opened here in the gem repo and the maintainers have asked us to open a ticket because it may be related to our Sidekiq-Pro integration

LindseySaari · 2023-10-30T16:40:25Z

I opened an official issues with Datadog

LindseySaari · 2023-10-30T17:04:26Z

Private Zenhub Image

Link to the ticket opened up with Datadog: https://help.datadoghq.com/hc/requests/1414562

LindseySaari · 2023-11-01T12:45:34Z

In communication with a Datadog engineer. I am working through some back and fourth and discussing the necessary config for engineers on the Datadog side to analyze the issues between the working (5.6.0) and broken (5.6.1) versions for the sidekiq metrics

laineymajor · 2023-11-01T15:12:52Z

Looking at Monday or Tuesday BEFORE daily sync to act on this... TBD.

To do

research how to send a flare

laineymajor · 2023-11-06T16:05:21Z

Lindsey is actively working this ticket today.
Changes are in review with appropriate team(s).

laineymajor · 2023-11-07T16:39:07Z

Waiting on datadog to analyze flare that was sent.

laineymajor · 2023-11-08T16:07:51Z

Waiting on DD. This is carrying over to new sprint.

LindseySaari · 2023-11-08T18:49:40Z

Datadog is still analyzing the logs this afternoon. The Solutions engineer assigned to the ticket will pass along any updates

LindseySaari · 2023-11-08T18:49:40Z

Datadog is still analyzing the logs this afternoon. The Solutions engineer assigned to the ticket will pass along any updates

LindseySaari · 2023-11-13T14:36:36Z

Update from Datadog

"Just reaching out to give you an update on this one. The team is still reviewing this ticket on our side and will let me know when they have next steps. In the meantime, feel free to reach out if you have any other questions on this ticket."

LindseySaari · 2023-11-16T15:53:34Z

I heard back from Datadog and they need us to execute a few more steps to help with the debugging process. I will aim for getting these changes into staging Friday morning and executing the necessary commands. It's important to start early to maximize the amount of time before the production deploy. Once the necessary information is gathered, this change will need to be reverted and the agent restarted before it goes on the conveyor belt to production.

Steps

Merge in dogstatsd gem update - make sure it deploys
Merge in the datadog-agent change. (This should autosync)
Make sure it syncs
Restart the agent: kubectl rollout restart daemonset datadog-agent
Run command kubectl exec -it ds/datadog-agent -- agent dogstatsd-stats
Run tcp dump: kubectl exec -it ds/datadog-agent -- tcpdump -i any "udp port 8125" -w output.pcap
Provide these reports to datadog
Revert Vets API and Datadog agent change
Merge & Verify Deploy/Sync
Restart the agent: kubectl rollout restart daemonset datadog-agent (edited)

LindseySaari · 2023-11-17T15:13:18Z

I adjusted the config this morning and ran the tcpdump command. After that, I copied the output/files to my local machine and forwarded them to Datadog.

laineymajor · 2023-11-20T15:47:01Z

Adding this to our next sprint as we need some additional devops help to move this work forward to close.

LindseySaari · 2023-11-21T14:53:58Z

Still following up with Datadog. They were mistaken on where the "proxy" resided in our setup. They thought it sat between the agent and the datadog endpoint, but it's actually more on the frontend of things where we use socat to proxy metrics from our rails app to the agent

laineymajor · 2023-11-29T16:20:03Z

@flooose to review ticket and sync with Lindsey if needed.

LindseySaari · 2023-11-30T16:10:07Z

Chris is looking at a possible workaround to test the issue here

laineymajor · 2023-12-06T16:21:53Z

Need time to test in the cluster

RachalCassity · 2023-12-06T22:40:30Z

Datadogstatsd 5.6.1 was deployed to prod.

Host 127.0.0.1 enforces IPv4 connection

LindseySaari · 2023-12-08T16:30:42Z

This has been fixed via the change to force IPv4 via 127.0.0.1

LindseySaari added the platform-product-team label Oct 30, 2023

LindseySaari self-assigned this Oct 30, 2023

LindseySaari mentioned this issue Nov 16, 2023

Dogstatsd debugging: put back broken version for testing in lower envs department-of-veterans-affairs/vets-api#14569

Merged

LindseySaari mentioned this issue Nov 17, 2023

Revert "put back broken version (#14569)" department-of-veterans-affairs/vets-api#14577

Merged

LindseySaari mentioned this issue Nov 27, 2023

Put back borked version of dogstatsd-ruby department-of-veterans-affairs/vets-api#14644

Merged

flooose mentioned this issue Nov 30, 2023

Use and IPV4 address, not localhost department-of-veterans-affairs/vets-api#14699

Merged

LindseySaari closed this as completed Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken Sidekiq <> Datadog Integration #68617

Broken Sidekiq <> Datadog Integration #68617

LindseySaari commented Oct 30, 2023

LindseySaari commented Oct 30, 2023

LindseySaari commented Oct 30, 2023

LindseySaari commented Nov 1, 2023

laineymajor commented Nov 1, 2023

laineymajor commented Nov 6, 2023

laineymajor commented Nov 7, 2023

laineymajor commented Nov 8, 2023

LindseySaari commented Nov 8, 2023

LindseySaari commented Nov 8, 2023

LindseySaari commented Nov 13, 2023

LindseySaari commented Nov 16, 2023

LindseySaari commented Nov 17, 2023

laineymajor commented Nov 20, 2023

LindseySaari commented Nov 21, 2023

laineymajor commented Nov 29, 2023

LindseySaari commented Nov 30, 2023

laineymajor commented Dec 6, 2023

RachalCassity commented Dec 6, 2023

LindseySaari commented Dec 8, 2023

Broken Sidekiq <> Datadog Integration #68617

Broken Sidekiq <> Datadog Integration #68617

Comments

LindseySaari commented Oct 30, 2023

LindseySaari commented Oct 30, 2023

LindseySaari commented Oct 30, 2023

LindseySaari commented Nov 1, 2023

laineymajor commented Nov 1, 2023

laineymajor commented Nov 6, 2023

laineymajor commented Nov 7, 2023

laineymajor commented Nov 8, 2023

LindseySaari commented Nov 8, 2023

LindseySaari commented Nov 8, 2023

LindseySaari commented Nov 13, 2023

LindseySaari commented Nov 16, 2023

Steps

LindseySaari commented Nov 17, 2023

laineymajor commented Nov 20, 2023

LindseySaari commented Nov 21, 2023

laineymajor commented Nov 29, 2023

LindseySaari commented Nov 30, 2023

laineymajor commented Dec 6, 2023

RachalCassity commented Dec 6, 2023

LindseySaari commented Dec 8, 2023