Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set GOTRACEBACK=crash when running cockroach #2088

Merged
merged 3 commits into from
Dec 22, 2022
Merged

set GOTRACEBACK=crash when running cockroach #2088

merged 3 commits into from
Dec 22, 2022

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Dec 21, 2022

By default, when Go encounters a fatal runtime error, it dumps some state (some Goroutine stack traces) and exits the process with a non-zero status code. It'd be nice to get core files for these fatal failures. Such core files proved invaluable for debugging #1146 and friends. Go provides a mechanism to dump core on fatal failures by setting GOTRACEBACK=crash in the environment. This causes it to dump its state like normal and then raise SIGABRT. This is how I generated most of the core files that I used to debug those issues.

Getting a core file also requires having configured coreadm properly and potentially updating ulimit -c. These are properties of the developer's environment and I think we should not take ownership of these. In production, I assume we have already configured them or will do so.

There are two places where this needs to be changed:

  • in omicron-test-utils: this covers both the test suite and the omicron-dev db-run developer tool
  • in the SMF manifest: this covers production

This change:

  • updates the dev/db CockroachDB starter infrastructure:
    • add support for setting environment variables, preserving the current default behavior of propagating the current environment into the cockroach process
    • uses that to set GOTRACEBACK=crash by default (this can be overridden)
    • print out the environment variables used
    • tests the above on illumos systems only (because testing this is platform-specific
      -- I'm using pargs -e PID to get the environment of the child process)
    • separately adds GOTRACEBACK=crash to the one-off invocation that we use to run cockroach version
  • updates the SMF manifest to set GOTRACEBACK=crash on deployed systems

@davepacheco davepacheco marked this pull request as ready for review December 21, 2022 22:33
@davepacheco davepacheco requested a review from smklein December 21, 2022 22:57
@smklein
Copy link
Collaborator

smklein commented Dec 22, 2022

Getting a core file also requires having configured coreadm properly and potentially updating ulimit -c. These are properties of the developer's environment and I think we should not take ownership of these. In production, I assume we have already configured them or will do so.

Should we priority boost #1597 ? Is this ultimately the sled agent's job?

@davepacheco
Copy link
Collaborator Author

Getting a core file also requires having configured coreadm properly and potentially updating ulimit -c. These are properties of the developer's environment and I think we should not take ownership of these. In production, I assume we have already configured them or will do so.

Should we priority boost #1597 ?

I do think this is critical for MVP. It's valuable as soon as it can be done. I'm not sure whether it's urgent enough to displace other work.

Is this ultimately the sled agent's job?

Will move this to #1597.

@davepacheco
Copy link
Collaborator Author

Also just to be clear this PR is not in any way blocked on #1597.

Comment on lines +1161 to +1163
let mut env_overrides = BTreeMap::new();
env_overrides.insert("GOTRACEBACK", "bogus");
env_overrides.insert("OMICRON_DUMMY", "dummy");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, can we declare this first, and use it to call builder.env? Just to "set the values once"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e71b56b.

@davepacheco davepacheco enabled auto-merge (squash) December 22, 2022 18:08
@davepacheco davepacheco merged commit cdf3207 into main Dec 22, 2022
@davepacheco davepacheco deleted the go-crash branch December 22, 2022 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants