Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove in memory sink from fanout and add latency sampling methods #28

Merged
merged 2 commits into from
Oct 11, 2023

Conversation

jayy04
Copy link

@jayy04 jayy04 commented Oct 10, 2023

Description

  • Remove in memory sink from fanout
  • Add latency sampling methods.

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@@ -98,7 +98,7 @@ func New(cfg Config) (_ *Metrics, rerr error) {
}()

m := &Metrics{memSink: memSink}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't create the fanout sink, you should be able to use a MetricSink object and either assign promSink or memSink to it.

// ModuleMeasureSinceWithSampling samples latency metrics given the sample rate.
// This is intended to be used in hot code paths.
func ModuleMeasureSinceWithSampling(module string, start time.Time, sampleRate float64, keys ...string) {
if rand.Float64() < sampleRate {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of issues with sampling based upon a probabilistic rate since:

  • it usually requires a lot of tuning over time.
  • you can't be sure if the telemetry information is skewed due to low frequency of events or bad random number generator sequence.

Sampling per N units of time makes a lot more sense. For example only allow this metric to be sampled once every second. It would also be good to scale the sample based upon how many times it has occurred correctly but that seems like it would require changing the go-metrics package to take bulk updates for samples.

If you still want to go down this path then it would make sense to use a faster random number generator like https://github.com/flyingmutant/rand and also ensure that you perform the sampling before you compute the time which would require the sampling to happen in the callers method (which is annoying without macros).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, we will go as is.

We should swap to use float32 as we don't need the extra precision that float64 provides and it takes less effort to RNG 32 bits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it actually takes more effort because Float32 internally calls Float64 😅

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is terrible. Likely why other rand libraries are so common in golang.

@@ -19,6 +20,18 @@ func NewLabel(name, value string) metrics.Label {
return metrics.Label{Name: name, Value: value}
}

// ModuleMeasureSinceWithSampling samples latency metrics given the sample rate.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment to state sampleRate should be between [0, 1.0)

// ModuleMeasureSinceWithSampling samples latency metrics given the sample rate.
// This is intended to be used in hot code paths.
func ModuleMeasureSinceWithSampling(module string, start time.Time, sampleRate float64, keys ...string) {
if rand.Float64() < sampleRate {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, we will go as is.

We should swap to use float32 as we don't need the extra precision that float64 provides and it takes less effort to RNG 32 bits.

@@ -70,3 +83,11 @@ func SetGaugeWithLabels(keys []string, val float32, labels []metrics.Label) {
func MeasureSince(start time.Time, keys ...string) {
metrics.MeasureSinceWithLabels(keys, start.UTC(), globalLabels)
}

// MeasureSinceWithSampling provides a wrapper functionality for emitting a a time measure
// metric with sampling.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto on comment.

// ModuleMeasureSinceWithSampling samples latency metrics given the sample rate.
// This is intended to be used in hot code paths.
func ModuleMeasureSinceWithSampling(module string, start time.Time, sampleRate float64, keys ...string) {
if rand.Float64() < sampleRate {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to move the sampling check outside of the method so that we don't have to pay the cost of defer or time.Now(). That would also make this method pointless.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe we should just do it that way. let me revert

metrics.MeasureSinceWithLabels(
keys,
start.UTC(),
append([]metrics.Label{NewLabel(MetricLabelNameModule, module)}, globalLabels...),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a label that states the sampling rate?

@github-actions github-actions bot removed the C:Store label Oct 11, 2023
@jayy04 jayy04 merged commit b95c66d into dydx-fork-v0.47.4 Oct 11, 2023
8 of 9 checks passed
@jayy04 jayy04 deleted the jy/metrics-improvement branch October 11, 2023 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants