Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka: Emit production rate #17491

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

arunramani
Copy link
Contributor

@arunramani arunramani commented Nov 19, 2024

Description

Add metric to track Kafka message production rate which can be useful for correlating with Kafka lag. For each collection period, the latest Kafka broker offset is compared with the previous minute to calculate the production rate.

One interesting use case is being able to use this metric to roughly calculate lag as time. This can be done by calculating the time t such that sum(time = now to t) of productionRate = lag i.e. how many minutes of production equals the current lag.

Release note

Added a new streaming ingest metric ingest/kafka/partitionProduction


Key changed/added classes in this PR
  • SeekableStreamingSupervisor
  • KafkaSupervisor

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@kfaraz
Copy link
Contributor

kfaraz commented Nov 20, 2024

@arunramani , could you please share some details as to how this metric would provide additional insight that is not already covered by metrics like Kafka lag, partition lag and/or message gap?

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arunramani , there are some problems with the current approach.
There is no guarantee that metrics would be emitted strictly every minute. This could lead to incorrect rate values (and thus an incorrect calculation of time lag downstream) since we are really just emitting the diff between the latestSequence at the last metric emission time at the latestSequence at the current time.

IIUC, we are trying to calculate the lag in terms of "how many minutes worth of records is the supervisor yet to process".
This value is fairly similar to the message gap but not quite the same.
The message gap is the "difference between the current timestamp of the system and the timestamp of the latest ingested record".
Whereas here, I think we want the "difference between the timestamp of the latest ingested record and the timestamp of the latest record in the stream".

If that is the case, I wonder if we even need the production rate.
Instead, do you think it would make more sense to simply calculate this timestamp difference and emit it?
Or are there challenges with that approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants