-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metric -- count of queries waiting for merge buffers #15025
Add metric -- count of queries waiting for merge buffers #15025
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your first contribution!
I took a quick pass of the code and have some suggestions - specifically in the MergeBufferPoolMonitor
that should make it easier to deploy.
Since this monitor runs only once on every emission period, the metrics only provide a snapshot in time of how many queries were waiting on merge buffers at that instant. Have you considered reporting metrics like mean, P98 and max time spent waiting on merge buffers instead?
That should provide visibility into how long queries are taking to acquire merge buffers on all the different services.
Another metric that may be interesting is the number of queries that timed out while waiting to acquire merge buffers.
processing/src/main/java/org/apache/druid/collections/DefaultBlockingPool.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/collections/DefaultBlockingPool.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/collections/DefaultBlockingPool.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/java/util/metrics/MergeBufferPoolMonitor.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/java/util/metrics/MergeBufferPoolMonitor.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/java/util/metrics/MergeBufferPoolMonitor.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/java/util/metrics/MergeBufferPoolMonitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the first PR! Left a few review comments.
Also, we should add the monitor to docs/configuration/index.md
along with the other monitors.
processing/src/main/java/org/apache/druid/java/util/metrics/MergeBufferPoolMonitor.java
Outdated
Show resolved
Hide resolved
|
||
@Inject | ||
public MergeBufferPoolMonitor( | ||
@Merging BlockingPool<ByteBuffer> mergeBufferPoolIn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The monitor itself shouldn't care about the actual blocking pool. It should have a "stats provider" (interface) which exposes the method getPendingQueries
.
You can take a look at the example: QueryCountStatsProvider
and QueryResource
for the pattern. You will need to wire it up in Guice as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LakshSingla is there a reason you want to do it that way? what structure do you have in mind exactly?
processing/src/test/java/org/apache/druid/java/util/metrics/MergeBufferPoolMonitorTest.java
Outdated
Show resolved
Hide resolved
processing/src/test/java/org/apache/druid/java/util/metrics/MergeBufferPoolMonitorTest.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/collections/BlockingPool.java
Outdated
Show resolved
Hide resolved
We don't need to add another monitor. The new metrics can be emitted from the QueryCountStatsMonitor. |
@abhishekagarwal87, just want to pan out the overall structure of add this metric from the
So here, I can add a parameter in In the interface of We will also need to keep the same changes in this PR to add a method in Let me know if I missed anything. Or next I will update the PR to reflect this changes discussed here. |
@kaisun2000 - How about injecting BlockingPool directly into the QueryCountStatsMonitor? Though seems like Laksh has objections to injecting pool into a monitor. I will let him chime in too. |
@abhishekagarwal87, thx for coordinating this. @LakshSingla , let me know your take here. In this case, I will inject the |
…rgebuffer-waiting-queries-count
I had concerns with the monitor directly interacting with the merge buffer pool, instead of the stats provider that it was supposed to work with. However, I don't have a strong stance on the same, since its mostly internal implementation. The changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM left a few comments. TY for taking this up!
server/src/test/java/org/apache/druid/server/metrics/QueryCountStatsMonitorTest.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/collections/BlockingPool.java
Outdated
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/collections/DefaultBlockingPool.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/metrics/QueryCountStatsMonitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code coverage issues, however, the coverage looks good to me. The insufficiency is due to the DummyBlockingPool
not being tested anywhere, which is okay to me.
LGTM 🚀
ty for the contribution, I am merging the PR. Can you please add a release note to the description @kaisun2000? (a few line user-facing summary of the change) It helps in prepping up the release notes and calling out user-facing changes. |
@LakshSingla, just added the release notes section. Thx for help to get this PR reviewed and checked in. |
thank you for your contribution @kaisun2000 |
Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools.
Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools.
) * Add metric -- count of queries waiting for merge buffers (apache#15025) Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools. * Update addMetric to build --------- Co-authored-by: kaisun2000 <[email protected]>
) (#223) * Add metric -- count of queries waiting for merge buffers (apache#15025) Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools. * Update addMetric to build --------- Co-authored-by: kaisun2000 <[email protected]>
…des (apache#16992) apache#15025 adds mergeBuffer/pendingRequests metric in QueryCountStatsMonitor. Since real-time nodes also use the same merge buffers for queries and have QueryCountStatsMonitor , the documentation is being updated to include this metric.
Description
Currently there is no metric to indicate the contention level for the shared resource -- merge buffer pools. Thus, it is hard to know that the query performance suffers from the contention for shared merge buffer pools. We did meet such performance degradation issue in the production environment.
This PR exposed the count of waiting queries (threads) blocking at in the merge buffers pools.
Summary of the changes for a quick reference:
BlockingPool
interface was updated with a new methodgetPendingQueries
to return the count of pending queries.DefaultBlockingPool
class was updated with the implementation of gathering the number of pending queries with anAtomicLong
counter.MergeBufferPoolMonitor
class was created with the shared merge buffer pool (annotated with@Merging
) injected. User can add this monitor to the configuration file such asdruid.monitoring.monitors=["org.apache.druid.java.util.metrics.MergeBufferPoolMonitor", ...]
MergeBufferPoolMonitor
would emit correct count of pending queries with a query blocking at theDefaultBlockingPool
by taking more than the number of available buffers in theDefaultBlockingPool
Release note
NEW: You can now add a metric
mergeBuffer\pendingRequests
to monitor the contention level of the merge buffer pool. This metric is exposed through theQueryCountStatsMonitor
Key changed/added classes in this PR
mergeBuffer/pendingRequest
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.MergeBufferPoolMonitor", ...]
This PR has: