-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for federated clusters #15181
Conversation
processing/src/main/java/org/apache/druid/query/QueryContexts.java
Outdated
Show resolved
Hide resolved
@599166320
I really donot understand what this means. Are you talking about ingestion tasks or query scheduling. The change in the current form looks hackish . |
@cryptoe Usually, we deploy a Druid cluster in one data center, with approximately a hundred servers in each data center. The entire cluster is quite stable. However, we've noticed that the master node operates in a master-slave configuration, storing a significant amount of metadata and handling heavy scheduling tasks. In theory, there might be bottlenecks, so we wanted to bring it up for discussion. Of course, what we are more concerned about is the cost of dedicated network traffic. |
@cryptoe However, if strict constraints are necessary, Query.queryContext might require some improvements. What are your thoughts on this? |
Things like lookups, post aggregators stuff with the current approach needs to be thought through. I think the correct way to do it would be to use something like https://github.com/lyft/presto-gateway and pass a query context to select the correct cluster you want or make some mapping to data source -> cluster on this gateway nodes. |
Another thing I was thinking is how order by's would look. The broker expects things to be sorted by the grouping key and then sorts stuff on the order by key IIRC. |
@cryptoe According to your description, are you concerned about a sorting like the one below?
In fact, when a query like this is forwarded from one broker to another broker in a cluster, the
So, you don't need to worry about this sorting issue. |
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
Fixes #14535.
Description
Druid provides a friendly and unified query gateway for users across multiple data centers and clusters through cluster federation queries.
As the number of nodes and metadata increases, a single Druid cluster can become excessively large, leading to mutual interference among tasks and deteriorating scheduling performance. Implementing federated queries can help avoid such issues. It allows for breaking down large clusters into relatively independent ones as needed, making scheduling more agile and lightweight.
Key changed/added classes in this PR
QueryContexts
BrokerServerView
CachingClusteredClient
TimelineServerView
This PR has: