support for federated clusters #15181

599166320 · 2023-10-17T09:35:31Z

Fixes #14535.

Description

Druid provides a friendly and unified query gateway for users across multiple data centers and clusters through cluster federation queries.
As the number of nodes and metadata increases, a single Druid cluster can become excessively large, leading to mutual interference among tasks and deteriorating scheduling performance. Implementing federated queries can help avoid such issues. It allows for breaking down large clusters into relatively independent ones as needed, making scheduling more agile and lightweight.

Key changed/added classes in this PR

QueryContexts
BrokerServerView
CachingClusteredClient
TimelineServerView

This PR has:

processing/src/main/java/org/apache/druid/query/QueryContexts.java

cryptoe · 2023-10-19T14:43:10Z

@599166320
How big is the deployment you are working witch is causing things like

 leading to mutual interference among tasks and deteriorating scheduling performance.

I really donot understand what this means. Are you talking about ingestion tasks or query scheduling.

The change in the current form looks hackish .
I think its a better design pattern that one cluster owns one data source if you really want to break things up since then you can configure load/rules compaction etc only on one cluster for a data source. Ingestion also gets simpler.

599166320 · 2023-10-19T15:12:25Z

@cryptoe
Thank you for your response. For the second point, I also hesitated whether to include it. However, I decided to include it to see what everyone thinks.

Usually, we deploy a Druid cluster in one data center, with approximately a hundred servers in each data center. The entire cluster is quite stable. However, we've noticed that the master node operates in a master-slave configuration, storing a significant amount of metadata and handling heavy scheduling tasks. In theory, there might be bottlenecks, so we wanted to bring it up for discussion.

Of course, what we are more concerned about is the cost of dedicated network traffic.

599166320 · 2023-10-20T01:49:37Z

@cryptoe
Is there a better way to implement the practical application of federated queries? @abhishekagarwal87 mentioned that it might potentially break certain protocols. In my opinion, the native query for Historical and Broker currently shares the code using QueryResource as the entry point, and the parameter structures are almost identical.

However, if strict constraints are necessary, Query.queryContext might require some improvements. What are your thoughts on this?

cryptoe · 2023-10-27T08:37:25Z

Things like lookups, post aggregators stuff with the current approach needs to be thought through.

I think the correct way to do it would be to use something like https://github.com/lyft/presto-gateway and pass a query context to select the correct cluster you want or make some mapping to data source -> cluster on this gateway nodes.

cryptoe · 2023-11-06T05:01:19Z

Another thing I was thinking is how order by's would look. The broker expects things to be sorted by the grouping key and then sorts stuff on the order by key IIRC.
In this case the cluster 2 broker will return the rows already sorted on the order by key, which will break the merging logic of the grouping keys on the broker for cluster 1.

599166320 · 2023-11-07T17:06:48Z

@cryptoe According to your description, are you concerned about a sorting like the one below?

SELECT
  COUNT(*) c,
  regionName
FROM wikipedia
GROUP BY regionName
ORDER BY regionName, c DESC
LIMIT 10

In fact, when a query like this is forwarded from one broker to another broker in a cluster, the LIMIT 10 part is removed. It will be transformed into a native query similar to the one below:

{
    "queryType": "groupBy",
    "dataSource":
    {
        "type": "table",
        "name": "wikipedia"
    },
    "intervals":
    {
        "type": "intervals",
        "intervals":
        [
            "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
        ]
    },
    "granularity":
    {
        "type": "all"
    },
    "dimensions":
    [
        {
            "type": "default",
            "dimension": "regionName",
            "outputName": "d0",
            "outputType": "STRING"
        }
    ],
    "aggregations":
    [
        {
            "type": "count",
            "name": "a0"
        }
    ],
    "limitSpec":
    {
        "type": "NoopLimitSpec"
    },
    "context":
    {
        "applyLimitPushDown": false,
        "defaultTimeout": 300000,
        "federatedClusterBrokers": "",
        "finalize": false,
        "fudgeTimestamp": "-4611686018427387904",
        "groupByOutermost": false,
        "groupByStrategy": "v2",
        "maxQueuedBytes": 5242880,
        "maxScatterGatherBytes": 9223372036854775807,
        "queryFailTime": 1699369214285,
        "queryId": "605a751b-f0ee-43fe-a754-b702035622df",
        "resultAsArray": true,
        "sqlQueryId": "e8bed46a-f2de-4fc3-89c6-febfe048debc",
        "timeout": 29981
    }
}

So, you don't need to worry about this sorting issue.

github-actions · 2024-03-09T00:15:43Z

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the [email protected] list.
Thank you for your contributions.

github-actions · 2024-04-07T00:18:29Z

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

599166320 added 2 commits July 6, 2023 21:56

support for federated clusters

c03f0f4

Merge branch 'master' into feature-federated-cluster

60ee3c1

599166320 changed the title ~~Feature federated cluster~~ support for federated clusters Oct 17, 2023

Add some test code

1f615df

funguy-tech reviewed Oct 18, 2023

View reviewed changes

processing/src/main/java/org/apache/druid/query/QueryContexts.java Outdated Show resolved Hide resolved

Add relevant documentation and modify variable names

16e0383

github-actions bot added the Area - Documentation label Oct 19, 2023

github-actions bot added the stale label Mar 9, 2024

github-actions bot closed this Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for federated clusters #15181

support for federated clusters #15181

599166320 commented Oct 17, 2023 •

edited

Loading

cryptoe commented Oct 19, 2023

599166320 commented Oct 19, 2023

599166320 commented Oct 20, 2023

cryptoe commented Oct 27, 2023 •

edited

Loading

cryptoe commented Nov 6, 2023

599166320 commented Nov 7, 2023

github-actions bot commented Mar 9, 2024

github-actions bot commented Apr 7, 2024

support for federated clusters #15181

support for federated clusters #15181

Conversation

599166320 commented Oct 17, 2023 • edited Loading

Description

Key changed/added classes in this PR

cryptoe commented Oct 19, 2023

599166320 commented Oct 19, 2023

599166320 commented Oct 20, 2023

cryptoe commented Oct 27, 2023 • edited Loading

cryptoe commented Nov 6, 2023

599166320 commented Nov 7, 2023

github-actions bot commented Mar 9, 2024

github-actions bot commented Apr 7, 2024

599166320 commented Oct 17, 2023 •

edited

Loading

cryptoe commented Oct 27, 2023 •

edited

Loading