Skip to content

Example Query Profile

Paul Rogers edited this page Oct 18, 2021 · 15 revisions

The following is an early Query Profile version for a SQL query which translates to a native scan query. See below for an explanation.

{
   "version":1,
   "type":"sql",
   "host":"localhost:8082",
   "service":"druid/broker",
   "queryId":"544aa5ac-7d5d-41fb-a319-caad2ff06613",
   "remoteAddress":"127.0.0.1",
   "columns":"None",
   "startTime":1634517420445,
   "timeNs":39823612401,
   "cpuNs":18479000,
   "rows":10,
   "query":{
      "query":"\nSELECT \"__time\", \"page\", \"delta\"\nFROM wikiticker\nWHERE \"__time\" >= TIMESTAMP \\'2015-09-12 13:00:00\\'\n  AND \"__time\" < TIMESTAMP \\'2015-09-12 15:00:00\\'\n  AND \"channel\" = \\'#en.wikipedia\\'\n  AND \"isRobot\" = \\'false\\'\n  AND \"page\" LIKE \\'User talk:D%\\'\nLIMIT 10\n",
      "resultFormat":"OBJECT",
      "header":false,
      "context":{ },
      "parameters":[ ]
   },
   "plan":{
      "rels":[
         {
            "id":"0",
            "relOp":"LogicalTableScan",
            "table":[
               "druid",
               "wikiticker"
            ],
            "inputs":[ ]
         },
         {
            "id":"1",
            "relOp":"LogicalFilter",
            "condition":{
               "op":{
                  "name":"AND",
                  "kind":"AND",
                  "syntax":"BINARY"
               },
               "operands":[
                  {
                     "op":{
                        "name":">=",
                        "kind":"GREATER_THAN_OR_EQUAL",
                        "syntax":"BINARY"
                     },
                     "operands":[
                        {
                           "input":0,
                           "name":"$0"
                        },
                        {
                           "literal":1442062800000,
                           "type":{
                              "type":"TIMESTAMP",
                              "nullable":false,
                              "precision":0
                           }
                        }
                     ]
                  },
                  {
                     "op":{
                        "name":"<",
                        "kind":"LESS_THAN",
                        "syntax":"BINARY"
                     },
                     "operands":[
                        {
                           "input":0,
                           "name":"$0"
                        },
                        {
                           "literal":1442070000000,
                           "type":{
                              "type":"TIMESTAMP",
                              "nullable":false,
                              "precision":0
                           }
                        }
                     ]
                  },
                  {
                     "op":{
                        "name":"=",
                        "kind":"EQUALS",
                        "syntax":"BINARY"
                     },
                     "operands":[
                        {
                           "input":2,
                           "name":"$2"
                        },
                        {
                           "literal":"#en.wikipedia",
                           "type":{
                              "type":"VARCHAR",
                              "nullable":false,
                              "precision":-1
                           }
                        }
                     ]
                  },
                  {
                     "op":{
                        "name":"=",
                        "kind":"EQUALS",
                        "syntax":"BINARY"
                     },
                     "operands":[
                        {
                           "input":12,
                           "name":"$12"
                        },
                        {
                           "literal":"false",
                           "type":{
                              "type":"VARCHAR",
                              "nullable":false,
                              "precision":-1
                           }
                        }
                     ]
                  },
                  {
                     "op":{
                        "name":"LIKE",
                        "kind":"LIKE",
                        "syntax":"SPECIAL"
                     },
                     "operands":[
                        {
                           "input":16,
                           "name":"$16"
                        },
                        {
                           "literal":"User talk:D%",
                           "type":{
                              "type":"CHAR",
                              "nullable":false,
                              "precision":12
                           }
                        }
                     ]
                  }
               ]
            }
         },
         {
            "id":"2",
            "relOp":"LogicalProject",
            "fields":[
               "__time",
               "page",
               "delta"
            ],
            "exprs":[
               {
                  "input":0,
                  "name":"$0"
               },
               {
                  "input":16,
                  "name":"$16"
               },
               {
                  "input":8,
                  "name":"$8"
               }
            ]
         },
         {
            "id":"3",
            "relOp":"LogicalSort",
            "collation":[
               
            ],
            "fetch":{
               "literal":10,
               "type":{
                  "type":"INTEGER",
                  "nullable":false
               }
            }
         }
      ]
   },
   "rootOperator":{
      "kind":"native",
      "query":{
         "queryType":"scan",
         "dataSource":{
            "type":"table",
            "name":"wikiticker"
         },
         "intervals":{
            "type":"intervals",
            "intervals":[
               "2015-09-12T13:00:00.000Z/2015-09-12T15:00:00.000Z"
            ]
         },
         "virtualColumns":[
            
         ],
         "resultFormat":"compactedList",
         "batchSize":20480,
         "limit":10,
         "order":"none",
         "filter":{
            "type":"and",
            "fields":[
               {
                  "type":"selector",
                  "dimension":"channel",
                  "value":"#en.wikipedia",
                  "extractionFn":"None"
               },
               {
                  "type":"selector",
                  "dimension":"isRobot",
                  "value":"false",
                  "extractionFn":"None"
               },
               {
                  "type":"like",
                  "dimension":"page",
                  "pattern":"User talk:D%",
                  "escape":"None",
                  "extractionFn":"None"
               }
            ]
         },
         "columns":[
            "__time",
            "delta",
            "page"
         ],
         "legacy":false,
         "context":{
            "queryId":"f6039b4b-ad18-4262-bd72-f0f36c0488a8",
            "sqlQueryId":"544aa5ac-7d5d-41fb-a319-caad2ff06613"
         },
         "descending":false,
         "granularity":{
            "type":"all"
         }
      },
      "child":{
         "kind":"limit",
         "limit":10,
         "rows":10,
         "batches":1,
         "child":{
            "kind":"merge",
            "children":[
               {
                  "kind":"retry",
                  "children":[
                     {
                        "kind":"scatter",
                        "segments":2,
                        "servers":1,
                        "children":[
                           {
                              "kind":"gather",
                              "timeNs":3021951687,
                              "host":"localhost:8083",
                              "url":"http://localhost:8083/druid/v2/",
                              "succeeded":true,
                              "firstByteNs":3019757515,
                              "rows":10,
                              "bytes":1943,
                              "response":{
                                 
                              },
                              "fragment":{
                                 "host":"localhost:8083",
                                 "service":"druid/historical",
                                 "rows":10,
                                 "startTime":1634517457241,
                                 "timeNs":3000618423,
                                 "cpuNs":143560000,
                                 "query":{
                                    "intervals":[
                                       "2015-09-12T13:00:00.000Z/2015-09-12T15:00:00.000Z"
                                    ],
                                    "context":{
                                       "defaultTimeout":300000,
                                       "finalize":false,
                                       "maxQueuedBytes":5242880,
                                       "maxScatterGatherBytes":9223372036854775807,
                                       "queryFailTime":1634517726522,
                                       "queryId":"f6039b4b-ad18-4262-bd72-f0f36c0488a8",
                                       "scanOutermost":false,
                                       "sqlQueryId":"544aa5ac-7d5d-41fb-a319-caad2ff06613",
                                       "timeout":269292
                                    }
                                 },
                                 "rootOperator":{
                                    "kind":"limit",
                                    "limit":10,
                                    "rows":10,
                                    "batches":1,
                                    "child":{
                                       "kind":"merge",
                                       "children":[
                                          {
                                             "kind":"scan-query",
                                             "strategy":"concat",
                                             "limit":10,
                                             "child":{
                                                "kind":"concat",
                                                "children":[
                                                   {
                                                      "kind":"segment-scan",
                                                      "segment":"wikiticker_2015-09-12T13:00:00.000Z_2015-09-12T14:00:00.000Z_2021-09-27T23:17:45.428Z",
                                                      "interval":"2015-09-12T13:00:00.000Z/2015-09-12T14:00:00.000Z",
                                                      "columnCount":3,
                                                      "batchSize":20480,
                                                      "cursors":[
                                                         {
                                                            "kind":"index-scan",
                                                            "interval":"2015-09-12T13:00:01.344Z/2015-09-12T14:00:00.000Z",
                                                            "indexRows":2073,
                                                            "granularity":"AllGranularity",
                                                            "indexFilters":[
                                                               {
                                                                  "kind":"bitmap",
                                                                  "dimension":"channel",
                                                                  "cardinality":44,
                                                                  "value":"#en.wikipedia",
                                                                  "rows":568
                                                               },
                                                               {
                                                                  "kind":"bitmap",
                                                                  "dimension":"isRobot",
                                                                  "cardinality":2,
                                                                  "value":"false",
                                                                  "rows":1294
                                                               }
                                                            ],
                                                            "unknownIndexCount":1,
                                                            "preFilteredRows":11,
                                                            "bitmapTimeNs":33456463,
                                                            "cursors":[
                                                               {
                                                                  "type":"index",
                                                                  "offsetType":"AscendingTimestampCheckingOffset",
                                                                  "preFilterRows":10,
                                                                  "postFilterRows":10
                                                               }
                                                            ]
                                                         }
                                                      ],
                                                      "rows":10,
                                                      "timeNs":111455749
                                                   },
                                                   {
                                                      "kind":"segment-scan",
                                                      "segment":"wikiticker_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2021-09-27T23:17:45.428Z",
                                                      "interval":"None",
                                                      "columnCount":0,
                                                      "batchSize":20480,
                                                      "limited":true,
                                                      "rows":0,
                                                      "timeNs":4918
                                                   }
                                                ]
                                             }
                                          }
                                       ]
                                    }
                                 }
                              }
                           }
                        ]
                     }
                  ]
               }
            ]
         }
      }
   }
}

Explanation

The overall object represents a request received by a Broker, and the way the Broker "scattered" the query to the (one) data nodes.

This is a SQL query, which is first planned using the Calcite engine. The profile includes Calcite's logical plan. Druid then converts the plan to a native query as part of query execution.

Druid queries are fully recursive: a Broker query is just a query that happens to come from a client rather than another Druid node. Since the term "query" is overloaded in Druid (it can mean a SQL query, a native query, a subquery, an internal rewritten query...), we adopt some distributed systems terms.

A fragment is a Druid query running on one node and corresponds to Druid's HTTP request handler for queries. The root fragment is the node which processes a query received from the client. (Recall that clients can, but usually don't, send queries directly to a data node.) In the typical case, the client sends the query to the Broker, and the Broker farms out the work to a set of data nodes (one in this case). Each such scattered query is a child fragment: a fragment that is part of a larger, overall query. SQL queries are always processed in a root fragment. Native queries can start with a root or child fragment.

The root fragment gives information about the query as a whole. Here we can see the address of the client, the full query as sent by the client, and the list of all the columns which this query uses. Druid allows the client to provide a query ID. Here we're using the pydruid client which did assign an ID as we can see in the copy of the received query. Otherwise, the Broker assigns a query ID.

Every fragment (root or child) is comprised of a series of (conceptual) operators: steps which read or transform data. (The actual Druid code is more complex, but we as users only care about the transform steps which Druid applied to run our query.)

This root fragment for the SQL query starts with a "native" operator which translates the SQL logical plan to a Druid native query physical plan. Here we see the native query generated. This query has a SQL LIMIT clause, which translates into a native limit and is shown in the profile as a limit operator. Next is a "merge" operator to combine results from data nodes. Next is a "retry" operator which handles missing fragments. If fragments had been missing, this operator profile would list them, along with details of the retry attempts. In this case, everything worked correctly the first time.

Below this we see a "scatter" operator which decides how to distribute the native query across our (single) data node. The node hosts multiple fragments, so the rewritten native query for the data node will span two fragments. On a realistic cluster, the scatter operator will send queries to multiple data nodes.

Below the scatter operator is a set of "gather" operators (often called "receiver" operators in other systems.) These are the parts of Druid which receive results from the data node and are the lowest operator running on the root Broker fragment. In a typical query, the Broker would also have a merge operator on top of the receiver, but here there is only one scatter node, so the Broker just uses a "concat" (concatenation) operator to "combine" the single result set.

The data node executes the rewritten native query as a child fragment, and returns data for that fragment to the Broker. The Broker inserts that fragment as a child of the "gather" operator: the gather operator has "gathered" fragment stats from the data node. The child fragment shows only the parts of the query which we believe to have been rewritten.

This is a scan query, so within the child fragment we see an operator for that task. The scan is made up of a set of segment scans, each of which is implemented by an index scan (only one here.) The details give us visibility into the strategies that Druid takes and some amount of information about times and row counts.

We then drill down further: we see that the query has three filters (WHERE) clause and that Druid is able to implement these filter via an index lookup. The segment has 2073 rows. Our value of "#en.wikipedia" against the channel dimension matches 568 rows. We can see that the there are 44 different channel values, so the average match should be 2073/44 = 47, but we found 568, meaning that the data is heavily skewed toward this particular value. We also see that the regex filter is reduced to a bitmap filter which is then union'ed with the bitmaps from the other two dimension filters to provide a "pre-filter" row count of 11 rows in one segment, 8 in another.

Druid calls the above "pre-filters" which is a bit of a misnomer. They are index lookups or predicates, after which the matching rows pass through filters. This query, however, has no filters. If it did, we'd see the rows further reduced by the use of a filter that can't be resolved via an index. Finally, the query has a limit (equivalent of a SQL LIMIT) clause, so the rows are truncated to just the first 20, which is the count of rows returned by the query.

Open Issues

  • Overall query time, bytes, rows are missing.
  • Receiver time is wrong: maybe wall clock time rather than elapsed time?
  • Response is missing from fragment.
  • limited = false, but the query has a limit?
  • Offset counts are 0. Gathered at the wrong time?
  • Convert granularity into a string value rather than using the class name.
  • Though the code suggests multiple cursors per segment, it seems in practice there is (usually? always?) one. Clean up somehow.
  • Since the limit seems to be applied at the scan-query level, indicate that in the profile.
  • Test with aggregates.
  • Expand to other query types.
  • In a real system, there will be many fragments. Perhaps separate out the planning-level information from the execution information, and merge planning information so we have one copy, not a copy per fragment.
Clone this wiki locally