Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Updating Rollup tutorial #16762

Merged
merged 21 commits into from
Jul 26, 2024
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions docs/tutorials/tutorial-rollup.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,13 @@ GROUP BY 1, 2, 3
PARTITIONED BY DAY
```

In the query, you group by dimensions, `timestamp`, `srcIP`, and `dstIP`. Note that the query uses the `FLOOR` function to bucket rows based on MINUTE granularity.
For the metrics, you apply aggregations to sum the `bytes` and `packets` columns and add a column that counts the number of rows that get rolled-up.
Note the following aspects of the ingestion statement:
* You transform the timestamp field using the `FLOOR` function to round timestamps down to the minute.
* You group by the dimensions `timestamp`, `srcIP`, and `dstIP`.
* You create the `bytes` and `packets` metrics, which are summed from their respective input fields.
* You also create the `count` metric that records the number of rows that get rolled-up per each row in the datasource.

With rollup, Druid combines rows with identical timestamp and dimension values after the timestamp truncation. Druid computes and stores the metric values using the specified aggregation function over each set of rolled-up rows.

edgar2020 marked this conversation as resolved.
Show resolved Hide resolved
After the ingestion completes, you can query the data.

Expand All @@ -99,7 +104,7 @@ Returns the following:
| `2018-01-02T21:33:00.000Z` | `7.7.7.7` | `8.8.8.8` | `100,288` | `2` | `161` |
| `2018-01-02T21:35:00.000Z` | `7.7.7.7` | `8.8.8.8` | `2,818` | `1` | `12` |

Notice there are only six rows as opposed to the nine rows in the example data. The next section covers how ingestion with rollup accomplishes this.
Notice there are only five rows as opposed to the nine rows in the example data. In the next section, you explore the components of the rolled-up rows.

## View rollup in action

Expand Down Expand Up @@ -147,3 +152,17 @@ Therefore, no rollup takes place:
| `2018-01-01T01:03:00.000Z` | `1.1.1.1` | `2.2.2.2` | `10,204` | `1` | `49` |


## Learn More
edgar2020 marked this conversation as resolved.
Show resolved Hide resolved

See the following topics for more information:

* [SQL-based ingestion query examples](../multi-stage-query/examples.md/#insert-with-rollup) for another example of data rollup during ingestion.

* [SQL-based ingestion concepts](../multi-stage-query/concepts/#rollup) for more details on the concept of rollup.

* [Data rollup](../ingestion/rollup/) for suggestions and best practices when performing rollup.


* [Druid schema model](../ingestion/schema-model.md) to go over more details on timestamp, dimensions, and metrics.


edgar2020 marked this conversation as resolved.
Show resolved Hide resolved
Loading