Faster dimension deserialization on the brokers #16740

LakshSingla · 2024-07-16T06:35:42Z

Description

After allowing complex dimensions to be grouped, we have seen a regression in queries that operate on large amounts of data which is solely due to the custom deserialization that has been added. After investigation, it looks like that happens because deserializing everything in a single go is a lot faster than deserializing individual dimensions, due to the additional overhead involved with Jackson deserialization when trying to parse a value. While the patch mentioned is important for allowing grouping on complex dimensions, this PR aims to improve the performance of the deserializing code. Equally importantly, I have duplicated the logic for deserialization to make the hot loop more effective, and if the query doesn't contain any complex dimensions (or nested columns), we resort to the original logic to remove any regression that was introduced by the linked patch.

This regression was originally discovered in a query that takes about 5 minutes to complete, and transfers a lot of data between the historicals and the broker as a part of the grouping process. The time before and after introducing the regressing patch increased from 270 seconds to 330 seconds.

Following improvements are made to the

Reuse deserialization context - This will reduce the overhead of creating a new deserialization context every time while calling jp.readValueAs(...), which is per entity, per row. This change alone reduces the time taken by the query by 10-20 seconds.
Use JavaType instead of Class - This is another free bit of optimization that can help with the runtime. Class is converted into JavaType before the mapper deserializes the value. Directly having the JavaType ready means that the per entity deserialization doesn't have to incur this cost.
Remove conditionals from the hot loop - This patch tries to make the hot loop as efficient as possible by removing the conditionals from the serde code.
Have redundant serializers and deserializers - Depending on backward compatibility and if the query has complex dimensions, serde logic can have many conditionals. This patch breaks those conditionals out so that the hot loop is more optimized than before.
Skip cloning and decorating the mapper - This patch introduces back the original code we had - if there isn't any backward compatibility to take care off, skip cloning the mapper and adding custom serde logic to it - with a minor change. We skip cloning if we don't have to worry about backward compatibility and we don't have any complex dimension.
Treat JSON columns differently from other complex types - JSON columns are complex types that can be optimised due to the fact that they can directly operate on the deserialized types. Therefore they can bypass all the logic of custom serde required for complex dimensions. This optimises the handling of nested types, which is one of the major group of complex columns that need to be grouped upon.

Benchmarking the `readValue` call on `ResultRow`

Current master (after regression)

Benchmark                                              (backwardCompatibility)      (complexDimensionType)  (numDimensions)  (primitiveToComplexDimensionRatio)  Mode  Cnt   Score   Error  Units
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                   0  avgt    5  30.200 ± 0.470  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 0.5  avgt    5  18.732 ± 0.316  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 1.0  avgt    5   5.774 ± 0.043  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                   0  avgt    5  30.384 ± 0.681  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 0.5  avgt    5  19.705 ± 0.450  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 1.0  avgt    5   5.565 ± 0.083  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                   0  avgt    5  31.146 ± 0.256  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 0.5  avgt    5  19.404 ± 1.294  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 1.0  avgt    5   5.681 ± 0.024  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                   0  avgt    5  30.445 ± 0.696  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 0.5  avgt    5  19.345 ± 0.361  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 1.0  avgt    5   5.608 ± 0.229  us/op

Before the regression
(Note that tests on serializablePairLongString are not representative of the performance we should expect, due to the additional overhead of serializing to the correct type. This was the original motivation behind the custom deserialization

Benchmark                                              (backwardCompatibility)      (complexDimensionType)  (numDimensions)  (primitiveToComplexDimensionRatio)  Mode  Cnt   Score   Error  Units
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                   0  avgt    5  18.449 ± 0.766  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 0.5  avgt    5  10.432 ± 0.095  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 1.0  avgt    5   2.986 ± 0.087  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                   0  avgt    5  17.439 ± 0.197  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 0.5  avgt    5  10.366 ± 0.213  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 1.0  avgt    5   3.198 ± 0.054  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                   0  avgt    5  17.492 ± 0.333  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 0.5  avgt    5  10.196 ± 0.335  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 1.0  avgt    5   3.081 ± 0.025  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                   0  avgt    5  18.919 ± 0.485  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 0.5  avgt    5  10.755 ± 0.189  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 1.0  avgt    5   3.436 ± 0.930  us/op

After the patch

Benchmark                                              (backwardCompatibility)      (complexDimensionType)  (numDimensions)  (primitiveToComplexDimensionRatio)  Mode  Cnt   Score   Error  Units
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                   0  avgt    5  18.771 ± 0.886  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 0.5  avgt    5  10.588 ± 0.170  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 1.0  avgt    5   3.272 ± 0.032  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                   0  avgt    5  29.902 ± 0.615  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 0.5  avgt    5  18.609 ± 0.230  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 1.0  avgt    5   2.955 ± 0.034  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                   0  avgt    5  17.859 ± 0.335  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 0.5  avgt    5  10.596 ± 0.079  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 1.0  avgt    5   3.097 ± 0.023  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                   0  avgt    5  30.298 ± 0.493  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 0.5  avgt    5  18.608 ± 0.678  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 1.0  avgt    5   3.120 ± 0.121  us/op

This PR has:

This reverts commit 2f54dd9.

clintropolis

seems fine to me 👍 (though then again so did the last change)

clintropolis · 2024-07-24T10:44:36Z

processing/src/main/java/org/apache/druid/query/groupby/ResultRowObjectMapperDecoratorUtil.java

+    return dimensionType.is(ValueType.COMPLEX) && !ColumnType.NESTED_DATA.equals(dimensionType);
+  }
+
+  private static JavaType[] createJavaTypesForResultRow(final GroupByQuery groupByQuery)


ah i see, inside the version that takes a class its going to call typeFactory.constructType anyway, 👍

clintropolis · 2024-07-24T10:45:17Z

processing/src/main/java/org/apache/druid/query/groupby/ResultRowObjectMapperDecoratorUtil.java

+  {
+    final boolean resultAsArray = query.context().getBoolean(GroupByQueryConfig.CTX_KEY_ARRAY_RESULT_ROWS, false);
+    final boolean intermediateCompatMode = groupByQueryConfig.isIntermediateResultAsMapCompat();
+    final boolean arrayBasedRows = resultAsArray && !intermediateCompatMode;


not a blocker, but i wonder if we still need this config...

We can probably remove the config safely and all the artifacts associated with maintaining the legacy version of the result rows.

LakshSingla · 2024-07-26T09:05:21Z

Tested the patch on the query that was regressing (285 seconds to 300+ seconds) with the custom deserialization and it is back to its previous performance. All the other queries tested with this patch also do not show any perf regressions. Benchmark numbers are mentioned in the description.

FrankChen021 · 2024-07-26T10:41:49Z

Use JavaType instead of Class - This is another free bit of optimization that can help with the runtime. Class is converted into JavaType before the mapper deserializes the value. Directly having the JavaType ready means that the per entity deserialization doesn't have to incur this cost.

Hi @LakshSingla I'm wondering how much performance gain this would bring in? I didn't find any doc about such performance overhead by using the Class object. And if we need to forbide the readValues method with the Class type?

Speedier dimension deserialization on the brokers.

faster serde

fb15cf5

LakshSingla changed the title ~~Reuese deserialization context when deserializing ResultRows~~ Reuse deserialization context when deserializing ResultRows Jul 16, 2024

LakshSingla added 7 commits July 16, 2024 19:52

remove backward compat

2f54dd9

Revert "remove backward compat"

6affc61

This reverts commit 2f54dd9.

more changes

b6ede34

add benchmark, checkstyle wont pass

cf554f0

changes

8512321

fix

a49b0af

null fix

98ea5bb

clintropolis approved these changes Jul 24, 2024

View reviewed changes

LakshSingla changed the title ~~Reuse deserialization context when deserializing ResultRows~~ Faster dimension deserialization on the brokers Jul 25, 2024

LakshSingla added 2 commits July 26, 2024 10:04

javadocs

c7351d7

static check

751ff9e

LakshSingla added Performance Area - Querying labels Jul 26, 2024

LakshSingla merged commit 725d442 into apache:master Jul 26, 2024
88 checks passed

LakshSingla deleted the reuse-deser-context branch July 26, 2024 09:06

sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Aug 6, 2024

Faster dimension deserialization on the brokers (apache#16740)

5f3d5f5

Speedier dimension deserialization on the brokers.

kfaraz added this to the 31.0.0 milestone Oct 4, 2024

kfaraz mentioned this pull request Oct 11, 2024

[DRAFT] 31.0.0 Release Notes #17332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster dimension deserialization on the brokers #16740

Faster dimension deserialization on the brokers #16740

LakshSingla commented Jul 16, 2024 •

edited

Loading

clintropolis left a comment

clintropolis Jul 24, 2024

clintropolis Jul 24, 2024

LakshSingla Jul 26, 2024

LakshSingla commented Jul 26, 2024

FrankChen021 commented Jul 26, 2024

Faster dimension deserialization on the brokers #16740

Faster dimension deserialization on the brokers #16740

Conversation

LakshSingla commented Jul 16, 2024 • edited Loading

Description

Benchmarking the readValue call on ResultRow

clintropolis left a comment

Choose a reason for hiding this comment

clintropolis Jul 24, 2024

Choose a reason for hiding this comment

clintropolis Jul 24, 2024

Choose a reason for hiding this comment

LakshSingla Jul 26, 2024

Choose a reason for hiding this comment

LakshSingla commented Jul 26, 2024

FrankChen021 commented Jul 26, 2024

LakshSingla commented Jul 16, 2024 •

edited

Loading

Benchmarking the `readValue` call on `ResultRow`