Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster dimension deserialization on the brokers #16740

Merged
merged 10 commits into from
Jul 26, 2024

Conversation

LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Jul 16, 2024

Description

After allowing complex dimensions to be grouped, we have seen a regression in queries that operate on large amounts of data which is solely due to the custom deserialization that has been added. After investigation, it looks like that happens because deserializing everything in a single go is a lot faster than deserializing individual dimensions, due to the additional overhead involved with Jackson deserialization when trying to parse a value. While the patch mentioned is important for allowing grouping on complex dimensions, this PR aims to improve the performance of the deserializing code. Equally importantly, I have duplicated the logic for deserialization to make the hot loop more effective, and if the query doesn't contain any complex dimensions (or nested columns), we resort to the original logic to remove any regression that was introduced by the linked patch.

This regression was originally discovered in a query that takes about 5 minutes to complete, and transfers a lot of data between the historicals and the broker as a part of the grouping process. The time before and after introducing the regressing patch increased from 270 seconds to 330 seconds.

Following improvements are made to the

  1. Reuse deserialization context - This will reduce the overhead of creating a new deserialization context every time while calling jp.readValueAs(...), which is per entity, per row. This change alone reduces the time taken by the query by 10-20 seconds.

  2. Use JavaType instead of Class - This is another free bit of optimization that can help with the runtime. Class is converted into JavaType before the mapper deserializes the value. Directly having the JavaType ready means that the per entity deserialization doesn't have to incur this cost.

  3. Remove conditionals from the hot loop - This patch tries to make the hot loop as efficient as possible by removing the conditionals from the serde code.

  4. Have redundant serializers and deserializers - Depending on backward compatibility and if the query has complex dimensions, serde logic can have many conditionals. This patch breaks those conditionals out so that the hot loop is more optimized than before.

  5. Skip cloning and decorating the mapper - This patch introduces back the original code we had - if there isn't any backward compatibility to take care off, skip cloning the mapper and adding custom serde logic to it - with a minor change. We skip cloning if we don't have to worry about backward compatibility and we don't have any complex dimension.

  6. Treat JSON columns differently from other complex types - JSON columns are complex types that can be optimised due to the fact that they can directly operate on the deserialized types. Therefore they can bypass all the logic of custom serde required for complex dimensions. This optimises the handling of nested types, which is one of the major group of complex columns that need to be grouped upon.

Benchmarking the readValue call on ResultRow

  • Current master (after regression)
Benchmark                                              (backwardCompatibility)      (complexDimensionType)  (numDimensions)  (primitiveToComplexDimensionRatio)  Mode  Cnt   Score   Error  Units
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                   0  avgt    5  30.200 ± 0.470  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 0.5  avgt    5  18.732 ± 0.316  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 1.0  avgt    5   5.774 ± 0.043  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                   0  avgt    5  30.384 ± 0.681  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 0.5  avgt    5  19.705 ± 0.450  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 1.0  avgt    5   5.565 ± 0.083  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                   0  avgt    5  31.146 ± 0.256  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 0.5  avgt    5  19.404 ± 1.294  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 1.0  avgt    5   5.681 ± 0.024  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                   0  avgt    5  30.445 ± 0.696  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 0.5  avgt    5  19.345 ± 0.361  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 1.0  avgt    5   5.608 ± 0.229  us/op
  • Before the regression
    (Note that tests on serializablePairLongString are not representative of the performance we should expect, due to the additional overhead of serializing to the correct type. This was the original motivation behind the custom deserialization
Benchmark                                              (backwardCompatibility)      (complexDimensionType)  (numDimensions)  (primitiveToComplexDimensionRatio)  Mode  Cnt   Score   Error  Units
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                   0  avgt    5  18.449 ± 0.766  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 0.5  avgt    5  10.432 ± 0.095  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 1.0  avgt    5   2.986 ± 0.087  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                   0  avgt    5  17.439 ± 0.197  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 0.5  avgt    5  10.366 ± 0.213  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 1.0  avgt    5   3.198 ± 0.054  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                   0  avgt    5  17.492 ± 0.333  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 0.5  avgt    5  10.196 ± 0.335  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 1.0  avgt    5   3.081 ± 0.025  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                   0  avgt    5  18.919 ± 0.485  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 0.5  avgt    5  10.755 ± 0.189  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 1.0  avgt    5   3.436 ± 0.930  us/op
  • After the patch
Benchmark                                              (backwardCompatibility)      (complexDimensionType)  (numDimensions)  (primitiveToComplexDimensionRatio)  Mode  Cnt   Score   Error  Units
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                   0  avgt    5  18.771 ± 0.886  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 0.5  avgt    5  10.588 ± 0.170  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true                        json              100                                 1.0  avgt    5   3.272 ± 0.032  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                   0  avgt    5  29.902 ± 0.615  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 0.5  avgt    5  18.609 ± 0.230  us/op
GroupByDeserializationBenchmark.deserializeResultRows                     true  serializablePairLongString              100                                 1.0  avgt    5   2.955 ± 0.034  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                   0  avgt    5  17.859 ± 0.335  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 0.5  avgt    5  10.596 ± 0.079  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false                        json              100                                 1.0  avgt    5   3.097 ± 0.023  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                   0  avgt    5  30.298 ± 0.493  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 0.5  avgt    5  18.608 ± 0.678  us/op
GroupByDeserializationBenchmark.deserializeResultRows                    false  serializablePairLongString              100                                 1.0  avgt    5   3.120 ± 0.121  us/op

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@LakshSingla LakshSingla changed the title Reuese deserialization context when deserializing ResultRows Reuse deserialization context when deserializing ResultRows Jul 16, 2024
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to me 👍 (though then again so did the last change)

return dimensionType.is(ValueType.COMPLEX) && !ColumnType.NESTED_DATA.equals(dimensionType);
}

private static JavaType[] createJavaTypesForResultRow(final GroupByQuery groupByQuery)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see, inside the version that takes a class its going to call typeFactory.constructType anyway, 👍

{
final boolean resultAsArray = query.context().getBoolean(GroupByQueryConfig.CTX_KEY_ARRAY_RESULT_ROWS, false);
final boolean intermediateCompatMode = groupByQueryConfig.isIntermediateResultAsMapCompat();
final boolean arrayBasedRows = resultAsArray && !intermediateCompatMode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker, but i wonder if we still need this config...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably remove the config safely and all the artifacts associated with maintaining the legacy version of the result rows.

@LakshSingla LakshSingla changed the title Reuse deserialization context when deserializing ResultRows Faster dimension deserialization on the brokers Jul 25, 2024
@LakshSingla
Copy link
Contributor Author

Tested the patch on the query that was regressing (285 seconds to 300+ seconds) with the custom deserialization and it is back to its previous performance. All the other queries tested with this patch also do not show any perf regressions. Benchmark numbers are mentioned in the description.

@LakshSingla LakshSingla merged commit 725d442 into apache:master Jul 26, 2024
88 checks passed
@LakshSingla LakshSingla deleted the reuse-deser-context branch July 26, 2024 09:06
@FrankChen021
Copy link
Member

Use JavaType instead of Class - This is another free bit of optimization that can help with the runtime. Class is converted into JavaType before the mapper deserializes the value. Directly having the JavaType ready means that the per entity deserialization doesn't have to incur this cost.

Hi @LakshSingla I'm wondering how much performance gain this would bring in? I didn't find any doc about such performance overhead by using the Class object. And if we need to forbide the readValues method with the Class type?

sreemanamala pushed a commit to sreemanamala/druid that referenced this pull request Aug 6, 2024
Speedier dimension deserialization on the brokers.
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants