-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster dimension deserialization on the brokers #16740
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine to me 👍 (though then again so did the last change)
return dimensionType.is(ValueType.COMPLEX) && !ColumnType.NESTED_DATA.equals(dimensionType); | ||
} | ||
|
||
private static JavaType[] createJavaTypesForResultRow(final GroupByQuery groupByQuery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see, inside the version that takes a class its going to call typeFactory.constructType anyway, 👍
{ | ||
final boolean resultAsArray = query.context().getBoolean(GroupByQueryConfig.CTX_KEY_ARRAY_RESULT_ROWS, false); | ||
final boolean intermediateCompatMode = groupByQueryConfig.isIntermediateResultAsMapCompat(); | ||
final boolean arrayBasedRows = resultAsArray && !intermediateCompatMode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker, but i wonder if we still need this config...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably remove the config safely and all the artifacts associated with maintaining the legacy version of the result rows.
Tested the patch on the query that was regressing (285 seconds to 300+ seconds) with the custom deserialization and it is back to its previous performance. All the other queries tested with this patch also do not show any perf regressions. Benchmark numbers are mentioned in the description. |
Hi @LakshSingla I'm wondering how much performance gain this would bring in? I didn't find any doc about such performance overhead by using the Class object. And if we need to forbide the readValues method with the Class type? |
Speedier dimension deserialization on the brokers.
Description
After allowing complex dimensions to be grouped, we have seen a regression in queries that operate on large amounts of data which is solely due to the custom deserialization that has been added. After investigation, it looks like that happens because deserializing everything in a single go is a lot faster than deserializing individual dimensions, due to the additional overhead involved with Jackson deserialization when trying to parse a value. While the patch mentioned is important for allowing grouping on complex dimensions, this PR aims to improve the performance of the deserializing code. Equally importantly, I have duplicated the logic for deserialization to make the hot loop more effective, and if the query doesn't contain any complex dimensions (or nested columns), we resort to the original logic to remove any regression that was introduced by the linked patch.
This regression was originally discovered in a query that takes about 5 minutes to complete, and transfers a lot of data between the historicals and the broker as a part of the grouping process. The time before and after introducing the regressing patch increased from 270 seconds to 330 seconds.
Following improvements are made to the
Reuse deserialization context - This will reduce the overhead of creating a new deserialization context every time while calling
jp.readValueAs(...)
, which is per entity, per row. This change alone reduces the time taken by the query by 10-20 seconds.Use JavaType instead of Class - This is another free bit of optimization that can help with the runtime. Class is converted into JavaType before the mapper deserializes the value. Directly having the JavaType ready means that the per entity deserialization doesn't have to incur this cost.
Remove conditionals from the hot loop - This patch tries to make the hot loop as efficient as possible by removing the conditionals from the serde code.
Have redundant serializers and deserializers - Depending on backward compatibility and if the query has complex dimensions, serde logic can have many conditionals. This patch breaks those conditionals out so that the hot loop is more optimized than before.
Skip cloning and decorating the mapper - This patch introduces back the original code we had - if there isn't any backward compatibility to take care off, skip cloning the mapper and adding custom serde logic to it - with a minor change. We skip cloning if we don't have to worry about backward compatibility and we don't have any complex dimension.
Treat JSON columns differently from other complex types - JSON columns are complex types that can be optimised due to the fact that they can directly operate on the deserialized types. Therefore they can bypass all the logic of custom serde required for complex dimensions. This optimises the handling of nested types, which is one of the major group of complex columns that need to be grouped upon.
Benchmarking the
readValue
call onResultRow
(Note that tests on
serializablePairLongString
are not representative of the performance we should expect, due to the additional overhead of serializing to the correct type. This was the original motivation behind the custom deserializationThis PR has: