Clint's Filter Writeup

Notes prepared by Clint.

Druid query processing with filtering and indexes

Many filters are split into DimFilter and Filter, the former of which can build the later, which is for processing individual segments to do the filtering. this split isn't necessary, rather it is left overs from earlier decisions on isolating the json request types from the mechanical segment processing types, and we've started combining some implementations. Filter is the interesting one though from the query engines perspective.

Filtering in Druid can currently happen in two ways, using indexes, and value matchers. https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/filter/Filter.java#L50

Every filter must implement a value matcher (and vector value matcher if supporting the vectorized engines), which is typically done as a predicate that just accepts row values as inputs and decides if the row matches or not.

https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/filter/ValueMatcher.java

https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/filter/vector/VectorValueMatcher.java

Filters may also use indexes, if the underlying column supports them. Currently filters only support indexes which can produce a bitmap corresponding to the row numbers of the column which match a given filter, the computation of which is captured in this interface https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/BitmapColumnIndex.java#L30 and produces our bitmap wrapper type to support various implementations:

https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/collections/bitmap/ImmutableBitmap.java the BitmapResultFactory is a wrapper for ImmutableBitmap operations to instrument them with metrics, BitmapColumnIndex. Roaring is the default bitmap implementation.

Both indexes and value matchers are used when constructing the cursor for processing the segment columns, which currently unfortunately knows a bit more than it should about how filters work, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/QueryableIndexStorageAdapter.java#L338 and partitions them into 'pre' filters (indexes) and 'post' filters (value matchers).

The indexes are computed into a https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/BitmapOffset.java or https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/vector/BitmapVectorOffset.java as appropriate, and similarly the remaining filters are put into https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/FilteredOffset.java or https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/vector/FilteredVectorOffset.java This is something I am working towards changing, since we actually have the underlying machinery so that cursor construction doesn't have to know how to partition filters, and instead could just allow filters to produce Offsets/VectorOffsets directly. This would allow for filters to use any type of index even if it doesn't produce a ImmutableBitmap, since it would have complete control over the Offsets/VectorOffsets that the cursor is built around.

Anyway, back to filters. So filters which use indexes are given a https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/filter/ColumnIndexSelector.java#L30 which is a thing that can provide a https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/column/ColumnIndexSupplier.java#L44 for a given column, which is the thing that produces indexes for that column.

Filters use this 'as' method to ask the index supplier if it has the type of indexes it is looking for, for the column it is looking at. All columns which exist, have a ColumnIndexSupplier, so if the supplier itself is null, it means the filter can build an 'all true' or 'all false' bitmap if the filter matches or doesn't match nulls (for most filters it should probably not match nulls to be SQL compatible, which is sort of a problem area right now with druid filtering/indexes and another area i'd like to work on fixing in the near term) Using the LikeFilter as an example https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/filter/LikeFilter.java#L74 it first checks to see if the supplier exists, if not, makes a null matching index, but if the supplier does exist, then it checks for a few different specialized indexes (such as one for equality and one that can exploit that the value dictionary and bitmap index are lexicographically sorted in string columns), and finally falls back to a predicate index if the comparator does not match the column type, which can use the same predicate as the value matcher ColumnIndexSupplier are what actually provides the index gizmos from the columns (or virtual columns, but we'll ignore that for now) to the filters. For strings, the supplier is https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedStringIndexSupplier.java#L78 which is attached to the column when reading it from the segment https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/serde/DictionaryEncodedColumnPartSerde.java#L364 segments themselves load from here https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/IndexIO.java#L539

building and writing indexes to segments

This part is maybe a bit clunky, and unfortunately indexing and segment merging are a bit more coupled than they should be, but its the way things are currently.

So, segment building happens through a thing called IncrementalIndex which is a write optimized segment that is later persisted to disk as an immutable segment.

Column processing for ingesting columns is done via DimensionHandler, which provides a thing called DimensionIndexer which processes the row values while building an in memory facts table, DimensionMergerV9 which is used to both persist DimensionIndexer and to merge persisted segments into a bigger segment. ingest time aggregators may also be defined to produce columns, but are not considered for grouping when rollup is enabled (instead combined with each other on the grouped rows, normal olap stuff), and not really interesting here (probably...)

https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/DimensionHandler.java https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/DimensionIndexer.java https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/DimensionMergerV9.java

For strings, the string dimension indexer builds a value dictionary, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/StringDimensionIndexer.java and the merger merges both value dictionaries and indexes https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/StringDimensionMergerV9.java However, indexes do not need to be strictly created while indexing nor merged in the index merger. I don't today have any examples of this (soon though), but they may also be built 'late' when writing out the column itself when merging, when the 'serializer' which will be fed the row values to physically write to the segment file, meaning that these values could also be processed by code that builds and then writes out the whatever kind of index you're trying to add.

https://github.com/apache/druid/blob/c877d8a98119f2240c1335bb052a3d90e9649a86/processing/src/main/java/org/apache/druid/segment/DimensionMerger.java#L104 is the method that processes rows, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/IndexMergerV9.java#L693 so the index could also be built up here, and then written out in the 'writeTo' method of the serializer, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/IndexMergerV9.java#L633, instead of being merging prebuilt indexes with the dedicated index merging methods that are pretty specific to how druid string columns are currently built (where it does a remapping of dictionary ids, and then combines them etc)

some amount of consideration needs to go into how we add different types of indexes to existing column types, it might make sense to vary which dimension merger is provided by the dimension handler based on options in the spec, but there are a few ways this could be done.

SQL operators

So far we've got to the point where we have covered 'native' druid filters and underlying indexes in the columns, but there is some additional work to wire up to work in SQL. this stuff will be split into two parts, the operator binding, and, a native druid expression to handle cases where the operator is used in some composition with other functions and is unable to use the 'native' filter. This interface is called https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/expression/SqlOperatorConversion.java and provides the methods to make expressions and make native filters, where appropriate. the contract for filters is that if a native filter can be produced, that will be used, otherwise an expression filter will be used instead as a fallback. I would recommend checking out things that implement 'toDruidFilter', such as https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/expression/builtin/LikeOperatorConversion.java and extends https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/expression/DirectOperatorConversion.java which is basically a pass through to native druid expressions, represented here as https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/expression/DruidExpression.java but which eventually becomes https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/math/expr/Expr.java which is the native expression syntax tree type and where/how expressions are evaluated.

To add the native druid expression function, there are two ways to do it, built-in functions https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/math/expr/Function.java -> https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/math/expr/FunctionalExpr.java#L160 and 'macro' functions, the latter are like extension points for the native expression system https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/math/expr/ExprMacroTable.java ExprMacro are a bit more flexible since they provide more control over how Expr are produced, and can dynamically change which Expr is generated based on arguments for example, while built-in functions are always used through a FunctionExpr. The like expression uses the macro table system, https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/expression/LikeExprMacro.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clint's Filter Writeup

Druid query processing with filtering and indexes

building and writing indexes to segments

SQL operators

Clone this wiki locally