Quantization involves converting fine-grained or continuous values, to discrete or coarse-grained values. A Quantizer is a function which takes fine-grained values as input, and maps those values to coarse-grained counterparts as its output, by discarding some precision.
Discrete values (e.g. Integer
, Long
, BigInteger
) are values which have only a finite number of possible values, or which have a fixed spacing between possible values. Continuous values (e.g. Float
, Double
, BigDecimal
) are values which do not have fixed spacing and which therefore can have an arbitrarily high precision.
Building indexes on continuous values can be challenging.
Consider a collection of Car
objects having Car.price
values 5000.00
, 5000.000001
, 5000.0000011
.
If an index was built on Car.price
using full precision - cars.addIndex(NavigableIndex.onAttribute(Car.PRICE))
- it would look like this:
5000.00 -> { Car{name=A, price=5000.00} }
5000.000001 -> { Car{name=B, price=5000.000001} }
5000.0000011 -> { Car{name=C, price=5000.0000011} }
This index would contain many entries, consuming a lot of memory, for only minor variations in Car.price
.
The same index using a quantizer - cars.addIndex(NavigableIndex.withQuantizerOnAttribute(DoubleQuantizer.withCompressionFactor(1), Car.Price))
- would look like this:
5000.0 -> { Car{name=A, price=5000.00}, Car{name=B, price=5000.000001}, Car{name=C, price=5000.0000011} }
Above, compression factor 1, for DoubleQuantizer
, means "truncate everything after the decimal point". Compression factor 5, would mean truncate everything after the decimal point AND group every 5 adjacent values in the index into the same entry.
CQEngine uses the following algorithm to retrieve objects matching a query from a quantized index:
- Apply the same quantization function as used to create the index, to values contained in the query
- Retrieve candidate sets of objects matching the quantized values from the index
- Apply lazily-evaluated filters to the result set, which when iterated or queried, causes the result set to filter out objects which do not match the original query thus restoring full precision for objects returned
* For equality-type queries such as
equal
orin
, apply the filter to all candidate sets matching the query * For range-type queries such asbetween
,lessThan
,greaterThan
as supported byNavigableIndex
, apply the filter to only the candidate sets at the ends of the range; objects in candidate sets in the middle of the range inherently will not require filtering and can be retrieved at full speed
As such quantization can greatly reduce the memory requirements of indexes, and often without significant CPU overhead.
CQEngine provides the following Quantizers:
IntegerQuantizer, LongQuantizer, BigIntegerQuantizer, FloatQuantizer, DoubleQuantizer, BigDecimalQuantizer
Additional Quantizers can be used by implementing the Quantizer interface.