Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce natural comparator for types that don't have a StringComparator #15145

Merged
merged 4 commits into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -311,8 +311,17 @@ private static void validateQuery(final GroupByQuery query)
}
}

/**
* Only allow ordering the queries from the MSQ engine, ignoring the comparator that is set in the query. This
* function checks if it is safe to do so, which is the case if the natural comparator is used for the dimension.
* Since MSQ executes the queries planned by the SQL layer, this is a sanity check as we always add the natural
* comparator for the dimensions there
*/
private static boolean isNaturalComparator(final ValueType type, final StringComparator comparator)
{
if (StringComparators.NATURAL.equals(comparator)) {
return true;
}
return ((type == ValueType.STRING && StringComparators.LEXICOGRAPHIC.equals(comparator))
|| (type.isNumeric() && StringComparators.NUMERIC.equals(comparator)))
&& !type.isArray();
Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@
import org.mockito.ArgumentMatchers;
import org.mockito.Mockito;

import javax.annotation.Nonnull;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
Expand Down Expand Up @@ -2381,7 +2380,6 @@ public void testUnionAllUsingUnionDataSource()
.verifyResults();
}

@Nonnull
private List<Object[]> expectedMultiValueFooRowsGroup()
{
ArrayList<Object[]> expected = new ArrayList<>();
Expand All @@ -2400,7 +2398,6 @@ private List<Object[]> expectedMultiValueFooRowsGroup()
return expected;
}

@Nonnull
private List<Object[]> expectedMultiValueFooRowsGroupByList()
{
ArrayList<Object[]> expected = new ArrayList<>();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ public StringComparatorModule()
new NamedType(StringComparators.AlphanumericComparator.class, StringComparators.ALPHANUMERIC_NAME),
new NamedType(StringComparators.StrlenComparator.class, StringComparators.STRLEN_NAME),
new NamedType(StringComparators.NumericComparator.class, StringComparators.NUMERIC_NAME),
new NamedType(StringComparators.VersionComparator.class, StringComparators.VERSION_NAME)
new NamedType(StringComparators.VersionComparator.class, StringComparators.VERSION_NAME),
new NamedType(StringComparators.NaturalComparator.class, StringComparators.NATURAL_NAME)
);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ public BoundDimFilter(
// will be used if the new 'ordering'
// property is missing. If both 'ordering' and 'alphaNumeric' are present,
// make sure they are consistent.
if (ordering == null) {
if (ordering == null || ordering.equals(StringComparators.NATURAL)) {
if (alphaNumeric == null || !alphaNumeric) {
this.ordering = StringComparators.LEXICOGRAPHIC;
} else {
Expand All @@ -100,7 +100,8 @@ public BoundDimFilter(
boolean orderingIsAlphanumeric = this.ordering.equals(StringComparators.ALPHANUMERIC);
Preconditions.checkState(
alphaNumeric == orderingIsAlphanumeric,
"mismatch between alphanumeric and ordering property");
"mismatch between alphanumeric and ordering property"
);
}
}
this.extractionFn = extractionFn;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,9 @@ public static Grouper.BufferComparator makeNullHandlingBufferComparatorForNumeri

private static boolean isPrimitiveComparable(boolean pushLimitDown, @Nullable StringComparator stringComparator)
{
return !pushLimitDown || stringComparator == null || stringComparator.equals(StringComparators.NUMERIC);
return !pushLimitDown
|| stringComparator == null
|| stringComparator.equals(StringComparators.NUMERIC)
|| stringComparator.equals(StringComparators.NATURAL);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1116,7 +1116,8 @@ private static int compareDimsInRowsWithAggs(
final StringComparator comparator = comparators.get(i);

final ColumnType fieldType = fieldTypes.get(i);
if (fieldType.isNumeric() && comparator.equals(StringComparators.NUMERIC)) {
if (fieldType.isNumeric()
&& (comparator.equals(StringComparators.NUMERIC) || comparator.equals(StringComparators.NATURAL))) {
// use natural comparison
if (fieldType.is(ValueType.DOUBLE)) {
// sometimes doubles can become floats making the round trip from serde, make sure to coerce them both
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ public static StringComparator fromString(String type)
return StringComparators.NUMERIC;
case StringComparators.VERSION_NAME:
return StringComparators.VERSION;
case StringComparators.NATURAL_NAME:
return StringComparators.NATURAL;
default:
throw new IAE("Unknown string comparator[%s]", type);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
import com.google.common.collect.Ordering;
import com.google.common.primitives.Ints;
import org.apache.druid.common.guava.GuavaUtils;
import org.apache.druid.error.DruidException;
import org.apache.druid.java.util.common.StringUtils;
import org.apache.maven.artifact.versioning.DefaultArtifactVersion;

import java.math.BigDecimal;
Expand All @@ -34,25 +36,28 @@ public class StringComparators
public static final String NUMERIC_NAME = "numeric";
public static final String STRLEN_NAME = "strlen";
public static final String VERSION_NAME = "version";
public static final String NATURAL_NAME = "natural";

public static final StringComparator LEXICOGRAPHIC = new LexicographicComparator();
public static final StringComparator ALPHANUMERIC = new AlphanumericComparator();
public static final StringComparator NUMERIC = new NumericComparator();
public static final StringComparator STRLEN = new StrlenComparator();
public static final StringComparator VERSION = new VersionComparator();
public static final StringComparator NATURAL = new NaturalComparator();

public static final int LEXICOGRAPHIC_CACHE_ID = 0x01;
public static final int ALPHANUMERIC_CACHE_ID = 0x02;
public static final int NUMERIC_CACHE_ID = 0x03;
public static final int STRLEN_CACHE_ID = 0x04;
public static final int VERSION_CACHE_ID = 0x05;
public static final int NATURAL_CACHE_ID = 0x06;

/**
* Comparison using the natural comparator of {@link String}.
*
* Note that this is not equivalent to comparing UTF-8 byte arrays; see javadocs for
* {@link org.apache.druid.java.util.common.StringUtils#compareUnicode(String, String)} and
* {@link org.apache.druid.java.util.common.StringUtils#compareUtf8UsingJavaStringOrdering(byte[], byte[])}.
* {@link StringUtils#compareUnicode(String, String)} and
* {@link StringUtils#compareUtf8UsingJavaStringOrdering(byte[], byte[])}.
*/
public static class LexicographicComparator extends StringComparator
{
Expand Down Expand Up @@ -492,4 +497,51 @@ public byte[] getCacheKey()
return new byte[]{(byte) VERSION_CACHE_ID};
}
}

/**
* NaturalComparator refers to the natural ordering of the type that it refers.
*
* For example, if the type is Long, the natural ordering would be numeric
* if the type is an array, the natural ordering would be lexicographic comparison of the natural ordering of the
* elements in the arrays.
*
* It is a sigil value for the dimension that we can handle in the execution layer, and don't need the comparator for.
* It is also a placeholder for dimensions that we don't have a comparator for (like arrays), but is a required for
* planning
*/
public static class NaturalComparator extends StringComparator
{
@Override
public int compare(String o1, String o2)
{
throw DruidException.defensive("compare() should not be called for the NaturalComparator");
}

@Override
public String toString()
{
return StringComparators.NATURAL_NAME;
}

@Override
public boolean equals(Object o)
{
if (this == o) {
return true;
}
return o != null && getClass() == o.getClass();
}

@Override
public int hashCode()
{
return 0;
}

@Override
public byte[] getCacheKey()
{
return new byte[]{(byte) NATURAL_CACHE_ID};
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,10 @@ public static BoundDimFilter lessThanOrEqualTo(final BoundRefKey boundRefKey, fi

public static BoundDimFilter interval(final BoundRefKey boundRefKey, final Interval interval)
{
if (!boundRefKey.getComparator().equals(StringComparators.NUMERIC)) {
// Interval comparison only works with NUMERIC comparator.
throw new ISE("Comparator must be NUMERIC but was[%s]", boundRefKey.getComparator());
if (!boundRefKey.getComparator().equals(StringComparators.NUMERIC)
&& !boundRefKey.getComparator().equals(StringComparators.NATURAL)) {
// Interval comparison only works with NUMERIC comparator or the NATURAL comparator
throw new ISE("Comparator must be NUMERIC or NATURAL but was[%s]", boundRefKey.getComparator());
}

return new BoundDimFilter(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@
import org.apache.calcite.util.TimestampString;
import org.apache.druid.java.util.common.DateTimes;
import org.apache.druid.java.util.common.IAE;
import org.apache.druid.java.util.common.ISE;
import org.apache.druid.java.util.common.StringUtils;
import org.apache.druid.math.expr.ExpressionProcessing;
import org.apache.druid.math.expr.ExpressionProcessingConfig;
Expand Down Expand Up @@ -208,20 +207,26 @@ public static boolean isLongType(SqlTypeName sqlTypeName)
SqlTypeName.INT_TYPES.contains(sqlTypeName);
}

/**
* Returns the natural StringComparator associated with the RelDataType
*/
public static StringComparator getStringComparatorForRelDataType(RelDataType dataType)
{
final ColumnType valueType = getColumnTypeForRelDataType(dataType);
return getStringComparatorForValueType(valueType);
}

/**
* Returns the natural StringComparator associated with the given ColumnType
*/
public static StringComparator getStringComparatorForValueType(ColumnType valueType)
{
if (valueType.isNumeric()) {
return StringComparators.NUMERIC;
} else if (valueType.is(ValueType.STRING)) {
return StringComparators.LEXICOGRAPHIC;
} else {
throw new ISE("Unrecognized valueType[%s]", valueType);
return StringComparators.NATURAL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, we might need to be careful with this, for the two things that use this, i know bound filters do not support array types, and pretty sure topN doesn't either. That said, bound filters are only used if druid.generic.useDefaultValueForNull=true which is no longer default, or if useBoundsAndSelectors context flag is set to true, so i'm not entirely sure how big of a deal that part is, but because of this filtering with arrays is really only well supported if sql compatible nulls are enabled, since we use equality and range filters which do support arrays instead of selectors and bounds which do not.

Copy link
Contributor Author

@LakshSingla LakshSingla Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I still think it makes sense that we plan the queries through the planner instead of failing them with the exception, and do this handling at the filter level.
NATURAL represents no type. If array ordering is not supported in the topN & the bound filter part, I guess it should be handled already (unless it bypasses it which should be a diff in this patch).
That's why I felt comfortable with this change - The places where we are already hitting this would leave users with an error message. And if we are actually using the comparator.compareTo(), we will get a different error message (which we can then refine at the site where it surfaces from)

The only place where this would change the logic would be where we are setting it to NATURAL and also bypassing it, effectively leaving the orderBy useless, which should be the changes in the diff (where we bypass). I'll revert some equality I did related to the bound filter so that this doesn't happen.

NATURAL for comparator refers to the comparator of the type, therefore the comparator shouldn't be used to gatekeep what types are allowed, there should be clear explicit type checks, I think. Also, as long as we don't bypass the natural comparator inadvertently, we won't be producing incorrect results, and throw back an error to the user. If we do see these cases, we can perhaps handle them at their call site then, instead of here.

}
}

Expand Down