Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce natural comparator for types that don't have a StringComparator #15145

Merged
merged 4 commits into from
Oct 16, 2023

Conversation

LakshSingla
Copy link
Contributor

@LakshSingla LakshSingla commented Oct 12, 2023

Description

Planning queries with ordering on arrays throws an error, since there isn't a StringComparator associated with it. StringComparator is an old concept, when dimensions were strings only, and a string dimension could have its ordering defined based on what the string represented. For example, for numeric strings, it could be set to NUMERIC.

However, in the native engine, the comparators are bypassed altogether if the type of the dimension is the natural comparator for that dimension. In the case of numeric dimensions, the NUMERIC comparator is the natural comparator. SQL always sets the comparator such that it gets bypassed in the native layer, therefore comparators can used only by used via native queries.

With MSQ, we can now have order by's with scan queries on arrays. To represent a natural ordering on a type for which we don't have a defined ordering, we add a NATURAL comparator, which is similar to setting the NUMERIC comparator on the numeric types, i.e. it should get bypassed and the natural ordering should be used. Alternatively is a dummy value for representing StringComparator = null and its a no-op.

Fixes a bug when executing queries like

SELECT int_array FROM foo ORDER BY int_array`

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 12, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
public static StringComparator getStringComparatorForValueType(ColumnType valueType)
{
if (valueType.isNumeric()) {
return StringComparators.NUMERIC;
} else if (valueType.is(ValueType.STRING)) {
return StringComparators.LEXICOGRAPHIC;
} else {
throw new ISE("Unrecognized valueType[%s]", valueType);
return StringComparators.NATURAL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, we might need to be careful with this, for the two things that use this, i know bound filters do not support array types, and pretty sure topN doesn't either. That said, bound filters are only used if druid.generic.useDefaultValueForNull=true which is no longer default, or if useBoundsAndSelectors context flag is set to true, so i'm not entirely sure how big of a deal that part is, but because of this filtering with arrays is really only well supported if sql compatible nulls are enabled, since we use equality and range filters which do support arrays instead of selectors and bounds which do not.

Copy link
Contributor Author

@LakshSingla LakshSingla Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I still think it makes sense that we plan the queries through the planner instead of failing them with the exception, and do this handling at the filter level.
NATURAL represents no type. If array ordering is not supported in the topN & the bound filter part, I guess it should be handled already (unless it bypasses it which should be a diff in this patch).
That's why I felt comfortable with this change - The places where we are already hitting this would leave users with an error message. And if we are actually using the comparator.compareTo(), we will get a different error message (which we can then refine at the site where it surfaces from)

The only place where this would change the logic would be where we are setting it to NATURAL and also bypassing it, effectively leaving the orderBy useless, which should be the changes in the diff (where we bypass). I'll revert some equality I did related to the bound filter so that this doesn't happen.

NATURAL for comparator refers to the comparator of the type, therefore the comparator shouldn't be used to gatekeep what types are allowed, there should be clear explicit type checks, I think. Also, as long as we don't bypass the natural comparator inadvertently, we won't be producing incorrect results, and throw back an error to the user. If we do see these cases, we can perhaps handle them at their call site then, instead of here.

@LakshSingla
Copy link
Contributor Author

Thanks for the review @clintropolis

@LakshSingla LakshSingla merged commit dc8d219 into apache:master Oct 16, 2023
81 checks passed
LakshSingla added a commit to LakshSingla/druid that referenced this pull request Oct 16, 2023
…ator (apache#15145)

Fixes a bug when executing queries with the ordering of arrays
@LakshSingla LakshSingla deleted the natural-comparator branch October 16, 2023 05:53
cryptoe pushed a commit that referenced this pull request Oct 16, 2023
…ator (#15145) (#15166)

Fixes a bug when executing queries with the ordering of arrays
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
…ator (apache#15145)

Fixes a bug when executing queries with the ordering of arrays
CaseyPan pushed a commit to CaseyPan/druid that referenced this pull request Nov 17, 2023
…ator (apache#15145)

Fixes a bug when executing queries with the ordering of arrays
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants