Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing optional replaceMissingValueWith in lookup function and macros #14956

Merged

Conversation

pranavbhole
Copy link
Contributor

@pranavbhole pranavbhole commented Sep 9, 2023

Description

Adding the optional 3rd argument replaceMissingValueWith to lookup extractor function that replaces the missing value from given lookups. Default value for replaceMissingValueWith is null. It is already supported in native layer. Just exposing this value to populated in SQL layer as well. On same line, updated the lookup macros to support replaceMissingValueWith.

Release note

Lookup function also accepts the 3rd argument called '$replaceMissingValueWith' as constant string, if you value is missing given lookups for queried key then lookup function return result value from replaceMissingValueWith eg:

LOOKUP(store, 'store_to_country', 'NA')

If value is missing from store_to_country lookup for given key 'store' then it will return 'NA'.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -62,6 +62,12 @@ SELECT
FROM sales
GROUP BY 1
```
Lookup function also accepts the 3rd argument called `$replaceMissingValueWith` as constant string, if you value is missing given lookups for queried key then lookup function return result value from `replaceMissingValueWith`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. The lookup function also accepts the 3rd argument called $replaceMissingValueWith as a constant string. If your value is missing a lookup for the queried key, the lookup function returns the result value from replaceMissingValueWith

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (args.size() > 2) {
final Expr missingValExpr = args.get(2);
if (missingValExpr.isLiteral()) {
return missingValExpr.getLiteralValue().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q. when used in sqlCompatibleMode if there is a value like empty string should it be converted to null ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, regardless of the sqlCompatibleMode, replaceMissingValueWith is literal value passed by user and feature expect to replace it as it is.

@pranavbhole pranavbhole force-pushed the replaceMissingValueWithLookup branch 5 times, most recently from 72624ed to 0eb390c Compare September 11, 2023 18:56
@@ -54,10 +54,11 @@ public String name()
@Override
public Expr apply(final List<Expr> args)
{
validationHelperCheckArgumentCount(args, 2);
validationHelperCheckMinArgumentCount(args, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use maybe validationHelperCheckArgumentRange since it requires 2 or 3 arguments exactly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return missingValExpr.getLiteralValue().toString();
}
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you are not going to handle a dynamic argument for the 'missing value' parameter, this should be an exception if the argument is not a literal. There is validationHelperCheckArgIsLiteral that can be called if we only want to support this argument being a constant.

Silently returning null seems incorrect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added validationHelperCheckArgIsLiteral check

Comment on lines +65 to +70
The lookup function also accepts the 3rd argument called `replaceMissingValueWith` as a constant string. If your value is missing a lookup for the queried key, the lookup function returns the result value from `replaceMissingValueWith`
For example:
```
LOOKUP(store, 'store_to_country', 'NA')
```
If value is missing from `store_to_country` lookup for given key 'store' then it will return `NA`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably indicate that this optional 3rd argument must be a constant.

math-expr.md, sql-functions.md, and sql-scalar.md should also be updated since they all document this function signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@pranavbhole pranavbhole force-pushed the replaceMissingValueWithLookup branch 3 times, most recently from c5ab145 to d112231 Compare September 12, 2023 06:43
Copy link
Contributor

@soumyava soumyava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Test
public void testTooFewArgs()
{
expectException(IllegalArgumentException.class, "Function[lookup] requires 2 to 3 arguments");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suppose if the range is 1 we could change this messaging to be like 2 or 3 but we don't have to worry about that in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Range 1, will also throw the same message.

if (args.size() > 2) {
final Expr missingValExpr = args.get(2);
validationHelperCheckArgIsLiteral(missingValExpr, "third argument");
return String.valueOf(missingValExpr.getLiteralValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, sorry I didn't notice this earlier, I think this should use Evals.asString so that if the value is null it doesn't become "null" as in https://github.com/apache/druid/pull/14956/files#diff-989c5e92f71c63c48629a62588a78bec198184ff896791058509176dffe91338R54

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, retained the original expression here. also checking literal in macro and added test to make sure that null literalValue is passed correctly same as native layer.

Comment on lines 114 to 117
if (missingValExpr.isLiteral()) {
return missingValExpr.getLiteralValue().toString();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should maybe have a similar validation to the native layer to make sure that the argument doesn't silently become null in the event that the expression is null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test in CalciteQueryTest to make sure that null is wired up correctly to native extraction fn.
modified the returned value to(String) missingValExpr.getLiteralValue()

if (inputExpressions.size() > 2) {
final Expr missingValExpr = plannerContext.parseExpression(inputExpressions.get(2).getExpression());
if (missingValExpr.isLiteral()) {
return missingValExpr.getLiteralValue().toString();
Copy link
Member

@clintropolis clintropolis Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this argument be null? (add a test with the argument as NULL if it parses correctly, ignore if not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can be null, newly added test validates the correct behavior.
Thank you

SqlTypeFamily.CHARACTER,
SqlTypeFamily.CHARACTER
)
.requiredOperandCount(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the lookup expr only supports a literal "missing value replacement", this should include .literalOperands(2) (meaning operand 2 must be a literal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -98,7 +98,7 @@ String functions accept strings, and return a type appropriate to the function.
|`CHAR_LENGTH(expr)`|Alias for `LENGTH`.|
|`CHARACTER_LENGTH(expr)`|Alias for `LENGTH`.|
|`STRLEN(expr)`|Alias for `LENGTH`.|
|`LOOKUP(expr, lookupName)`|Look up `expr` in a registered [query-time lookup table](lookups.md). Note that lookups can also be queried directly using the [`lookup` schema](sql.md#from).|
|`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a registered [query-time lookup table](lookups.md). Note that lookups can also be queried directly using the [`lookup` schema](sql.md#from).|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the text to say what replaceMissingValueWith does, and to point out that it must be a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

FN_NAME,
arg.stringify(),
lookupExpr.stringify(),
replaceMissingValueWith
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle escaping properly. replaceMissingValueWith might contain ' itself, or newlines, etc.

The simplest way to fix this is to keep the expr around and stringify() it here, like we do with lookupExpr (which is also required to be a literal).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@pranavbhole pranavbhole force-pushed the replaceMissingValueWithLookup branch from d112231 to 3fbd640 Compare October 1, 2023 05:03
@pranavbhole pranavbhole force-pushed the replaceMissingValueWithLookup branch from 3fbd640 to bd0cac9 Compare October 1, 2023 05:49
@@ -82,7 +82,7 @@ The following built-in functions are available.
|concat|concat(expr, expr...) concatenate a list of strings|
|format|format(pattern[, args...]) returns a string formatted in the manner of Java's [String.format](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#format-java.lang.String-java.lang.Object...-).|
|like|like(expr, pattern[, escape]) is equivalent to SQL `expr LIKE pattern`|
|lookup|lookup(expr, lookup-name) looks up expr in a registered [query-time lookup](../querying/lookups.md)|
|lookup|lookup(expr, lookup-name[,replaceMissingValueWith]) looks up expr in a registered [query-time lookup](../querying/lookups.md)|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should probably specify replaceMissingValueWith must be a constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

docs/querying/sql-functions.md Outdated Show resolved Hide resolved
docs/querying/sql-scalar.md Show resolved Hide resolved
@@ -69,7 +70,9 @@ public Expr apply(final List<Expr> args)
lookupExtractorFactoryContainerProvider,
lookupName,
false,
null,
replaceMissingValueWith != null && replaceMissingValueWith.isLiteral()
? (String) replaceMissingValueWith.getLiteralValue()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose since other types can also be literals it might be safer to use Evals.asString instead of casting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (inputExpressions.size() > 2) {
final Expr missingValExpr = plannerContext.parseExpression(inputExpressions.get(2).getExpression());
if (missingValExpr.isLiteral()) {
return (String) missingValExpr.getLiteralValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otoh, this one is probably fine as is and doesn't need to use Evals.asString like the native layer should since the operand type checker specifies the 3rd argument as character type, though it would also probably be harmless

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

.build()
),
ImmutableList.of(
new Object[]{"Missing_Value", NullHandling.sqlCompatible() ? null : "", 5L},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use NullHandling.defaultStringValue()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@clintropolis clintropolis merged commit f1edd67 into apache:master Oct 3, 2023
74 checks passed
@pranavbhole pranavbhole deleted the replaceMissingValueWithLookup branch October 3, 2023 01:20
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
ektravel pushed a commit to ektravel/druid that referenced this pull request Oct 16, 2023
…os (apache#14956)

* Exposing optional replaceMissingValueWith in lookup function and macros

* args range validation

* Updating docs

* Addressing comments

* Update docs/querying/sql-scalar.md

Co-authored-by: Clint Wylie <[email protected]>

* Update docs/querying/sql-functions.md

Co-authored-by: Clint Wylie <[email protected]>

* Addressing comments

---------

Co-authored-by: Clint Wylie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants