Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add implicit casting to TypeSignature::String #13404

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jonathanc-n
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

#13402 #13394 were running into some problems with implicit casting due to TypeSignature::String not dealing with implicit casting properly

What changes are included in this PR?

Added implicit casting to TypeSignature::String

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions labels Nov 14, 2024
@Omega359
Copy link
Contributor

Please remove the changes to regexp_match as that change adds in support for utf8view which breaks that function (see ci output). It's a much larger issue than those changes. See #11911 (comment) for some details.

@jonathanc-n
Copy link
Contributor Author

Will fix this tommorow.

@jonathanc-n jonathanc-n marked this pull request as draft November 14, 2024 03:33
@jonathanc-n jonathanc-n marked this pull request as ready for review November 14, 2024 19:08
NULL

query T
SELECT ltrim(12345, '1')
Copy link
Contributor

@jayzhan211 jayzhan211 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuckDB doesn't allow this query 🤔
Numeric string casting is quite confusing to me

D SELECT ltrim(12345, '1');
Binder Error: No function matches the given name and argument types 'ltrim(INTEGER_LITERAL, STRING_LITERAL)'. You might need to add explicit type casts.
	Candidate functions:
	ltrim(VARCHAR) -> VARCHAR
	ltrim(VARCHAR, VARCHAR) -> VARCHAR

LINE 1: SELECT ltrim(12345, '1');

Same with postgres

postgres=# SELECT ltrim(12345, '1');
ERROR:  function ltrim(integer, unknown) does not exist
LINE 1: SELECT ltrim(12345, '1');
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

@jayzhan211 jayzhan211 marked this pull request as draft November 19, 2024 06:02
@jayzhan211
Copy link
Contributor

Mark this as draft as it is not ready for review

@jonathanc-n
Copy link
Contributor Author

@alamb @jayzhan211 I'm a bit confused with this pr, are we looking to support implicit casting with TypeSignature or not? As can be seen above, this behaviour seems to have been deprecated in DuckDB.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Nov 26, 2024

this behaviour seems to have been deprecated in DuckDB.

I think we still prefer the actual result in Postgres and DuckDB

@jonathanc-n
Copy link
Contributor Author

@jayzhan211 So should this PR be closed? Since there is no need for implicit casting in TypeSignature::String? I just worked on this because of your comment on implicit casting in #13301

@jayzhan211
Copy link
Contributor

@jayzhan211 So should this PR be closed? Since there is no need for implicit casting in TypeSignature::String? I just worked on this because of your comment on implicit casting in #13301

Sure.

@jayzhan211 jayzhan211 closed this Nov 26, 2024
@Omega359
Copy link
Contributor

I don't agree with the decision to not support implicit coercion to string for function arguments.

Firstly, it's a regression from currently supported features and will break things for users, myself included.

Secondly Postgresql's requirement for explicit casting for so many things imho is one of it's most annoying 'features' and is generally not required by just about every other database out there. The reasoning given for the DuckDB decision was nothing to do with functions and everything to do with casting issues with general sql and comparisons which I tend to agree with. See the PR for details.

select length(42:VARCHAR);

should not be required.

select length(42);

should just work.

@alamb
Copy link
Contributor

alamb commented Nov 26, 2024

Firstly, it's a regression from currently supported features and will break things for users, myself included.

For this reason alone I agree , and it would definitely fall into the category of implicit semantic changes causing troubles described on

I don't fully understand the current state -- is there a regression between 43 and 44 (not yet released)? If so, is it tracked with a ticket?

@Omega359
Copy link
Contributor

Omega359 commented Nov 26, 2024

The regression I think was in 43 but I am not 100% sure. I just built 42 off the tag and ran it in docker:

root@f93d03a93bc1:/apache_datafusion# docker run --rm -it datafusion-cli_42
DataFusion CLI v42.2.0
> select ascii(42);
+------------------+
| ascii(Int64(42)) |
+------------------+
| 52               |
+------------------+
1 row(s) fetched. 
Elapsed 0.001 seconds.

The slt test I just added in main:

External error: query failed: DataFusion error: Error during planning: Error during planning: The signature expected NativeType::String but received NativeType::Int64 No function matches the given name and argument types 'ascii(Int64)'. You might need to add explicit type casts.
        Candidate functions:
        ascii(String(1))
[SQL] SELECT ascii(2)
at test_files/expr.slt:322

I suspect any function switched in #12751 will no longer work with any type coercion.

Edit: just verified the above sql works with DF 41 as well.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Nov 29, 2024

@Omega359
How about we make this configurable?
Enable implicit coercion if we want the ease of use and the casting cost is acceptable, disable it if we prefer explicit casting without any surprise? By default, we disable implicit coercion to be consistent with Postgres and DuckDB

@jonathanc-n
Copy link
Contributor Author

@jayzhan211 That sounds good. However, I think implicit coercion should be the default or it'll cause regressions for users. Are you able to open this pr back up, i can add the changes.

@jayzhan211 jayzhan211 reopened this Nov 29, 2024
@jayzhan211
Copy link
Contributor

jayzhan211 commented Nov 29, 2024

An alternative approach is that we need to differentiate string literal and varchar like Postgres an DuckDB. Only untyped string literal is able to cast to any other types, so we have the implicit casting for query like select ascii(42); but not select ascii(42::INT)

@jonathanc-n Your solution in this PR might not return error for select ascii(42::INT), you might need to find a way to take string literal as an unknown or untyped.

@Omega359
Copy link
Contributor

@Omega359 How about we make this configurable? Enable implicit coercion if we want the ease of use and the casting cost is acceptable, disable it if we prefer explicit casting without any surprise? By default, we disable implicit coercion to be consistent with Postgres and DuckDB

I am fine with it being configurable but I would suggest that it would be enabled by default. Ideally the behaviour would be tied into the sql dialect used but that I think is something for the future.

@Omega359
Copy link
Contributor

An alternative approach is that we need to differentiate string literal and varchar like Postgres an DuckDB. Only untyped string literal is able to cast to any other types, so we have the implicit casting for query like select ascii(42); but not select ascii(42::INT)

That to me makes perfect sense.

@Omega359
Copy link
Contributor

Omega359 commented Dec 4, 2024

Here is another implicit casting behaviour that changed from 40 -> 43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants