SQL Feature extention for SSB #940

DerSchmidt · 2025-01-14T08:55:15Z

This PR adds String Support to the SQL Parser and makes it possible to use BETWEEN. Both features are commonly needed for benchmarks like SSB

It is now possible to use Strings in the SQL Parser, for this the LEXER and Parser needed to be adjusted to reflect Strings. Furthermore, the Visitor got extended by a String builder. All the other support gets provided by the DAPHNE infrastructure. Strings as per SQL standard are using single quotes, given that most SQL database servers also support double-quoted string literals, we opted to support them too. Because the sqlOp has a String as an input, we have to escape these double-quotes.
-- result = sql("SELECT ... FROM ... WHERE ... AND p.P_CATEGORY = 'MFGR#12' ... ;" );
-- result = sql("SELECT ... FROM ... WHERE ... AND p.P_CATEGORY = \"MFGR#12\" ... ;" );
Furthermore, we now support BETWEEN. This is a simple implementation to simply enable BETWEEN and not be performant. The Between operation generates four comparisons operations, two and operations and one or operation. It checks both a <= x <= b and b <= x <= a.

During the implementation of between, it became clear that, the SQL parser is vulnerable to the Issue #203. This PR doesn't resolve this issue but tries to provide some renewed attention to this issue.

The SSB Benchmark uses strings for comparisions, which this implementation supports, by useing the same infrastructure the comparisions used before. This does not include type checking, such that this general implementation might make it possible for unexpected errors. For instance the SQL Grammar allows a multiplication between a String and an Integer. For handeling of such an Error I trust the DAPHNE infrastructure.

The current implementation is really inefficient because it evaluates first a <= x <= b followed by a >= x >= b. There has to be a better solution for this kind of problem. However for the first step, makeing the ssb run in DAPHNE, this solution is acceptable

The Problem is in a query conatining ... x Between a AND b AND y < 5; The First AND binds to hard makeing it x > (a AND b) AND x < (y < 5) even tho we want x > a AND x < b AND y < 5. This splits the general Expr up, and makes some generalized behaviour impossible but fixes this particular Bug. I will look into a different Solution for this problem

Test for String and Between support where added. The test writing exposed an already an existing issue for the current implementation, which lead to a quick fix. Issue #203 is a persistent issue for the sql parser, for instance in the where clause it is legal to: f.a < 1 however it is illegal to 1 > f.a, because the parser can't diferentiate between the two. This had an implication for between, since in the code gen the objects where switched and the operator state the same. This got now rectified, such that LessEqual and GreaterEqual Ops get used instead of just LessEqual.

pdamme

Thanks for this contribution, @DerSchmidt! Overall, the code looks good to me. However, some points need to be addressed:

Necessary changes: (need to be addressed before the merge)

I don't fully understand why you created a separate generalExpr2 in the SQL grammar. At a quick glance, it seems to me like the only effect is to avoid nested use of BETWEEN. Though nested BETWEEN doesn't seem to be very useful, I don't see why we should prohibit expressions like a BETWEEN 0 AND (3 * CAST((2 BETWEEN 1 AND 3) AS INT)). At the same time, the parser code seems to get more complicated through the additional generalExpr2 (e.g., duplication of parantheses expression for generalExpr and generalExpr2). Can't we treat BETWEEN like any other boolean operation? I would just like to clarify that point, if there is a good reason, we can also leave it as it is.
The SQL grammer now supports string literals and they can contain escaped special characters. However, the SQL parser (visitiLiteral()) does not replace these escaped characters. For instance, "ab\"cd" would remain ab\"cd, but should become ab"cd. Please either add the unescaping to the SQL parser (have a look at how the the DaphneDSL parser does it) or remove the escapes from the SQL grammar.
Test cases, *.daphne files: Please update the comment line at the top of each file to describe what the test case does.

Optional changes: (could be addressed now or after the merge)

All your test cases for strings in SQL contain only strings with exactly one character. To be on the safe side, it would be a good idea to also involve some other cases, such as multiple characters, empty strings, strings containing escaped special characters.

Besides that, thanks for pointing out #203 in this context. I saw in your code that you circumvented this problem by creating the elementwise comparison operations with a suitable operand order. #203 is referenced by a commit that fixes the problem for some binary operators. I think similar fixes could be implemented for >= and <=, e.g., X >= c to not (c < X). Feel free to give it a try if you like (but not required for this PR).

…ests for SQL Strings

DerSchmidt · 2025-01-16T08:35:59Z

Thank you @pdamme,

Concerning 1:
I needed to add generalExpr2 because otherwise the AND in BETWEEN wouldn't bind strongly enough.
So, for instance: ... f.a BETWEEN 1 AND 3 AND f.c < 5 where the second AND is just another filter. ANTLR parsed this, no matter the order in of the BETWEEN and AND rules, as: f.a BETWEEN (1 AND 3) AND f.c < 5 which results obviously in a wrong result (The wanted parsing would be: (f.a BETWEEN 1 AND 3) AND f.c < 5 ). The introduction of generalExpr2 fixes this issue, since we don't have any Boolean Operators inside generalExpr2.
Fixing nested BETWEEN's is simply a happy accident.
There might be a better solution for this problem, but this was the least invasive step I could think of.

Concerning 2:
I applied the code from the DSL Parser onto the SQL Parser. See f86b9e2

Concerning 3 and the Optional Changes:
I updated the Comments on top of the tests to correctly reflect what is tested.
For completeness’s sake, two test cases were added for multi character strings. See 2635ae1

I am now aware of #203, and as soon as I am free I might address it, but as you said, it is outside the scope of this PR.

DerSchmidt · 2025-01-28T06:12:17Z

@pdamme can you maybe have a quick look again?
Thank you very much!

- Removed superfluous parentheses in FLOAT_LITERAL. - Edited comment lines in test scripts.

pdamme

Thanks for addressing all requested required and optional points, @DerSchmidt. This PR is ready to be merged now.

DerSchmidt added 9 commits January 7, 2025 10:30

Added missing test to Execution

23feb84

hopefully resolving clang-format check

046bdbf

hopefully resolving clang-format check 2

495cf15

final resolve of clang-format check

78a11bc

final resolve of clang-format check

3976f30

pdamme self-requested a review January 15, 2025 23:27

pdamme requested changes Jan 15, 2025

View reviewed changes

DerSchmidt added 2 commits January 16, 2025 08:59

Added Correct Escaping of Escape Sequences in the SQL Parser

f86b9e2

Updated Comments on top of the Test cases and added multi Character T…

2635ae1

…ests for SQL Strings

DerSchmidt requested a review from pdamme January 16, 2025 09:32

Fixed an Issue with ' not being correctly escaped.

aba5206

Finalization.

fa39aa3

- Removed superfluous parentheses in FLOAT_LITERAL. - Edited comment lines in test scripts.

pdamme approved these changes Jan 28, 2025

View reviewed changes

pdamme merged commit e6fd3e7 into main Jan 28, 2025
3 checks passed

DerSchmidt deleted the ssb_sql_extension branch January 28, 2025 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Feature extention for SSB #940

SQL Feature extention for SSB #940

DerSchmidt commented Jan 14, 2025

pdamme left a comment

DerSchmidt commented Jan 16, 2025 •

edited

Loading

DerSchmidt commented Jan 28, 2025

pdamme left a comment

SQL Feature extention for SSB #940

SQL Feature extention for SSB #940

Conversation

DerSchmidt commented Jan 14, 2025

pdamme left a comment

Choose a reason for hiding this comment

DerSchmidt commented Jan 16, 2025 • edited Loading

DerSchmidt commented Jan 28, 2025

pdamme left a comment

Choose a reason for hiding this comment

DerSchmidt commented Jan 16, 2025 •

edited

Loading