Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Feature extention for SSB #940

Merged
merged 13 commits into from
Jan 28, 2025
Merged

SQL Feature extention for SSB #940

merged 13 commits into from
Jan 28, 2025

Conversation

DerSchmidt
Copy link
Collaborator

This PR adds String Support to the SQL Parser and makes it possible to use BETWEEN. Both features are commonly needed for benchmarks like SSB

  • It is now possible to use Strings in the SQL Parser, for this the LEXER and Parser needed to be adjusted to reflect Strings. Furthermore, the Visitor got extended by a String builder. All the other support gets provided by the DAPHNE infrastructure. Strings as per SQL standard are using single quotes, given that most SQL database servers also support double-quoted string literals, we opted to support them too. Because the sqlOp has a String as an input, we have to escape these double-quotes.
    -- result = sql("SELECT ... FROM ... WHERE ... AND p.P_CATEGORY = 'MFGR#12' ... ;" );
    -- result = sql("SELECT ... FROM ... WHERE ... AND p.P_CATEGORY = \"MFGR#12\" ... ;" );

  • Furthermore, we now support BETWEEN. This is a simple implementation to simply enable BETWEEN and not be performant. The Between operation generates four comparisons operations, two and operations and one or operation. It checks both a <= x <= b and b <= x <= a.

During the implementation of between, it became clear that, the SQL parser is vulnerable to the Issue #203. This PR doesn't resolve this issue but tries to provide some renewed attention to this issue.

The SSB Benchmark uses strings for comparisions, which
this implementation supports, by useing the same
infrastructure the comparisions used before.
This does not include type checking, such that this
general implementation might make it possible for
unexpected errors. For instance the SQL Grammar allows
a multiplication between a String and an Integer.
For handeling of such an Error I trust the DAPHNE
infrastructure.
The current implementation is really inefficient because
it evaluates first a <= x <= b followed by a >= x >= b.
There has to be a better solution for this kind of problem.
However for the first step, makeing the ssb run in DAPHNE,
this solution is acceptable
The Problem is in a query conatining ... x Between a AND b AND y < 5;
The First AND binds to hard makeing it x > (a AND b) AND x < (y < 5)
even tho we want x > a AND x < b AND y < 5.
This splits the general Expr up, and makes some generalized behaviour
impossible but fixes this particular Bug. I will look into a different
Solution for this problem
Test for String and Between support where added.
The test writing exposed an already an existing issue for the current implementation,
which lead to a quick fix.
Issue #203 is a persistent issue for the sql parser,
for instance in the where clause it is legal to: f.a < 1 however it is illegal to 1 > f.a,
because the parser can't diferentiate between the two. This had an implication for between,
since in the code gen the objects where switched and the operator state the same. This got
now rectified, such that LessEqual and GreaterEqual Ops get used instead of just LessEqual.
@pdamme pdamme self-requested a review January 15, 2025 23:27
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution, @DerSchmidt! Overall, the code looks good to me. However, some points need to be addressed:

Necessary changes: (need to be addressed before the merge)

  1. I don't fully understand why you created a separate generalExpr2 in the SQL grammar. At a quick glance, it seems to me like the only effect is to avoid nested use of BETWEEN. Though nested BETWEEN doesn't seem to be very useful, I don't see why we should prohibit expressions like a BETWEEN 0 AND (3 * CAST((2 BETWEEN 1 AND 3) AS INT)). At the same time, the parser code seems to get more complicated through the additional generalExpr2 (e.g., duplication of parantheses expression for generalExpr and generalExpr2). Can't we treat BETWEEN like any other boolean operation? I would just like to clarify that point, if there is a good reason, we can also leave it as it is.
  2. The SQL grammer now supports string literals and they can contain escaped special characters. However, the SQL parser (visitiLiteral()) does not replace these escaped characters. For instance, "ab\"cd" would remain ab\"cd, but should become ab"cd. Please either add the unescaping to the SQL parser (have a look at how the the DaphneDSL parser does it) or remove the escapes from the SQL grammar.
  3. Test cases, *.daphne files: Please update the comment line at the top of each file to describe what the test case does.

Optional changes: (could be addressed now or after the merge)

  1. All your test cases for strings in SQL contain only strings with exactly one character. To be on the safe side, it would be a good idea to also involve some other cases, such as multiple characters, empty strings, strings containing escaped special characters.

Besides that, thanks for pointing out #203 in this context. I saw in your code that you circumvented this problem by creating the elementwise comparison operations with a suitable operand order. #203 is referenced by a commit that fixes the problem for some binary operators. I think similar fixes could be implemented for >= and <=, e.g., X >= c to not (c < X). Feel free to give it a try if you like (but not required for this PR).

@DerSchmidt
Copy link
Collaborator Author

DerSchmidt commented Jan 16, 2025

Thank you @pdamme,

Concerning 1:
I needed to add generalExpr2 because otherwise the AND in BETWEEN wouldn't bind strongly enough.
So, for instance: ... f.a BETWEEN 1 AND 3 AND f.c < 5 where the second AND is just another filter. ANTLR parsed this, no matter the order in of the BETWEEN and AND rules, as: f.a BETWEEN (1 AND 3) AND f.c < 5 which results obviously in a wrong result (The wanted parsing would be: (f.a BETWEEN 1 AND 3) AND f.c < 5 ). The introduction of generalExpr2 fixes this issue, since we don't have any Boolean Operators inside generalExpr2.
Fixing nested BETWEEN's is simply a happy accident.
There might be a better solution for this problem, but this was the least invasive step I could think of.

Concerning 2:
I applied the code from the DSL Parser onto the SQL Parser. See f86b9e2

Concerning 3 and the Optional Changes:
I updated the Comments on top of the tests to correctly reflect what is tested.
For completeness’s sake, two test cases were added for multi character strings. See 2635ae1

I am now aware of #203, and as soon as I am free I might address it, but as you said, it is outside the scope of this PR.

@DerSchmidt DerSchmidt requested a review from pdamme January 16, 2025 09:32
@DerSchmidt
Copy link
Collaborator Author

@pdamme can you maybe have a quick look again?
Thank you very much!

- Removed superfluous parentheses in FLOAT_LITERAL.
- Edited comment lines in test scripts.
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all requested required and optional points, @DerSchmidt. This PR is ready to be merged now.

@pdamme pdamme merged commit e6fd3e7 into main Jan 28, 2025
3 checks passed
@DerSchmidt DerSchmidt deleted the ssb_sql_extension branch January 28, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants