-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL Feature extention for SSB #940
Conversation
The SSB Benchmark uses strings for comparisions, which this implementation supports, by useing the same infrastructure the comparisions used before. This does not include type checking, such that this general implementation might make it possible for unexpected errors. For instance the SQL Grammar allows a multiplication between a String and an Integer. For handeling of such an Error I trust the DAPHNE infrastructure.
The current implementation is really inefficient because it evaluates first a <= x <= b followed by a >= x >= b. There has to be a better solution for this kind of problem. However for the first step, makeing the ssb run in DAPHNE, this solution is acceptable
The Problem is in a query conatining ... x Between a AND b AND y < 5; The First AND binds to hard makeing it x > (a AND b) AND x < (y < 5) even tho we want x > a AND x < b AND y < 5. This splits the general Expr up, and makes some generalized behaviour impossible but fixes this particular Bug. I will look into a different Solution for this problem
Test for String and Between support where added. The test writing exposed an already an existing issue for the current implementation, which lead to a quick fix. Issue #203 is a persistent issue for the sql parser, for instance in the where clause it is legal to: f.a < 1 however it is illegal to 1 > f.a, because the parser can't diferentiate between the two. This had an implication for between, since in the code gen the objects where switched and the operator state the same. This got now rectified, such that LessEqual and GreaterEqual Ops get used instead of just LessEqual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution, @DerSchmidt! Overall, the code looks good to me. However, some points need to be addressed:
Necessary changes: (need to be addressed before the merge)
- I don't fully understand why you created a separate
generalExpr2
in the SQL grammar. At a quick glance, it seems to me like the only effect is to avoid nested use ofBETWEEN
. Though nestedBETWEEN
doesn't seem to be very useful, I don't see why we should prohibit expressions likea BETWEEN 0 AND (3 * CAST((2 BETWEEN 1 AND 3) AS INT))
. At the same time, the parser code seems to get more complicated through the additionalgeneralExpr2
(e.g., duplication of parantheses expression forgeneralExpr
andgeneralExpr2
). Can't we treatBETWEEN
like any other boolean operation? I would just like to clarify that point, if there is a good reason, we can also leave it as it is. - The SQL grammer now supports string literals and they can contain escaped special characters. However, the SQL parser (
visitiLiteral()
) does not replace these escaped characters. For instance,"ab\"cd"
would remainab\"cd
, but should becomeab"cd
. Please either add the unescaping to the SQL parser (have a look at how the the DaphneDSL parser does it) or remove the escapes from the SQL grammar. - Test cases,
*.daphne
files: Please update the comment line at the top of each file to describe what the test case does.
Optional changes: (could be addressed now or after the merge)
- All your test cases for strings in SQL contain only strings with exactly one character. To be on the safe side, it would be a good idea to also involve some other cases, such as multiple characters, empty strings, strings containing escaped special characters.
Besides that, thanks for pointing out #203 in this context. I saw in your code that you circumvented this problem by creating the elementwise comparison operations with a suitable operand order. #203 is referenced by a commit that fixes the problem for some binary operators. I think similar fixes could be implemented for >=
and <=
, e.g., X >= c
to not (c < X)
. Feel free to give it a try if you like (but not required for this PR).
Thank you @pdamme, Concerning 1: Concerning 2: Concerning 3 and the Optional Changes: I am now aware of #203, and as soon as I am free I might address it, but as you said, it is outside the scope of this PR. |
@pdamme can you maybe have a quick look again? |
- Removed superfluous parentheses in FLOAT_LITERAL. - Edited comment lines in test scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all requested required and optional points, @DerSchmidt. This PR is ready to be merged now.
This PR adds String Support to the SQL Parser and makes it possible to use BETWEEN. Both features are commonly needed for benchmarks like SSB
It is now possible to use Strings in the SQL Parser, for this the LEXER and Parser needed to be adjusted to reflect Strings. Furthermore, the Visitor got extended by a String builder. All the other support gets provided by the DAPHNE infrastructure. Strings as per SQL standard are using single quotes, given that most SQL database servers also support double-quoted string literals, we opted to support them too. Because the sqlOp has a String as an input, we have to escape these double-quotes.
--
result = sql("SELECT ... FROM ... WHERE ... AND p.P_CATEGORY = 'MFGR#12' ... ;" );
--
result = sql("SELECT ... FROM ... WHERE ... AND p.P_CATEGORY = \"MFGR#12\" ... ;" );
Furthermore, we now support BETWEEN. This is a simple implementation to simply enable BETWEEN and not be performant. The Between operation generates four comparisons operations, two and operations and one or operation. It checks both
a <= x <= b
andb <= x <= a
.During the implementation of between, it became clear that, the SQL parser is vulnerable to the Issue #203. This PR doesn't resolve this issue but tries to provide some renewed attention to this issue.