-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INSERT/REPLACE dimension target column types are validated against source input expressions #15962
Conversation
sql/src/main/java/org/apache/druid/sql/calcite/planner/CalcitePlanner.java
Outdated
Show resolved
Hide resolved
// Scan query lists columns in alphabetical order independent of the | ||
// SQL project list or the defined schema. Here we just check that the | ||
// set of columns is correct, but not their order. | ||
.columns("b", "e", "v0", "v1", "v2", "v3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its hard to interpret this plan like this...what will permute v0
to be the 1st column?
shouldn't that be in the plan? ...or the rename of the columns to their output names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be an existing issue as the comment above describes, the results are still written in the proper order and mapped to the appropriate columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually tested this, and the columns are written in proper order, and results are what you'd expect. It seems that none of the existing dml unit tests test for results; the dml test engine in use does not allow for selects. Maybe an excellent addition in the near future. We can add full integration tests once this feature is more complete. How does that sound to you?
// with any coercions applied. We update the validated node type of the WITH node here so | ||
// that they are consistent. | ||
if (source instanceof SqlWith) { | ||
final RelDataType withBodyType = getValidatedNodeTypeIfKnown(((SqlWith) source).body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is the right approach; is the issue coming from something like:
- the with may supply
null
-s - the target table doesn't accept
null
value-s
I wonder why not make this replaceWithDefault
sacred by adding COALESCE(colVal, 0)
and similar crap; so that calcite is also aware it...the runtime could remove the coalesce
if its know that its pointless and done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this goes beyond null handling; implicitly coerced types coming from calcite's default enabled coercion rules, do not seem to be updated in the WITH node type, only the WITH select body node type. This would still be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be the consequence of the long commented lines around line 223 - about not knowing the types - so just pass unknown -> I think that should be fixed somehow ; and then this wouldn't be needed either...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get around this and the WITH
issue with the following approach:
- override
inferUnknownTypes
node
will have the full expression including the alias -> access by fieldName will be ok;- instead of passing an
unknownType
- I've passed some type with artifically created fields to a referene to the druid table (this could probably be done differently) - this way
validateTargetType
could become kinda part of the validation
however...it have lead to consistency errors as well -> we are reading a select which produces a BIGINT
as DOUBLE
and similar...which is clearly invalid.
I think the right way to fix this will be to add proper rewrites at the right point in the query and let the system do its job - I think that won't be even a hack as opposed to multiple long long comments about doing something which is clearly questionable.
I think to do that a way could be something like:
- override
performUnconditionalRewrites
to be able to processINSERT
nodes as well - identify columns by name from
SqlInsert#source
- rewrite all columns which are of interest with a dummy typed function like
druidDouble
or similar - use the method's validation logic to ensure that the conversion will happen/okay/etc
- the output type of the function will be the support type for that column - so that won't cause any problems
- leave the function there - however as it was just a placeholder it can be removed during native translation without any risks..
colName, | ||
typeFactory.createTypeWithNullability(relType, true) | ||
)); | ||
if (NullHandling.replaceWithDefault()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the resultset may contain null
even in case replaceWithDefault()
is true
fyi: RowSignatures#toRelDataType
creates a nullable strings regardless of replaceWithDefault
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test that includes a String column, and it seems to produce a targetType of VARCHAR NOT NULL when this is false, and a nullable VARCHAR when true, so I think this is handled correctly, but let me know if you think otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for string types, default value mode treats null
and ''
interchangeably, so it probably should be marked as nullable? tbh its not super well defined, but we do always mark string values as nullable in default value mode since the empty strings behave as null.
For numbers though default value mode effectively means that they don't exist.
This default value mode is deprecated now, so we probably don't have to spend too much time thinking about how things should behave, so we should probably match the computed schema stuff (strings always nullable in either mode, numbers only nullable in sql compatible mode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if that's the case the logic should not live here in this file - instead it should be in a more central location.... its bad to see these conditions flow everywhere
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
Show resolved
Hide resolved
/** | ||
* Test the use of catalog specs to drive MSQ ingestion. | ||
*/ | ||
public class CatalogInsertTest extends CalciteCatalogInsertTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are a lots of duplication with CatalogInsertTest
and CatalogReplaceTest
(4 lines differ)
and also CatalogQueryTest
is pretty similar - seems like it has a different method order...
could you put all these into some common place (a rule?) and reuse that everywhere?
import org.apache.druid.sql.calcite.table.DatasourceTable; | ||
import org.apache.druid.sql.calcite.table.DruidTable; | ||
|
||
public class CalciteCatalogIngestionDmlTest extends CalciteIngestionDmlTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why there is no @Test
in this "Test"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a base class, maybe javadoc to explain that would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in that case it should be abstract
and not have a Test
ending ; like TestBase
or something
// with any coercions applied. We update the validated node type of the WITH node here so | ||
// that they are consistent. | ||
if (source instanceof SqlWith) { | ||
final RelDataType withBodyType = getValidatedNodeTypeIfKnown(((SqlWith) source).body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be the consequence of the long commented lines around line 223 - about not knowing the types - so just pass unknown -> I think that should be fixed somehow ; and then this wouldn't be needed either...
colName, | ||
typeFactory.createTypeWithNullability(relType, true) | ||
)); | ||
if (NullHandling.replaceWithDefault()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if that's the case the logic should not live here in this file - instead it should be in a more central location.... its bad to see these conditions flow everywhere
import org.apache.druid.sql.calcite.table.DatasourceTable; | ||
import org.apache.druid.sql.calcite.table.DruidTable; | ||
|
||
public class CalciteCatalogIngestionDmlTest extends CalciteIngestionDmlTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a base class, maybe javadoc to explain that would be useful
} | ||
|
||
@Test | ||
public void testInsertIntoExistingWithIncompatibleTypeAssignment() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be nice to add a few more tests like this to encode some of the behaviors in tests
Description
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, tables that are defined in the catalog will have their simple dimension defined column types validated against source input expressions mapped to them during DML INSERT/REPLACE operations. Complex measure types columns will not be validated at this time, this will come in a follow up pr. Also enforcing sealed / non-sealed mode; if a table is sealed, no undefined columns may be added to the table during ingestion. Also addressing remaining comments from #15836 and #15908.
This PR has: