-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] The canonicalized version of GpuFileSourceScanExec
s that suppose to be semantic-equal can be different
#10136
Comments
GpuFileSourceScanExec
s that suppose to be semantic-equal may not be equal
GpuFileSourceScanExec
s that suppose to be semantic-equal may not be equal GpuFileSourceScanExec
s that suppose to be semantic-equal may be unequal
GpuFileSourceScanExec
s that suppose to be semantic-equal may be unequal GpuFileSourceScanExec
s that suppose to be semantic-equal may be different
GpuFileSourceScanExec
s that suppose to be semantic-equal may be different GpuFileSourceScanExec
s that suppose to be semantic-equal can be different
I'm trying to understand the problem. Is it that the partition column |
The problem is that the GpuFileSourceScanExec node doesn't canonicalize properly so semantically identical copies are recognized as such. The same partition columns are pruned from both, but the issue is that there are filter expressions that are not getting canonicalized properly. There's a bug in the expression lists we're passing to canonicalization steps for the partition and data filter parameters. That bug causes us to potentially skip some filter expressions that need canonicalization, and that in turn causes the semantic equivalence comparison to fail. |
Column But column |
The canonicalized version of
GpuFileSourceScanExec
s that suppose to be semantic-equal may not be equal to each other after theprunePartitionForFileSourceScan
rule applies for some cases.Because this rule will remove some partition columns that are not used by the first downstream
ProjectExec
for some patterns for optimization, leading to some partition columns not exist in the finalized output. Then theAttributeReference
s in some filters(e.g.partitionFilters
) but excluded from the finalized output will not be canonicalized correctly.Repro steps:
Launch a GPU spark session, and run
You will get
But they should be equal to each other.
The text was updated successfully, but these errors were encountered: