-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync plugin support as of 2024-12-31 #1478
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: cindyyuanjiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cindyyuanjiang for working on that.
It is important to stay up-to-date with the plugin since many of the new operators are affecting our customers.
I updated the issue description with the latest sync-pugin job. So, will be nice to include the remaining execs/expression if they are easy to knock off.
@@ -1,2 +1,2 @@ | |||
App Name,App ID,SQL DF Duration,SQL Dataframe Task Duration,App Duration,GPU Opportunity,Executor CPU Time Percent,SQL Ids with Failures,Unsupported Read File Formats and Types,Unsupported Write Data Format,Complex Types,Nested Complex Types,Potential Problems,Longest SQL Duration,SQL Stage Durations Sum,NONSQL Task Duration Plus Overhead,Unsupported Task Duration,Supported SQL DF Task Duration,App Duration Estimated,Unsupported Execs,Unsupported Expressions,Estimated Job Frequency (monthly),Total Core Seconds | |||
"Spark shell","local-1624371544219",4575,20421,175293,1523,72.15,"","JSON[string:double:date:int:bigint];Text[*]","JSON","","","",1859,5372,176916,13622,6799,false,"CollectLimit;Scan text;Execute InsertIntoHadoopFsRelationCommand json;Scan json","",30,2096 | |||
"Spark shell","local-1624371544219",4575,20421,175293,4365,72.15,"","Text[*]","JSON","","","",1859,5372,176916,938,19483,false,"CollectLimit;Scan text;Execute InsertIntoHadoopFsRelationCommand json","",30,2096 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: the legacy behavior was identifying JSON[string:double:date:int:bigint]
as unsupported.
If we are checking that the datatypes is supported, what about BigInt
? I don't see it in the datasource columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to investigate into this. This is the results from Qual tool after we updated the plugin changes. I am not certain why bigint
is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps bigInt
has never been one of the columns to begin with.
There are two main questions here to answer:
- I expected this readFormat to be unsupported based on the current columns of the CSV files. Why the new output indicates that this is a supported read although it has a field in the schema that is not supported in the CSV files (
BigInt
)? @nartal1 can help in confirming what is the correct behavior. - How should we handle
BigInt
? We can reachout to Bobby to confirm whether:- It needs to be added to the columns. In that case, the fix needs to be done on the plugin side; or
- It should be mapped to another datatype like
Int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is probably a bug because if BigInt is not a valid column, then the readSchema should have been considered unsupported.
For the second part: The correct way to handle that is to map the datatypes to their aliases. I filed a new issue #1492 for that.
@@ -1169,6 +1169,103 @@ class SQLPlanParserSuite extends BasePlanParserSuite { | |||
} | |||
} | |||
|
|||
test("MonthsBetween is supported in ProjectExec") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following 3 tests are very identical.
Is it possible to create a parameterized test for all of them?
TrampolineUtil.withTempDir { parquetoutputLoc => | ||
TrampolineUtil.withTempDir { eventLogDir => | ||
val (eventLog, _) = ToolTestUtils.generateEventLog(eventLogDir, | ||
"ProjectExprsSupported") { spark => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProjectExprsSupported
-> if we end up having 3 unit tests, it will be nice to have different names for the app. Different names for unit-tests help when we debug and extract the eventlogs for further debugging and testing. the app name then will be used to link back to the unit-test that created that eventlog.
WIP.
Fixes #1452