Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualification tool: Add more information for unsupported operators #680

Merged
merged 10 commits into from
Dec 15, 2023

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Dec 7, 2023

This PR fixes #626 and fixes #656

In 626 - Output file had duplicate values once for Exec and another for Expressions.

Exec" "Filter" ""
"Exec" "Filter" "Filter Exec is not supported as expressions are not supported -  `AtLeastNNulls`"

This PR deduplicates the Exec in output files. i.e if an Exec is not supported due to unsupported expressions, then output will contain Exec name along with unsupported expressions. This PR also identifies UDF and dataset operations and includes that as the reason for the unsupported Exec. Also adds a new column "Ignore Operator" which indicates whether the operator is something like a DDL, Table operation which can be ignored. This helps to make sure we document all the unsupported operators in the output file but we can filter the operators to be prioritized based on "Ignore Operator" column.

Sample Output:

App ID,Unsupported Type,Details,Notes,Ignore Operator
"application_1700986078276_0001","Exec","LocalTableScan","","true"
"application_1700986078276_0001","Exec","AdaptiveSparkPlan","","true"
"application_1700986078276_0001","Exec","Scan hive cust2cc_v1.stats","","false"
"application_1700986078276_0001","Exec","SetCatalogAndNamespace","","true"
"application_1700986078276_0001","Exec","Execute CreateDataSourceTableAsSelectCommand","","true"
"application_1700986078276_0001","Exec","Filter","Filter Exec is not supported as this operation is from dataset which is not supported","false"
"application_1700992056158_0001","Exec","BatchEvalPython","BatchEvalPython Exec is not supported as it contains UDF which is not supported","false"
"application_1700992056158_0001","Exec","Filter","Filter Exec is not supported as it contains UDF which is not supported","false"
"application_1700992056158_0001","Exec","Project","Project Exec is not supported as expressions are not supported -  `normalizeIp`","false"
"application_1700992056158_0001","Exec","Filter","Filter Exec is not supported as expressions are not supported -  `validateIP;normalizeIp`","false"
"application_1700992056158_0001","Exec","SortMergeJoin","SortMergeJoin Exec is not supported as expressions are not supported -  `normalizeEmail`","false"
"application_1700992056158_0001","Exec","Sort","Sort Exec is not supported as expressions are not supported -  `normalizeEmail`","false"
"application_1700992056158_0001","Expression","normalizeName","","false"
"application_1700992056158_0001","Expression","normalizeEmail","","false"

For 656 - Generates new output file (rapids_4_spark_qualification_output_unsupportedOperatorsStageDuration.csv) based on comment - #656 (comment) . This file helps to determine wallclockStageDuration to appDuration ratio and corresponding unsupported operators in that stage. This helps in prioritizing the unsupported opeators.

Sample output:

App ID,Unsupported Type,Stage ID,Stage Duration,App Duration,Recommendation,Ignore Operator
"application_1700992056158_0001","LocalTableScan",65,4323,1691909,Recommended,true
"application_1700992056158_0001","Execute InsertIntoHiveTable",107,3940,1691909,Recommended,false
"application_1700992056158_0001","BatchEvalPython",81,141962,1691909,Recommended,false
"application_1700992056158_0001","Filter",81,141962,1691909,Recommended,false
"application_1700992056158_0001","BatchEvalPython",81,141962,1691909,Recommended,false
"application_1700992056158_0001","Project",81,141962,1691909,Recommended,false
"application_1700992056158_0001","Filter",7,77,1691909,Recommended,false
"application_1700992056158_0001","LocalTableScan",7,77,1691909,Recommended,true

@nartal1 nartal1 added feature request New feature or request core_tools Scope the core module (scala) labels Dec 7, 2023
@nartal1 nartal1 self-assigned this Dec 7, 2023
parthosa
parthosa previously approved these changes Dec 13, 2023
Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1.

@nartal1
Copy link
Collaborator Author

nartal1 commented Dec 13, 2023

Thanks @parthosa for the review. I updated the PR based on @viadea 's offline feedback. I have updated the description. PTAL.
@viadea - Would be good to get your feedback.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1

@amahussein amahussein merged commit b54ae84 into NVIDIA:dev Dec 15, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
3 participants