-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip processing apps with invalid platform and spark runtime configurations #1421
Conversation
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a discussion on the issue. I am not convinced that this is the best way the tools should behave.
# Conflicts: # core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
@amahussein Updated the behavior to skip processing apps that have a runtime not supported by the platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa
I was thinking about postponing the creation of the platform as much as possible. The platform would be created after there is enough information to decide which platform it is. If not, then we use the argument.
This is a risky change and it requires some considerable amount of testing; especially that we need to revisit the python side.
- At least the platform argument in python would have to be "non-optional". This will reduce the possibility of user hitting the problem because of a wrong CLI guess
- Revisit the workflow from python, to Scala, to AutoTuner, then back to Python. Previously, we had to initialize the platform prior to processing the eventlogs because we were using the CSP SDK. With the StorageLib in place, we can finally avoid that flow on python side.
- I am also concerned about the impact of that on the user + qualX
We can discuss further online and get feedback from @leewyang about how this would impact QualX
@@ -42,7 +42,8 @@ import org.apache.spark.util.Utils | |||
|
|||
abstract class AppBase( | |||
val eventLogInfo: Option[EventLogInfo], | |||
val hadoopConf: Option[Configuration]) extends Logging | |||
val hadoopConf: Option[Configuration], | |||
val platform: Option[Platform] = None) extends Logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what I exactly thought we are heading to. Creating that platform before processing the eventlog implies that we are not using the information/cluster-detection logic from the eventlog.
Thanks @amahussein for the review. I agree that platform should be decided after processing of the event log. However, I think the concerns related to incorrect platform detection is outside the scope of this issue. This issue aims to solve the problem that if the platform (user provided or detected by our CLI) is incompatible with the Spark Runtime, then we should skip processing it.
|
Based on offline discussion, this requires that the correct platform is always provided. Converting this to draft untill #1462 is merged. |
This PR is ready to be reviewed as #1462 is merged. |
The Error Message when platform is not specified:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTME
Fixes #1420.
This PR updated the tools behavior to skip processing apps that have a runtime not supported by the platform.
Changes
Enhancements to Platform Support:
core/src/main/scala/com/nvidia/spark/rapids/tool/Platform.scala
: Added a default runtime and supported runtimes to thePlatform
class, and introduced a method to check if a given runtime is supported. UpdatedDatabricksPlatform
to includePHOTON
as a supported runtime. [1] [2]Runtime Validation:
core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala
: Added avalidateSparkRuntime
method to ensure the parsed Spark runtime is supported by the platform, throwing anUnsupportedSparkRuntimeException
if not. Updated the constructor to include the platform. [1] [2]Integration with Profiling and Qualification Tools:
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Profiler.scala
: Integrated platform support into theProfiler
class, ensuring that the platform is correctly instantiated and passed toApplicationInfo
. [1] [2]core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala
: Updated to include platform support in theQualificationAppInfo
class.Test
Unit Tests
core/src/test/scala/com/nvidia/spark/rapids/tool/ToolTestUtils.scala
: Modified test utilities to create platform instances based on the provided platform name.core/src/test/scala/com/nvidia/spark/rapids/tool/planparser/BasePlanParserSuite.scala
: Updated test cases to include platform names when creating applications from event logs.core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/AnalysisSuite.scala
: Enhanced test cases to validate platform-specific behavior, including handling of Databricks Photon runtime logs. [1] [2]Behave Tests
user_tools/tests/spark_rapids_tools_e2e/features/event_log_processing.feature
]: Updated Python behave tests to include case for onprem and databricks-aws platforms for photon event logs.