-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Exec Parsers - remove individual parser classes #1396
Conversation
This PR is to remove redundant code in many ExecParsers. We have several ExecParsers with the same functionality in parse code. In this PR, following changes are done: 1. If the Execs don't have expressions, then execName is assigned and ExecInfo object is created from GenericExecParser. 2. If the Execs have expressions then reflection is used to assign the appropriate parser to be used. 3. If there is any additional calculation in the parse function, then only that part is overriden in the ExecParser and the rest of the function uses the GenereicExecParser base code 4.Removed all the files which are not needed anymore Signed-off-by: Niranjan Artal <[email protected]>
Can you please provide more details on the benefits of using a json file for this is better then just having it in the code? The base classes all make sense but having extra json file based on what I see implemented here seems like extra complexity that I want to understand. |
I added json file so that we have all the Execs information at one place and can add or update the Execs/expressions/configs easily. Also, if there are any fields to be added - like
|
So its definitely an interesting solutions but without further information or reasons for this design we seem to be introducing more complexity, less readable, performance overhead, and losing type checking and binary compatibility checks for no reasons. I honestly don't even know that you need a map. This can just be a case statement and some subclasses to reduce most of the code duplication. |
There are some other reasons why would need to configure the tools' handling for different execs:
|
I think you might be missing my main point. I'm not against refactoring it to make use of common classes. I mentioned this years ago. I'm questioning the need for a separate json file and using reflection.
I don't know what this means, you have to run the tool to get the output. how does this change magically report output?
I don't understand how this has anything to do with json/reflection vs doing it in code. Either case you are adding a few lines somewhere and do the testing. I guess adding to a json file could be easier but you have the negatives already listed as well. A developer should be easily be able to add a couple lines of code, adding in code does all the proper type checking with less complexity.
I already stated ways to remove the new class requirement. That was mostly done originally to parallelize development, other ways to reduce code without introducing json file and reflection.
This again doesn't require json and reflection to reuse code. The question isn't about refactoring the code, the question is more about adding this json file and reflection. |
Anyway, most important part that we agree on the refactoring. |
…ve Parsers Signed-off-by: Niranjan Artal <[email protected]>
@tgravescs @amahussein - Please take another look and let me know if this approach seems okay. |
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/CustomShuffleReaderExecParser.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/CustomShuffleReaderExecParser.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nartal1
I posted a few questions
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/ObjectHashAggregateExecParser.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/GenericExecParser.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/GenericExecParser.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/GenericExecParser.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/GenericExecParser.scala
Outdated
Show resolved
Hide resolved
1. default Exec name 2. added argument to GenericParser constructor
Signed-off-by: Niranjan Artal <[email protected]>
…-issue-1325-part1
Thanks @amahussein and @tgravescs for the review. I have addressed review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nartal1
LGTME!
This PR contributes to #1325 . This is the first iteration to clean up ExecParsers code.
This pull request refactors several execution parsers in the
planparser
package to use a new generic execution parser class. The changes aim to simplify the codebase by consolidating common logic into a single class,GenericExecParser
.In this PR, following changes are done:
Refactoring to use
GenericExecParser
:core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/CustomShuffleReaderExecParser.scala
: Modified to extendGenericExecParser
instead of implementing its own parsing logic.core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/GenericExecParser.scala
: Added a newGenericExecParser
class to handle common parsing logic for execution nodes.Removal of Individual Parser Classes:
AggregateInPandasExecParser
,ArrowEvalPythonExecParser
,CartesianProductExecParser
,CoalesceExecParser
,CollectLimitExecParser
,ExpandExecParser
,FilterExecParser
,FlatMapGroupsInPandasExecParser
,GenerateExecParser
, and others. These classes were responsible for parsing specific execution nodes but are now replaced by a more generic approach. [1] [2] [3] [4] [5] [6] [7] [8] [9]These changes streamline the parser configuration process, reduce redundancy, and make it easier to manage and extend the parser functionality.