New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[FEA] Enable AQE related recommendations in Profiler Auto-tuner #688

Merged

amahussein merged 8 commits into NVIDIA:dev from cindyyuanjiang:autotuner-aqe

Jan 11, 2024

Collaborator

cindyyuanjiang commented Dec 12, 2023 •

edited

Loading

Fixes #576

This PR added the following list of settings AQE optimization for auto-tuner:

Spark Property	Recommendation
spark.rapids.sql.batchSizeBytes	Set to just under 2GB (default is 1GB)
spark.sql.adaptive.autoBroadcastJoinThreshold	If the setting is above 100MB (default is 10MB), recommend user to set to a lower number.
spark.sql.adaptive.advisoryPartitionSizeInBytes	If Input Size > 35KB and Shuffle Read > 50KB: - For A100, set to 64MB - For T4, set to 32MB Otherwise, set to 128MB.
spark.sql.adaptive.coalescePartitions.initialPartitionNum	If Input Size > 35KB, Shuffle Read > 50KB and value < 200 (low value) - For A100, set to 400 - For T4, set to 800
spark.sql.adaptive.coalescePartitions.parallelismFirst	If Input Size > 35KB and Shuffle Read > 50KB: - Set to 'false' to prioritize 'advisoryPartitionSizeInBytes' over 'minPartitionSize' for better performance

parthosa force-pushed the autotuner-aqe branch from 6da5607 to 52e5039 Compare

December 12, 2023 21:10

parthosa assigned parthosa and cindyyuanjiang

cindyyuanjiang requested review from amahussein, mattahrens, kuhushukla and nartal1

December 12, 2023 22:04


          Add recommendations in AutoTuner for AQE configs and unit tests

6ec1aa6

Signed-off-by: Partho Sarthi <[email protected]>

parthosa force-pushed the autotuner-aqe branch from 3c945bb to 6ec1aa6 Compare

December 12, 2023 22:12

nartal1 reviewed

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Outdated Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Outdated Show resolved Hide resolved

amahussein requested changes

View reviewed changes

Collaborator

amahussein left a comment

Thanks @parthosa and @cindyyuanjiang !
I made some few comments

Can you please check also how the AQE recommendations look like in the user-tools?
One of the things to keep an eye on is that the tables of the profiler stdout are readable and match the expected from the profiler core .

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Outdated Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Show resolved Hide resolved

core/src/main/scala/org/apache/spark/sql/rapids/tool/ToolUtils.scala Outdated Show resolved Hide resolved

core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/AutoTunerSuite.scala Outdated Show resolved Hide resolved

core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/AutoTunerSuite.scala Outdated Show resolved Hide resolved

parthosa and others added 4 commits

December 18, 2023 16:14


          Address review comments

7b48ab3

Signed-off-by: Partho Sarthi <[email protected]>


          remove magic number for batchSizeBytes

632a93d

Signed-off-by: cindyyuanjiang <[email protected]>


          break long comments into separate lines

5d9b8dc

Signed-off-by: cindyyuanjiang <[email protected]>


          Merge branch 'autotuner-aqe' of https://github.com/cindyyuanjiang/spa…

b379ccf

…rk-rapids-tools into autotuner-aqe

amahussein reviewed

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Outdated Show resolved Hide resolved


          added white space in comment message

14b2dfb

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang mentioned this pull request

[FEA] Enable AQE autoBroadcastJoinThreshold configuration recommendation in Auto-tuner #719

Open

Collaborator Author

cindyyuanjiang commented Jan 8, 2024

Filed follow up issue, tracked here: #719

revans2 requested changes

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Show resolved Hide resolved

revans2 previously approved these changes

View reviewed changes

Collaborator

revans2 left a comment

Switching my review to approved. I saw the benchmark results and I am okay with the change now. I am a little concerned with what happens if the the number of shuffle partitions is very small, or the AQE target shuffle size is very large, or if the maxPartitionBytes is very large. But as long as we have benchmarks that we are running we can improve the benchmarks over time as we see more corner cases from customers show up.

cindyyuanjiang requested a review from amahussein

January 10, 2024 02:11

amahussein requested changes

View reviewed changes

Collaborator

amahussein left a comment

the PR has conflicts. Needs upmerge.

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Outdated Show resolved Hide resolved

core/src/main/scala/org/apache/spark/sql/rapids/tool/ToolUtils.scala Show resolved Hide resolved

core/src/main/scala/org/apache/spark/sql/rapids/tool/ToolUtils.scala Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala Outdated Show resolved Hide resolved

Collaborator Author

cindyyuanjiang commented Jan 10, 2024 •

edited

Loading

Switching my review to approved. I saw the benchmark results and I am okay with the change now. I am a little concerned ...

Thank you for the feedback @revans2!


          merge conflict

c383dea

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang dismissed revans2’s stale review via

c383dea

January 10, 2024 22:45


          addressed review feedback on adding gpu types

ba5529d

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang requested a review from amahussein

January 11, 2024 01:04

amahussein approved these changes

View reviewed changes

amahussein merged commit c0b4ddf into NVIDIA:dev

13 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

nartal1 nartal1 left review comments

parthosa parthosa left review comments

amahussein amahussein approved these changes

revans2 revans2 left review comments

mattahrens Awaiting requested review from mattahrens

kuhushukla Awaiting requested review from kuhushukla

Labels

None yet