Fix inconsistent shuffle write time sum results in Profiler output #1450

cindyyuanjiang · 2024-12-05T23:45:25Z

Changes

In TaskModel class, keep using nanoseconds for shuffle write time
Convert these into milliseconds when generating output

This improves shuffle write time metrics output to avoid potential precision lost from converting nanoseconds to milliseconds and then taking the sum of the converted numbers. It also separates TaskModel and output reporting so that we know all metrics are in their original values before output generation.

Testing

Existing unit tests
Manually confirm the shuffle write time value is consistent in all places in Profiler output

Before/After Values (shuffle write time sum)

core/src/test/resources/ProfilingExpectations/rapids_join_eventlog_jobmetricsaggmulti_expectation.csv
944 --> 1001
849 --> 901

core/src/test/resources/ProfilingExpectations/rapids_join_eventlog_sqlmetricsaggmulti_expectation.csv
944 --> 1001
849 --> 901

core/src/test/resources/ProfilingExpectations/rapids_join_eventlog_stagemetricsaggmulti_expectation.csv
397 --> 400
505 --> 508
42 --> 93
373 --> 376
473 --> 475
3 --> 50

… ms only in output Signed-off-by: cindyyuanjiang <[email protected]>

nartal1

LGTM. Thanks @cindyyuanjiang for the fix!
Nit: It would be nice to include the before and after values in the description. I understand that we can confirm the fix from the expected_files.

parthosa

Thanks @cindyyuanjiang for this change.

@amahussein Unrelated but should we have similar approach for executorCpuTime and executorDeserializeCpuTime?

cindyyuanjiang · 2024-12-09T22:52:29Z

Thanks @nartal1! Updated the before/after values in PR description.

cindyyuanjiang · 2024-12-09T22:54:42Z

Thanks @parthosa! Agree we should discuss the requirements for executorCpuTime and executorDeserializeCpuTime.

amahussein · 2024-12-11T15:52:59Z

Thanks @cindyyuanjiang for this change.

@amahussein Unrelated but should we have similar approach for executorCpuTime and executorDeserializeCpuTime?

Thanks @parthosa. Yes, it would have been better to fix the inconsistency for other metrics within the is very PR since the change is not big compared to the overhead we would have to go through filing another bug then dealing with the a new PR.

amahussein · 2024-12-16T20:21:10Z

@cindyyuanjiang
Is this ready to merge? Or there is something you are going to address?

amahussein

The implementation is still not accurate. because we need to convert the units after all the tasks are aggregated on each level.

amahussein · 2024-12-16T20:37:26Z

core/src/main/scala/com/nvidia/spark/rapids/tool/analysis/AppSparkMetricsAnalyzer.scala

@@ -438,7 +438,7 @@ class AppSparkMetricsAnalyzer(app: AppBase) extends AppAnalysisBase(app) {
        val peakMemoryValues = tasksInStage.map(_.peakExecutionMemory)
        val shuffleWriteTime = tasksInStage.map(_.sw_writeTime)
        (AppSparkMetricsAnalyzer.maxWithEmptyHandling(peakMemoryValues),
-          shuffleWriteTime.sum)
+          TimeUnit.NANOSECONDS.toMillis(shuffleWriteTime.sum))


This still does not fix the problem because the conversion is done on the stage-level.
The correct way, is to convert after the metrics are aggregated on each level.
For example, perStage/perSql/perJob.

The per SQL and per job results are computed based on cached per stage results. Please correct me if I am wrong.

Correct!
But when we are aggregating perSql, this PR is actually aggregating the stages per SQL after the time is converted to milliseconds.
If we want to be more accurate, then the cached-per-stage-results should still be in nano-seconds; then per-sql value is the sum in nano-seconds; and finally it gets converted to milliseonds.

understood, thanks @amahussein! I will address this now.

Discussed offline. We will keep the current implementation to avoid potential overflow if we aggregate at SQL/job level.

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang · 2024-12-17T02:37:48Z

Applied same approach for executorCpuTime and executorDeserializeCpuTime.

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang · 2024-12-17T02:40:30Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileClassWarehouse.scala

  }

  override def convertToCSVSeq: Seq[String] = {
-    Seq(appIndex.toString, StringUtils.reformatCSVString(appId), rootsqlID.getOrElse("").toString,
-      sqlID.toString, durStr, containsDataset.toString, appDurStr,
-      StringUtils.reformatCSVString(potentialStr), execCpuTimePercent)


Updated format only for better readability.

cindyyuanjiang · 2024-12-17T02:40:42Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileClassWarehouse.scala

@@ -950,14 +992,27 @@ case class SQLDurationExecutorTimeProfileResult(appIndex: Int, appId: String,
  }

  override def convertToSeq: Seq[String] = {
-    Seq(appIndex.toString, rootsqlID.getOrElse("").toString, appId, sqlID.toString, durStr,
-      containsDataset.toString, appDurStr, potentialStr, execCpuTimePercent)


Updated format only for better readability.

cindyyuanjiang · 2024-12-17T02:41:33Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileClassWarehouse.scala

-    "resultSerializationTime_sum", "resultSize_max", "sr_fetchWaitTime_sum",
-    "sr_localBlocksFetched_sum", "sr_localBytesRead_sum", "sr_remoteBlocksFetched_sum",
-    "sr_remoteBytesRead_sum", "sr_remoteBytesReadToDisk_sum", "sr_totalBytesRead_sum",
-    "sw_bytesWritten_sum", "sw_recordsWritten_sum", "sw_writeTime_sum")


Updated format only for better readability.

cindyyuanjiang · 2024-12-17T02:41:53Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileClassWarehouse.scala

@@ -924,12 +951,27 @@ case class IOAnalysisProfileResult(
  }
 }

-case class SQLDurationExecutorTimeProfileResult(appIndex: Int, appId: String,
-    rootsqlID: Option[Long], sqlID: Long, duration: Option[Long], containsDataset: Boolean,
-    appDuration: Option[Long], potentialProbs: String,


Updated format only for better readability.

cindyyuanjiang · 2024-12-17T02:41:58Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileClassWarehouse.scala

    executorCpuRatio: Double) extends ProfileResult {
-  override val outputHeaders = Seq("appIndex", "App ID", "RootSqlID", "sqlID", "SQL Duration",
-    "Contains Dataset or RDD Op", "App Duration", "Potential Problems", "Executor CPU Time Percent")


Updated format only for better readability.

cindyyuanjiang · 2024-12-17T02:47:01Z

@amahussein @parthosa @nartal1
Question: After we make the changes, I see Executor CPU Time Percent of 103.45 > 100 in core/src/test/resources/ProfilingExpectations/rapids_duration_and_cpu_expectation.csv, do we want to limit/upper-bound this ratio to 100.0, or it is okay to have >100 percentages?

amahussein · 2024-12-17T17:41:23Z

@amahussein @parthosa @nartal1 Question: After we make the changes, I see Executor CPU Time Percent of 103.45 > 100 in core/src/test/resources/ProfilingExpectations/rapids_duration_and_cpu_expectation.csv, do we want to limit/upper-bound this ratio to 100.0, or it is okay to have >100 percentages?

It seems more like a bug.

amahussein

Thanks @cindyyuanjiang
minor styling issue.

amahussein · 2024-12-17T17:45:14Z

core/src/main/scala/com/nvidia/spark/rapids/tool/analysis/AppSparkMetricsAnalyzer.scala

@@ -473,7 +471,7 @@ class AppSparkMetricsAnalyzer(app: AppBase) extends AppAnalysisBase(app) {
        tasksInStage.map(_.sr_totalBytesRead).sum,
        tasksInStage.map(_.sw_bytesWritten).sum,
        tasksInStage.map(_.sw_recordsWritten).sum,
-        shuffleWriteTimeSum
+        TimeUnit.NANOSECONDS.toMillis(shuffleWriteTimeSum) // nanoseconds


nit:
// nanoseconds -> It took me a minute until I realized that this is a comment and not a division :)
Can we move that to be in a separate line above the conversion

thanks, removed this.

amahussein · 2024-12-17T17:46:37Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/store/TaskModel.scala

-    executorCPUTime: Long,
+    executorDeserializeCPUTime: Long, // nanoseconds
+    executorRunTime: Long, // milliseconds
+    executorCPUTime: Long, // nanoseconds


💯
Thanks for doing that

amahussein · 2024-12-17T17:47:54Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/ProfileClassWarehouse.scala

@@ -443,20 +443,44 @@ trait BaseJobStageAggTaskMetricsProfileResult extends ProfileResult {
  def srTotalBytesReadSum: Long
  def swBytesWrittenSum: Long
  def swRecordsWrittenSum: Long
-  def swWriteTimeSum: Long
+  def swWriteTimeSum: Long // milliseconds


Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang · 2024-12-17T21:49:46Z

@amahussein @parthosa @nartal1 Question: After we make the changes, I see Executor CPU Time Percent of 103.45 > 100 in core/src/test/resources/ProfilingExpectations/rapids_duration_and_cpu_expectation.csv, do we want to limit/upper-bound this ratio to 100.0, or it is okay to have >100 percentages?

It seems more like a bug.

thanks @amahussein! Filed issue: #1469

parthosa

Thanks @cindyyuanjiang for this change. The discussion above about overflow concerns makes sense.

amahussein

@amahussein @parthosa @nartal1 Question: After we make the changes, I see Executor CPU Time Percent of 103.45 > 100 in core/src/test/resources/ProfilingExpectations/rapids_duration_and_cpu_expectation.csv, do we want to limit/upper-bound this ratio to 100.0, or it is okay to have >100 percentages?

It seems more like a bug.

thanks @amahussein! Filed issue: #1469

I am not sure we should fix the percentage in a followup issue. This means that we are fixing inconsistent view across 2 files and we introduce another bug.

cindyyuanjiang · 2024-12-20T19:49:32Z

@amahussein @parthosa @nartal1 Question: After we make the changes, I see Executor CPU Time Percent of 103.45 > 100 in core/src/test/resources/ProfilingExpectations/rapids_duration_and_cpu_expectation.csv, do we want to limit/upper-bound this ratio to 100.0, or it is okay to have >100 percentages?

It seems more like a bug.

thanks @amahussein! Filed issue: #1469

I am not sure we should fix the percentage in a followup issue. This means that we are fixing inconsistent view across 2 files and we introduce another bug.

Investigated into this. It looks more like a rounding issue than a bug to me:

executorRunTime is in milliseconds in its raw form while executorCpuTime is in nanoseconds, therefore executorRunTime could have lost precision before we take the sum over all tasks.
The runtime is very low where execCPURatio = 103.45, execCpuTime = 30ms and execRunTime = 29ms.

WDYT? @amahussein @nartal1 @parthosa

parthosa · 2024-12-20T22:58:59Z

@amahussein @parthosa @nartal1 Question: After we make the changes, I see Executor CPU Time Percent of 103.45 > 100 in core/src/test/resources/ProfilingExpectations/rapids_duration_and_cpu_expectation.csv, do we want to limit/upper-bound this ratio to 100.0, or it is okay to have >100 percentages?

It seems more like a bug.

thanks @amahussein! Filed issue: #1469

I am not sure we should fix the percentage in a followup issue. This means that we are fixing inconsistent view across 2 files and we introduce another bug.

Investigated into this. It looks more like a rounding issue than a bug to me:

executorRunTime is in milliseconds in its raw form while executorCpuTime is in nanoseconds, therefore executorRunTime could have lost precision before we take the sum over all tasks.

The runtime is very low where execCPURatio = 103.45, execCpuTime = 30ms and execRunTime = 29ms.

WDYT? @amahussein @nartal1 @parthosa

Thanks @cindyyuanjiang for looking into this. I think this was always a bug but we are now able to catch it due to the changes in this PR. If in the raw form they are measured in different units, I do not think we can fix this problem.

I could not find a reason why spark reports runtime in ms and cpu time in ns.

Ref:
https://github.com/apache/spark/blob/a2e3188b4997001f4dbc1eb364d61ca55d438208/core/src/main/scala/org/apache/spark/executor/Executor.scala#L715-L720

keep nano sec unit for shuffle write time in taskmodel and convert to…

a47ebd7

… ms only in output Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang self-assigned this Dec 5, 2024

cindyyuanjiang requested review from parthosa, amahussein and nartal1 December 5, 2024 23:45

cindyyuanjiang added bug Something isn't working core_tools Scope the core module (scala) labels Dec 5, 2024

nartal1 previously approved these changes Dec 9, 2024

View reviewed changes

This comment was marked as duplicate.

Sign in to view

parthosa previously approved these changes Dec 9, 2024

View reviewed changes

amahussein requested changes Dec 16, 2024

View reviewed changes

refactored executorCpuTime and executorDeserializeCpuTime

8e03d94

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang dismissed stale reviews from parthosa and nartal1 via 8e03d94 December 17, 2024 02:33

cindyyuanjiang requested review from amahussein, parthosa and nartal1 December 17, 2024 02:38

change for better readability

dbb879c

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang commented Dec 17, 2024

View reviewed changes

amahussein mentioned this pull request Dec 17, 2024

Optimize implementation of getAggregateRawMetrics in core-tools #1468

Merged

amahussein previously approved these changes Dec 17, 2024

View reviewed changes

minor style issue

babc840

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang dismissed amahussein’s stale review via babc840 December 17, 2024 21:49

cindyyuanjiang requested a review from amahussein December 17, 2024 21:55

parthosa approved these changes Dec 17, 2024

View reviewed changes

nartal1 approved these changes Dec 17, 2024

View reviewed changes

amahussein requested changes Dec 18, 2024

View reviewed changes

cindyyuanjiang mentioned this pull request Dec 18, 2024

[BUG] Profiler output Executor CPU Time Percent is greater than 100 #1469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inconsistent shuffle write time sum results in Profiler output #1450

Fix inconsistent shuffle write time sum results in Profiler output #1450

cindyyuanjiang commented Dec 5, 2024 •

edited

Loading

nartal1 left a comment

This comment was marked as duplicate.

parthosa left a comment

cindyyuanjiang commented Dec 9, 2024

cindyyuanjiang commented Dec 9, 2024 •

edited

Loading

amahussein commented Dec 11, 2024

amahussein commented Dec 16, 2024

amahussein left a comment

amahussein Dec 16, 2024

cindyyuanjiang Dec 16, 2024

amahussein Dec 16, 2024

cindyyuanjiang Dec 16, 2024

cindyyuanjiang Dec 17, 2024

cindyyuanjiang commented Dec 17, 2024

cindyyuanjiang Dec 17, 2024

cindyyuanjiang Dec 17, 2024

cindyyuanjiang Dec 17, 2024

cindyyuanjiang Dec 17, 2024

cindyyuanjiang Dec 17, 2024

cindyyuanjiang commented Dec 17, 2024 •

edited

Loading

amahussein commented Dec 17, 2024

amahussein left a comment

amahussein Dec 17, 2024

cindyyuanjiang Dec 17, 2024

amahussein Dec 17, 2024

amahussein Dec 17, 2024

cindyyuanjiang commented Dec 17, 2024 •

edited

Loading

parthosa left a comment

amahussein left a comment

cindyyuanjiang commented Dec 20, 2024

parthosa commented Dec 20, 2024 •

edited

Loading

Fix inconsistent shuffle write time sum results in Profiler output #1450

Are you sure you want to change the base?

Fix inconsistent shuffle write time sum results in Profiler output #1450

Conversation

cindyyuanjiang commented Dec 5, 2024 • edited Loading

Changes

Testing

Before/After Values (shuffle write time sum)

nartal1 left a comment

Choose a reason for hiding this comment

This comment was marked as duplicate.

parthosa left a comment

Choose a reason for hiding this comment

cindyyuanjiang commented Dec 9, 2024

cindyyuanjiang commented Dec 9, 2024 • edited Loading

amahussein commented Dec 11, 2024

amahussein commented Dec 16, 2024

amahussein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cindyyuanjiang commented Dec 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cindyyuanjiang commented Dec 17, 2024 • edited Loading

amahussein commented Dec 17, 2024

amahussein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cindyyuanjiang commented Dec 17, 2024 • edited Loading

parthosa left a comment

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

cindyyuanjiang commented Dec 20, 2024

parthosa commented Dec 20, 2024 • edited Loading

cindyyuanjiang commented Dec 5, 2024 •

edited

Loading

cindyyuanjiang commented Dec 9, 2024 •

edited

Loading

cindyyuanjiang commented Dec 17, 2024 •

edited

Loading

cindyyuanjiang commented Dec 17, 2024 •

edited

Loading

parthosa commented Dec 20, 2024 •

edited

Loading