-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics GpuPartitioning.CopyToHostTime #11882
Conversation
Signed-off-by: sperlingxx <[email protected]>
db876e4
to
967d345
Compare
build |
build |
// The SQLMetric key for MemoryCopyFromDeviceToHost | ||
val CopyToHostTime: String = "d2hMemCpyTime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in GpuMetric along with the description. Copy to host time is not a metric specific to partitioning, and we should be consistent about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it into GpuMetric
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala
Outdated
Show resolved
Hide resolved
new NvtxRange("PartitionD2H", NvtxColor.CYAN)) | ||
// Wait for copyToHostAsync | ||
withResource(memCpyNvtxRange) { _ => | ||
Cuda.DEFAULT_STREAM.sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the only time spent on copy to host. The copyToHostAsync calls above are not guaranteed to be asynchronous (e.g.: when the copy is from pageable memory, and we're not guaranteed to be using pinned memory). Therefore the metric and NVTX range needs to cover the copyToHostAsync calls above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refined the code to wrap them all.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuPartitioning.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly the same comments as Jason
@@ -132,7 +135,15 @@ trait GpuPartitioning extends Partitioning { | |||
} | |||
} | |||
withResource(hostPartColumns) { _ => | |||
Cuda.DEFAULT_STREAM.sync() | |||
lazy val memCpyNvtxRange = memCopyTime.map( | |||
new NvtxWithMetrics("PartitionD2H", NvtxColor.CYAN, _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NvtxWithMetrics has an apply
that already does this for you.
withResource(NvtxRange("PartitionD2H", NvtxColor.CYAN, memCopyTime)) { _ =>
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
…ing.scala Co-authored-by: Jason Lowe <[email protected]>
…ing.scala Co-authored-by: Jason Lowe <[email protected]>
Signed-off-by: sperlingxx <[email protected]>
build |
Close #11878
This PR is to add the GpuMetric
GpuPartitioning.CopyToHostTime
. SinceGpuPartitioning
is a GpuExpression rather than a GpuPlan, a specialized methodGpuPartitioning.setupMetrics
was created for the setup of detailed GpuPartitioning metrics during the planning time.During the local test, the newly-added metric works well.