-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
host watermark metric #11725
host watermark metric #11725
Conversation
Signed-off-by: Zach Puller <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, could we do a UT for this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpuller. Looks good to me overall, left some minor comments. Let me know what you think.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostAlloc.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostAlloc.scala
Outdated
Show resolved
Hide resolved
hostBytesAllocated -= bytes | ||
// For some reason it's possible for the task to start out by releasing resources, | ||
// possibly from a previous task, in such case we probably should just ignore it. | ||
hostBytesAllocated = hostBytesAllocated.max(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth to log a warning if this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird behavior.
Could there be allocations from different threads here that are not task threads? I don't think there is a case where a task could begin by freeing a bunch of memory, unless we didn't tie in correctly at the beginning, and we are missing the allocations, or the bytes
we are getting here are somehow padded, and different than the allocated amount.
It's possible and normally I would always be in favor of doing so but in this case I opted not to for the following reasons:
If you or others feel strongly I can still find a way to add tests regardless. |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostAlloc.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like for us to investigate the case where we can go negative on deallocation, that seems odd.
FYI I didn't actually observe this here, this was a carryover. from having observed it in the disk spill metric impl. In any case, I can still double check if/why that may happen. |
Signed-off-by: Zach Puller <[email protected]>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/HostAlloc.scala
Outdated
Show resolved
Hide resolved
Thanks @zpuller for addressing the comments. The code change looks good to me. As for the metric being negative, I agree that it would be better to understand when and why it happens. But since that issue doesn't seem to be introduced by this change, I would be fine with merging this PR and continuing the investigation if it takes long. |
Signed-off-by: Zach Puller <[email protected]>
Signed-off-by: Zach Puller <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a small comment about the logging level
@@ -186,6 +203,9 @@ private class HostAlloc(nonPinnedLimit: Long) extends HostMemoryAllocator with L | |||
allocAttemptFinishedWithoutException = true | |||
} finally { | |||
if (ret.isDefined) { | |||
val metrics = GpuTaskMetrics.get | |||
metrics.incHostBytesAllocated(amount) | |||
logDebug(getHostAllocMetricsLogStr(metrics)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these logTrace instead of debug? The frequency that these happen make it so that I would prefer to have it at an even lower logging level.
Signed-off-by: Zach Puller <[email protected]>
The way the accumulators here are getting combined (max) I don't think works in this case. Unless I am missing something. Because a pinned allocation (or the disk case, or the device case) could be allocated in thread 1 and freed on thread 2, we could have a negative value in thread 2, and over count thread 1's max. I thought the idea behind the metric was to get the "max used memory", and so that means to me that really the task should sample the tracker (device/host/disk) for the max used memory during the lifecycle of the task. Yes that means the task is accounting for other tasks' memory, but the max is what we are after. So I think when we allocate/free, instead of sending the delta to the task specific counter to inc/dec, we set the watermark, and recompute the max. |
Thanks @abellina I missed that. Yes the max and the memory allocated values need to be stored in a singleton/static location and protected with some kind of a lock/atomic. I think that would be enough to make this all work. |
build |
Signed-off-by: Zach Puller <[email protected]>
build |
Adds watermark metric to track the max amount of memory allocated on the host per task.
Tested manually and verified the values look sane in the UI and in logs.