-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fetch snapshotter proxy object without holding cache lock #685
fix: fetch snapshotter proxy object without holding cache lock #685
Conversation
Signed-off-by: Austin Vazquez <[email protected]>
snapshotter, ok = cache.snapshotters[key] | ||
cache.mutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use a reader lock above. Should we be doing that here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof that adds a new level of complexity if we need to account for a second double check lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. If !ok
, wouldn't cache.snapshotters[key]
be always nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kzys, unless another thread populates the cache after we have a cache miss but before we acquire the writers lock. The cache entry needs to be .Close()
before it is garbage collected to cleanup system resources for metrics proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But @ginglis13 is correct this solution won't work for edge cases because we can leak now. So it requires fetch
after reader's lock cache miss but before writer's lock acquisition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving cache.mutex.RUnlock()
in this if block? The snapshotter doesn't do much time consuming operations between RUnlock and Lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may result in deadlock. We'd block on Lock without releasing RUnlock. I don't believe RWMutex has brains to know assign writer's lock even if only one reader's lock is acquired by the same thread.
I think this deserves an issue to discuss options. We need to serialize dictionary writes, but you're right that we don't need to block all remote snapshotters while we dial just any snapshotter. We probably need something smarter than a single cache mutex. |
@kzys @Kern-- @ginglis13 I have created #687 to discuss options. Closing this PR for now and will re-opened based on our conversations there. |
Signed-off-by: Austin Vazquez [email protected]
Issue #, if available:
None
Description of changes:
fetch
operation should not occur while the cache lock is held.This unblocks snapshot requests whom have dialed their microVM to continue in the edge case where the lock has been acquired by a slow dialer.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.