-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CAY-1251] Introduce worker-side model cache #1252
Conversation
#1253 will resolve the fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It looks good overall, but I have a few concerns:
- What if we use more than one trainer threads? Doesn't this case request about 2x pull if the cache is not warmed up (link)?
- What are the indices in the graph (e.g., cache2, cache3)?
this.modelTable = tableAccessor.getTable(modelTableId); | ||
this.modelUpdateFunction = modelUpdateFunction; | ||
|
||
// TODO #00: introduce a sophisticated cache refresh/eviction policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the issue number to #1254
modelTable.updateNoReply(key, deltaValue); | ||
pushTracer.recordTime(1); | ||
|
||
// update local cache. oldValue always exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused with the phrase oldValue always exists
: does it imply that push()
occurs always after some parameters are loaded via pull()
? If it's true, could you please add a comment why oldValue
is guaranteed to exist?
} | ||
|
||
/** | ||
* This method does not care about cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great if we clarify what are the differences between pull(List<K> keys, Table table)
and pull(List<K> keys)
. The distinction is missing in the base interface, but we differentiate them in this implementation (with cache vs. without cache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. actually this part is completely same with no-cache version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, then the question would be more about
why do we not care about cache in this method, while pull(final List<K> keys)
gets the data from the cache?
This question raised my original question above: what are the differences between the two methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull(List<K> keys, Table table)
is for using other tables that has no caches.
This ModelAccessor
implementation provides a cache only for a table in its field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull(List<K> keys, Table table)
has been inserted to support offline model evaluation.
So this method may seem awkward in ModelAccessor
interface.
@yunseong I'll update this PR to use guava loading cache soon. |
In the above graph, integers attached at the end of labels (e.g., cache2, cache3) means multiple experiments with cache. Sorry for confusing. |
@yunseong how about to merge this PR? |
Agreed. The PR looks good and I'm merging it. Thanks! |
Resolves #1251
This PR decouples computation and communication in ML training by introducing worker-side model cache.
Note that cache eviction/refresh policy should be improved.
The current version simply refreshes cache with 10s interval.
Users can turn on this feature with
-model_cache_enabled
option.