Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CAY-1251] Introduce worker-side model cache #1252

Merged
merged 24 commits into from
Nov 29, 2017
Merged

[CAY-1251] Introduce worker-side model cache #1252

merged 24 commits into from
Nov 29, 2017

Conversation

wynot12
Copy link
Contributor

@wynot12 wynot12 commented Nov 2, 2017

Resolves #1251

This PR decouples computation and communication in ML training by introducing worker-side model cache.
Note that cache eviction/refresh policy should be improved.
The current version simply refreshes cache with 10s interval.

Users can turn on this feature with-model_cache_enabled option.

@wynot12 wynot12 requested a review from yunseong November 2, 2017 16:59
@wynot12
Copy link
Contributor Author

wynot12 commented Nov 2, 2017

image

Here's a graph that compares convergence performance between cache vs. no-cache.
(Experiment setup: Optiplex cluster, NMF, Netflix 1x, 5 epoch)

@wynot12
Copy link
Contributor Author

wynot12 commented Nov 2, 2017

#1253 will resolve the fail.

Copy link
Contributor

@yunseong yunseong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It looks good overall, but I have a few concerns:

  1. What if we use more than one trainer threads? Doesn't this case request about 2x pull if the cache is not warmed up (link)?
  2. What are the indices in the graph (e.g., cache2, cache3)?

this.modelTable = tableAccessor.getTable(modelTableId);
this.modelUpdateFunction = modelUpdateFunction;

// TODO #00: introduce a sophisticated cache refresh/eviction policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the issue number to #1254

modelTable.updateNoReply(key, deltaValue);
pushTracer.recordTime(1);

// update local cache. oldValue always exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused with the phrase oldValue always exists: does it imply that push() occurs always after some parameters are loaded via pull()? If it's true, could you please add a comment why oldValue is guaranteed to exist?

}

/**
* This method does not care about cache.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if we clarify what are the differences between pull(List<K> keys, Table table) and pull(List<K> keys). The distinction is missing in the base interface, but we differentiate them in this implementation (with cache vs. without cache).

Copy link
Contributor Author

@wynot12 wynot12 Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. actually this part is completely same with no-cache version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, then the question would be more about
why do we not care about cache in this method, while pull(final List<K> keys) gets the data from the cache?

This question raised my original question above: what are the differences between the two methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull(List<K> keys, Table table) is for using other tables that has no caches.
This ModelAccessor implementation provides a cache only for a table in its field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull(List<K> keys, Table table) has been inserted to support offline model evaluation.
So this method may seem awkward in ModelAccessor interface.

@wynot12
Copy link
Contributor Author

wynot12 commented Nov 7, 2017

@yunseong I'll update this PR to use guava loading cache soon.

@wynot12
Copy link
Contributor Author

wynot12 commented Nov 7, 2017

In the above graph, integers attached at the end of labels (e.g., cache2, cache3) means multiple experiments with cache. Sorry for confusing.

@wynot12
Copy link
Contributor Author

wynot12 commented Nov 28, 2017

@yunseong how about to merge this PR?
Since cache-version implementation is separate from the original one and our default setting turns off caching, it does not conflict with our main work that will use non-cache version.

@yunseong
Copy link
Contributor

Agreed. The PR looks good and I'm merging it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants