You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that you change the default learning rate for Imagenet in multi-GPU running by multiplying 0.1 with the number of GPUs. I am wondering did you actually use this to get the reported performance in the paper? Will this results in better performance only for sparse training or also dense performance.
Many thanks
The text was updated successfully, but these errors were encountered:
Good catch, I was not aware of this behavior. I did not change the code any further and trained on 4 GPUs. I have not studied the performance difference in detail if I change this behavior. It might have affected the results for sparse and dense performance, but since I have not any data I cannot say for sure what the effect is.
Hi Tim,
First, thank you for your code.
I notice that you change the default learning rate for Imagenet in multi-GPU running by multiplying 0.1 with the number of GPUs. I am wondering did you actually use this to get the reported performance in the paper? Will this results in better performance only for sparse training or also dense performance.
Many thanks
The text was updated successfully, but these errors were encountered: