You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks you for open-sourcing this great idea !
I'm exploring your codes and doing some experiments with variants.
The first thing is activation function.
As written in BNN_cifar10.py, you used HardTanh function for activation function.
I don't see any description on your paper about this, though I'm not 100% confident, but I found this significantly affects to accuracy anyway.
Keeping ReLU with BNN as the full-precision counterpart does drops about 10% of top1 accuracy.
Do you have any insight about this?
Because, when stacking very deep networks, I heard that hyperbolic tangent for activation could be a bad idea.
I'm bit concerned about gradient vanishing problems, etc.
If you can share some experience about this, why did you use the specific hyperbolic tangent function and so on, I'd be very nice.
Thanks in advance
OYH
The text was updated successfully, but these errors were encountered:
HardTanh is simply cliping the values to be between -1 and 1. Everything above 1 it sets to 1 and below -1 to -1, this helps the initial training phase. Since I used BN to normalize the input I know that most input's data would be in that area. After that I used the sign function which actually binarized the input. If you use the ReLU function you simply assign everything above 0 to 1 and the rest would be zero. Probably if you would clamp the relu values above 1 (same idea as relu6 only with 1 instead of 6) and use round function instead of sign you would get good results.
All the best, Itay
Hi,
Thanks you for open-sourcing this great idea !
I'm exploring your codes and doing some experiments with variants.
The first thing is activation function.
As written in BNN_cifar10.py, you used HardTanh function for activation function.
I don't see any description on your paper about this, though I'm not 100% confident, but I found this significantly affects to accuracy anyway.
Keeping ReLU with BNN as the full-precision counterpart does drops about 10% of top1 accuracy.
Do you have any insight about this?
Because, when stacking very deep networks, I heard that hyperbolic tangent for activation could be a bad idea.
I'm bit concerned about gradient vanishing problems, etc.
If you can share some experience about this, why did you use the specific hyperbolic tangent function and so on, I'd be very nice.
Thanks in advance
OYH
The text was updated successfully, but these errors were encountered: