-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding paper and codes #7
Comments
Hi, I also find the second question. |
I find that the codes using k-means quantization while in the paper it says find the optimal clip value to minimize the KL divergence between non-quantized and quantized weight/activation, in the paper it means the linear quantization, which is different as shown in the codes. |
This confuses me as well. The paper uses linear quantization, but the code provides k-means quantization (similar to the "deep compression"). After k-means quantization, we cannot guarantee that the weights are fixed point arithmetic units. |
It's quite unfortunate that the main novelty claimed by the paper, i.e., the use of direct hardware feedback, is conveniently missing in this repo. In fact, even the paper failed to provide a clear explanation on that claim. |
We have updated the linear quantization as well as the hardware resource-constrained part in this repo. Please let us know if you have any questions. |
Can you please point to the part where the direct HW feedback is used? Thanks. Without that, the repo is still quite limited in significance. |
Thanks for your feedback! You can view the related code refer to haq/lib/env/linear_quantize_env.py Line 306 in 7141586
|
By diving deep into the codes and the paper, I have two questions.
I've read from the paper that "If the current policy exceeds our resource budget (on latency, energy or model size), we will sequentially decrease the bitwidth of each layer until the
constraint is finally satisfied." Where in the codes correspond to this statement "decrease the bitwidth of the layer when the current policy exceeds budget?"
Why don't you use the k-means quantization for latency/energy constraint experiments? Will you release codes for linear quantization?
The text was updated successfully, but these errors were encountered: