-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default Initialization of Lambda Parameters to Zero #71
Comments
Hi @lpyhdzx, sorry for the late reply! Even though lambda_1 and lambda_2 are initialized to 0, and the lagrangian_loss is initially 0, the lambdas will still receive gradients during backpropagation. lambda_1 will get a gradient of (expected_sparsity - target_sparsity), and lambda_2 will get a gradient of (expected_sparsity - target_sparsity) ** 2. Therefore, these variables are still learnable. |
Thanks for the reply! |
Hi! I am facing similar problem, lag_loss is negative and I am not sure if this will improve with additional training thank you! |
Hi! Great work!
I have a question about the default value of the lambda params. I've noticed that they are initialized to zero by default:
lambda_1_layer = torch.nn.Parameter(torch.tensor(0.0, device=self.device))
Given that the Lagrangian loss is calculated using these parameters as follows:
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2
Initializing lambda_1 and lambda_2 to zero seems to imply that the Lagrangian loss component will be zero, as there would be no penalty for deviating from the target sparsity.
So, is it intended for the lambda parameters to be initialized to zero? or is there another section of the code where these parameters are set or adjusted after initialization? I appreciate any clarifications or insights you can provide on this matter.
The text was updated successfully, but these errors were encountered: