Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default Initialization of Lambda Parameters to Zero #71

Open
lpyhdzx opened this issue Jun 6, 2024 · 3 comments
Open

Default Initialization of Lambda Parameters to Zero #71

lpyhdzx opened this issue Jun 6, 2024 · 3 comments

Comments

@lpyhdzx
Copy link

lpyhdzx commented Jun 6, 2024

Hi! Great work!
I have a question about the default value of the lambda params. I've noticed that they are initialized to zero by default:
lambda_1_layer = torch.nn.Parameter(torch.tensor(0.0, device=self.device))
Given that the Lagrangian loss is calculated using these parameters as follows:
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2
Initializing lambda_1 and lambda_2 to zero seems to imply that the Lagrangian loss component will be zero, as there would be no penalty for deviating from the target sparsity.

So, is it intended for the lambda parameters to be initialized to zero? or is there another section of the code where these parameters are set or adjusted after initialization? I appreciate any clarifications or insights you can provide on this matter.

@xiamengzhou
Copy link
Contributor

Hi @lpyhdzx, sorry for the late reply!

Even though lambda_1 and lambda_2 are initialized to 0, and the lagrangian_loss is initially 0, the lambdas will still receive gradients during backpropagation. lambda_1 will get a gradient of (expected_sparsity - target_sparsity), and lambda_2 will get a gradient of (expected_sparsity - target_sparsity) ** 2. Therefore, these variables are still learnable.

@lpyhdzx
Copy link
Author

lpyhdzx commented Jun 11, 2024

Thanks for the reply!
I borrowed this method but found that this loss would be optimized to be negative value. I guess that this is because there is no additional constraint on this lagrange loss and the parameter lambda can reach to negative values.
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2,
I'm not sure if there is any way to avoid this

@Alloooshe
Copy link

Thanks for the reply! I borrowed this method but found that this loss would be optimized to be negative value. I guess that this is because there is no additional constraint on this lagrange loss and the parameter lambda can reach to negative values. lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2, I'm not sure if there is any way to avoid this

Hi! I am facing similar problem, lag_loss is negative and I am not sure if this will improve with additional training
in my case the lambda_1, lambda_2 parameters are taking negative values and decreasing
it would be great if you can share insights/advice on the matter

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants