Default Initialization of Lambda Parameters to Zero #71

lpyhdzx · 2024-06-06T03:31:09Z

Hi! Great work!
I have a question about the default value of the lambda params. I've noticed that they are initialized to zero by default:
lambda_1_layer = torch.nn.Parameter(torch.tensor(0.0, device=self.device))
Given that the Lagrangian loss is calculated using these parameters as follows:
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2
Initializing lambda_1 and lambda_2 to zero seems to imply that the Lagrangian loss component will be zero, as there would be no penalty for deviating from the target sparsity.

So, is it intended for the lambda parameters to be initialized to zero? or is there another section of the code where these parameters are set or adjusted after initialization? I appreciate any clarifications or insights you can provide on this matter.

The text was updated successfully, but these errors were encountered:

xiamengzhou · 2024-06-08T03:51:48Z

Hi @lpyhdzx, sorry for the late reply!

Even though lambda_1 and lambda_2 are initialized to 0, and the lagrangian_loss is initially 0, the lambdas will still receive gradients during backpropagation. lambda_1 will get a gradient of (expected_sparsity - target_sparsity), and lambda_2 will get a gradient of (expected_sparsity - target_sparsity) ** 2. Therefore, these variables are still learnable.

lpyhdzx · 2024-06-11T02:01:29Z

Thanks for the reply!
I borrowed this method but found that this loss would be optimized to be negative value. I guess that this is because there is no additional constraint on this lagrange loss and the parameter lambda can reach to negative values.
lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2,
I'm not sure if there is any way to avoid this

Alloooshe · 2024-07-23T14:31:02Z

Thanks for the reply! I borrowed this method but found that this loss would be optimized to be negative value. I guess that this is because there is no additional constraint on this lagrange loss and the parameter lambda can reach to negative values. lagrangian_loss = lambda_1 * (expected_sparsity - target_sparsity) + lambda_2 * (expected_sparsity - target_sparsity) ** 2, I'm not sure if there is any way to avoid this

Hi! I am facing similar problem, lag_loss is negative and I am not sure if this will improve with additional training
in my case the lambda_1, lambda_2 parameters are taking negative values and decreasing
it would be great if you can share insights/advice on the matter

thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default Initialization of Lambda Parameters to Zero #71

Default Initialization of Lambda Parameters to Zero #71

lpyhdzx commented Jun 6, 2024

xiamengzhou commented Jun 8, 2024

lpyhdzx commented Jun 11, 2024

Alloooshe commented Jul 23, 2024

Default Initialization of Lambda Parameters to Zero #71

Default Initialization of Lambda Parameters to Zero #71

Comments

lpyhdzx commented Jun 6, 2024

xiamengzhou commented Jun 8, 2024

lpyhdzx commented Jun 11, 2024

Alloooshe commented Jul 23, 2024