You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Tim, thanks for making this library. I am trying to test it on speech generation models and i have some questions from your code template:
The models come with their own schedulers and optimizers. Can i simply wrap them around with decay = CosineDecay ... and mask = Masking(optimizer, ...)? Should i change the optimizer to follow optim.SGD(...) and ignore the scheduler? It looks like mask.step() runs every epoch and replaces the scheduler, but i think i should still keep the optimizer specific to the model i have.
I understand that density/sparsity is the desired % of weights to keep, while prune/death rate is an internal parameter to determine what % weights should be redistributed at each iteration. Is this correct?
Density looks like = sparsity in your code, although normally i would think density = 1 - sparsity.
Code fails at core.py line 221-223 when there are RNNs, because for them bias is a boolean and the bias terms are actually bias_ih and bias_hh. I think this might count the parameters better:
for p, tensor in self.modules[0].named_parameters():
total_size += tensor.numel()
The text was updated successfully, but these errors were encountered:
The mask scheduler is different from the learning rate scheduler. The learning rate scheduler should be unaffected by the code.
That is correct. The sparsity percentage is kept steady, but the prune rate changes over time.
I think this is correct. For me, it feels more natural to think in terms of density (27% of weights seems more intuitive than 73% sparsity). However, I think I keep the naming in the code as "sparsity" even though I used density conceptually
This is a good catch! Could you create a pull request for this? I did not test the code for RNNs
Hi Tim, thanks for making this library. I am trying to test it on speech generation models and i have some questions from your code template:
decay = CosineDecay ...
andmask = Masking(optimizer, ...)
? Should i change the optimizer to followoptim.SGD(...)
and ignore the scheduler? It looks likemask.step()
runs every epoch and replaces the scheduler, but i think i should still keep the optimizer specific to the model i have.core.py
line 221-223 when there are RNNs, because for thembias
is a boolean and the bias terms are actuallybias_ih
andbias_hh
. I think this might count the parameters better:The text was updated successfully, but these errors were encountered: