Release v0.4.6
Changes:
- Add
Distiller
for model distillation - Add support for difference padding strategies:
bucket
: group sequence by length, and then padding sequence by max sequence length of the same batchbatch
: padding sequence by max sequence length of this batchfixed
: padding sequence to a fixed max sequence length over all examples