Here are a few info:
In Natural Language Processing:
- Transformer Utilization
- Roberta Base Implementation
- Masked language modeling with DebertaV3
- Training any NLP Transformer with Native Pytorch
- Roberta Distilation classification
- Knowledge distilation from large language model
- Roberta distilation
- Roberta Distilation classification
- Knowledge distilation from large language model
- Initializing a model with different embedding size and layer from large model through Teacher-Student pretraining
- RoBERTa (from Facebook)