This project offers a framework for applying Knowledge Distillation(KD)
to various deep learning models. In KD, a large model (Teacher
) helps a smaller model (Student
) become more efficient while maintaining performance. This approach is ideal for creating models that work well in resource-constrained environments like mobile devices or embedded systems.
In this project, we demonstrate the application of Knowledge Distillation using CNN as the Teacher
model and a simplified CNN
, ResNet
model as the Student
on the Image dataset (CIFAR-10). The framework is designed to be easily extensible to different datasets and models.
You can easily manage dependencies using Poetry. Install all required packages by running:
poetry shell
poetry install
This project uses the CIFAR-10 dataset, which is automatically downloaded via the torchvision library. No separate download process is required.
tensorboard --logdir=./logs
The Teacher model and Student model are simple model structures inspired by CNN
and ResNet
, respectively, with the model size differing by approximately 4 times.
Teacher (Size) | Student (Size) | KD (parameter value) | Accuracy | Epoch |
---|---|---|---|---|
ResNet (4.57MB, baseline) | - | - | 92.84% | 200 |
- | ResNet (1.19MB, baseline) | - | 83.61% | 30 |
ResNet (4.57MB, predtrained) | ResNet (1.19MB) | logits (weight=1.0) | 87.51% | 30 |
ResNet (4.57MB, predtrained) | ResNet (1.19MB) | soft_target (T=4.0, weight=1.0) | 86.69% | 30 |
ResNet (4.57MB, predtrained) | ResNet (1.19MB) | hints (weight=1.0) | 87.90% | 30 |
ResNet (4.57MB, predtrained) | ResNet (1.19MB) | attention_transfer (weight=1.0) | 86.21% | 30 |
ResNet (4.57MB, predtrained) | ResNet (1.19MB) | similarity_perserving (weight=1.0) | 87.45% | 30 |