My writeup for this solution can be found on kaggle.
- Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, CPU Core=12, CPU Memory=64GB, GPU= 1 x RTX 3090
- Linux Ubuntu 20.04 LTS
- python==3.7.13
- Download BirdCLEF data for 2021, 2022, and 2023
- Download additional datasets here
- Copy the no-call directory of ff1010bird_nocall to the BirdCLEF 2023 train_audio directory.
Directory structure example
/input/
┣ aicrowd2020_noise_30sec/
┣ birdclef-2021/
└ train_short_audio
┣ birdclef-2022/
└ train_audio
┣ birdclef-2023/
├ train_audio <- add no-call
└ train_meta_pseudo.pickle
┣ esc50/
┣ ff1010bird_nocall/
└ ff1010bird_metadata_v1_pseudo.pickle
┣ train_soundscapes/
┣ xeno-canto/
┣ xeno-canto_nd/
┣ zenodo_nocall_30sec/
┣ pretrain_metadata_10fold_pseudo.pickle
┣ xeno-canto_audio_meta_pseudo.pickle
┗ xeno-canto_nd_audio_meta_pseudo.pickle
/src/
┗ ...
- Get predicted values from Kaggle Models like this notebook.
- Store the vector of predicted values as one column (teacher_preds) in the training data. like a ○○○.pickle.
# -C flag is used to specify a config file
# replace NAME_OF_CONFIG with an appropiate config file name such as exp105
python pretrain_net.py -C NAME_OF_CONFIG # for pretraining using BirdCLEF 2021, 2022
python train_net.py -C NAME_OF_CONFIG # for training using BirdCLEF 2021, 2022
Inference is published in a kaggle kernel here.
Name | Public LB | Private LB |
---|---|---|
BaseModel | 0.80603 | 0.70782 |
BaseModel + Knowledge Distillation | 0.82073 | 0.72752 |
BaseModel + Knowledge Distillation + Adding xeno-canto | 0.82905 | 0.74038 |
BaseModel + Knowledge Distillation + Adding xeno-canto + Pretraining | 0.8312 | 0.74424 |
BaseModel + Knowledge Distillation + Adding xeno-canto + Pretraining + Ensemble (4 models) | 0.84019 | 0.75688 |