#4 initiate directory and file structure

bioacoustic-ai · Nov 11, 2024 · aeba190 · aeba190
1 parent 99fda00
commit aeba190
Show file tree

Hide file tree

Showing 19 changed files with 228 additions and 162 deletions.
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@ Models currently include:
 
 |   Name|   ref paper|   ref code|   sampling rate|   input length| embedding dimension |
 |---|---|---|---|---|---|
-|  Animal2vec_XC|   [paper](https://arxiv.org/abs/2406.01253)   |   [code](https://github.com/livingingroups/animal2vec)    |   8 kHz (?)|   5 s| 768 |
+|  Animal2vec_XC|   [paper](https://arxiv.org/abs/2406.01253)   |   [code](https://github.com/livingingroups/animal2vec)    |   24 kHz|   5 s| 768 |
 |  Animal2vec_MK|   [paper](https://arxiv.org/abs/2406.01253)   |   [code](https://github.com/livingingroups/animal2vec)    |   8 kHz|   10 s| 1024 |
 |   AudioMAE    |   [paper](https://proceedings.neurips.cc/paper_files/paper/2022/hash/b89d5e209990b19e33b418e14f323998-Abstract-Conference.html)   |   [code](https://github.com/facebookresearch/AudioMAE)    |   16 kHz|   10 s| 768 |
 |   AVES        |   [paper](https://arxiv.org/abs/2210.14493)   |   [code](https://github.com/earthspecies/aves)    |   16 kHz|   1 s| 768 |
@@ -28,140 +28,6 @@ Models currently include:
 |   UMAP        |   [paper](https://arxiv.org/abs/1802.03426)   |   [code](https://github.com/lmcinnes/umap)    |   - |   - | |
 |   VGGish      |   [paper](https://ieeexplore.ieee.org/document/7952132)   |   [code](https://github.com/tensorflow/models/tree/master/research/audioset/vggish)    |   16 kHz|   0.96 s| 128 |
 
-## Brief description of models
-All information is extracted from the respective repositories and manuscripts. Please refer to them for more details
-
-### Animal2vec_XC 
-- raw waveform input
-- self-supervised model
-- transformer
-- trained on bird song data
-
-animal2vec model weights are from self-supervised pretraining on xeno-canto data. The model is based on data2vec2.0 with a number of bioacoustic-specific model implementations. See paper for more details.
-
-### Animal2vec_MK 
-- raw waveform input
-- self-supervised pretrained model, fine-tuned
-- transformer
-- trained on meerkat vocalizations
-
-animal2vec model weights are from self-supervised pretraining on meerkat data with fine tuning on a curated meerkat dataset. The model is based on data2vec2.0 with a number of bioacoustic-specific model implementations. See paper for more details.
-
-### AudioMAE
-- spectrogram input
-- self-supervised pretrained model, fine-tuned
-- vision transformer
-- trained on general audio
-
-AudioMAE from the facebook research group is a vision transformer pretrained on AudioSet-2M data and fine-tuned on AudioSet-20K.
-
-### AVES
-- transformer
-- self-supervised pretrained model
-- trained on general audio
-
-AVES is short for Animal Vocalization Encoder based on Self-Supervision. The model is based on the HuBERT-base architecture. The model is pretrained on unannotated audio datasets AudioSet-20K, FSD50K and the animal sounds from AudioSet and VGGSound.
-
-
-### BioLingual
-- transformer
-- spectrogram input
-- contrastive-learning
-- self-supervised pretrained model
-- trained on animal sound data (primarily bird song)
-
-BioLingual is a language-audio model trained on captioning bioacoustic datasets inlcuding xeno-canto and iNaturalist. The model architecture is based on the [CLAP](https://arxiv.org/pdf/2211.06687) model architecture. 
-
-### BirdAVES
-- transformer
-- self-supervised pretrained model
-- trained on general audio and bird song data
-
-AVES is short for Animal Vocalization Encoder based on Self-Supervision. The model is based on the HuBERT-large architecture. The model is pretrained on unannotated audio datasets AudioSet-20K, FSD50K and the animal sounds from AudioSet and VGGSound as well as bird vocalizations from xeno-canto. 
-
-### BirdNET
-- CNN
-- supervised training model
-- trained on bird song data
-
-BirdNET (v2.4) is based on a EfficientNET(b0) architecture. The model is trained on a large amount of bird vocalizations from the xeno-canto database alongside other bird song databses. 
-
-### EchoPaSST
-- transformer
-- supervised pretrained model, fine-tuned
-- pretrained on general audio and bird song data
-
-EchoPaSST is a vision transformer trained on AudioSet and (deep) fine-tuned on xeno-canto. The model is based on the [PaSST](https://github.com/kkoutini/PaSST) framework. 
-
-
-### HumpbackNET
-- CNN
-- supervised training model
-- trained on humpback whale song
-
-HumpbackNET is a binary classifier based on a ResNet-50 model trained on humpback whale data from different parts in the North Atlantic. 
-
-### Insect66NET
-- CNN
-- supervised training model
-- trained on insect sounds
-
-InsectNET66 is a [EfficientNet v2 s](https://pytorch.org/vision/main/models/generated/torchvision.models.efficientnet_v2_s.html) model trained on the [Insect66 dataset](https://zenodo.org/records/8252141) including sounds of grasshoppers, crickets, cicadas developed by the winning team of the Capgemini Global Data Science Challenge 2023.
-
-
-### Mix2
-- CNN
-- supervised training model
-- trained on frog sounds
-
-Mix2 is a [MobileNet v3](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py) model trained on the [AnuranSet](https://github.com/soundclim/anuraset) which includes sounds of 42 different species of frogs from different regions in Brazil. The model was trained using a mixture of Mixup augmentations to handle the class imbalance of the data.
-
-### RCL_FS_BSED
-- CNN
-- supervised contrastive learning
-- trained on dcase 2023 task 5 dataset [link](https://zenodo.org/records/6482837)
-
-RCL_FS_BSED stands for Regularized Contrastive Learning for Few-shot Bioacoustic Sound Event Detection and features a model based on a ResNet model. The model was originally created for the DCASE bioacoustic few shot challenge (task 5) and later improved.
-
-### ProtoCLR
-- transformer
-- supervised contrastive learning
-- trained on bird song data
-
-ProtoCLR stands for Prototypical Contrastive Learning for robust representation learning. The architecture is a CvT-13 (Convolutional vision transformer) with 20M parameters. ProtoCLR has been validated on transfer learning tasks for bird sound classification, showing strong domain-invariance in few-shot scenarios. The model was trained on the xeno-canto dataset.
-
-
-### Perch
-- CNN
-- supervised training model
-- trained on bird song data
-
-Perch is a EFficientNet B1 model trained on the entire Xeno-canto database.
-
-### SurfPerch
-- CNN
-- supervised training model
-- trained on bird song, fine-tuned on tropical reef data
-
-Perch is a EFficientNet B1 model trained on the entire Xeno-canto database and fine tuned on coral reef and unrelated sounds.
-
-### WhalePerch
-- CNN
-- supervised training model
-- trained on 7 whale species
-
-WhalePerch (multispecies_whale) is a EFficientNet B0 model trained on whale sounds.
-
-### UMAP
-see [repo](https://github.com/lmcinnes/umap)
-
-### VGGISH
-- CNN
-- supervised training model
-- trained on general audio
-
-VGGish is a model based on the [VGG](https://arxiv.org/pdf/1409.1556) architecture. The model is trained on audio from youtube videos (YouTube-8M)
-
 ## Installation
 
 Create a virtual environment using python3.11 and virtualenv

diff --git a/bacpipe/config.yaml b/bacpipe/config.yaml
@@ -1,23 +1,16 @@
 ## MAIN VARS ##
-
-# PATHS:
-# define paths for your data, these are the standard directories, if you 
-# used the file condenser. If you do not have annotated data, and all of your
-# sound files are in one directory, change accordingly
-audio_dir :       "bacpipe/test_files/audio/audio_test_files"
 # audio_dir :       "/media/vincent/Extreme SSD/MA/20221019-Benoit/transfer_1780104_files_cfc5b86f/SABA01_201511_201604_SN275/resampled_2kHz/wav"
 
 # fixed path, embeddings will be stored here, DO NOT CHANGE
 embed_parent_dir :   "bacpipe/test_files/embeds"
 umap_parent_dir :   "bacpipe/test_files/umap_embeds"
 
 
-# embedding model name
-embedding_model: 'rcl_fs_bsed'
-
 # supported formats of audio files
 audio_suffixes: ['.wav', '.WAV', '.aif', '.mp3']
 
+# specify your device ['cpu', 'cuda', ...]
+device: 'cpu'
 
 # UMAP settings
 n_neighbors : 50

diff --git a/bacpipe/embeddings/README.md b/bacpipe/embeddings/README.md
diff --git a/bacpipe/evaluation/README.md b/bacpipe/evaluation/README.md
@@ -0,0 +1 @@
+# Evaluation scripts and data to analyze model performance
diff --git a/bacpipe/evaluation/classification.py b/bacpipe/evaluation/classification.py
diff --git a/bacpipe/evaluation/clustering_eval.py b/bacpipe/evaluation/clustering_eval.py
diff --git a/bacpipe/evaluation/datasets/README.md b/bacpipe/evaluation/datasets/README.md
@@ -0,0 +1 @@
+# Audio data used to evaluate the models
diff --git a/bacpipe/evaluation/datasets/audio_test_files/humpback20s.wav b/bacpipe/evaluation/datasets/audio_test_files/humpback20s.wav
diff --git a/bacpipe/evaluation/datasets/benchmark/test/README.md b/bacpipe/evaluation/datasets/benchmark/test/README.md
@@ -0,0 +1 @@
+# Test data to benchmark model performance
diff --git a/bacpipe/evaluation/datasets/benchmark/train/README.md b/bacpipe/evaluation/datasets/benchmark/train/README.md
@@ -0,0 +1 @@
+# Train data to benchmark model performance 
diff --git a/bacpipe/evaluation/results/metrics/README.md b/bacpipe/evaluation/results/metrics/README.md
@@ -0,0 +1 @@
+# Metrics generated during evaluation will be placed here
diff --git a/bacpipe/evaluation/results/plots/README.md b/bacpipe/evaluation/results/plots/README.md
@@ -0,0 +1 @@
+# Plots generated during evaluation will be placed here 
diff --git a/bacpipe/evaluation/visualization.py b/bacpipe/evaluation/visualization.py
diff --git a/bacpipe/generate_embeddings.py b/bacpipe/generate_embeddings.py
@@ -15,10 +15,12 @@ def __init__(
         self,
         check_if_combination_exists=True,
         model_name="umap",
+        audio_dir=None,
         testing=False,
         **kwargs,
     ):
         self.model_name = model_name
+        self.audio_dir = audio_dir
 
         with open("bacpipe/config.yaml", "r") as f:
             self.config = yaml.safe_load(f)
@@ -72,7 +74,7 @@ def check_embeds_already_exist(self):
                 if (
                     self.model_name in d.stem
                     and Path(self.audio_dir).stem in d.stem
-                    and self.embedding_model in d.stem
+                    and self.model_name in d.stem
                 ):
 
                     num_files = len(
@@ -136,7 +138,7 @@ def get_embeddings(self):
                 {"embedding_files": [], "embedding_dimensions": []}
             )
             self.embed_dir = Path(self.umap_parent_dir).joinpath(
-                self.get_timestamp_dir() + f"-{self.embedding_model}"
+                self.get_timestamp_dir() + f"-{self.model_name}"
             )
 
     def get_embedding_dir(self):
@@ -157,7 +159,7 @@ def get_embedding_dir(self):
         embed_dirs = [
             d
             for d in self.embed_parent_dir.iterdir()
-            if self.audio_dir.stem in d.stem and self.embedding_model in d.stem
+            if self.audio_dir.stem in d.stem and self.model_name in d.stem
         ]
         # check if timestamp of umap is after timestamp of model embeddings
         embed_dirs.sort()
@@ -217,7 +219,8 @@ def get_embeddings_from_model(self, sample):
             samples = self.model.preprocess(frames)
         start = time.time()
 
-        embeds = self.model(samples)
+        batched_samples = self.model.init_dataloader(samples)
+        embeds = self.model.batch_inference(batched_samples)
         if not isinstance(embeds, np.ndarray):
             embeds = embeds.numpy()
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		# Evaluation scripts and data to analyze model performance
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		# Metrics generated during evaluation will be placed here
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		# Plots generated during evaluation will be placed here