Creating an AudioSplittingDataset with one Raven file- problem #821

canihaveabravo · 2023-08-29T12:42:25Z

canihaveabravo
Aug 29, 2023

Hello,

I would really appreciate your help. I am attempting to pre-process a 15-minute audio (.wav) file- split it into 60 s clips, build FFT, augmentation etc and pair the clips with boxed annotations of a antarctic blue whale call derived from a single raven selection table.

I've been using the AudioSplittingDataset function to split the 15-minute files into consecutive 60 s clips and the SpectrogramPreprocessor function to build the FFT and do the augmentation. However I am unable so far to pair the annotations within the dataset. The tutorial example for AudioSplittingDataset doesn't show how to do this. Please could you provide some instruction for this task?

Thank you very much in advance,

Lorenzo

louisfh · 2023-08-29T17:29:04Z

louisfh
Aug 29, 2023
Collaborator

Thanks for posting! Can you share some of your code and let us know what version of OpenSoundscape you're using so we can see where you're hitting a problem?

I might have misunderstood your question, but there isn't really a one-to-one matching between raven annotations and the output of a preprocessor. We handle raven annotations by turning them into labels for discrete audio segments. So for example a raven file of boxed annotations that looks like this (ignoring extra columns, like frequency, channel etc):

audio_file	raven_file	start_time	end_time	annotation
file1.wav	file1.selections.txt	5	45	beep
file1.wav	file1.selections.txt	30	80	boop
file1.wav	file1.selections.txt	190	220	long_boop

Would be read in, and then converted to a labels_dataframe

from opensoundscape.annotations import BoxedAnnotations
annotations = BoxedAnnotations.from_raven_files(["/path/to/file1.selections.txt"],audio_files=["/path/to/file1.wav"])
labels_dataframe = annotations.one_hot_clip_labels(
    full_duration=240, # The duration of the entire audio file
    clip_duration=60,  # the duration of clip you're interested in
    clip_overlap=0,      # overlap between consecutive clips
    min_label_overlap=0.25, # minimum overlap with a boxed call for an audio segment to be considered to contain that call
)

The output, labels_dataframe would look like this:

file	start_time	end_time	boop	beep	long_boop
file1.wav	0	60	1	1	0
file1.wav	60	120	1	0	0
file1.wav	120	180	0	0	0
file1.wav	180	240	0	0	1

It's this labels_dataframe we use to initialize a preprocessor, or train a model. e.g.:

from opensoundscape import CNN
model = CNN('resnet18',classes=["boop", "beep", "long_boop"],sample_duration=60.0)
model.train(train_df = labels_dataframe)

0 replies

canihaveabravo · 2023-08-30T11:48:09Z

canihaveabravo
Aug 30, 2023
Author

Hi Louis, Thanks very much for your reply and suggestions. I have now managed to process a sample audio file and set up a label_df, preprocess the audio and run a basic test CNN. Some excerpts of code below… For my next step I would like to import multiple 15-minute audio files and preprocess them to 60 s clips and create a labels dataframe containing multi-label annotations (ABW, noise, OW). The problem I currently have is that these annotations are saved in just one raven selection table (attached). Am I able to create a label dataframe from one raven file that corresponds to annotations covering multiple 15-min audio files (85 file sin this subset? Thanks for your help, much appreciated. Lorenzo # Audio file audio_file = './ABW/2019-09-04T09-30-06.wav' # Raven annotation file annotation_file = './ABW/2019-09-04T09-30-06.ABW.selections.txt' #create an object from Raven file annotations = BoxedAnnotations.from_raven_files([annotation_file], keep_extra_columns= None, audio_files=[audio_file]) #inspect the object's .df attribute, which contains the table of annotations annotations.df.head() full_duration=900, # The duration of the entire audio file clip_duration=60, clip_overlap=0, class_subset=None, min_label_overlap=0.25, final_clip=None ) labels_df.head() #initialise preprocessor and set clip length pre = SpectrogramPreprocessor(sample_duration=60) # set sample rate on audio load pre.pipeline.load_audio.set(sample_rate=1000) #adjust bandpass parameters pre.pipeline.bandpass.set(min_f=0,max_f=100) # adjust spectrogram parameters pre.pipeline.to_spec.params.window_samples = 1024 pre.pipeline.to_spec.params.overlap_fraction = 0.5 pre.pipeline.to_spec.params.decibel_limits = (-140, -10) pre.pipeline.to_spec.params.dB_scale = True pre.pipeline.to_spec.params.scaling = "spectrum" # bypass augmentations pre.pipeline.random_affine.bypass=False pre.pipeline.add_noise.bypass =True # adjust augmentation parameters pre.pipeline.time_mask.set(max_width = 0.08, max_masks = 5) pre.pipeline.frequency_mask.set(max_width = 0.05, max_masks = 5) # Create preprocessed dataset of audio clips from the original audio file dataset = AudioFileDataset(labels_df, pre) # Set output tensor shape dataset.preprocessor.out_shape = [224,224,3] # Get the first 9 samples and plot them tensors = [dataset[i].data for i in range(9)] sample_labels = [list(dataset[i].labels[dataset[i].labels>0].index) for i in range(9)] _ = show_tensor_grid(tensors,3, labels=sample_labels) ***@***.*** # Create model object classes = train_df.columns #in this case, there's just one class: ["ABW"] model = CNN('resnet18',classes=classes,sample_duration=60) model.train( train_df = train_df, validation_df = valid_df, save_path='./ABW/binary_train/', # where to save the trained model epochs=5, batch_size=8, save_interval=5, #save model every 5 epochs (the best model is always saved in addition) num_workers=0, # specify 4 if you have 4 CPU processors, eg., 0 means only the root processor ) #Let Wandb know that we finished training successfully wandb.unwatch(model.network) wandb.finish() Lorenzo Scala Senior Marine Scientist Seiche Ltd T: +44(0)1409 404050 E: ***@***.*** W: https://www.seiche.com Bradworthy Industrial Estate, Langdon Road, Bradworthy, Holsworthy, Devon, EX22 7SF, United Kingdom Seiche Ltd is registered in England & Wales. Registered number: 1468514. Registered Office: Bradworthy Industrial Estate, Langdon Road, Bradworthy, Holsworthy, Devon, EX22 7SF, United Kingdom The information contained in this e-mail transmission, and any documents, files or previous e-mail messages attached to it, is privileged and confidential, and solely intended for the use of the individual(s) to whom it is addressed. If you are not the intended recipient or a person responsible for delivering it to the intended recipient you should not read, copy, distribute or otherwise use the information, and you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this message is strictly prohibited. If you have received this message in error, please notify sender immediately and delete this e-mail and attached documents. From: Louis Freeland-Haynes ***@***.***> Sent: Tuesday, August 29, 2023 6:29 PM To: kitzeslab/opensoundscape ***@***.***> Cc: Lorenzo Scala ***@***.***>; Author ***@***.***> Subject: Re: [kitzeslab/opensoundscape] Creating an AudioSplittingDataset with one Raven file- problem (Discussion #821) CAUTION: This email originated from outside of SWTG. Thanks for posting! Can you share some of your code and let us know what version of OpenSoundscape you're using so we can see where you're hitting a problem? I might have misunderstood your question, but there isn't really a one-to-one matching between raven annotations and the output of a preprocessor. We handle raven annotations by turning them into labels for discrete audio segments. So for example a raven file of boxed annotations that looks like this (ignoring extra columns, like frequency, channel etc): audio_file raven_file start_time end_time annotation file1.wav file1.selections.txt 5 45 beep file1.wav file1.selections.txt 30 80 boop file1.wav file1.selections.txt 190 220 long_boop Would be read in, and then converted to a labels_dataframe from opensoundscape.annotations import BoxedAnnotations annotations = BoxedAnnotations.from_raven_files(["/path/to/file1.selections.txt"],audio_files=["/path/to/file1.wav"]) labels_dataframe = annotations.one_hot_clip_labels( full_duration=240, # The duration of the entire audio file clip_duration=60, # the duration of clip you're interested in clip_overlap=0, # overlap between consecutive clips min_label_overlap=0.25, # minimum overlap with a boxed call for an audio segment to be considered to contain that call ) The output, labels_dataframe would look like this: file start_time end_time boop beep long_boop file1.wav 0 60 1 1 0 file1.wav 60 120 1 0 0 file1.wav 120 180 0 0 0 file1.wav 180 240 0 0 1 It's this labels_dataframe we use to initialize a preprocessor, or train a model. e.g.: from opensoundscape import CNN model = CNN('resnet18',classes=["boop", "beep", "long_boop"],sample_duration=60.0) model.train(train_df = labels_dataframe) — Reply to this email directly, view it on GitHub<#821 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC4F3MBUZLAWGC7JH4MGIGLXXYRGXANCNFSM6AAAAAA4C4RFUA>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>> Selection View Channel Begin Time (s) End Time (s) Low Freq (Hz) High Freq (Hz) Detector Occupancy 1 Spectrogram 1 1 706.221645833 711.615395833 10.000 44.000 ABW 0.8000 2 Spectrogram 1 1 772.744562500 781.014979167 10.000 44.000 ABW 0.9130 3 Spectrogram 1 1 1007.552479167 1015.103729167 10.000 44.000 ABW 0.7619 4 Spectrogram 1 1 1080.547895833 1087.739562500 10.000 44.000 ABW 0.8500 5 Spectrogram 1 1 1184.827062500 1195.614562500 10.000 44.000 ABW 0.9000 6 Spectrogram 1 1 1258.901229167 1263.935395833 10.000 44.000 ABW 0.7857 7 Spectrogram 1 1 2045.669562500 2051.782479167 10.000 44.000 ABW 0.9412 8 Spectrogram 1 1 2119.743729167 2124.777895833 10.000 44.000 ABW 0.7143 9 Spectrogram 1 1 2864.440812500 2873.070812500 10.000 44.000 ABW 0.6250 10 Spectrogram 1 1 3049.266645833 3058.615812500 10.000 44.000 ABW 0.5769 11 Spectrogram 1 1 3193.099979167 3200.651229167 10.000 44.000 ABW 0.5238 12 Spectrogram 1 1 3270.050812500 3275.084979167 10.000 44.000 ABW 0.9286 13 Spectrogram 1 1 3495.149979167 3502.701229167 10.000 44.000 ABW 0.9048 14 Spectrogram 1 1 3883.859562500 3900.040812500 10.000 44.000 ABW 0.7556 15 Spectrogram 1 1 4457.394979167 4468.182479167 10.000 44.000 Noise 0.5667 16 Spectrogram 1 1 8254.954562500 8261.067479167 10.000 44.000 Noise 0.5882 17 Spectrogram 1 1 10833.167062500 10839.279979167 10.000 44.000 OW 0.4706 18 Spectrogram 1 1 11001.452062500 11008.643729167 10.000 44.000 OW 0.3500 19 Spectrogram 1 1 11171.534979167 11177.288312500 10.000 44.000 Noise 0.5000 20 Spectrogram 1 1 12216.843729167 12224.754562500 10.000 44.000 OW 0.4091 21 Spectrogram 1 1 12365.711229167 12373.981645833 10.000 44.000 OW 0.3043 22 Spectrogram 1 1 12612.025812500 12627.128312500 10.000 44.000 Noise 0.8095 23 Spectrogram 1 1 12706.596229167 12711.630395833 10.000 44.000 OW 0.6429 24 Spectrogram 1 1 13129.106645833 13136.298312500 10.000 44.000 OW 0.7000 25 Spectrogram 1 1 13332.990395833 13338.384145833 10.000 44.000 OW 0.8667 26 Spectrogram 1 1 13803.325395833 13810.157479167 10.000 44.000 OW 0.4211 27 Spectrogram 1 1 14027.705395833 14033.099145833 10.000 44.000 OW 0.5333 28 Spectrogram 1 1 14304.584562500 14312.854979167 10.000 44.000 Noise 0.7826 29 Spectrogram 1 1 14852.229979167 14859.062062500 10.000 44.000 OW 0.3158 30 Spectrogram 1 1 15253.884562500 15260.357062500 10.000 44.000 Noise 0.3889 31 Spectrogram 1 1 17387.292479167 17393.045812500 10.000 44.000 OW 0.5625 32 Spectrogram 1 1 18650.868312500 18658.419562500 10.000 44.000 Noise 0.7619 33 Spectrogram 1 1 19605.202479167 19616.709145833 10.000 44.000 Noise 0.6250 34 Spectrogram 1 1 19763.059562500 19774.925812500 10.000 44.000 Noise 0.7273 35 Spectrogram 1 1 19981.686229167 19991.754562500 10.000 44.000 Noise 0.4286 36 Spectrogram 1 1 20503.801229167 20523.578312500 10.000 44.000 Noise 0.3636 37 Spectrogram 1 1 23614.197062500 23622.467479167 10.000 44.000 ABW 1.0000 38 Spectrogram 1 1 25120.851229167 25129.481229167 10.000 44.000 ABW 1.0000 39 Spectrogram 1 1 28141.351229167 28159.689979167 10.000 44.000 Noise 0.5686 40 Spectrogram 1 1 30071.594562500 30079.505395833 10.000 44.000 ABW 0.5000 41 Spectrogram 1 1 30201.763729167 30207.876645833 10.000 44.000 ABW 1.0000 42 Spectrogram 1 1 30349.192895833 30359.980395833 10.000 44.000 ABW 1.0000 43 Spectrogram 1 1 30421.469145833 30432.616229167 10.000 44.000 ABW 0.9677 44 Spectrogram 1 1 30495.183729167 30503.813729167 10.000 44.000 ABW 1.0000 45 Spectrogram 1 1 30568.179145833 30578.966645833 10.000 44.000 ABW 1.0000 46 Spectrogram 1 1 30641.893729167 30654.119562500 10.000 44.000 ABW 0.9706 47 Spectrogram 1 1 30765.949979167 30776.737479167 10.000 44.000 ABW 1.0000 48 Spectrogram 1 1 30837.866645833 30849.732895833 10.000 44.000 ABW 1.0000 49 Spectrogram 1 1 31001.117479167 31006.870812500 10.000 44.000 Noise 0.4375 50 Spectrogram 1 1 31028.445812500 31034.918312500 10.000 44.000 ABW 1.0000 51 Spectrogram 1 1 31248.510812500 31256.781229167 10.000 44.000 ABW 0.9565 52 Spectrogram 1 1 31320.787062500 31329.417062500 10.000 44.000 ABW 0.9583 53 Spectrogram 1 1 31392.703729167 31401.333729167 10.000 44.000 ABW 0.9583 54 Spectrogram 1 1 31461.024562500 31470.733312500 10.000 44.000 ABW 0.9630 55 Spectrogram 1 1 31559.550395833 31568.899562500 10.000 44.000 ABW 0.9615 56 Spectrogram 1 1 31706.260395833 31718.126645833 10.000 44.000 ABW 1.0000 57 Spectrogram 1 1 31779.255812500 31791.481645833 10.000 44.000 ABW 1.0000 58 Spectrogram 1 1 31971.273312500 31982.060812500 10.000 44.000 ABW 0.9333 59 Spectrogram 1 1 32045.347479167 32056.494562500 10.000 44.000 ABW 0.9355 60 Spectrogram 1 1 32615.287062500 32621.759562500 10.000 44.000 Noise 0.3889 61 Spectrogram 1 1 33826.004145833 33835.712895833 10.000 44.000 ABW 0.8889 62 Spectrogram 1 1 36420.038312500 36431.544979167 10.000 44.000 Noise 0.3438 63 Spectrogram 1 1 38712.741645833 38725.327062500 10.000 44.000 Noise 0.5143 64 Spectrogram 1 1 39195.662062500 39211.483729167 10.000 44.000 Noise 0.5682 65 Spectrogram 1 1 39997.532895833 40011.916229167 10.000 44.000 Noise 0.4750 66 Spectrogram 1 1 40115.116645833 40120.150812500 10.000 44.000 OW 0.5714 67 Spectrogram 1 1 40503.466645833 40508.500812500 10.000 44.000 OW 0.4286 68 Spectrogram 1 1 41609.185395833 41617.815395833 10.000 44.000 OW 0.9167 69 Spectrogram 1 1 41786.100395833 41796.887895833 10.000 44.000 ABW 0.5000 70 Spectrogram 1 1 41803.719979167 41812.709562500 10.000 44.000 OW 0.8400 71 Spectrogram 1 1 41862.332062500 41871.681229167 10.000 44.000 ABW 0.8846 72 Spectrogram 1 1 41927.776229167 41938.923312500 10.000 44.000 ABW 0.7742 73 Spectrogram 1 1 42002.569562500 42011.559145833 10.000 44.000 ABW 0.9200 74 Spectrogram 1 1 42059.743312500 42066.934979167 10.000 44.000 OW 0.4000 75 Spectrogram 1 1 42078.441645833 42085.633312500 10.000 44.000 ABW 1.0000 76 Spectrogram 1 1 42152.515812500 42164.382062500 10.000 44.000 ABW 0.8182 77 Spectrogram 1 1 42228.028312500 42240.613729167 10.000 44.000 ABW 0.9429 78 Spectrogram 1 1 42375.817062500 42385.166229167 10.000 44.000 ABW 0.8846 79 Spectrogram 1 1 42448.093312500 42456.004145833 10.000 44.000 ABW 1.0000 80 Spectrogram 1 1 42516.054562500 42525.044145833 10.000 44.000 ABW 0.7200 81 Spectrogram 1 1 42587.252062500 42598.039562500 10.000 44.000 ABW 0.9333 82 Spectrogram 1 1 42660.966645833 42671.754145833 10.000 44.000 ABW 0.9000 83 Spectrogram 1 1 42732.523729167 42741.153729167 10.000 44.000 ABW 0.8750 84 Spectrogram 1 1 42901.168312500 42911.236645833 10.000 44.000 ABW 0.7500 85 Spectrogram 1 1 42977.759562500 42985.310812500 10.000 44.000 ABW 0.9524 86 Spectrogram 1 1 43008.683729167 43023.786229167 10.000 44.000 OW 0.7143 87 Spectrogram 1 1 43053.631645833 43060.463729167 10.000 44.000 ABW 0.8947 88 Spectrogram 1 1 43127.346229167 43137.054979167 10.000 44.000 ABW 0.8148 89 Spectrogram 1 1 43196.386229167 43206.814145833 10.000 44.000 OW 1.0000 90 Spectrogram 1 1 43224.433729167 43231.265812500 10.000 44.000 ABW 0.9474 91 Spectrogram 1 1 43257.515395833 43270.100812500 10.000 44.000 Noise 0.6000 92 Spectrogram 1 1 43384.088729167 43395.595395833 10.000 44.000 OW 1.0000 93 Spectrogram 1 1 43442.341229167 43452.049979167 10.000 44.000 ABW 0.9630 94 Spectrogram 1 1 43513.898312500 43523.966645833 10.000 44.000 ABW 0.9643 95 Spectrogram 1 1 43658.091229167 43666.002062500 10.000 44.000 ABW 1.0000 96 Spectrogram 1 1 43710.230812500 43719.220395833 10.000 44.000 ABW 0.9200 97 Spectrogram 1 1 43788.260395833 43802.284145833 10.000 44.000 Noise 0.5128 98 Spectrogram 1 1 43842.917062500 43851.187479167 10.000 44.000 ABW 1.0000 99 Spectrogram 1 1 43918.069979167 43923.463729167 10.000 44.000 ABW 0.7333 100 Spectrogram 1 1 43987.829145833 43996.818729167 10.000 44.000 ABW 0.6000 101 Spectrogram 1 1 44075.927062500 44085.276229167 10.000 44.000 OW 0.8846 102 Spectrogram 1 1 44129.504979167 44134.898729167 10.000 44.000 ABW 0.8000 103 Spectrogram 1 1 44261.831645833 44274.776645833 10.000 44.000 Noise 0.8333 104 Spectrogram 1 1 44456.006645833 44465.715395833 10.000 44.000 OW 0.8148 105 Spectrogram 1 1 44782.508312500 44788.261645833 10.000 44.000 OW 0.7500 106 Spectrogram 1 1 44899.732479167 44914.834979167 10.000 44.000 Noise 0.7619 107 Spectrogram 1 1 45057.949145833 45066.579145833 10.000 44.000 OW 0.9583 108 Spectrogram 1 1 45151.800395833 45158.632479167 10.000 44.000 ABW 0.8947 109 Spectrogram 1 1 45279.452479167 45290.239979167 10.000 44.000 Noise 0.5000 110 Spectrogram 1 1 48189.200812500 48195.313729167 10.000 44.000 ABW 0.5294 111 Spectrogram 1 1 48261.836645833 48270.826229167 10.000 44.000 ABW 0.7600 112 Spectrogram 1 1 48331.955395833 48343.102479167 10.000 44.000 ABW 0.3226 113 Spectrogram 1 1 48410.704145833 48418.614979167 10.000 44.000 ABW 0.7727 114 Spectrogram 1 1 49170.503729167 49177.335812500 10.000 44.000 ABW 0.9474 115 Spectrogram 1 1 49242.779979167 49251.769562500 10.000 44.000 ABW 0.8400 116 Spectrogram 1 1 49319.371229167 49328.720395833 10.000 44.000 ABW 0.4615 117 Spectrogram 1 1 49399.558312500 49407.109562500 10.000 44.000 ABW 1.0000 118 Spectrogram 1 1 49478.666645833 49487.296645833 10.000 44.000 ABW 0.9583 119 Spectrogram 1 1 49554.179145833 49559.572895833 10.000 44.000 ABW 1.0000 120 Spectrogram 1 1 49624.297895833 49629.332062500 10.000 44.000 ABW 0.5000 121 Spectrogram 1 1 49683.629145833 49700.169979167 10.000 44.000 Noise 0.6957 122 Spectrogram 1 1 49766.333312500 49774.603729167 10.000 44.000 ABW 0.3913 123 Spectrogram 1 1 49824.226229167 49829.619979167 10.000 44.000 Noise 0.4667 124 Spectrogram 1 1 49907.649562500 49918.796645833 10.000 44.000 Noise 0.6452 125 Spectrogram 1 1 50440.192479167 50448.462895833 10.000 44.000 Noise 0.3043 126 Spectrogram 1 1 50489.814979167 50502.400395833 10.000 44.000 ABW 0.4571 127 Spectrogram 1 1 51576.475812500 51585.824979167 10.000 44.000 ABW 0.7308 128 Spectrogram 1 1 51903.337062500 51920.237479167 10.000 44.000 Noise 0.5957 129 Spectrogram 1 1 52399.562062500 52408.911229167 10.000 44.000 ABW 0.3077 130 Spectrogram 1 1 52468.961645833 52477.591645833 10.000 44.000 ABW 0.9583 131 Spectrogram 1 1 54615.314562500 54627.540395833 10.000 44.000 Noise 0.3235 132 Spectrogram 1 1 55712.762895833 55723.190812500 10.000 44.000 Noise 0.3103 133 Spectrogram 1 1 55826.031645833 55831.425395833 10.000 44.000 Noise 0.3333 134 Spectrogram 1 1 56225.528729167 56244.586645833 10.000 44.000 Noise 0.8868 135 Spectrogram 1 1 57152.894145833 57166.198729167 10.000 44.000 Noise 0.7027 136 Spectrogram 1 1 59461.419145833 59473.285395833 10.000 44.000 Noise 0.5152 137 Spectrogram 1 1 65877.464562500 65882.858312500 10.000 44.000 Noise 0.3333 138 Spectrogram 1 1 66796.919145833 66805.189562500 10.000 44.000 Noise 0.4348 139 Spectrogram 1 1 67812.742062500 67827.844562500 10.000 44.000 Noise 0.5714 140 Spectrogram 1 1 69002.962895833 69016.627062500 10.000 44.000 Noise 0.3158 141 Spectrogram 1 1 69022.739979167 69040.359562500 10.000 44.000 Noise 0.9592 142 Spectrogram 1 1 69378.367895833 69389.874562500 10.000 44.000 Noise 0.8438 143 Spectrogram 1 1 73287.757895833 73294.589979167 10.000 44.000 OW 0.3684 144 Spectrogram 1 1 73935.007895833 73945.795395833 10.000 44.000 ABW 0.5667 145 Spectrogram 1 1 74579.740812500 74586.932479167 10.000 44.000 OW 0.3500 146 Spectrogram 1 1 75045.041645833 75052.233312500 10.000 44.000 OW 0.6500 147 Spectrogram 1 1 75333.067895833 75339.180812500 10.000 44.000 OW 0.4118 148 Spectrogram 1 1 75423.323312500 75431.593729167 10.000 44.000 ABW 0.5652 149 Spectrogram 1 1 75574.348312500 75584.776229167 10.000 44.000 ABW 0.8276

10 replies

sammlapp Sep 5, 2023
Maintainer

Hi, yes it will be straightforward to transform this table into the types of tables expected by the BoxedAnnotations.from_raven_files() method. In Python, you could load this table and split it into one annotation file per audio file. Roughly something like this:

Here I'm assuming that "Begin File" is the column containing the audio file name, and that the audio files are in some folder called /path/to/audio.

import pandas as pd
from pathlib import Path

df = pd.read_csv('/path/to/Selection Table.txt',delimiter='\t')

# convert audio file names to full paths
audio_root = Path('/path/to/audio')

# choose where to save annotation files
annotation_root = Path('/path/to/annotations') 
df['Begin File'] = [audio_root / f for f in df['Begin File']]

# convert start time and end time to be relative to audio file start
# adjust the math if this is incorrect
df['Begin Time (s)'] = df['File Offset (s)']
df['End Time (s)'] = df['End Time (s)']+df['File Offset (s)']

# split into one table per audio file. Give it the same name as the audio file
annotation_files = []
for audio_file in df['Begin File']:
    # select rows with this file's annotations
    subset_df = df[df['Begin File']==audio_file]
    annotation_path = annotation_root / (Path(audio_file).stem + '.selections.txt')
    subset_df.to_csv(annotation_path,sep='\t') # saves tab-delimited annotation table
    annotation_files.append(annotation_path) 

# save a table listing each audio and corresponding annotation file path
pd.DataFrame({'audio_file':df['Begin File'],'annotation_file':annotation_files}).to_csv('./audio_and_annotation_files.csv')

I haven't run this code because I don't have your files, so it may have some bugs

canihaveabravo Sep 7, 2023
Author

Thank you very much for providing this code. This has resolved my raven table issue and I'm now able to train my dataset. Fingers crossed for a good outcome. Thanks again.

canihaveabravo Sep 20, 2023
Author

Hello- Is there a way to instruct the CNN model to ignore unlabelled clips during training? I have a large dataset of audio files that have one hot label clips length 60 s but many of these clips do not contain a label. There might be perhaps 1 annotation in a 15 min file of 15 x 60 s clips. So that's 14 empty clips that pass through the training epoch. Am I understanding that right?

Thanks very much for your advice.

louisfh Sep 22, 2023
Collaborator

If your 'unlabelled' clips are truly unlabelled - i.e. they've not been annotated and you don't know what they contain, you will want to drop them from the training set. However, if you mean that the unlabelled clips are negatives, i.e. you've listened to them and you know they don't contain any of the sounds of interest you're training your model for, you will want to keep these for training. If your training set is very imbalanced, with many more negatives than positives, you can balance the dataset. A simple way to do this would be to 'upsample' the positives, so you have the same number of positives as negatives.

So if you already have a labels_dataframe that looks like this:

file	start_time	end_time	boop	beep	long_boop
file1.wav	0	60	1	1	0
file1.wav	60	120	1	0	0
file1.wav	120	180	0	0	0
file1.wav	180	240	0	0	1

And the dataframe's index is a multi-index of file, start_time, end_time. (Which will be the case if you've e.g. used our annotations module to make this dataframe)

Here's a simple way of doing this:

# find all the rows with at least one non-zero value in one of the columns 
# These contain at least one of your classes of interest and are positives
positives = labels_dataframe[labels_dataframe.sum(axis=1) > 0]

# find all the rows with all 0s in the columns. These are your negatives
negatives = labels_dataframe[labels_dataframe.sum(axis=1) == 0]

# if there are many more negatives than positives, upsample your positives so you have a balanced training set
positives = positives.sample(len(n_negatives), replace=True)

# create a new training set of your negatives and positives
training_set = pd.concat([positives, negatives])

You might want to do something different, e.g. if you have tonnes of negatives and your model will take too long to train, you might actually want to downsample your negatives. Or you might want to balance each of the classes if you have multiple classes.

canihaveabravo Sep 26, 2023
Author

Thanks for your help :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating an AudioSplittingDataset with one Raven file- problem #821

{{title}}

Replies: 2 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Creating an AudioSplittingDataset with one Raven file- problem #821

canihaveabravo Aug 29, 2023

Replies: 2 comments · 10 replies

louisfh Aug 29, 2023 Collaborator

canihaveabravo Aug 30, 2023 Author

sammlapp Sep 5, 2023 Maintainer

canihaveabravo Sep 7, 2023 Author

canihaveabravo Sep 20, 2023 Author

louisfh Sep 22, 2023 Collaborator

canihaveabravo Sep 26, 2023 Author

canihaveabravo
Aug 29, 2023

Replies: 2 comments 10 replies

louisfh
Aug 29, 2023
Collaborator

canihaveabravo
Aug 30, 2023
Author

sammlapp Sep 5, 2023
Maintainer

canihaveabravo Sep 7, 2023
Author

canihaveabravo Sep 20, 2023
Author

louisfh Sep 22, 2023
Collaborator

canihaveabravo Sep 26, 2023
Author