Creating an AudioSplittingDataset with one Raven file- problem #821
Replies: 2 comments 10 replies
-
Thanks for posting! Can you share some of your code and let us know what version of OpenSoundscape you're using so we can see where you're hitting a problem? I might have misunderstood your question, but there isn't really a one-to-one matching between raven annotations and the output of a preprocessor. We handle raven annotations by turning them into labels for discrete audio segments. So for example a raven file of boxed annotations that looks like this (ignoring extra columns, like frequency, channel etc):
Would be read in, and then converted to a labels_dataframe
The output,
It's this labels_dataframe we use to initialize a preprocessor, or train a model. e.g.:
|
Beta Was this translation helpful? Give feedback.
-
Hi Louis,
Thanks very much for your reply and suggestions. I have now managed to process a sample audio file and set up a label_df, preprocess the audio and run a basic test CNN. Some excerpts of code below…
For my next step I would like to import multiple 15-minute audio files and preprocess them to 60 s clips and create a labels dataframe containing multi-label annotations (ABW, noise, OW). The problem I currently have is that these annotations are saved in just one raven selection table (attached). Am I able to create a label dataframe from one raven file that corresponds to annotations covering multiple 15-min audio files (85 file sin this subset?
Thanks for your help, much appreciated.
Lorenzo
# Audio file
audio_file = './ABW/2019-09-04T09-30-06.wav'
# Raven annotation file
annotation_file = './ABW/2019-09-04T09-30-06.ABW.selections.txt'
#create an object from Raven file
annotations = BoxedAnnotations.from_raven_files([annotation_file], keep_extra_columns= None, audio_files=[audio_file])
#inspect the object's .df attribute, which contains the table of annotations
annotations.df.head()
full_duration=900, # The duration of the entire audio file
clip_duration=60,
clip_overlap=0,
class_subset=None,
min_label_overlap=0.25,
final_clip=None
)
labels_df.head()
#initialise preprocessor and set clip length
pre = SpectrogramPreprocessor(sample_duration=60)
# set sample rate on audio load
pre.pipeline.load_audio.set(sample_rate=1000)
#adjust bandpass parameters
pre.pipeline.bandpass.set(min_f=0,max_f=100)
# adjust spectrogram parameters
pre.pipeline.to_spec.params.window_samples = 1024
pre.pipeline.to_spec.params.overlap_fraction = 0.5
pre.pipeline.to_spec.params.decibel_limits = (-140, -10)
pre.pipeline.to_spec.params.dB_scale = True
pre.pipeline.to_spec.params.scaling = "spectrum"
# bypass augmentations
pre.pipeline.random_affine.bypass=False
pre.pipeline.add_noise.bypass =True
# adjust augmentation parameters
pre.pipeline.time_mask.set(max_width = 0.08, max_masks = 5)
pre.pipeline.frequency_mask.set(max_width = 0.05, max_masks = 5)
# Create preprocessed dataset of audio clips from the original audio file
dataset = AudioFileDataset(labels_df, pre)
# Set output tensor shape
dataset.preprocessor.out_shape = [224,224,3]
# Get the first 9 samples and plot them
tensors = [dataset[i].data for i in range(9)]
sample_labels = [list(dataset[i].labels[dataset[i].labels>0].index) for i in range(9)]
_ = show_tensor_grid(tensors,3, labels=sample_labels)
***@***.***
# Create model object
classes = train_df.columns #in this case, there's just one class: ["ABW"]
model = CNN('resnet18',classes=classes,sample_duration=60)
model.train(
train_df = train_df,
validation_df = valid_df,
save_path='./ABW/binary_train/', # where to save the trained model
epochs=5,
batch_size=8,
save_interval=5, #save model every 5 epochs (the best model is always saved in addition)
num_workers=0, # specify 4 if you have 4 CPU processors, eg., 0 means only the root processor
)
#Let Wandb know that we finished training successfully
wandb.unwatch(model.network)
wandb.finish()
Lorenzo Scala
Senior Marine Scientist
Seiche Ltd
T:
+44(0)1409 404050
E: ***@***.***
W: https://www.seiche.com
Bradworthy Industrial Estate, Langdon Road, Bradworthy, Holsworthy, Devon, EX22 7SF, United Kingdom
Seiche Ltd is registered in England & Wales. Registered number: 1468514. Registered Office: Bradworthy Industrial Estate, Langdon Road, Bradworthy, Holsworthy, Devon, EX22 7SF, United Kingdom
The information contained in this e-mail transmission, and any documents, files or previous e-mail messages attached to it, is privileged and confidential, and solely intended for the use of the individual(s) to whom it is addressed. If you are not the intended recipient or a person responsible for delivering it to the intended recipient you should not read, copy, distribute or otherwise use the information, and you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this message is strictly prohibited. If you have received this message in error, please notify sender immediately and delete this e-mail and attached documents.
From: Louis Freeland-Haynes ***@***.***>
Sent: Tuesday, August 29, 2023 6:29 PM
To: kitzeslab/opensoundscape ***@***.***>
Cc: Lorenzo Scala ***@***.***>; Author ***@***.***>
Subject: Re: [kitzeslab/opensoundscape] Creating an AudioSplittingDataset with one Raven file- problem (Discussion #821)
CAUTION: This email originated from outside of SWTG.
Thanks for posting! Can you share some of your code and let us know what version of OpenSoundscape you're using so we can see where you're hitting a problem?
I might have misunderstood your question, but there isn't really a one-to-one matching between raven annotations and the output of a preprocessor. We handle raven annotations by turning them into labels for discrete audio segments. So for example a raven file of boxed annotations that looks like this (ignoring extra columns, like frequency, channel etc):
audio_file
raven_file
start_time
end_time
annotation
file1.wav
file1.selections.txt
5
45
beep
file1.wav
file1.selections.txt
30
80
boop
file1.wav
file1.selections.txt
190
220
long_boop
Would be read in, and then converted to a labels_dataframe
from opensoundscape.annotations import BoxedAnnotations
annotations = BoxedAnnotations.from_raven_files(["/path/to/file1.selections.txt"],audio_files=["/path/to/file1.wav"])
labels_dataframe = annotations.one_hot_clip_labels(
full_duration=240, # The duration of the entire audio file
clip_duration=60, # the duration of clip you're interested in
clip_overlap=0, # overlap between consecutive clips
min_label_overlap=0.25, # minimum overlap with a boxed call for an audio segment to be considered to contain that call
)
The output, labels_dataframe would look like this:
file
start_time
end_time
boop
beep
long_boop
file1.wav
0
60
1
1
0
file1.wav
60
120
1
0
0
file1.wav
120
180
0
0
0
file1.wav
180
240
0
0
1
It's this labels_dataframe we use to initialize a preprocessor, or train a model. e.g.:
from opensoundscape import CNN
model = CNN('resnet18',classes=["boop", "beep", "long_boop"],sample_duration=60.0)
model.train(train_df = labels_dataframe)
—
Reply to this email directly, view it on GitHub<#821 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC4F3MBUZLAWGC7JH4MGIGLXXYRGXANCNFSM6AAAAAA4C4RFUA>.
You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>
Selection View Channel Begin Time (s) End Time (s) Low Freq (Hz) High Freq (Hz) Detector Occupancy
1 Spectrogram 1 1 706.221645833 711.615395833 10.000 44.000 ABW 0.8000
2 Spectrogram 1 1 772.744562500 781.014979167 10.000 44.000 ABW 0.9130
3 Spectrogram 1 1 1007.552479167 1015.103729167 10.000 44.000 ABW 0.7619
4 Spectrogram 1 1 1080.547895833 1087.739562500 10.000 44.000 ABW 0.8500
5 Spectrogram 1 1 1184.827062500 1195.614562500 10.000 44.000 ABW 0.9000
6 Spectrogram 1 1 1258.901229167 1263.935395833 10.000 44.000 ABW 0.7857
7 Spectrogram 1 1 2045.669562500 2051.782479167 10.000 44.000 ABW 0.9412
8 Spectrogram 1 1 2119.743729167 2124.777895833 10.000 44.000 ABW 0.7143
9 Spectrogram 1 1 2864.440812500 2873.070812500 10.000 44.000 ABW 0.6250
10 Spectrogram 1 1 3049.266645833 3058.615812500 10.000 44.000 ABW 0.5769
11 Spectrogram 1 1 3193.099979167 3200.651229167 10.000 44.000 ABW 0.5238
12 Spectrogram 1 1 3270.050812500 3275.084979167 10.000 44.000 ABW 0.9286
13 Spectrogram 1 1 3495.149979167 3502.701229167 10.000 44.000 ABW 0.9048
14 Spectrogram 1 1 3883.859562500 3900.040812500 10.000 44.000 ABW 0.7556
15 Spectrogram 1 1 4457.394979167 4468.182479167 10.000 44.000 Noise 0.5667
16 Spectrogram 1 1 8254.954562500 8261.067479167 10.000 44.000 Noise 0.5882
17 Spectrogram 1 1 10833.167062500 10839.279979167 10.000 44.000 OW 0.4706
18 Spectrogram 1 1 11001.452062500 11008.643729167 10.000 44.000 OW 0.3500
19 Spectrogram 1 1 11171.534979167 11177.288312500 10.000 44.000 Noise 0.5000
20 Spectrogram 1 1 12216.843729167 12224.754562500 10.000 44.000 OW 0.4091
21 Spectrogram 1 1 12365.711229167 12373.981645833 10.000 44.000 OW 0.3043
22 Spectrogram 1 1 12612.025812500 12627.128312500 10.000 44.000 Noise 0.8095
23 Spectrogram 1 1 12706.596229167 12711.630395833 10.000 44.000 OW 0.6429
24 Spectrogram 1 1 13129.106645833 13136.298312500 10.000 44.000 OW 0.7000
25 Spectrogram 1 1 13332.990395833 13338.384145833 10.000 44.000 OW 0.8667
26 Spectrogram 1 1 13803.325395833 13810.157479167 10.000 44.000 OW 0.4211
27 Spectrogram 1 1 14027.705395833 14033.099145833 10.000 44.000 OW 0.5333
28 Spectrogram 1 1 14304.584562500 14312.854979167 10.000 44.000 Noise 0.7826
29 Spectrogram 1 1 14852.229979167 14859.062062500 10.000 44.000 OW 0.3158
30 Spectrogram 1 1 15253.884562500 15260.357062500 10.000 44.000 Noise 0.3889
31 Spectrogram 1 1 17387.292479167 17393.045812500 10.000 44.000 OW 0.5625
32 Spectrogram 1 1 18650.868312500 18658.419562500 10.000 44.000 Noise 0.7619
33 Spectrogram 1 1 19605.202479167 19616.709145833 10.000 44.000 Noise 0.6250
34 Spectrogram 1 1 19763.059562500 19774.925812500 10.000 44.000 Noise 0.7273
35 Spectrogram 1 1 19981.686229167 19991.754562500 10.000 44.000 Noise 0.4286
36 Spectrogram 1 1 20503.801229167 20523.578312500 10.000 44.000 Noise 0.3636
37 Spectrogram 1 1 23614.197062500 23622.467479167 10.000 44.000 ABW 1.0000
38 Spectrogram 1 1 25120.851229167 25129.481229167 10.000 44.000 ABW 1.0000
39 Spectrogram 1 1 28141.351229167 28159.689979167 10.000 44.000 Noise 0.5686
40 Spectrogram 1 1 30071.594562500 30079.505395833 10.000 44.000 ABW 0.5000
41 Spectrogram 1 1 30201.763729167 30207.876645833 10.000 44.000 ABW 1.0000
42 Spectrogram 1 1 30349.192895833 30359.980395833 10.000 44.000 ABW 1.0000
43 Spectrogram 1 1 30421.469145833 30432.616229167 10.000 44.000 ABW 0.9677
44 Spectrogram 1 1 30495.183729167 30503.813729167 10.000 44.000 ABW 1.0000
45 Spectrogram 1 1 30568.179145833 30578.966645833 10.000 44.000 ABW 1.0000
46 Spectrogram 1 1 30641.893729167 30654.119562500 10.000 44.000 ABW 0.9706
47 Spectrogram 1 1 30765.949979167 30776.737479167 10.000 44.000 ABW 1.0000
48 Spectrogram 1 1 30837.866645833 30849.732895833 10.000 44.000 ABW 1.0000
49 Spectrogram 1 1 31001.117479167 31006.870812500 10.000 44.000 Noise 0.4375
50 Spectrogram 1 1 31028.445812500 31034.918312500 10.000 44.000 ABW 1.0000
51 Spectrogram 1 1 31248.510812500 31256.781229167 10.000 44.000 ABW 0.9565
52 Spectrogram 1 1 31320.787062500 31329.417062500 10.000 44.000 ABW 0.9583
53 Spectrogram 1 1 31392.703729167 31401.333729167 10.000 44.000 ABW 0.9583
54 Spectrogram 1 1 31461.024562500 31470.733312500 10.000 44.000 ABW 0.9630
55 Spectrogram 1 1 31559.550395833 31568.899562500 10.000 44.000 ABW 0.9615
56 Spectrogram 1 1 31706.260395833 31718.126645833 10.000 44.000 ABW 1.0000
57 Spectrogram 1 1 31779.255812500 31791.481645833 10.000 44.000 ABW 1.0000
58 Spectrogram 1 1 31971.273312500 31982.060812500 10.000 44.000 ABW 0.9333
59 Spectrogram 1 1 32045.347479167 32056.494562500 10.000 44.000 ABW 0.9355
60 Spectrogram 1 1 32615.287062500 32621.759562500 10.000 44.000 Noise 0.3889
61 Spectrogram 1 1 33826.004145833 33835.712895833 10.000 44.000 ABW 0.8889
62 Spectrogram 1 1 36420.038312500 36431.544979167 10.000 44.000 Noise 0.3438
63 Spectrogram 1 1 38712.741645833 38725.327062500 10.000 44.000 Noise 0.5143
64 Spectrogram 1 1 39195.662062500 39211.483729167 10.000 44.000 Noise 0.5682
65 Spectrogram 1 1 39997.532895833 40011.916229167 10.000 44.000 Noise 0.4750
66 Spectrogram 1 1 40115.116645833 40120.150812500 10.000 44.000 OW 0.5714
67 Spectrogram 1 1 40503.466645833 40508.500812500 10.000 44.000 OW 0.4286
68 Spectrogram 1 1 41609.185395833 41617.815395833 10.000 44.000 OW 0.9167
69 Spectrogram 1 1 41786.100395833 41796.887895833 10.000 44.000 ABW 0.5000
70 Spectrogram 1 1 41803.719979167 41812.709562500 10.000 44.000 OW 0.8400
71 Spectrogram 1 1 41862.332062500 41871.681229167 10.000 44.000 ABW 0.8846
72 Spectrogram 1 1 41927.776229167 41938.923312500 10.000 44.000 ABW 0.7742
73 Spectrogram 1 1 42002.569562500 42011.559145833 10.000 44.000 ABW 0.9200
74 Spectrogram 1 1 42059.743312500 42066.934979167 10.000 44.000 OW 0.4000
75 Spectrogram 1 1 42078.441645833 42085.633312500 10.000 44.000 ABW 1.0000
76 Spectrogram 1 1 42152.515812500 42164.382062500 10.000 44.000 ABW 0.8182
77 Spectrogram 1 1 42228.028312500 42240.613729167 10.000 44.000 ABW 0.9429
78 Spectrogram 1 1 42375.817062500 42385.166229167 10.000 44.000 ABW 0.8846
79 Spectrogram 1 1 42448.093312500 42456.004145833 10.000 44.000 ABW 1.0000
80 Spectrogram 1 1 42516.054562500 42525.044145833 10.000 44.000 ABW 0.7200
81 Spectrogram 1 1 42587.252062500 42598.039562500 10.000 44.000 ABW 0.9333
82 Spectrogram 1 1 42660.966645833 42671.754145833 10.000 44.000 ABW 0.9000
83 Spectrogram 1 1 42732.523729167 42741.153729167 10.000 44.000 ABW 0.8750
84 Spectrogram 1 1 42901.168312500 42911.236645833 10.000 44.000 ABW 0.7500
85 Spectrogram 1 1 42977.759562500 42985.310812500 10.000 44.000 ABW 0.9524
86 Spectrogram 1 1 43008.683729167 43023.786229167 10.000 44.000 OW 0.7143
87 Spectrogram 1 1 43053.631645833 43060.463729167 10.000 44.000 ABW 0.8947
88 Spectrogram 1 1 43127.346229167 43137.054979167 10.000 44.000 ABW 0.8148
89 Spectrogram 1 1 43196.386229167 43206.814145833 10.000 44.000 OW 1.0000
90 Spectrogram 1 1 43224.433729167 43231.265812500 10.000 44.000 ABW 0.9474
91 Spectrogram 1 1 43257.515395833 43270.100812500 10.000 44.000 Noise 0.6000
92 Spectrogram 1 1 43384.088729167 43395.595395833 10.000 44.000 OW 1.0000
93 Spectrogram 1 1 43442.341229167 43452.049979167 10.000 44.000 ABW 0.9630
94 Spectrogram 1 1 43513.898312500 43523.966645833 10.000 44.000 ABW 0.9643
95 Spectrogram 1 1 43658.091229167 43666.002062500 10.000 44.000 ABW 1.0000
96 Spectrogram 1 1 43710.230812500 43719.220395833 10.000 44.000 ABW 0.9200
97 Spectrogram 1 1 43788.260395833 43802.284145833 10.000 44.000 Noise 0.5128
98 Spectrogram 1 1 43842.917062500 43851.187479167 10.000 44.000 ABW 1.0000
99 Spectrogram 1 1 43918.069979167 43923.463729167 10.000 44.000 ABW 0.7333
100 Spectrogram 1 1 43987.829145833 43996.818729167 10.000 44.000 ABW 0.6000
101 Spectrogram 1 1 44075.927062500 44085.276229167 10.000 44.000 OW 0.8846
102 Spectrogram 1 1 44129.504979167 44134.898729167 10.000 44.000 ABW 0.8000
103 Spectrogram 1 1 44261.831645833 44274.776645833 10.000 44.000 Noise 0.8333
104 Spectrogram 1 1 44456.006645833 44465.715395833 10.000 44.000 OW 0.8148
105 Spectrogram 1 1 44782.508312500 44788.261645833 10.000 44.000 OW 0.7500
106 Spectrogram 1 1 44899.732479167 44914.834979167 10.000 44.000 Noise 0.7619
107 Spectrogram 1 1 45057.949145833 45066.579145833 10.000 44.000 OW 0.9583
108 Spectrogram 1 1 45151.800395833 45158.632479167 10.000 44.000 ABW 0.8947
109 Spectrogram 1 1 45279.452479167 45290.239979167 10.000 44.000 Noise 0.5000
110 Spectrogram 1 1 48189.200812500 48195.313729167 10.000 44.000 ABW 0.5294
111 Spectrogram 1 1 48261.836645833 48270.826229167 10.000 44.000 ABW 0.7600
112 Spectrogram 1 1 48331.955395833 48343.102479167 10.000 44.000 ABW 0.3226
113 Spectrogram 1 1 48410.704145833 48418.614979167 10.000 44.000 ABW 0.7727
114 Spectrogram 1 1 49170.503729167 49177.335812500 10.000 44.000 ABW 0.9474
115 Spectrogram 1 1 49242.779979167 49251.769562500 10.000 44.000 ABW 0.8400
116 Spectrogram 1 1 49319.371229167 49328.720395833 10.000 44.000 ABW 0.4615
117 Spectrogram 1 1 49399.558312500 49407.109562500 10.000 44.000 ABW 1.0000
118 Spectrogram 1 1 49478.666645833 49487.296645833 10.000 44.000 ABW 0.9583
119 Spectrogram 1 1 49554.179145833 49559.572895833 10.000 44.000 ABW 1.0000
120 Spectrogram 1 1 49624.297895833 49629.332062500 10.000 44.000 ABW 0.5000
121 Spectrogram 1 1 49683.629145833 49700.169979167 10.000 44.000 Noise 0.6957
122 Spectrogram 1 1 49766.333312500 49774.603729167 10.000 44.000 ABW 0.3913
123 Spectrogram 1 1 49824.226229167 49829.619979167 10.000 44.000 Noise 0.4667
124 Spectrogram 1 1 49907.649562500 49918.796645833 10.000 44.000 Noise 0.6452
125 Spectrogram 1 1 50440.192479167 50448.462895833 10.000 44.000 Noise 0.3043
126 Spectrogram 1 1 50489.814979167 50502.400395833 10.000 44.000 ABW 0.4571
127 Spectrogram 1 1 51576.475812500 51585.824979167 10.000 44.000 ABW 0.7308
128 Spectrogram 1 1 51903.337062500 51920.237479167 10.000 44.000 Noise 0.5957
129 Spectrogram 1 1 52399.562062500 52408.911229167 10.000 44.000 ABW 0.3077
130 Spectrogram 1 1 52468.961645833 52477.591645833 10.000 44.000 ABW 0.9583
131 Spectrogram 1 1 54615.314562500 54627.540395833 10.000 44.000 Noise 0.3235
132 Spectrogram 1 1 55712.762895833 55723.190812500 10.000 44.000 Noise 0.3103
133 Spectrogram 1 1 55826.031645833 55831.425395833 10.000 44.000 Noise 0.3333
134 Spectrogram 1 1 56225.528729167 56244.586645833 10.000 44.000 Noise 0.8868
135 Spectrogram 1 1 57152.894145833 57166.198729167 10.000 44.000 Noise 0.7027
136 Spectrogram 1 1 59461.419145833 59473.285395833 10.000 44.000 Noise 0.5152
137 Spectrogram 1 1 65877.464562500 65882.858312500 10.000 44.000 Noise 0.3333
138 Spectrogram 1 1 66796.919145833 66805.189562500 10.000 44.000 Noise 0.4348
139 Spectrogram 1 1 67812.742062500 67827.844562500 10.000 44.000 Noise 0.5714
140 Spectrogram 1 1 69002.962895833 69016.627062500 10.000 44.000 Noise 0.3158
141 Spectrogram 1 1 69022.739979167 69040.359562500 10.000 44.000 Noise 0.9592
142 Spectrogram 1 1 69378.367895833 69389.874562500 10.000 44.000 Noise 0.8438
143 Spectrogram 1 1 73287.757895833 73294.589979167 10.000 44.000 OW 0.3684
144 Spectrogram 1 1 73935.007895833 73945.795395833 10.000 44.000 ABW 0.5667
145 Spectrogram 1 1 74579.740812500 74586.932479167 10.000 44.000 OW 0.3500
146 Spectrogram 1 1 75045.041645833 75052.233312500 10.000 44.000 OW 0.6500
147 Spectrogram 1 1 75333.067895833 75339.180812500 10.000 44.000 OW 0.4118
148 Spectrogram 1 1 75423.323312500 75431.593729167 10.000 44.000 ABW 0.5652
149 Spectrogram 1 1 75574.348312500 75584.776229167 10.000 44.000 ABW 0.8276
|
Beta Was this translation helpful? Give feedback.
-
Hello,
I would really appreciate your help. I am attempting to pre-process a 15-minute audio (.wav) file- split it into 60 s clips, build FFT, augmentation etc and pair the clips with boxed annotations of a antarctic blue whale call derived from a single raven selection table.
I've been using the AudioSplittingDataset function to split the 15-minute files into consecutive 60 s clips and the SpectrogramPreprocessor function to build the FFT and do the augmentation. However I am unable so far to pair the annotations within the dataset. The tutorial example for AudioSplittingDataset doesn't show how to do this. Please could you provide some instruction for this task?
Thank you very much in advance,
Lorenzo
Beta Was this translation helpful? Give feedback.
All reactions