Add training data extraction to segments tab #463

kahst · 2024-10-07T15:47:23Z

We should add the extraction of training samples based on annotation files (e.g., selection tables) to the segments tab, so that people can use that to build their training data sets without having to manually extract snippets through Raven.

This is how it could work:

create two options in segments tab: Create segments for review and Create segments for training
for training segments:
- parse selection table and convert timestamps to fixed format (e.g., 3s) based on overlap with bounding box
  - 4.2-7.2 becomes 3.0-6.0 and 6.0-9.0
  - 2.9-6.2 becomes 3.0-6.0 because the overlap (in this case 0.5 is set as min) is too short to be in 0.0-3.0 or 6.0-9.0
- merge labels for each segment
- save in folder with multi-label format that is compatible with our training data workflow

max-mauermann · 2024-10-25T14:32:56Z

How do we want to handle annotations that lie exactly on the edge between two segments but dont have enough overlap with either one?
For example: 2.9-3.3

kahst · 2024-10-26T15:00:42Z

I'd say we assign these to the segment with the most overlap (in your case 3-6). Since the annotation is so short, we can still assume that a significant percentage of the marked vocalization is covered by the segment.

kahst assigned max-mauermann Oct 7, 2024

kahst added type: enhancement New feature or request area: Analyzer labels Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add training data extraction to segments tab #463

Add training data extraction to segments tab #463

kahst commented Oct 7, 2024

max-mauermann commented Oct 25, 2024

kahst commented Oct 26, 2024

Add training data extraction to segments tab #463

Add training data extraction to segments tab #463

Comments

kahst commented Oct 7, 2024

max-mauermann commented Oct 25, 2024

kahst commented Oct 26, 2024