Add Word-Level Alignment #215

ibevers · 2024-12-05T17:07:36Z

Description

Fixes a bug that was leading to incorrect alignments.
Updates the output to give transcription, sentence, word, and character timestamps.
Adds a recursive evaluation function for ensuring the differences in timestamps between two alignments are under a certain threshold.
Adds a test that validates that the generated alignments match a publicly available alignment with a tolerance of .1 second for every start and end timestamp for every aligned level present in both alignments.
Uses SenseLab native data types more (can be fully converted to SenseLab types in the future).

Related Issue(s)

Motivation and Context

Corrects buggy forced alignment and adds more granular alignment (word and character), which is useful for various purposes. This is a character-based approach. I plan to add a phoneme-based approach in the future to compare to this.

How Has This Been Tested?

Added test to compare with publicly available alignment. Plan to do more comprehensive evaluation on TIMIT as soon as possible.

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

I have added tests to cover my changes.
All new and existing tests passed.
My code follows the code style of this project.

ibevers · 2024-12-05T17:15:55Z

Identify where the timestamps are calculated for indices
Ensure indices for words are present at that point
Update to calculate timestamps for words
Ensure word timestamps are stored in final result

ibevers · 2024-12-17T16:54:23Z

Update 12/09:
Reviewed code and identified what needs to be changed. Still figuring out how exactly it should be changed.

Update 12/17:
The code works and returns an aligned ScriptLine for sentences, words, and characters. I am now fixing discrepancies with the previous tests. I also plan to add a correctness test for at least one labeled example.

… timestamps

ibevers · 2024-12-19T17:51:56Z

@fabiocat93 Would you kindly allow the workflow to run tests on this? All the forced alignment-relevant tests pass. I haven't run all the tests locally because I still have to address the squim issue.

fabiocat93

I confirm forced alignment tests don't pass (the rest of the tests pass with no problem). I have also left some more comments. Moreover, I would suggest uncommenting and fixing transcribe_timestamped and relative tests.

fabiocat93 · 2024-12-23T16:09:13Z

src/tests/audio/tasks/forced_alignment_test.py

@@ -182,7 +172,8 @@ def test_align_transcriptions_fixture(resampled_mono_audio_sample: Audio, script
    aligned_transcriptions = align_transcriptions(audios_and_transcriptions_and_language)
    assert len(aligned_transcriptions) == 2
    assert len(aligned_transcriptions[0]) == 1
-    assert aligned_transcriptions[0][0].text == "test"
+    if aligned_transcriptions[0][0]:
+        assert aligned_transcriptions[0][0].text == "test"


@ibevers, how is the expected transcript text "test"?

@fabiocat93 Good point--will change to match the audio file

fabiocat93 · 2024-12-23T16:13:59Z

src/tests/audio/tasks/forced_alignment_test.py

+    audios_and_transcriptions_and_language = [
+        (resampled_had_that_curiosity_audio_sample, script_line_fixture_curiosity, Language(language_code="en"))
+    ]
+    aligned_transcriptions = align_transcriptions(audios_and_transcriptions_and_language)


aligned_transcriptions is [[ScriptLine(text='I had that curiosity beside me at this moment', speaker=None, start=0.664, end=3.098, chunks=[ScriptLine(text='I had that curiosity beside me at this moment.', speaker=None, start=0.664, end=3.098, chunks=[ScriptLine(text='I', speaker=None, start=0.664, end=0.684, chunks=[ScriptLine(text='I', speaker=None, start=0.664, end=0.684, chunks=None)]), ScriptLine(text='had', speaker=None, start=0.684, end=0.845, chunks=[ScriptLine(text='', speaker=None, start=0.684, end=0.704, chunks=None), ScriptLine(text='h', speaker=None, start=0.704, end=0.765, chunks=None), ScriptLine(text='a', speaker=None, start=0.765, end=0.805, chunks=None), ScriptLine(text='d', speaker=None, start=0.805, end=0.845, chunks=None)]), ScriptLine(text='that', speaker=None, start=0.845, end=1.026, chunks=[ScriptLine(text='', speaker=None, start=0.845, end=0.885, chunks=None), ...

Something is wrong with this. Chunks should include the words and not the entire sentence again... @ibevers

Also, why do words have chunks? What's the point? Can you clarify that to me please?

Something is wrong with this. Chunks should include the words and not the entire sentence again...

@fabiocat93 Nothing is wrong here. Input transcriptions are aligned at the sentence level, so a transcription that is one sentence long will have a chunk for the sentence.

Also, why do words have chunks? What's the point? Can you clarify that to me please?

@fabiocat93 Words have chunks for transparency into how the alignment was done and because having character-level alignments will allow easy mapping to phoneme-level alignments in the future.

We should conceptually and internally agree on what ScriptLine contains. If you imagine a theatre script, you have a collection of sentences by potentially multiple speakers represented as an ordered list of sentences. Analogously, in our case, a script can be represented as a collection of ScriptLine objects, and a ScriptLine contains one sentence (which contains words). I would have max one sentence per ScriptLine for a matter of consistency at least among us in the group

I see what you're thinking--in a theater script, a particular character might say multiple sentences in a row though, so if we make a list of sentences, it won't exactly correspond to the structure of a theater script. Previously, I was thinking of ScriptLine as a generic data structure for hierarchically representing text--I can change it to be an ordered list of sentence ScriptLine objects. Perhaps we can discuss this at Biometrics?

ibevers · 2024-12-23T17:54:17Z

I confirm forced alignment tests don't pass (the rest of the tests pass with no problem). I have also left some more comments. Moreover, I would suggest uncommenting and fixing transcribe_timestamped and relative tests.

@fabiocat93 Are you referring to these tests or other tests? They pass on my machine.

ibevers · 2024-12-23T18:01:43Z

Moreover, I would suggest uncommenting and fixing transcribe_timestamped and relative tests.

@fabiocat93 Will do, thanks!

ibevers · 2025-01-02T18:33:07Z

Moreover, I would suggest uncommenting and fixing transcribe_timestamped and relative tests.

@fabiocat93 Will do, thanks!

@fabiocat93 This is outside the scope of this PR. I will fix the transcribe_timestamped workflow here: #231

ibevers · 2025-01-02T18:43:28Z

@fabiocat93 I'm trying to figure out why the tests passed on my machine but not the GitHub actions server. I'm seeing that the forced alignment tests failed because of worker segfaults. According to ChatGPT:

Using pytest-xdist can spawn multiple workers. Wav2Vec2 (and underlying native libraries) occasionally segfaults when run in parallel on certain hardware/OS combos. On a local machine, you may be running tests serially (or on different hardware), so it doesn’t crash.

I'll try to confirm that this is the source of the issue.

ibevers · 2025-01-02T19:26:01Z

@fabiocat93 This may be related as I believe wav2vec2 uses some C++ modules under-the-hood:
pytest-dev/pytest-xdist#126

ibevers · 2025-01-02T19:47:53Z

@fabiocat93 I realized you gave me permission to run the tests on my own--thanks! I tried running them serially, and got a segfault again.

ibevers · 2025-01-02T19:57:19Z

@fabiocat93 The tests all call align_transcriptions. They all fail with a segfault in the same place: "/Users/runner/work/senselab/senselab/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 2229 in forward

ibevers · 2025-01-02T20:25:38Z

@fabiocat93 I figured out the issue!

Nested-virtualization and Metal Performance Shaders (MPS) are not supported due to the limitation of Apple's Virtualization Framework.
https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

I changed the devicetype used to CPU and macOS-tests worked. I believe this is because the _select_device_and_dtype() function selects MPS on the GitHub Actions MacOS runner.

codecov-commenter · 2025-01-02T20:51:04Z

Codecov Report

Attention: Patch coverage is 96.26168% with 4 lines in your changes missing coverage. Please review.

Project coverage is 66.39%. Comparing base (113721a) to head (650e5f5).
Report is 84 commits behind head on main.

Files with missing lines	Patch %	Lines
...b/audio/tasks/forced_alignment/forced_alignment.py	94.73%	3 Missing ⚠️
src/tests/audio/tasks/forced_alignment_test.py	97.05%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #215      +/-   ##
==========================================
+ Coverage   60.24%   66.39%   +6.15%     
==========================================
  Files         113      120       +7     
  Lines        4017     4232     +215     
==========================================
+ Hits         2420     2810     +390     
+ Misses       1597     1422     -175

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ibevers · 2025-01-02T22:06:11Z

Related: #232

fabiocat93 · 2025-01-09T22:26:35Z

@fabiocat93 I figured out the issue!

Nested-virtualization and Metal Performance Shaders (MPS) are not supported due to the limitation of Apple's Virtualization Framework.
https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

I changed the devicetype used to CPU and macOS-tests worked. I believe this is because the _select_device_and_dtype() function selects MPS on the GitHub Actions MacOS runner.

_select_device_and_dtype() selects the fastest device among those that are supported by the model. You can specify the supported devices and you can exclude MPS. This way, the model will run on CUDA or CPU only, depending on the availability

fabiocat93

It looks ok to me, but would ask you to address my comment above related to ScriptLine before merging it. I would maybe ask some others to see what they think

fabiocat93 · 2025-01-09T22:49:03Z

src/senselab/audio/tasks/forced_alignment/evaluation.py

I can see some useful scenarios for using compare_alignments. Thank you for implementing it!

No problem, glad to hear!

fabiocat93

It looks good overall, but I recommend addressing my earlier comment about ScriptLine before proceeding with the merge. You might also consider gathering feedback from others in the group to ensure alignment

ibevers · 2025-01-10T15:47:06Z

_select_device_and_dtype() selects the fastest device among those that are supported by the model. You can specify the supported devices and you can exclude MPS. This way, the model will run on CUDA or CPU only, depending on the availability

@fabiocat93 Ah, okay, good to know. I will update according!

ibevers · 2025-01-10T16:03:06Z

It looks good overall, but I recommend addressing my earlier comment about ScriptLine before proceeding with the merge. You might also consider gathering feedback from others in the group to ensure alignment

I'll add it to the Biometrics agenda to ensure alignment :)

Clean up tuple unpacking

11b3ec5

ibevers linked an issue Dec 5, 2024 that may be closed by this pull request

Task: Word-Level Alignment #214

Open

3 tasks

ibevers added 10 commits December 11, 2024 14:10

Update to calculate character timestamps

f0e0f6f

Start implementation with ScriptLine

bbac27f

Merge branch 'main' into 214-task-word-level-alignment

4a1c447

Update timestamping to use dicts

cb6d965

Update _assign_timestamps_to_characters to fully align

e8c7bb3

Clean up vestigial code and comments

525110a

Remove unused args

ed88772

Update ScriptLine conversion to use from_dict

47442df

Update comments and fix mypy errors

f177a29

Clean up long comment

f9c9b13

ibevers added 11 commits December 17, 2024 15:39

Fix 2 broken tests

eef0734

Remove vestigial test

ded8325

Add fixture with real alignment for comparison

bbd4107

Add had that curiosity audio fixture

fdaa824

Add test with had_that_curiosity audio

7f8ebd3

Adjust curiosity test

816fb76

Replace segment level whisper timestamps with more accurate alignment…

060cf4e

… timestamps

Add forced alignment alignment difference check function

5934adf

Improve evaluation function name

94e351e

Update alignment comparison to use lower case

eda1af0

Add compare_alignments to curiosity test

88cbd7b

ibevers marked this pull request as ready for review December 19, 2024 17:49

Rename assign timestamps function

754b1fe

ibevers requested a review from fabiocat93 December 19, 2024 22:09

fabiocat93 assigned ibevers Dec 23, 2024

fabiocat93 added release patch to-test to-test-gpu labels Dec 23, 2024

fabiocat93 requested changes Dec 23, 2024

View reviewed changes

Merge branch 'main' into 214-task-word-level-alignment

0429de0

ibevers mentioned this pull request Jan 2, 2025

Fix transcribe_timestamped #231

Draft

6 tasks

ibevers force-pushed the 214-task-word-level-alignment branch from 54cdcaa to 0429de0 Compare January 2, 2025 20:27

ibevers mentioned this pull request Jan 2, 2025

Bug: _select_device_and_dtype does not work on virtualized hardware #232

Open

Change default device type to CPU

8d04540

ibevers added 3 commits January 2, 2025 16:20

Update test_align_transcriptions_fixture to transcribe the input audio

8a7cdfa

Add aligned mono audio fixture

64a0011

Remove redundant test

650e5f5

ibevers requested a review from fabiocat93 January 2, 2025 22:06

fabiocat93 reviewed Jan 9, 2025

View reviewed changes

fabiocat93 requested changes Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Word-Level Alignment #215

Add Word-Level Alignment #215

ibevers commented Dec 5, 2024 •

edited

Loading

ibevers commented Dec 5, 2024 •

edited

Loading

ibevers commented Dec 17, 2024

ibevers commented Dec 19, 2024

fabiocat93 left a comment

fabiocat93 Dec 23, 2024

ibevers Dec 23, 2024

fabiocat93 Dec 23, 2024

fabiocat93 Dec 23, 2024

ibevers Dec 23, 2024 •

edited

Loading

ibevers Dec 23, 2024 •

edited

Loading

fabiocat93 Jan 9, 2025

ibevers Jan 10, 2025

ibevers commented Dec 23, 2024 •

edited

Loading

ibevers commented Dec 23, 2024

ibevers commented Jan 2, 2025 •

edited

Loading

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

codecov-commenter commented Jan 2, 2025 •

edited

Loading

ibevers commented Jan 2, 2025

fabiocat93 commented Jan 9, 2025

fabiocat93 left a comment

fabiocat93 Jan 9, 2025

ibevers Jan 10, 2025

fabiocat93 left a comment

ibevers commented Jan 10, 2025

ibevers commented Jan 10, 2025

Add Word-Level Alignment #215

Are you sure you want to change the base?

Add Word-Level Alignment #215

Conversation

ibevers commented Dec 5, 2024 • edited Loading

Description

Related Issue(s)

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

ibevers commented Dec 5, 2024 • edited Loading

ibevers commented Dec 17, 2024

ibevers commented Dec 19, 2024

fabiocat93 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibevers Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

ibevers Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ibevers commented Dec 23, 2024 • edited Loading

ibevers commented Dec 23, 2024

ibevers commented Jan 2, 2025 • edited Loading

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

ibevers commented Jan 2, 2025

codecov-commenter commented Jan 2, 2025 • edited Loading

Codecov Report

ibevers commented Jan 2, 2025

fabiocat93 commented Jan 9, 2025

fabiocat93 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabiocat93 left a comment

Choose a reason for hiding this comment

ibevers commented Jan 10, 2025

ibevers commented Jan 10, 2025

ibevers commented Dec 5, 2024 •

edited

Loading

ibevers commented Dec 5, 2024 •

edited

Loading

ibevers Dec 23, 2024 •

edited

Loading

ibevers Dec 23, 2024 •

edited

Loading

ibevers commented Dec 23, 2024 •

edited

Loading

ibevers commented Jan 2, 2025 •

edited

Loading

codecov-commenter commented Jan 2, 2025 •

edited

Loading