forked from aalto-speech/speaker-diarization
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.html
145 lines (145 loc) · 10.1 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
</head>
<body>
<h1 id="speaker-diarization-scripts-readme">Speaker Diarization scripts README</h1>
<p>This README describes the various scripts available for doing manual segmentation of media files, for annotation or other purposes, for speaker diarization, and converting from-to the file formats of several related tools.</p>
<p>The scripts are either in <code>python2</code> or <code>perl</code>, but interpreters for these should be readily available.</p>
<p>Please send any questions/suggestions to <a href="mailto:[email protected]">[email protected]</a></p>
<h2 id="installation-instructions">Installation instructions</h2>
<p>Most of these scripts depend on the <code>aku</code> tools that are part of the <code>AaltoASR</code> package that you can find <a href="https://github.com/aalto-speech/AaltoASR">here</a>. You should compile that for your platform first, following <a href="https://github.com/aalto-speech/AaltoASR/blob/develop/INSTALLATION.md">these</a> instructions.</p>
<p>In this <code>speaker-diarization</code> directory:</p>
<ul>
<li>Add a symlink to the folder <code>AaltoASR/</code></li>
<li>Add a symlink to the folder <code>AaltoASR/build</code></li>
<li>Add a symlink to <code>AaltoASR/build/aku/feacat</code></li>
<li>Make sure the <code>ffmpeg</code> executable is on path or add a symlink to it too.</li>
</ul>
<p>For example, if you have cloned and built <code>AaltoASR</code> into the <code>../AaltoASR</code> path (relative to <code>speaker-diarization</code>):</p>
<div class="sourceCode"><pre class="sourceCode bash"><code class="sourceCode bash"><span class="ex">speaker-diarization</span>$ ln -s ../AaltoASR ./
<span class="ex">speaker-diarization</span>$ ln -s ../AaltoASR/build ./
<span class="ex">speaker-diarization</span>$ ln -s ../AaltoASR/build/aku/feacat ./</code></pre></div>
<p>Would work.</p>
<p>You probably want to use <code>spk-diarization2.py</code> since that one calls the <em>2</em> versions of some scrips, while <code>spk-diarization.py</code> uses an old, <code>matlab</code>-based <code>VAD</code> that is hard to configure and deprecated.</p>
<h2 id="mseg.py">mseg.py</h2>
<p>Script to help perform manual segmentation of a media file, it can be any media file type supported by <code>mplayer</code>. It's only dependency is a <code>Python-mplayer</code> wrapper that can be installed locally by executing:</p>
<pre><code>$ pip install --user mplayer.py</code></pre>
<p>After that executing it is just:</p>
<pre><code>$ ./mseg.py /path/to/mediafile -o outputfile</code></pre>
<p>The output file is optional. It also supports the invocation:</p>
<pre><code>$ ./mseg.py /path/to/mediafile -o outputfile -i inputfile</code></pre>
<p>To continue a previously saved segmentation session. Once in the program, the controls are:</p>
<ul>
<li>Quit: <em>esc</em> or <em>q</em></li>
<li>Pause: <em>p</em></li>
<li>Mark position: <em>space</em></li>
<li>Manually edit mark: <em>e</em></li>
<li>Add manual mark: <em>a</em></li>
<li>Remove mark: <em>r</em></li>
<li>Faster speed: <em>Up</em></li>
<li>Slower speed: <em>Down</em></li>
<li>Rewind: <em>Left</em></li>
<li>Fast Forward: <em>Right</em></li>
<li>Scroll down marks: <em>pgDwn</em></li>
<li>Scroll up marks: <em>pgUp</em></li>
</ul>
<p>The media file starts as paused, so to start reproduction just hit the <em>p</em> key.</p>
<h2 id="mseg2elan.py">mseg2elan.py</h2>
<p>Script to convert from mseg output to Elan file format.</p>
<p>Usage:</p>
<pre><code>$ ./mseg2elan.py msoutputfile -o outputfile</code></pre>
<p>If <code>outputfile</code> is not specified, the output will be sent to the stdout. Once in Elan, segments can be easily fine tuned by changing to the segmentation mode, in Options->Segmentation Mode.</p>
<h2 id="aku2elan.py">aku2elan.py</h2>
<p>Script to convert from AKU recipes to Elan file format.</p>
<p>Usage:</p>
<pre><code>$ ./aku2elan.py recipe -o outputfile</code></pre>
<p>If <code>outputfile</code> is not specified, the output will be sent to the <code>stdout</code>. Once in Elan, segments can be easily fine tuned by changing to the segmentation mode, in Options->Segmentation Mode.</p>
<h2 id="elan2aku.py">elan2aku.py</h2>
<p>Script to convert from Elan file format to AKU recipes.</p>
<p>Usage:</p>
<pre><code>$ ./elan2aku.py elanoutputfile -o akurecipe</code></pre>
<p>If <code>akurecipe</code> is not specified, the output will be sent to the <code>stdout</code>.</p>
<h2 id="mseg_to_textgrid.pl">mseg_to_textgrid.pl</h2>
<p>Script to convert from mseg output to praat file format.</p>
<p>Usage:</p>
<pre><code>$ perl mseg_to_textgrid.pl msfile > outputfile</code></pre>
<p>If <code>outputfile</code> is not specified, the output will be sent to the <code>stdout</code>.</p>
<h2 id="voice-detection2.py">voice-detection2.py</h2>
<p>Creates an <code>AKU</code> recipe from the <code>generate_exp.py</code> output (<code>.exp</code> files).</p>
<p>For full help, use:</p>
<pre><code>$ ./voice-detection2.py -h</code></pre>
<h2 id="vad-performance.py">vad-performance.py</h2>
<p>Rates the performance of a Voice Activity Detection recipe in <code>AKU</code> format, such as those created with <code>voice-detection.py</code>. To measure the performance, another recipe with ground truth should be provided.</p>
<p>For full help, use:</p>
<pre><code>$ ./vad-performance.py -h</code></pre>
<h2 id="spk-change-detection.py">spk-change-detection.py</h2>
<p>Performs speaker turn segmentation over audio, using a distance measure such as GLR, KL2 or BIC, and sliding or growing window. It requires an input recipe file in <code>AKU</code> format pointing to the audio files, and preferably with turns of speech/non-speech already processed, and a features file for each wav to process, in the format outputted by the <code>feacat</code> program of the <code>AKU</code> suite.</p>
<p>For full help, use:</p>
<pre><code>$ ./spk-change-detection.py -h</code></pre>
<h2 id="spk-change-performance.py">spk-change-performance.py</h2>
<p>Rates the performance of a speaker turn segmentation recipe in <code>AKU</code> format, such as those created with <code>spk-change-detection.py</code>. To measure the performance, another recipe with ground truth should be provided.</p>
<p>For full help, use:</p>
<pre><code>$ ./spk-change-performance.py -h</code></pre>
<h2 id="spk-clustering.py">spk-clustering.py</h2>
<p>Performs speaker turn clustering over audio. It requires a speaker segmentation recipe in AKU format, such as those created with <code>spk-change-detection.py</code>, and a features file for each wav file to process, in the format outputted by the feacat program of the AKU suite.</p>
<p>For full help, use:</p>
<pre><code>$ ./spk-clustering.py -h</code></pre>
<h2 id="spk-time.py">spk-time.py</h2>
<p>Calculates per-speaker speaking time from a speaker-tagged recipe in <code>AKU</code> format.</p>
<p>For full help, use:</p>
<pre><code>$ ./spk-time.py -h</code></pre>
<h2 id="spk-diarization2.py">spk-diarization2.py</h2>
<p>Performs full speaker diarization over media file. If the media is not a <code>wav</code> file it tries to convert it to wav using <code>ffmpeg</code>. It then calls <code>generate_exp.py</code>, <code>voice-detection.py</code>, <code>spk-change-detection.py</code> and <code>spk-clustering.py</code> in succession.</p>
<p>For full help, use:</p>
<pre><code>$ ./spk-diarization2.py -h</code></pre>
<p>Notes:</p>
<ul>
<li>Paths for the other scripts and features must be provided.</li>
<li>Since this script is a convenient wrapper for the other scripts of the family, it doesn't have options for all the settings of the other scripts, just some defaults. If you want to tune them, edit this script directly.</li>
<li>Some scripts have a <em>2</em> version. Usage of that one is preferable.</li>
</ul>
</body>
</html>