S. Voran, October 28, 2013
Updated by S. Voran on March 2, 2017 to cover ABC-MRT16 as well.
This software implements the ABC-MRT and ABC-MRT16 algorithms for objective estimation of speech intelligibility. The algorithms are discussed in detail in [1] and [2]. ABC-MRT is short for “Articulation Band Correlation Modified Rhyme Test.”
The Modified Rhyme Test (MRT) [3] is a protocol for evaluating speech intelligibility using human subjects. The subjects are presented with the task of identifying one of six different words that take the phonetic form CVC. The six options differ only in the leading or trailing consonant. MRT results take the form of success rates (corrected for guessing) that range from 0 (guessing) to 1 (correct identification in every case). These success rates form a measure of speech intelligibility in this specific (MRT) context.
Articulation Band Correlation-MRT (ABC-MRT) is a signal processing algorithm that processes MRT audio files and produces success rates. The goal of ABC-MRT is to produce success rates that agree with those produced by MRT. Thus ABC-MRT is an automated or objective version of MRT and no human subjects are required.
ABC-MRT performs a narrowband (nominally 4 kHz) analysis. ABC-MRT16 is applicable to narrowband, wideband, superwideband, and fullband speech. ABC-MRT processes the first 17 AI bands while ABC-MRT16 processes all 20 AI bands, as well as an additional "AI Band 21" that covers 7 kHz to 20 kHz. Of equal importance is that ABC-MRT16 incorporates a model for attention that allows it to properly operate across the different bandwidths without any bandwidth detection or switching.
Unless backwards compatibility is required, ABC-MRT16 is the recommended algorithm, even if only narrowband conditions are to be tested. The attention model makes it superior to ABC-MRT.
The software provided here runs in the Matlab® or Octave environments.
Application of ABC-MRT(16) to a speech communication system-under-test (SUT) requires two steps.
- Pass a set of reference recordings through the SUT to produce a set of test recordings.
- Apply ABC-MRT(16) to the test recordings to produce a success rates that describe the intelligibility of the SUT.
Full details for each step follow. In addition, a small demonstration provides an example for getting started.
The reference recordings in this repository are stored on GitHub using
git-lfs
. This means that the computer cloning this repo must have git-lfs
installed in order to properly download the reference recordings.
Before cloning this repo, download, install, and initialize
git-lfs
.
For the time being, GitHub does not correctly include the reference
recordings when downloading the code as a .zip
file.
If you have cloned this repo before installing git-lfs
, issuing the command
git lfs pull
may properly download the reference recordings. As a last
resort, cloning the repo to a new location after installing git-lfs
will
download the reference recordings.
The SourceAudio folder contains 1200 .wav
files. The
naming convention is TT_bnn_wm_orig.wav
. TT={F1,F3,M3,M4}
to indicate the
talker, nn
is an two digit integer 01 to 50 to indicate the MRT block or
list, and m
is a single-digit integer 1 to 6 to indicate the word within
that list.
You now have access to 1200 .wav
files for ABC-MRT(16) testing. Each file
contains the same carrier phase followed by a single keyword. For example
in TT_b01_w1_orig.wav
you will hear “Please select the word went.” The
keyword is “went.” In TT_b01_w2_orig.wav
you will hear “Please select the
word sent.” The .wav
file format is mono, linear PCM, with 16 bits/sample
and 48,000 samples/second.
The most thorough testing will use all 1200 files. If this is
prohibitive, then somewhat less thorough testing can be accomplished using
a subset of these files. The best subsets are balanced with respect to
talker and are enumerated by filelist.m
which produces a Matlab variable,
a .mat
file, and a .txt
file. The first argument to filelist.m
is an
integer N
, which tells how many .wav
files will be used, 16<=N<=1200
. An
input of N=16
will produce a list of 16 files and increasing N
to 20 will
add 4 more files, but will not change the original 16.
We have characterized the deviation in ABC-MRT results caused by using
N<1200
files. This deviation is measured with respect to the ABC-MRT results
with 1200 files. Note that ABC-MRT results nominally range from 0 to 1, so a
deviation of .01 is 1% of the intelligibility scale. We have calculated
these deviations across 139 different conditions:
N | RMS Deviation | Maximum Absolute Deviation | Total Speech File Time |
---|---|---|---|
16 | 0.017 | 0.053 | 28 sec |
32 | 0.009 | 0.033 | 57 sec |
64 | 0.005 | 0.014 | 115 sec |
128 | 0.003 | 0.008 | 231 sec |
256 | 0.002 | 0.004 | 462 sec ( 8 min) |
512 | 0.001 | 0.002 | 922 sec (15 min) |
1200 | 0.000 | 0.000 | 2170 sec (36 min) |
The next task is to pass the desired reference .wav
files through each SUT
of interest. More generally, one is often interested in a particular
operating configuration for an SUT, possibly combined with
environmental factors. To describe this we use the more general term
“condition.” For example, conditions 1, 2, 3, and 4 might be SUT 1 in
quiet, SUT 1 with car noise at 5 dB SNR, SUT 1 with babble noise at 0 dB
SNR, and SUT 2 in quiet, respectively.
The recording of the conditions can be scripted and implemented using one
or more computers with suitable sound I/O, but other implementations are
possible as well. The Matlab code playrec.m
gives one example of how one
might script these processes.
ABC-MRT(16) expects that reference file TT_bnn_wm_orig.wav
will produce the
test file testpath/Cpp/TT_bnn_wm_cpp.wav
, where pp
is a two digit integer
00 to 99 that can be used to identify the condition, and testpath is an
arbitrary path identifier. For example, passing F1_b01_w1_orig.wav
through condition 17 should produce mytestpath/C17/F1_b01_w1_c17.wav
.
In general, a condition will have a non-zero delay so perfect synchronization of playing and recording may truncate the contents of the file. Thus the timing of the recording process must be adjusted so that each test file includes at least the entire keyword. In addition each test file must contain at least 42,000 samples (875 ms duration). Note that ABC-MRT(16) processing time increases as file length increases.
For best results, the playback and recording processes must be virtually transparent relative to the condition. A/D and D/A conversion external to the computer may be advantageous. The playback level must be properly adjusted, and the recording level is especially important. If it is too high, clipping may impair the A/D process. If it is too low quantization noise may impair the A/D process.
A higher level of automation might be achieved by concatenating the desired reference files, playing that single long file while recording a single long test file, then cutting the long test file into properly named individual test files. Or one might modify the ABC-MRT(16) software to accept and properly parse a single long test file. One might attempt to further streamline the process by passing only keywords (and not the carrier phrase) through the condition. For some conditions this change may produce identical results. But if the condition includes any adaptive process (e.g., noise reduction) then this change may alter the results. If this is the case, then consider how the condition is used in practice. In many cases transmitting a sequence of sentences (carrier phrases with keywords) may be more realistic than transmitting a list of unrelated words (keywords alone).
Run ABC_MRT.m
or ABC_MRT16.m
. It will process all .wav
files (up to 1200
of them) for a single condition. A simple script can apply ABC_MRT.m
or ABC_MRT16.m
to multiple conditions. Examples are provided in
ABC_MRTdemo.m
and ABC_MRT16demo.m
.
The call to ABC_MRT.m
is:
[phi_hat, success]=ABC_MRT(speech_path,cond_num,n_files,verbose)
and the call to ABC_MRT16.m
is:
[phi_hat, success]=ABC_MRT16(speech_path,cond_num,n_files,verbose)
-
speech_path
is a string that gives the path to the speech files -
cond_num
is the condition number, 0 to 99. This is used to form the end of the path and the filenames as specified below. -
n_files
is the number of speech files to use, 16 <= N <= 1200. Using more speech files gives a more robust result, but takes longer. One can usefilelist.m
to generate the list of.wav
files required for any value ofn_files
. -
verbose
is set to any nonzero value to force progress reporting -
success
is a column vector with length n_files that gives the success rate for each file involved in the test. -
phi_hat
is a scalar that gives the final intelligibility estimate for the condition.phi_hat
is expected to range from 0 to 1, similar to MRT results. Larger values ofphi_hat
indicate higher levels of speech intelligibility
ABC_MRT.m
and ABC_MRT16.m
expect a specific file naming convention. Here are some
examples:
ABC_MRT(e:/soundfiles/MRT/,1,1200,*), requires 1200 .wav files of the form
e:/soundfiles/MRT/C01/TT_bnn_wm_c01.wav
ABC_MRT(e:/soundfiles/MRT/,17,16,*), requires 16 .wav files of the form
e:/soundfiles/MRT/C17/TT_bnn_wm_c17.wav
ABC_MRT16(e:/soundfiles/MRT/,1,1200,*), requires 1200 .wav files of the form
e:/soundfiles/MRT/C01/TT_bnn_wm_c01.wav
ABC_MRT16(e:/soundfiles/MRT/,17,16,*), requires 16 .wav files of the form
e:/soundfiles/MRT/C17/TT_bnn_wm_c17.wav
In every case the base of the filenames takes the form TT_bnn_wm with:
TT=F1, F3, M3, or M4 (talker specification)
nn=1 to 50 (list specification)
m=1 to 6 (word specification)
(Note that 4 x 50 x 6 =1200, so these ranges allow specification of all 1200 files)
Run ABC_MRTdemo.m
. It applies ABC_MRT to 6 different conditions, using
just 16 .wav files per condition. The required files are provided in the
demo directory. The results (phi_hat values) are plotted against actual
MRT scores. Using 16 files is fast and it shows how the software works,
but it gives only rough results. In practice one should use as many files
as practical.
Or run ABC_MRT16demo.m
. It applies ABC_MRT16 to 6 different conditions, using
just 16 .wav files per condition. The required files are provided in the
demo directory. The results (phi_hat values) are plotted against actual
MRT scores. Using 16 files is fast and it shows how the software works,
but it gives only rough results. In practice one should use as many files
as practical.
[1] S. Voran "Using articulation index band correlations to objectively estimate speech intelligibility consistent with the modified rhyme test," Proc. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 20- 23, 2013. Available at www.its.bldrdoc.gov/audio after October 20, 2013.
[2] S. Voran "A multiple bandwidth objective speech intelligibility estimator based on articulation index band correlations and attention," Proc. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, March 5-9, 2017. Available at www.its.bldrdoc.gov/audio.
[3] ANSI S3.2, "American national standard method for measuring the intelligibility of speech over communication systems," 1989.