-
Notifications
You must be signed in to change notification settings - Fork 254
Configuration
ODAS
This module defines the format and interface that will provide the RAW data from the microphones. Here are the elements to define:
Parameter | Type | Description |
---|---|---|
fS | uint | Sample rate in samples/sec |
hopSize | uint | Number of samples acquired on each channel at each frame |
nBits | uint | Number of bits per sample (must be either 8, 16, 24 or 32) |
nChannels | uint | Number of audio channels |
Here is an example with a sample rate of 44100 samples/sec, a hop size of 512 samples, 16 bits signed samples and 8 channels:
raw:
{
fS = 44100;
hopSize = 512;
nBits = 16;
nChannels = 8;
interface: {
type = "file";
path = "mics.raw";
};
};
Mapping provides a way to select the microphones which are to be used by ODAS. This allows you to ignore some microphones if the RAW signal provides channels you would like to ignore.
Parameter | Type | Description |
---|---|---|
map | uint | List of the channels used, with index starting at 1 |
Here is an example where the RAW signals contain 8 channels, and you wish to use only channels 1,2,5,6 and 7. Once mapping is done, these channels are now referred to as microphones 1, 2, 3, 4 and 5.
mapping:
{
map: (1,2,5,6,7);
};
This module defines some general parameters that are used by most modules. Here are the elements to define:
Parameter | Type | Description |
---|---|---|
epsilon | float | You should leave this parameter at 1E-20 |
size.hopSize | uint | You should leave this parameter at 128 |
size.frameSize | uint | You should leave this parameter at 256 |
samplerate.mu | uint | You should leave this parameter at 16000 |
samplerate.sigma2 | float | You should leave this parameter at 0.01 |
mics.[m].mu.[0] | float | Position mean in x of microphone m |
mics.[m].mu.[1] | float | Position mean in y of microphone m |
mics.[m].mu.[2] | float | Position mean in z of microphone m |
mics.[m].sigma2.[0] | float | Position variance in xx of microphone m |
mics.[m].sigma2.[1] | float | Position variance in xy of microphone m |
mics.[m].sigma2.[2] | float | Position variance in xz of microphone m |
mics.[m].sigma2.[3] | float | Position variance in yx of microphone m |
mics.[m].sigma2.[4] | float | Position variance in yy of microphone m |
mics.[m].sigma2.[5] | float | Position variance in yz of microphone m |
mics.[m].sigma2.[6] | float | Position variance in zx of microphone m |
mics.[m].sigma2.[7] | float | Position variance in zy of microphone m |
mics.[m].sigma2.[8] | float | Position variance in zz of microphone m |
mics.[m].direction.[0] | float | Direction in x of microphone m |
mics.[m].direction.[1] | float | Direction in y of microphone m |
mics.[m].direction.[2] | float | Direction in z of microphone m |
mics.[m].angle.[0] | float | Maximum angle at which gain is 1 for microphone m |
mics.[m].angle.[1] | float | Minimum angle at which gain is 0 for microphone m |
spatialfilter.direction.[0] | float | Direction in x for space search |
spatialfilter.direction.[1] | float | Direction in y for space search |
spatialfilter.direction.[2] | float | Direction in z for space search |
spatialfilter.angle.[0] | float | Maximum angle at which gain is 1 for space search |
spatialfilter.angle.[1] | float | Minimum angle at which gain is 0 for space search |
nThetas | uint | You should leave this parameter at 181 |
gainMin | float | You should leave this parameter at 0.25 |
Here is an example with a 16-microphone array with a cubic shape. Microphone directivity is used as the array is closed, and the microphone variance is diagonal and non-zero for axes that span the surface plane for each microphone:
general:
{
epsilon = 1E-20;
size:
{
hopSize = 128;
frameSize = 256;
};
samplerate:
{
mu = 16000;
sigma2 = 0.01;
};
speedofsound:
{
mu = 343.0;
sigma2 = 25.0;
};
mics = (
# Microphone 1
{
mu = ( +0.1250, -0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 2
{
mu = ( +0.1250, +0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 3
{
mu = ( +0.1250, -0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 4
{
mu = ( +0.1250, +0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 5
{
mu = ( +0.0725, +0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 6
{
mu = ( -0.0725, +0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 7
{
mu = ( +0.0725, +0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 8
{
mu = ( -0.0725, +0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, +1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 9
{
mu = ( -0.1250, +0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 10
{
mu = ( -0.1250, -0.0725, +0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 11
{
mu = ( -0.1250, +0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 12
{
mu = ( -0.1250, -0.0725, -0.0725 );
sigma2 = ( 0.0, 0.0, 0.0, 0.0, +1E-6, 0.0, 0.0, 0.0, +1E-6 );
direction = ( -1.000, +0.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 13
{
mu = ( -0.0725, -0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 14
{
mu = ( +0.0725, -0.1250, +0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 15
{
mu = ( -0.0725, -0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
},
# Microphone 16
{
mu = ( +0.0725, -0.1250, -0.0725 );
sigma2 = ( +1E-6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, +1E-6 );
direction = ( +0.000, -1.000, +0.000 );
angle = ( 80.0, 100.0 );
}
);
# Spatial filter to include only a range of direction if required
# (may be useful to remove false detections from the floor)
spatialfilter: {
direction = ( +0.000, +0.000, +1.000 );
angle = (80.0, 90.0);
};
nThetas = 181;
gainMin = 0.25;
};
This module stands for the stationnary noise estimation by Minima Control Recursive Averaging (MCRA):
Parameter | Type | Description |
---|---|---|
b | uint | You should leave this parameter at 3 |
alphaS | float | You should leave this parameter at 0.1 |
L | uint | You should leave this parameter at 150 |
delta | float | You should leave this parameter at 3.0 |
alphaD | float | You should leave this parameter at 0.1 |
Here is an example of what the Stationnary noise estimation module should look like in configuration:
# Stationnary noise estimation
sne:
{
b = 3;
alphaS = 0.1;
L = 150;
delta = 3.0;
alphaD = 0.1;
};
This module generates sources with potential directions of arrival for sound:
Parameter | Type | Description |
---|---|---|
nPots | uint | You should leave this parameter at 4 |
nMatches | uint | You should leave this parameter at 10 |
probMin | float | You should leave this parameter at 0.3 |
nRefinedLevels | uint | You should leave this parameter at 1 |
interpRate | uint | You should leave this parameter at 1 |
scans.[0].level | uint | You should leave this parameter at 2 |
scans.[0].delta | int | You should leave this parameter at -1 |
scans.[1].level | uint | You should leave this parameter at 4 |
scans.[2].level | int | You should leave this parameter at -1 |
Here is an example of the Sound Source Localization configuration. In this case, the potential sources are displayed in the terminal:
# Sound Source Localization
ssl:
{
nPots = 4;
nMatches = 10;
probMin = 0.3;
nRefinedLevels = 1;
interpRate = 1;
# Number of scans: level is the resolution of the sphere
# and delta is the size of the maximum sliding window
# (delta = -1 means the size is automatically computed)
scans = (
{ level = 2; delta = -1; },
{ level = 4; delta = -1; }
);
# Output to export potential sources
potential: {
format = "json";
interface: {
type = "terminal";
};
};
};
Sound source tracking can be performed with particle filters or Kalman filters. Kalman filters are recommanded as they provide improved accuracy and reduce the computational load significantly.
Parameter | Type | Description |
---|---|---|
mode | string | Is either "particle" or "kalman" according to method chosen |
active.[0].weight | float | You should leave this parameter at 1.0 |
active.[0].mu | float | You should leave this parameter at 0.3 |
active.[0].sigma2 | float | You should leave this parameter at 0.0025 |
inactive.[0].weight | float | You should leave this parameter at 1.0 |
inactive.[0].mu | float | You should leave this parameter at 0.15 |
inactive.[0].sigma2 | float | You should leave this parameter at 0.0025 |
sigmaR2_prob | float | You should leave this parameter at 0.0025 |
sigmaR2_active | float | You should leave this parameter at 0.0225 |
Pfalse | float | You should leave this parameter at 0.1 |
Pnew | float | You should leave this parameter at 0.1 |
Ptrack | float | You should leave this parameter at 0.8 |
theta_new | float | You should leave this parameter at 0.9 |
N_prob | uint | You should leave this parameter at 5 |
theta_prob | float | You should leave this parameter at 0.8 |
N_inactive.[0] | uint | You should leave this parameter at 150 |
N_inactive.[1] | uint | You should leave this parameter at 200 |
N_inactive.[2] | uint | You should leave this parameter at 250 |
N_inactive.[3] | uint | You should leave this parameter at 250 |
theta_inactive | float | You should leave this parameter at 0.9 |
kalman.sigmaQ | float | You should leave this parameter at 0.001 |
particle.nParticles | uint | You should leave this parameter at 1000 |
particle.st_alpha | float | You should leave this parameter at 2.0 |
particle.st_beta | float | You should leave this parameter at 0.04 |
particle.st_ratio | float | You should leave this parameter at 0.5 |
particle.ve_alpha | float | You should leave this parameter at 0.05 |
particle.ve_beta | float | You should leave this parameter at 0.2 |
particle.ve_ratio | float | You should leave this parameter at 0.3 |
particle.ac_alpha | float | You should leave this parameter at 0.5 |
particle.ac_beta | float | You should leave this parameter at 0.2 |
particle.ac_ratio | float | You should leave this parameter at 0.2 |
particle.Nmin | float | You should leave this parameter at 0.7 |
Here is an example of the Sound Source Tracking configuration. In this case, the system uses Kalman filters and returns up to four tracked sources in the terminal:
sst:
{
# Mode is either "kalman" or "particle"
mode = "kalman";
# Parameters used by both the Kalman and particle filter
active = (
{ weight = 1.0; mu = 0.3; sigma2 = 0.0025 }
);
inactive = (
{ weight = 1.0; mu = 0.15; sigma2 = 0.0025 }
);
sigmaR2_prob = 0.0025;
sigmaR2_active = 0.0225;
Pfalse = 0.1;
Pnew = 0.1;
Ptrack = 0.8;
theta_new = 0.9;
N_prob = 5;
theta_prob = 0.8;
N_inactive = ( 150, 200, 250, 250 );
theta_inactive = 0.9;
# Parameters used by the Kalman filter only
kalman: {
sigmaQ = 0.001;
};
# Parameters used by the particle filter only
particle: {
nParticles = 1000;
st_alpha = 2.0;
st_beta = 0.04;
st_ratio = 0.5;
ve_alpha = 0.05;
ve_beta = 0.2;
ve_ratio = 0.3;
ac_alpha = 0.5;
ac_beta = 0.2;
ac_ratio = 0.2;
Nmin = 0.7;
};
# Output to export tracked sources
tracked: {
format = "json";
interface: {
type = "terminal";
};
};
};
Sound source separation allows to enhance the sound source of interest:
Parameter | Type | Description |
---|---|---|
mode_sep | string | You should leave this parameter at "dds" |
mode_pf | string | You should leave this parameter at "ms" |
gain_sep | float | Gain to change the volume of the separated stream |
gain_pf | float | Gain to change the volume of the post-filtered stream |
dgss.mu | float | You should leave this parameter at 0.01 |
dgss.lambda | float | You should leave this parameter at 0.5 |
ms.alphaPmin | float | You should leave this parameter at 0.07 |
ms.eta | float | You should leave this parameter at 0.5 |
ms.alphaZ | float | You should leave this parameter at 0.8 |
ms.thetaWin | float | You should leave this parameter at 0.3 |
ms.alphaWin | float | You should leave this parameter at 0.3 |
ms.maxAbsenceProb | float | You should leave this parameter at 0.9 |
ms.Gmin | float | You should leave this parameter at 0.01 |
ms.winSizeLocal | uint | You should leave this parameter at 3 |
ms.winSizeGlobal | uint | You should leave this parameter at 23 |
ms.winSizeFrame | uint | You should leave this parameter at 256 |
Here is an example where the system outputs the separated and post-filtered signas in files separated.raw and postfiltered.raw. The number of channels correspond to the maximum number of simultaneously tracked sources.
sss:
{
# Mode is either "dds", "dgss" or "dmvdr"
mode_sep = "dds";
mode_pf = "ms";
gain_sep = 1.0;
gain_pf = 10.0;
dds: {
};
dgss: {
mu = 0.01;
lambda = 0.5;
};
dmvdr: {
};
ms: {
alphaPmin = 0.07;
eta = 0.5;
alphaZ = 0.8;
thetaWin = 0.3;
alphaWin = 0.3;
maxAbsenceProb = 0.9;
Gmin = 0.01;
winSizeLocal = 3;
winSizeGlobal = 23;
winSizeFrame = 256;
};
separated: {
fS = 16000;
hopSize = 128;
nBits = 16;
interface: {
type = "file";
path = "separated.raw";
};
};
postfiltered: {
fS = 16000;
hopSize = 128;
nBits = 16;
gain = 10.0;
interface: {
type = "file";
path = "postfiltered.raw";
};
};
};
Sound classification needs to be improved. For now just leave the following configuration:
classify:
{
frameSize = 4096;
winSize = 3;
tauMin = 88;
tauMax = 551;
deltaTauMax = 20;
alpha = 0.3;
gamma = 0.05;
phiMin = 0.5;
r0 = 0.2;
category: {
format = "undefined";
interface: {
type = "blackhole";
}
}
};
Provided by IntRoLab, Université de Sherbrooke, Québec, Canada.