Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytesToDoubleArray() sizing & FFT #100

Open
nalbion opened this issue Jul 13, 2017 · 2 comments
Open

bytesToDoubleArray() sizing & FFT #100

nalbion opened this issue Jul 13, 2017 · 2 comments

Comments

@nalbion
Copy link

nalbion commented Jul 13, 2017

As per the recommendations of Moattar and Homayounpour I'm trying to detect voice activity using a 10ms sliding window.

For 10ms of 16kHz 16bit mono audio, getNumBytes(.01) returns 320. (it would be 320.5, but it is stored in an int)

...why add the .5?

       public int getNumOfBytes(double seconds) {
		AudioFormat format = getAudioFormat();
		return (int)(seconds * format.getSampleRate() * format.getFrameSize() + .5);
	}

then getFrequency() calls bytesToDoubleArray(), passing the 320 bytes. Another point of confusion is the calculation of the size of micBufferData:

            double[] micBufferData = new double[bytesRecorded - bytesPerSample +1];
	    for (int index = 0, floatIndex = 0; index < bytesRecorded - bytesPerSample + 1; index += bytesPerSample, floatIndex++) {

                 ...
                 micBufferData[floatIndex] = sample32;
            }

with 2 bytesPerSample, the code has allocated space for 319 doubles, but when it's done everything after bytesPerSample[159] is 0.0

back in getFrequency() I end up with an array of 319 Complex values, but again, everything after 159 is 0.0, 0.0

In FFT() you check:

        // radix 2 Cooley-Tukey FFT
        if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2"); }

...At first I thought "that's not checking if it is a power of 2", but then you call it recursively, this would eventually be a valid test. As it happens, the excheption is thrown the first time through because I've got 160 values in an array with capacity for 319.

@nalbion nalbion changed the title bytesToDoubleArray() sizing bytesToDoubleArray() sizing & FFT Jul 13, 2017
@nalbion
Copy link
Author

nalbion commented Jul 13, 2017

I've changed my window size to 8ms and removed the "+1" mentioned above, but now when FFT returns the first element always a 0.0 imaginary component, and as a result findMaxMagnitude() finds a huge value at index 0 and votes it as the top result - so the frequency is always 0 and my VAD never detects any speech

@goxr3plus
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants