wakeNet, which is a wake word engine built upon neural network, is specially designed for low-power embedded MCUs. Now, the wakeNet model supports up to 5 wake words.
Please see the flow diagram of wakeNet below:
-
Speech Feature:
The wakeNet uses MFCC to obtain the features of the input audio clip (16 KHz, 16 bit, single track). The window width and step width of each frame of the audio clip are both 30 ms. -
Neural Network:
Now, the neural network structure has been updated to the sixth edition, among which,- wakeNet1,wakeNet2,wakeNet3,wakeNet4,wakeNet6,wakeNet7 had been out of use.
- wakeNet5 only support ESP32 chip.
- wakeNet8,wakeNet9 only support ESP32S3 chip, which are built upon the Dilated Convolution structure. Note that,The network structure of wakeNet5,wakeNet5X2 and wakeNet5X3 is same, but the parameter of wakeNetX2 and wakeNetX3 is more than wakeNet5. Please refer to Resource Occupancy for details.
-
Keyword Triggering Method
For continuous audio stream, we calculate the average recognition results (M) for several frames and generate a smoothing prediction result, to improve the accuracy of keyword triggering. Only when the M value is larger than the set threshold, a triggering command is sent.
The following table shows the models supported by Espressif SoCs:
SoCs | wakeNet5 | wakeNet8 | wakeNet9 |
---|---|---|---|
ESP32 | Yes | No | No |
ESP32S3 | No | Yes | Yes |
-
How to select the wakeNet model
Please refer to Flash model 介绍.
-
How to run wakeNet
wakeNet is currently included in the AFE, which is running by default, and returns the detect results through the AFE fetch interface.
If users wants to close wakeNet, please use:
afe_config.wakeNet_init = False.
Please refer to Performance_test.
For details on how to customize your wake words, please see Espressif Speech Wake Word Customization Process.