Skip to content

Latest commit

 

History

History
55 lines (35 loc) · 2.43 KB

README.md

File metadata and controls

55 lines (35 loc) · 2.43 KB

wakeNet

wakeNet, which is a wake word engine built upon neural network, is specially designed for low-power embedded MCUs. Now, the wakeNet model supports up to 5 wake words.

Overview

Please see the flow diagram of wakeNet below:

  • Speech Feature:
    The wakeNet uses MFCC to obtain the features of the input audio clip (16 KHz, 16 bit, single track). The window width and step width of each frame of the audio clip are both 30 ms.

  • Neural Network:
    Now, the neural network structure has been updated to the sixth edition, among which,

    • wakeNet1,wakeNet2,wakeNet3,wakeNet4,wakeNet6,wakeNet7 had been out of use.
    • wakeNet5 only support ESP32 chip.
    • wakeNet8,wakeNet9 only support ESP32S3 chip, which are built upon the Dilated Convolution structure. Note that,The network structure of wakeNet5,wakeNet5X2 and wakeNet5X3 is same, but the parameter of wakeNetX2 and wakeNetX3 is more than wakeNet5. Please refer to Resource Occupancy for details.
  • Keyword Triggering Method
    For continuous audio stream, we calculate the average recognition results (M) for several frames and generate a smoothing prediction result, to improve the accuracy of keyword triggering. Only when the M value is larger than the set threshold, a triggering command is sent.

The following table shows the models supported by Espressif SoCs:

SoCs wakeNet5 wakeNet8 wakeNet9
ESP32 Yes No No
ESP32S3 No Yes Yes

Use wakeNet

  • How to select the wakeNet model

    Please refer to Flash model 介绍.

  • How to run wakeNet

    wakeNet is currently included in the AFE, which is running by default, and returns the detect results through the AFE fetch interface.

    If users wants to close wakeNet, please use:

    afe_config.wakeNet_init = False.
    

Performance Test

Please refer to Performance_test.

Wake Word Customization

For details on how to customize your wake words, please see Espressif Speech Wake Word Customization Process.