Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get ready state of recognizer #80

Open
DanielUsselmann opened this issue Mar 17, 2024 · 8 comments
Open

Get ready state of recognizer #80

DanielUsselmann opened this issue Mar 17, 2024 · 8 comments

Comments

@DanielUsselmann
Copy link

DanielUsselmann commented Mar 17, 2024

Hi,
is there a way to get something like a ready state from the recognizer?
I have the problem that it takes some seconds until the recognizer is really recognizing speech from the user.
While it is in this not ready state I want a loading circle to wait until the recognizer is ready for use.

I am using getUserMedia() function for Audio.

@erikh2000
Copy link

I think I agree it would be nice to have, and maybe that feature is already there and I missed it. But...

If you maintain your own ready state and set it to true once all the setup work is done, you should have the same thing. The main asynchronous delay is loading and creating the model which is then passed to the KaldiRecognizer constructor. I believe that after the KaldiRecognizer instance is constructed, it's ready to receive data from the microphone via acceptWaveform(). (But that's going from memory, and I can't test it right now.)

@DanielUsselmann
Copy link
Author

I have the following react-app:

main.jsx:
<Recognizer language={language} onModelLoad={handleLoadComplete} onResult={handleResult}/>

recognizer.jsx:
useEffect(() => { if(ready){ onModelLoad(); // Invoke onModelLoad callback to notify the parent component } }, [ready]);

`useEffect(() => {
const loadModel = async () => {
const defaultModelPath = language;
setLoading(true); // Set loading state to true initially
const newChannel = new MessageChannel();
setChannel(newChannel); // Set channel state
const model = await createModel(defaultModelPath);
model.registerPort(newChannel.port1);
setLoadedModel({ model, path: defaultModelPath });
const newRecognizer = new model.KaldiRecognizer(48000);
newRecognizer.setWords(true);
newRecognizer.on("result", (message) => {
const result = message.result;
if (result.text !== "") {
setUtterances((utt) => [...utt, result]);
onResultRef.current?.(result);
}
});
newRecognizer.on("partialresult", (message) => {
setPartial(message.result.partial);
});
setRecognizer(newRecognizer);
setLoading(false); // Set loading state to false once loading is complete

};

loadModel();

}, []); // Load the default model when language changes`

mic.jsx:
` const startRecognitionStream = useCallback(async () => {
if (recognizer) {
if (!mediaStream) {

    try {
      mediaStream = await navigator.mediaDevices.getUserMedia({
        video: false,
        audio: {
          echoCancellation: true,
          noiseSuppression: true,
        },
      });
      if (mediaStream) {
      ready(true);
      }
      const audioContext = new AudioContext();
      await audioContext.audioWorklet.addModule('js/recognizer-processor.js')

      const recognizerProcessor = new AudioWorkletNode(audioContext, 'recognizer-processor', { channelCount: 1, numberOfInputs: 1, numberOfOutputs: 1 });
      recognizerProcessor.port.postMessage({ action: 'init', recognizerId: recognizer.id },[channel.port2])
      recognizerProcessor.connect(audioContext.destination);

      const source = audioContext.createMediaStreamSource(mediaStream);
      if(source.connect(recognizerProcessor) != undefined)
      {
        ;
      }
      
    } catch (e) {
        f7.dialog.alert(e.name, e.message);
    }
  } 


}

}, [recognizer]);

useEffect(() => {
startRecognitionStream();
}, [recognizer]);`

Sorry to put my code in that bad, but its somehow not supported..

To explain: If the model has been loaded the loading flag is set to true and if the mic is allowed the ready flag is set.
But somehow the recognition starts not immediately

@erikh2000
Copy link

It's a little too much for me to find the issue in that code. I'll say that I would suspect that the setter functions (e.g. setRecognizer(), setLoading()) might not be updating values when you want them to inside of the function passed to useEffect().

I recommend setting breakpoints and stepping through the code in a browser debugger, like Chrome's or Firefox's. You can narrow it to the exact point of execution where something is happening outside of your expectations.

@DanielUsselmann
Copy link
Author

I mean the current code has the spinner circle until the user allows the mic to use, but it still takes some seconds until you can speak

@erikh2000
Copy link

I don't trust my eyes and mind to sort out the React-based state logic above. That's why I say it might be useful for you to narrow down the issue with debugging.

So for example, one thing I would verify is that the series of events is really like this:

  1. recognizer is constructed with the loaded model.
  2. microphone is captured and audioWorklet is running.
  3. user experiences delay of some seconds before words are recognized.

Because maybe you're actually seeing something more like:
2. microphone is captured and audioWorklet is running.
3. user experiences delay of some seconds before words are recognized.

  1. recognizer is constructed with the loaded model.

I often find unexpected behavior around useEffect() and useState(). I'm sure its based on my ignorance of how React works. But I'm just saying that I've been surprised a thousand times.

If it helps in any way, here is my code handling initialization: https://github.com/erikh2000/sl-web-speech/blob/main/src/speech/Recognizer.ts

@DanielUsselmann
Copy link
Author

Thanks!
Do you know how to stop the recognizer from "recognizing"?
In my opinion normally if a component is unmounted it shall stop right or do i have to do sth manually ?

@erikh2000
Copy link

erikh2000 commented Mar 20, 2024

Do you know how to stop the recognizer from "recognizing"?

One way is to stop sending samples to the recognizer via .acceptWaveform(). So in your audioworklet, you can check a "muted" flag and just not send samples if the flag is set. That should cut way down on CPU. And it also has a nice guarantee that the recognizer isn't continuing to listen in some unexpected way that will make your users upset about privacy.

In my opinion normally if a component is unmounted it shall stop right or do i have to do sth manually ?

This is a really good question. I started to type an answer, and realized I was guessing beyond my knowledge.

With the combination of web workers, WASM, and React component lifecycle, I'm just not 100% sure. A hypothesis is that 1. all execution in the recognizer stops when you stop calling .acceptWaveform() and 2. memory of the recognizer instance is freed by garbage collection some time after your component unmounts, if your recognizer instance is stored in a variable scoped to the component and nowhere else.

On point #2, I prefer to keep the recognizer instance in a module-scoped variable that isn't bound to a React component. In this way, I can reuse the same recognizer instance even if the user exits a screen and returns. (My app has multiple screens, each rendered by a separate component) By module-scoped variable, I mean a declaration of a the recognizer instance like:

`let recognizer = null;

export function initRecognizer() {
//...setup omitted
recognizer = new KaldiRecognizer(...etc...);
}`

@DanielUsselmann
Copy link
Author

In my case I need the recognizer to be a State in React, so let is not an option.
I can call some functions on my const [recognizer, setRecognizer] = useState(), but I cant find anything to stop the recognizer.
How would you handle that ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants