Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run the vosk on worker task in nodejs #565

Open
ilovebioz opened this issue May 30, 2021 · 5 comments
Open

run the vosk on worker task in nodejs #565

ilovebioz opened this issue May 30, 2021 · 5 comments

Comments

@ilovebioz
Copy link

Hi,

I succeeded in running the nodejs simple example using FFmpeg. Now I would like to
using Bree scheduler to execute the above example as a worker task. But whenever It comes to the below part the process automatically killed. Pls, help to show me what is the problem? and how can I fix it? thank you very much.

const libvosk = ffi.Library(soname, {
'vosk_set_log_level': ['void', ['int']],
'vosk_model_new': [vosk_model_ptr, ['string']],
'vosk_model_free': ['void', [vosk_model_ptr]],
'vosk_spk_model_new': [vosk_spk_model_ptr, ['string']],
'vosk_spk_model_free': ['void', [vosk_spk_model_ptr]],
'vosk_recognizer_new': [vosk_recognizer_ptr, [vosk_model_ptr, 'float']],
'vosk_recognizer_new_spk': [vosk_recognizer_ptr, [vosk_model_ptr, vosk_spk_model_ptr, 'float']],
'vosk_recognizer_new_grm': [vosk_recognizer_ptr, [vosk_model_ptr, 'float', 'string']],
'vosk_recognizer_free': ['void', [vosk_recognizer_ptr]],
'vosk_recognizer_accept_waveform': ['bool', [vosk_recognizer_ptr, 'pointer', 'int']],
'vosk_recognizer_result': ['string', [vosk_recognizer_ptr]],
'vosk_recognizer_final_result': ['string', [vosk_recognizer_ptr]],
'vosk_recognizer_partial_result': ['string', [vosk_recognizer_ptr]],
});

@solyarisoftware
Copy link

solyarisoftware commented May 30, 2021

your problem is not well described.
May you explain better what you want to do?
What do you mean with "worker task"? An external process? Or a worker thread?

@ilovebioz
Copy link
Author

Hi,

I used Bree (a scheduler module of nodejs, one of special thing of this scheduler is it creates a worker_threads for each of task).
const Bree = require('bree');

const bree = new Bree({
// logger: new Cabin(),
root: false,
//outputWorkerMetadata: true,
jobs: [
{
name: 'taskProcess',
path: path.join(global.STTSERVICE.jobPath, 'taskProcess.js'),
interval: '6s',
worker: {
workerData: {
info: global.STTSERVICE,
}
}
},
],
});

bree.start('taskProcess');

the taskProcess in the above code is similar to VOSK FFmpeg sample. It loads the dll, calls FFmpeg, and does the STT.
By this structure, each job will be run on a worker thread (not the main process like the original example). Whenever it comes to the library declare:

const libvosk = ffi.Library(soname, {
'vosk_set_log_level': ['void', ['int']],
'vosk_model_new': [vosk_model_ptr, ['string']],
'vosk_model_free': ['void', [vosk_model_ptr]],
'vosk_spk_model_new': [vosk_spk_model_ptr, ['string']],
'vosk_spk_model_free': ['void', [vosk_spk_model_ptr]],
'vosk_recognizer_new': [vosk_recognizer_ptr, [vosk_model_ptr, 'float']],
'vosk_recognizer_new_spk': [vosk_recognizer_ptr, [vosk_model_ptr, vosk_spk_model_ptr, 'float']],
'vosk_recognizer_new_grm': [vosk_recognizer_ptr, [vosk_model_ptr, 'float', 'string']],
'vosk_recognizer_free': ['void', [vosk_recognizer_ptr]],
'vosk_recognizer_accept_waveform': ['bool', [vosk_recognizer_ptr, 'pointer', 'int']],
'vosk_recognizer_result': ['string', [vosk_recognizer_ptr]],
'vosk_recognizer_final_result': ['string', [vosk_recognizer_ptr]],
'vosk_recognizer_partial_result': ['string', [vosk_recognizer_ptr]],
});

the main process exit.

I hope now everything is clearer.

Thank you very much!

@solyarisoftware
Copy link

solyarisoftware commented May 31, 2021

Well,

You didn't specify the exit error of your main process. But now it's more clear what you want to do: you want to transcode with ffmpeg and transcript with Vosk files, using worker_threads with a scheduler on top (maybe you want to make a server architecture). What is not clear is WHY you want to proceed this way.


Unfortunately nodejs worker_threads fight with Vosk (memory) architecture.
I explain why here below.

Please remember in Vosk the loaded language model possibly occupy a lot of RAM memory.
E.g. the English language large model take ~3.3 GB (let's call this magnitude: M) !
So you want to load the model ONCE, otherwise, if you load the model IN each thread (or process) T, you will allocate T * M GB !

So if you want to delegate fffmpeg transcoding and speech to text tasks, I propose different approaches:

See also VoskJs, my nodejs Vosk wrapper, with server examples: https://github.com/solyarisoftware/voskJs/

@ilovebioz
Copy link
Author

hi,

firstly, I would like to thank you for your kind explanation. I would like to make a restful server that completely does the same job as Voskjs but a bit different on logic. In my server, the API just receive the request and store them to a queue, a batch job will handle the STT processing. That is why I would like to make the FFmpeg STT a worker thread in Nodejs.
if I develop the batch job on the main process, it will block the API server and the client can not send the request during it's working.
Thank you again for your support, I will study solution 2 to solve my problem.

@solyarisoftware
Copy link

In my server, the API just receive the request and store them to a queue, a batch job will handle the STT processing.

If you want to build an ASR decoder server architecture, you probably want to take latencies as low as possible. Right? :-)

But if you use a job queue manager you are just serializing requests, delegating to a beckend system to fulfill requests. In that way you do not block the nodejs main thread (of the server) ok, but you do not solve the entire problem (minimize latency).

Of course all depends on cpu cores available in your host

  • If you have 1 core: No way. your server serializes requests processing one to one
  • if you have 2 cores: maybe you can use a job queue, but it's useless, just fork a nodejs process from your main nodejs thread!
  • if you have >2 cores: go for Vosk threads (sol.2) or a process pool (sol.1)

That is why I would like to make the FFmpeg STT a worker thread in Nodejs.

Warning: you can't pass the Vosk Model using worker_threads because the Model object contains functions! See: #502.

All in all, you get some info, but you did not detailed the Vosk issue, so I suggest to please close this issue and maybe reopen another with a well detailed problem related to Vosk.

BTW, if you find VoskJs useful, I appreciate a star there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants