-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to set-up a Vosk multi-threads server architecture in NodeJs #502
Comments
I believe you can dump model address as a number, pass it to the worker and then reinitialize model object there from existing memory address. You'll need to add another model constructor to the Model class then. Not sure about details in javascript though. |
I believe it can't work. The problem is that nodejs worker threads allow |
You must be passing a value (long int address of the model in memory), not reference. |
I’m afraid that’s not possible in NodeJs; you can't convert a |
it is easy to get address from the model:
not straightforward to init model back from that address. |
we might add a dummy C method that returns model object from int:
|
An alternative to workers would be libuv async calls: https://github.com/node-ffi/node-ffi/wiki/Node-FFI-Tutorial#async-library-calls |
|
I have pushed version 0.3.25 with async demo: https://github.com/alphacep/vosk-api/blob/master/nodejs/demo/demo_async.js |
Thanks for sharing:
In your demo, you are running multiple (4) async runs of the My notes:
|
No, see acceptWaveformAsync, it runs in a thread. |
Ah! I had not noticed. This changes/solves the point ! Let me do some tests :) |
Hi Nicolay Doing some tests, It first glance seems to me that I made, as part of my VoskJs project (I'll publish soon a new release) a brainless stress test program, that "spawn" N transcript requests in "parallel". That's a "worst case" / theoretic test, just to stress my 8 cores laptop and see what happens. Results seems to me encouraging! If I run a single request I got 439 msec of elapsed (that's good!) A part the increased elapsed (that could also depend of my laptop HW CPU power saving issue), I believe that's the overall behavior of Vosk multithreading works as expected! The test is of course not a real case of a server. Soon I'll setup (and publish in voskJs next release) a simple HTTP server architecture to make some more realistic stress tests. More tests to be done (on a server host / virtual machine). I'd rename the title of the issue Thanks for now! stressTest.js const os = require('os')
const { initModel, transcript, freeModel } = require('../voskjs')
const DEBUG_REQUESTS = false
const DEBUG_RESULTS = false
let activeRequests = 0
/**
* concurrentRequestsTest
* run in parallel a number of transcript requests (for a given model and audio file)
*
* @async
* @param {Number} numRequests
* @param {String} audioFile
* @param {VoskModelObject} model
* @return {Promise}
*
*/
function concurrentTranscriptRequests(numRequests, audioFile, model) {
const promises = []
for (let i = 0; i < numRequests; i++ ) {
if (DEBUG_REQUESTS) {
// new thread started, increment global counter of active thread running
activeRequests++
console.log ( `DEBUG. active requests : ${activeRequests}` )
}
// speech recognition from an audio file
try {
// run an async function (returning a Promise), without waiting the end of transcript elaboration
const result = transcript(audioFile, model)
// add Promise to an array
promises.push(result)
}
catch (error) {
console.error(error)
}
}
// return an array of promises
return promises
}
/**
* stressTest
* unit test
*/
async function main() {
const numRequests = + process.argv[2]
if ( !numRequests || numRequests < 1 ) {
console.error(`usage: ${process.argv[1]} number_parallel_requests`)
process.exit()
}
// take the number of virtual cores (vCPU)
const cpuCount = os.cpus().length
console.log()
console.log(`CPU cores in this host : ${cpuCount}`)
if ( numRequests > cpuCount )
console.log(`warning: number of requested tasks (${numRequests}) is higher than number of available cores (${cpuCount})`)
console.log(`requests to be spawned : ${numRequests}`)
console.log()
const modelDirectory = '../models/vosk-model-en-us-aspire-0.2'
const audioFile = '../audio/2830-3980-0043.wav'
console.log(`model directory : ${modelDirectory}`)
console.log(`speech file name : ${audioFile}`)
console.log()
// create a runtime model
const model = await initModel(modelDirectory)
// run numRequests transcript requests in parallel
const promises = concurrentTranscriptRequests(numRequests, audioFile, model)
// await singleTranscriptRequests(numRequests, audioFile, model)
// wait termination of all promises
for (let i = 0; i < promises.length; i++ ) {
const result = await promises[i]
if (DEBUG_REQUESTS) {
// thread finished, decrement global counter of active thread running
activeRequests--
console.log ( `DEBUG. active requests : ${activeRequests}` )
}
if (DEBUG_RESULTS)
console.log ( result )
}
// free the runtime model
freeModel(model)
//console.log('done.')
}
main() RESULTS The host: inxi -C -M
Machine: Type: Laptop System: HP product: HP Laptop 17-by1xxx v: Type1ProductConfigId serial: <superuser/root required>
Mobo: HP model: 8531 v: 17.16 serial: <superuser/root required> UEFI: Insyde v: F.32 date: 12/14/2018
CPU: Topology: Quad Core model: Intel Core i7-8565U bits: 64 type: MT MCP L2 cache: 8192 KiB
Speed: 600 MHz min/max: 400/4600 MHz Core speeds (MHz): 1: 600 2: 600 3: 600 4: 600 5: 600 6: 600 7: 600 8: 600 Single request (1 thread) $ /usr/bin/time -f "%e" pidstat 1 -u -e node stressTest 1
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 28/04/2021 _x86_64_ (8 CPU)
CPU cores in this host : 8
requests to be spawned : 1
model directory : ../models/vosk-model-en-us-aspire-0.2
speech file name : ../audio/2830-3980-0043.wav
log level : 0
LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00668192 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from ../models/vosk-model-en-us-aspire-0.2/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:246) Loading HCLG from ../models/vosk-model-en-us-aspire-0.2/graph/HCLG.fst
08:54:25 UID PID %usr %system %guest %wait %CPU CPU Command
08:54:26 1000 253795 51,00 82,00 0,00 0,00 133,00 2 node
LOG (VoskAPI:ReadDataFiles():model.cc:265) Loading words from ../models/vosk-model-en-us-aspire-0.2/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo ../models/vosk-model-en-us-aspire-0.2/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading CARPA model from ../models/vosk-model-en-us-aspire-0.2/rescore/G.carpa
08:54:27 1000 253795 79,00 21,00 0,00 0,00 100,00 2 node
init model elapsed : 2195ms
transcript elapsed : 439ms
Average: 1000 253795 65,00 51,50 0,00 0,00 116,50 - node
2.95 10 requests in parallel $ /usr/bin/time -f "%e" pidstat 1 -u -e node stressTest 10
Linux 5.8.0-50-generic (giorgio-HP-Laptop-17-by1xxx) 28/04/2021 _x86_64_ (8 CPU)
CPU cores in this host : 8
warning: number of requested tasks (10) is higher than number of available cores (8)
requests to be spawned : 10
model directory : ../models/vosk-model-en-us-aspire-0.2
speech file name : ../audio/2830-3980-0043.wav
log level : 0
LOG (VoskAPI:ReadDataFiles():model.cc:194) Decoding params beam=13 max-active=7000 lattice-beam=6
LOG (VoskAPI:ReadDataFiles():model.cc:197) Silence phones 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 2 orphan components.
LOG (VoskAPI:Collapse():nnet-utils.cc:1488) Added 1 components, removed 2
LOG (VoskAPI:CompileLooped():nnet-compile-looped.cc:345) Spent 0.00680518 seconds in looped compilation.
LOG (VoskAPI:ReadDataFiles():model.cc:221) Loading i-vector extractor from ../models/vosk-model-en-us-aspire-0.2/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:246) Loading HCLG from ../models/vosk-model-en-us-aspire-0.2/graph/HCLG.fst
08:45:20 UID PID %usr %system %guest %wait %CPU CPU Command
08:45:21 1000 252999 56,44 75,25 0,00 0,00 131,68 0 node
LOG (VoskAPI:ReadDataFiles():model.cc:265) Loading words from ../models/vosk-model-en-us-aspire-0.2/graph/words.txt
LOG (VoskAPI:ReadDataFiles():model.cc:273) Loading winfo ../models/vosk-model-en-us-aspire-0.2/graph/phones/word_boundary.int
LOG (VoskAPI:ReadDataFiles():model.cc:281) Loading CARPA model from ../models/vosk-model-en-us-aspire-0.2/rescore/G.carpa
08:45:22 1000 252999 79,00 21,00 0,00 0,00 100,00 0 node
init model elapsed : 2218ms
08:45:23 1000 252999 233,00 32,00 0,00 0,00 265,00 0 node
08:45:24 1000 252999 383,00 3,00 0,00 0,00 386,00 3 node
transcript elapsed : 1623ms
transcript elapsed : 1734ms
transcript elapsed : 1837ms
transcript elapsed : 1934ms
transcript elapsed : 2040ms
transcript elapsed : 2128ms
transcript elapsed : 2227ms
transcript elapsed : 2369ms
transcript elapsed : 2471ms
08:45:25 1000 252999 99,00 2,00 0,00 0,00 101,00 0 node
transcript elapsed : 2566ms
Average: 1000 252999 169,86 26,75 0,00 0,00 196,61 - node
5.15 |
Related to #516 |
I stumble upon a similar issue while working with a nodejs Both these minimum cases are failing in their own way: const { Worker } = require('worker_threads');
new Worker(`require("ffi-napi")`, { eval: true }); require('ffi-napi');
const { Worker } = require('worker_threads');
new Worker(`require("ffi-napi")`, { eval: true }); See node-ffi-napi/node-ffi-napi#125 . I added some feedbacks on which version it occurs. Haven't tried with @nshmyrev custom |
Yeah, not yet. I need to find the time, publish the fork and update the dependency. Hopefully this year ;) |
Alright, I'll see what I can do to help today... I'll propose a PR soon. |
|
I added the One odd thing tho, I was still getting the node -e 'const {Worker}=require("worker_threads"); new Worker(`require(".")`, {eval:true})' Whereas in the test I couldn't tell if it was failing or not. It doesn't throw for sure but maybe I'm just failing to detect this kind of error. Anyway while waiting for the PRs to get review you might to give a try to this fork |
Thank you Johan. I'll probably try to look over weekend then, it is great things are working! |
Alternative is bun:FFI https://twitter.com/jarredsumner/status/1521527222514774017 |
Hi Nicolay,
That's not a real issue, just two questions/ a brainstorming/suggestion request, about a server architecture in nodejs.
I'm trying to extend my project voskJs implementing a nodejs server side architecture to manage multiple concurrent Vosk transcript requests.
Here #498 you told me that the transcript function run on a single core and you rightly suggested to implement a multithread server. So I'm trying to understand how can I use nodejs worker threads.
For a server that by example has to manage a single language (consequently say a single model), my idea was
But I have a problem: in nodejs working threads in theory can NOT share an object containing functions. See:
whereas the Vosk Model Object contains functions:
So I fair I can't pass to the Model each thread. I'll verify asap in practice.
Now I have a serious problem because the Vosk Model requires a huge amount of RAM.
By example using English language large model
vosk-model-en-us-aspire-0.2
, it seems to me that Vosk occupy something like ~3 GB RAM (see below theMaximum resident set size (kbytes): 3253024
line when running/usr/bin/time --verbose node voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-en-us-aspire-0.2
).See stdout when running Vosk transcript in single process/request (using voskJs wrapper):
Questions:
May you confirm that Vosk model RAM usage is ~3 GB RAM (for the mentioned language model)?
Using processes instead of threads:
If I can't user worker threads, reusing a shared memory for the huge Model object, the alternative could be to implement a multi-process architecture of workers, but in this case any worker process must load the model separately (e.g. > ~3 GB). So I have an 8 cores host, and I foresee say 7 child/worker processes, the total amount of RAM in the host must me something > ~3GB * 7 = >~21 GB! That's insane. Any suggestion for an alternative solution (in nodejs)?
Using vosk-server
I guess at the end of the day a nodejs server could just do some IPC with the Vosk-Server you implemented. How much RAM and cpu cores vosk-server requires?
Thanks for your patience
Giorgio
The text was updated successfully, but these errors were encountered: