Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vosk offline speech recognition service #186917

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

sbruder
Copy link
Contributor

@sbruder sbruder commented Aug 16, 2022

Description of changes

This adds Vosk, an offline speech recognition service. It is part of Summer of Nix to allow live transcriptions with Jitsi Meet, for which Vosk is required.

It consists of several parts:

  • The main vosk library, including alpha cephei’s fork of kaldi, which is incompatible with the current kaldi derivation in nixpkgs
  • The python bindings/library to vosk (includes vosk-transcriber command line tool)
    • There are also other bindings available, but the python ones are required for vosk-server, so I only packaged them
    • I’m not sure if this is the correct way to handle a library that also includes (multiple) language bindings. I wanted to separate them and not put the library itself in python-modules, because it provides bindings for a lot of languages.
  • The language models for vosk. I packaged the main model directly provided by Vosk, when this is not available the small model and/or a third-party one
  • The vosk-server WebSocket server
  • A NixOS module for vosk-server
  • A NixOS integration test for vosk-server
    • It does not use any of the pre-packaged models because their memory consumption is too high and updates to them could make the test fail
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: python 8.has: changelog 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` labels Aug 16, 2022
@ofborg ofborg bot added 8.has: package (new) This PR adds a new package 11.by: package-maintainer This PR was created by the maintainer of the package it changes 10.rebuild-darwin: 1-10 10.rebuild-linux: 11-100 labels Aug 16, 2022
@sbruder
Copy link
Contributor Author

sbruder commented Aug 16, 2022

I just saw #185255, which also packages vosk(-api), but uses upstream Kaldi (with a minimal amount of patches) instead of using alpha cephei’s fork. Is that approach preferable?

pkgs/top-level/all-packages.nix Outdated Show resolved Hide resolved
homepage = "https://github.com/alphacep/kaldi/tree/vosk";
description = "Kaldi fork for the Vosk offline speech recognition API";
license = licenses.asl20;
platforms = platforms.unix;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.maintainer must be set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also need to be set for derivations that are not called in all-packages.nix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so if I do nix repl and try to get the maintainer, I can find you :P

pkgs/tools/audio/vosk/kaldi.nix Show resolved Hide resolved
Comment on lines +46 to +52
# When updating, always update the git revision (rev) and the version base.
# The version base can be obtained by running src/base/get_version.sh in the kaldi source.
# Only the first three components (major.minor.patch) should be put in here,
# the last one (short git commit hash) will automatically be appended
version_base = "5.5.1046";
rev = "76cd51d44c0a61e3905c35cadb2ec5f54f3e42d1";
in
stdenv.mkDerivation rec {
pname = "kaldi";
version = "${version_base}-${lib.substring 0 5 rev}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# When updating, always update the git revision (rev) and the version base.
# The version base can be obtained by running src/base/get_version.sh in the kaldi source.
# Only the first three components (major.minor.patch) should be put in here,
# the last one (short git commit hash) will automatically be appended
version_base = "5.5.1046";
rev = "76cd51d44c0a61e3905c35cadb2ec5f54f3e42d1";
in
stdenv.mkDerivation rec {
pname = "kaldi";
version = "${version_base}-${lib.substring 0 5 rev}";
in
stdenv.mkDerivation rec {
pname = "kaldi";
version = "5.5.1046";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That separates the version base from the git revision which makes it more likely that an update forgets to update the version base. Also, I think the comment should stay there because it’s not very intuitive to see what exactly 1046 is and how to get that for different (newer) versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not do this for versions in nixpkgs. Please refer to the contributing guide for details

];
});

# pinned in kaldi/tools/Makefile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to pin this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is for upstream to decide. How could this be not pinned in nixpkgs? Every fetcher/FOD needs an output hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same number as in line 14, is it?

pkgs/tools/audio/vosk/kaldi.nix Outdated Show resolved Hide resolved
pkgs/tools/audio/vosk/kaldi.nix Show resolved Hide resolved
pkgs/tools/audio/vosk-server/default.nix Outdated Show resolved Hide resolved
};
interface = mkOption {
type = types.str;
default = "0.0.0.0";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default = "0.0.0.0";
default = "127.0.0.1";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.0.0.0 is the upstream default, which is why I copied it. Because there is a firewall active by default on NixOS, there should be no security issues with keeping that default. Also, many other services are listening on all interfaces by default. Is there a reason why this service in particular should only listen on localhost?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a secure default even if firewall is disabled. Otherwise, if we really want this: if config.networking.firewall.enable then "0.0.0.0" else "127.0.0.1" should be the good option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do such a confusing conditional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, we should have 127.0.0.1 by default IMHO.

Copy link
Contributor

@cleeyv cleeyv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I successfully did a test deployment of this module as a transcription service for the Jigasi component of Jitsi Meet! Everything went smoothly on the Vosk side of this deployment so I don't have any changes to suggest here based on that test. The resulting transcription output of a Jitsi Meet conference is kind of buggy, but this is consistent with an existing upstream issue so doesn't seem related to this package/module: alphacep/vosk-server#89

One thing to note for anyone else trying to deploy this is that it uses a lot of memory. I had problem from hitting memory limits on an 8GB VM and had to redeploy on a 16GB one. Looking at performance graphs, it seems to be using max of nearly 10GB for the first ~5mins after the initial deployment (the starting of the vosk service) and then it drops down to a persistent usage of closer to 9GB. If we add higher level documentation of how to use this service with Jigasi and Jitsi Meet then I think it will be worth mentioning this memory usage, but I don't see a requirement for it in this PR.

@bobby285271 bobby285271 added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Aug 24, 2022
@sbruder
Copy link
Contributor Author

sbruder commented Aug 24, 2022

@cleeyv The high memory usage is definitely something that should be noted, maybe in the description of the model option. As an alternative for people with less memory, we could also additionally package the small models for languages for which Vosk provides high-quality (but large in size and memory consumption) models. I currently only packaged the large models for those languages. The NixOS integration test currently uses a small model for (among others) that reason, because I could not get it to work with less than 8 GiB of memory, which is more than any other currently existing test in nixpkgs uses.

@dit7ya
Copy link
Member

dit7ya commented Sep 2, 2022

Looking forward to this! A working offline ASR opens up so many possibilities.

@sbruder sbruder force-pushed the vosk branch 2 times, most recently from 237c6a2 to fde0714 Compare September 16, 2022 11:47
@sbruder sbruder mentioned this pull request Sep 16, 2022
13 tasks
@bobby285271 bobby285271 removed the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Sep 16, 2022
@SuperSandro2000 SuperSandro2000 added the 2.status: merge conflict This PR has merge conflicts with the target branch label Oct 3, 2022
Copy link
Member

@RaitoBezarius RaitoBezarius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the amazing packaging.
I would need you to move the release notes to 23.05 and some minor things left to be addressed. Other than this, I feel like we can merge it.

version_base = "5.5.1046";
rev = "76cd51d44c0a61e3905c35cadb2ec5f54f3e42d1";
in
stdenv.mkDerivation rec {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately, we might want to override the already existing kaldi package (either src or patches) like in https://github.com/NixOS/nixpkgs/pull/185148/files#diff-a4997bb1585454092e3e7a29bb551970a156f4c7fb1135984c17303027f022e5R10-R32

That way we will not have unnecessary duplication of code that will inevitably get out of sync.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-03-20-nixpkgs-architecture-team-meeting-33/26547/2

Comment on lines +259 to +265
<listitem>
<para>
<link xlink:href="https://alphacephei.com/vosk/">vosk</link>,
an offline speech recognition service. Available as
<link linkend="opt-services.vosk.enable">services.vosk</link>.
</para>
</listitem>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase and delete this change because the file got deleted


installPhase = ''
install -D libvosk${stdenv.hostPlatform.extensions.sharedLibrary} $out/lib/libvosk${stdenv.hostPlatform.extensions.sharedLibrary}
'';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'';
'';

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Sep 7, 2023
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 19, 2024
@wegank wegank marked this pull request as draft March 20, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: python 8.has: changelog 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10 11.by: package-maintainer This PR was created by the maintainer of the package it changes 12.approvals: 1 This PR was reviewed and approved by one reputable person
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants