Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to be run on Windows platform #25

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

papoteur-mga
Copy link

I have clients who run on Windows.
Thus I propose this, which consists of:

  • using fmedia instead parec, by detecting that the platform is not posix;
  • using thread to get sound data instead of non blocking stream, which is a posix specific feature;
  • using pynput keyboard simulation - the option has to be passed.

Thus I got good results for my limited tests on a Windows 8 machine. And it still works on my Linux.
I used black and mypy passed.
The value of size block seems important for the reactivity. A big value like 1M give too long interval between results. A lower value like 1k seems to give a too high load. I put 16k which seems a good balance.
https://github.com/papoteur-mga/nerd-dictation/blob/1fb54dc4bd8fc389dbd18001c0185257d60c9fae/nerd-dictation#L99
If this is accepted, some documentation has to be edited.

@ideasman42
Copy link
Owner

Thanks for the PR, please squish noisy changes (for example, making sample rate a hard-coded value, then using the configured value again).

Papoteur added 2 commits October 24, 2021 09:35
A new option --simulate-input-method is added.
This replaces xdotool and needs pynput as new module.
Use 'auto' default input method.
Use thread instead of a non blocking stream which is specific to Linux
@papoteur-mga
Copy link
Author

Hello,
Should be OK to review now.

@ideasman42
Copy link
Owner

Ran some more tests on this and responsiveness seems to have dropped a little, from looking into this - it seems time.sleep(idle_time_test) is sleeping even when the queue has data to process.

Changing the read size in enqueue_output from 16384 to 1024 for example, stops input entirely on my system (dictation isn't working at all).

Besides this seeming error prone in general, higher sample rates could lock up recording.

The test to idle: idle_time > 0.0: (after calling exit_fn(...)) should only run when the queue is empty.

@papoteur-mga
Copy link
Author

I have moved the test to idle in the case where the queue of data is empty. This is indeed more responsive. Thus I use smaller chunks from the queue.

@ideasman42
Copy link
Owner

I have moved the test to idle in the case where the queue of data is empty. This is indeed more responsive. Thus I use smaller chunks from the queue.

This change makes the idle-delay not fulfill it's original purpose.

  • Whenever "read" is called on the stdout there will likely be some data.
  • Continually reading a few bytes from the stdout in a loop causes high CPU usage (on one core at least).
  • First reading, then idling means idle may never happen or ...
  • Depend on the buffering behavior of the program generating the output.

So I believe idle should only be changed to check the queue is empty, to allow time to build up the record buffer (defaulting to 1/10th of a second, enough to feel responsive without high CPU usage).

@ideasman42
Copy link
Owner

BTW, did you check on non-blocking reads for WIN32?
https://stackoverflow.com/questions/2408560/non-blocking-console-input

@papoteur-mga
Copy link
Author

I have tested it on Windows 8 only, I don't have Win32 here.

… recorder to 1024.

This improves responsiveness  and allows to have smaller chunks from recorder
@papoteur-mga
Copy link
Author

I displaced the test.
This still works with responsiveness.

@ideasman42
Copy link
Owner

ideasman42 commented Oct 30, 2021

I've been considering this PR and something that doesn't sit well with me is that it complicates the code by adding threading which is only needed for Windows, which I don't use (and can't easily test).

Using threading (in my opinion) makes the code more complex and less hackable (which is a goal of this project I would like to keep prioritized).

So I would rather the windows support instead, replace the pipe read with a file-like object that manages reads that don't block on windows.
(this class can use asyncio or threading internally, whatever is needed, but it should be isolated to the class and only used on windows).

This would allow windows support without complicating logic for Linux.

@asamwow
Copy link

asamwow commented Nov 19, 2021

I kinda agree with @ideasman42 here, but I still wanted to say thank you very much for the code @papoteur-mga
Also, with this snippet, I can use almost the same config file as I do with #27

diff --git a/nerd-dictation b/nerd-dictation
index a3cbad8..7f23be6 100755
--- a/nerd-dictation
+++ b/nerd-dictation
@@ -613,6 +613,40 @@ def text_from_vosk_pipe(
     # Track this to prevent excessive load when the "partial" result doesn't change.
     json_text_partial_prev = ""
 
+    def windows_parse_key(key):
+        from pynput.keyboard import Key
+        if (key == "alt"):
+            return Key.alt
+        if (key == "control"):
+            return Key.ctrl
+        if key == "Backspace":
+            return Key.backspace
+        if key == "Tab":
+            return Key.tab
+        if key == "enter":
+            return Key.enter
+        return None
+
+    def windows_command_hack(command, kbd):
+        from pynput.keyboard import Key
+        args = command.split('+')
+        leader_key = windows_parse_key(args[0])
+        if len(args) > 1 and leader_key != None:
+            with kbd.pressed(leader_key):
+                follower_key = windows_parse_key(args[1]);
+                if follower_key == None:
+                    kbd.type(args[1])
+                else:
+                    kbd.press(follower_key)
+                    kbd.release(follower_key)
+            return
+        else:
+            if leader_key != None:
+                kbd.press(leader_key)
+                kbd.release(leader_key)
+                return
+        kbd.type(command)
+
     def handle_fn_wrapper(text: str, is_partial_arg: bool) -> None:
         nonlocal handled_any
         # Simple deferred text input, just accumulate values in a list (finish entering text on exit).
@@ -653,8 +687,10 @@ def text_from_vosk_pipe(
 
         if not is_partial_arg:
             if macro is not None:
+                from pynput.keyboard import Controller  # type: ignore
+                kbd = Controller();
                 for command in macro:
-                    subprocess.check_output(command).decode("utf-8")
+                    windows_command_hack(command, kbd)
             if progressive_continuous:
                 text_prev = ""
             else:

so I can work from my windows work machine, and my home linux machine! and only one config file. sweeeeet.

@ideasman42 ideasman42 force-pushed the master branch 8 times, most recently from 569fdae to 4a922ad Compare January 9, 2022 09:28
@ideasman42 ideasman42 force-pushed the master branch 5 times, most recently from 37a2db6 to d56e61e Compare May 16, 2022 12:50
@ideasman42 ideasman42 force-pushed the master branch 3 times, most recently from f72c0d1 to 5f3b3e9 Compare May 16, 2022 23:45
@ideasman42 ideasman42 force-pushed the master branch 3 times, most recently from 2436242 to 72eef57 Compare November 3, 2022 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants