diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..cd4c0b9 --- /dev/null +++ b/404.html @@ -0,0 +1,799 @@ + + + +
+ + + + + + + + + + + + + + + + + + +When working with voice-driven applications, a robust and flexible architecture for handling both speech recognition and synthesis is vital. The Stark framework provides these features via interfaces (protocols) that can be easily extended and customized. This page dives deeper into the Stark framework's speech interface protocols and provides details on their implementation.
+@runtime_checkable
+class SpeechRecognizerDelegate(Protocol):
+ async def speech_recognizer_did_receive_final_result(self, result: str): pass
+ async def speech_recognizer_did_receive_partial_result(self, result: str): pass
+ async def speech_recognizer_did_receive_empty_result(self): pass
+
+@runtime_checkable
+class SpeechRecognizer(Protocol):
+ is_recognizing: bool
+ delegate: SpeechRecognizerDelegate | None
+
+ async def start_listening(self): pass
+ def stop_listening(self): pass
+
This protocol provides callback methods to output results of various states of the speech recognition:
+speech_recognizer_did_receive_final_result
: Triggered when a final transcript is available.speech_recognizer_did_receive_partial_result
: Fired upon receiving an interim transcript.speech_recognizer_did_receive_empty_result
: Called when no speech was detected.This protocol defines the primary input interface for any speech recognition implementation. It consists of:
+is_recognizing
: A flag indicating if the recognizer is currently active.delegate
: An instance responsible for handling the recognition results.start_listening
: A method to initiate the listening process.stop_listening
: A method to halt the listening process.To illustrate a custom implementation, we can reference the VoskSpeechRecognizer
. This implementation leverages the Vosk offline speech recognition library. It downloads and initializes the Vosk model, sets up an audio queue, and provides methods to start and stop the recognition process.
For a deeper understanding, review the source code of the VoskSpeechRecognizer
implementation.
@runtime_checkable
+class SpeechSynthesizerResult(Protocol):
+ async def play(self): pass
+
+@runtime_checkable
+class SpeechSynthesizer(Protocol):
+ async def synthesize(self, text: str) -> SpeechSynthesizerResult: pass
+
SpeechSynthesizerResult: This protocol defines a structure for the output of the speech synthesis process. It provides a method, play
, to audibly present the synthesized speech.
SpeechSynthesizer: This protocol represents the primary interface for any speech synthesis implementation. It contains:
+synthesize
: An asynchronous method that takes text input and returns a SpeechSynthesizerResult
instance.For a hands-on example, the SileroSpeechSynthesizer
and GCloudSpeechSynthesizer
classes illustrate how one might implement the synthesizer protocol using the Silero models and Google Cloud Text-to-Speech services, respectively.
To gain more insights, you can check the source code of the SileroSpeechSynthesizer
implementation.
In this approach, you leverage the terminal or command line of a computer as the interface for both speech recognition and synthesis. Instead of speaking into a microphone and receiving audio feedback:
+Recognition: Users type their queries or commands into the terminal. The system then processes these textual inputs as if they were transcribed from spoken words.
+Synthesis: Instead of "speaking" or playing synthesized voice, the system displays the response as text in the terminal. This creates a chat-like experience directly within the terminal.
+This is an excellent method for debugging, quick testing, or when dealing with environments where audio interfaces aren't feasible.
+The GUI (Graphical User Interface) provides an intuitive and interactive way to implement custom speech interfaces for voice assistants. It offers a multifaceted experience, allowing users to:
+Text Outputs: Display text-based responses, enabling clear communication with users through written messages.
+Context Visualization: Visualize context and relevant information using graphics, charts, or interactive elements to enhance user understanding.
+Text and Speech Input: Accept input through both text and speech, allowing users to interact in the manner most convenient for them.
+Trigger with Buttons: Incorporate buttons or interactive elements that users can click or tap to initiate voice assistant interactions, providing a user-friendly interface.
+The GUI interface serves as a versatile canvas for crafting engaging voice assistant experiences, making it an excellent choice for applications where graphical interaction enhances user engagement and comprehension.
+Telegram, a popular messaging platform, provides an amazing bot API that developers can use to create custom bots. By leveraging this API, you can emulate speech interfaces in two distinct ways:
+Recognition: Users send voice messages to the Telegram bot. These voice messages can be transcribed into text using a speech recognition system. The recognized text can then be processed further by the bot for commands or queries.
+Synthesis: Instead of sending back text responses, the bot can use a text-to-speech system to generate voice messages, which it then sends back to the users. This method provides a more authentic "voice assistant" experience within the messaging environment.
+By utilizing voice messages, you can create a more immersive experience for users, closely resembling interactions with traditional voice assistants.
+Recognition: Users send text messages to the Telegram bot. The bot then treats these messages as if they were the transcribed text of spoken words.
+Synthesis: Rather than synthesizing spoken responses, the bot sends back text messages as its replies. The users read these messages as if they were listening to the synthesized voice of the system.
+This approach offers a chat-like experience directly within the Telegram app, providing a seamless interaction that many users find intuitive.
+In both methods, the use of a Telegram bot allows developers to introduce voice command functionalities in messaging environments, reaching users on various devices and platforms.
+Venture in mind that these are mere illustrations of potential implementations. The canvas of possibilities is vast, bounded solely by the horizons of your creativity.
+ +STARK's flexibility and extensibility can be attributed to its ability to cater to various use cases and environments. An essential feature of the framework is the capacity to customize the run function. This allows developers to personalize the core functionality, integrating custom setups, or extending the capabilities of the framework.
+Below is a quick guide on how to understand and make use of the custom run function.
+The run
function in STARK serves as the primary entry point that sets up and commences the voice assistant.
import asyncer
+
+from stark.interfaces.protocols import SpeechRecognizer, SpeechSynthesizer
+from stark.core import CommandsContext, CommandsManager
+from stark.voice_assistant import VoiceAssistant
+from stark.general.blockage_detector import BlockageDetector
+
+
+async def run(
+ manager: CommandsManager,
+ speech_recognizer: SpeechRecognizer,
+ speech_synthesizer: SpeechSynthesizer
+):
+ async with asyncer.create_task_group() as main_task_group:
+ context = CommandsContext(
+ task_group = main_task_group,
+ commands_manager = manager
+ )
+ voice_assistant = VoiceAssistant(
+ speech_recognizer = speech_recognizer,
+ speech_synthesizer = speech_synthesizer,
+ commands_context = context
+ )
+ speech_recognizer.delegate = voice_assistant
+ context.delegate = voice_assistant
+
+ main_task_group.soonify(speech_recognizer.start_listening)()
+ main_task_group.soonify(context.handle_responses)()
+
+ detector = BlockageDetector()
+ main_task_group.soonify(detector.monitor)()
+
Let's dissect it:
+async def run(
+ manager: CommandsManager,
+ speech_recognizer: SpeechRecognizer,
+ speech_synthesizer: SpeechSynthesizer
+):
+
Parameters:
+manager
: An instance of CommandsManager
which holds all the commands that the voice assistant can recognize and process.speech_recognizer
: The implementation you've selected for speech recognition.speech_synthesizer
: The implementation you've chosen for speech synthesis.Here, a task group is created using asyncer
. Task groups allow you to manage several tasks concurrently.
A CommandsContext
is initialized. This holds the context in which commands are executed, including the associated task group and the command manager.
voice_assistant = VoiceAssistant(
+ speech_recognizer = speech_recognizer,
+ speech_synthesizer = speech_synthesizer,
+ commands_context = context
+)
+
The VoiceAssistant
is then created and initialized with the recognizer, synthesizer, and context.
Both the speech recognizer and the commands context are associated with the voice assistant as their delegates. This setup ensures that when the recognizer captures any speech or when there's a command response to handle, the voice assistant processes them.
+main_task_group.soonify(speech_recognizer.start_listening)()
+main_task_group.soonify(context.handle_responses)()
+
Tasks are added to the main task group: One to start the speech recognizer's listening process, and the other to handle responses from executed commands.
+ +A blockage detector is introduced and initialized. This mechanism ensures that any potential deadlocks or blocking calls within the async code are detected, allowing for smooth operation.
+Customizing the run
function provides a pathway to inject additional functionalities or to adapt the framework to specific needs.
For instance, you could:
+When customizing, ensure that you maintain the core structure, especially the initialization of the main components and the task group management. The ordering can be crucial, especially when setting delegates.
+To kickstart your customization, replicate the default run function as your foundation, and weave in your specific adjustments or additions as needed. Consequently, a "Hello, World" implementation with a custom run would appear as:
+import asyncer
+from stark import CommandsContext, CommandsManager, Response
+from stark.interfaces.protocols import SpeechRecognizer, SpeechSynthesizer
+from stark.interfaces.vosk import VoskSpeechRecognizer
+from stark.interfaces.silero import SileroSpeechSynthesizer
+from stark.voice_assistant import VoiceAssistant
+from stark.general.blockage_detector import BlockageDetector
+
+
+VOSK_MODEL_URL = "YOUR_CHOSEN_VOSK_MODEL_URL"
+SILERO_MODEL_URL = "YOUR_CHOSEN_SILERO_MODEL_URL"
+
+recognizer = VoskSpeechRecognizer(model_url=VOSK_MODEL_URL)
+synthesizer = SileroSpeechSynthesizer(model_url=SILERO_MODEL_URL)
+
+manager = CommandsManager()
+
+@manager.new('hello')
+async def hello_command() -> Response:
+ text = voice = 'Hello, world!'
+ return Response(text=text, voice=voice)
+
+async def run(
+ manager: CommandsManager,
+ speech_recognizer: SpeechRecognizer,
+ speech_synthesizer: SpeechSynthesizer
+):
+ async with asyncer.create_task_group() as main_task_group:
+ context = CommandsContext(
+ task_group = main_task_group,
+ commands_manager = manager
+ )
+ voice_assistant = VoiceAssistant(
+ speech_recognizer = speech_recognizer,
+ speech_synthesizer = speech_synthesizer,
+ commands_context = context
+ )
+ speech_recognizer.delegate = voice_assistant
+ context.delegate = voice_assistant
+
+ main_task_group.soonify(speech_recognizer.start_listening)()
+ main_task_group.soonify(context.handle_responses)()
+
+ detector = BlockageDetector()
+ main_task_group.soonify(detector.monitor)()
+
+async def main():
+ await run(manager, recognizer, synthesizer)
+
+if __name__ == '__main__':
+ asyncer.runnify(main)() # or anyio.run(main), same thing
+
With the adaptability of Stark, VA can be integrated with various external triggers to provide a flexible and dynamic user experience. In the STARK framework, the integration of external triggers is seamless and can greatly enhance the interactivity of the assistant.
+In this guide, we will walk through how to set up and use external triggers to activate the STARK Voice Assistant.
+The STARK framework provides a dedicated mode for external triggers: the "External" mode. When you set the VA mode to "external", it waits for an explicit trigger to activate the SpeechRecognizer
component.
Additionally, you can utilize the stop_after_interaction
property in custom modes:
When set to True
, this ensures that after the VA finishes its current interaction, it stops the SpeechRecognizer
, allowing for the next interaction to be initiated by an external trigger.
Details on the Voice Assistant page.
+start_listening()
¶Once the VA has stopped listening after an interaction, you can restart the SpeechRecognizer
using the start_listening()
method. This method serves as an entry point when you want to reactivate voice recognition after an external trigger.
Do note that you probably need to implement a custom run function to add cuncurrent process or create a separate thread.
+The beauty of external triggers lies in their versatility. Here are some ways to integrate them:
+A simple approach is to have a specific keyboard combination to activate Stark. Tools like Python's keyboard
library can help in detecting specific keypresses, enabling you to then call start_listening()
.
For those looking for a hands-free approach, integrating hardware can be a fascinating option. For instance, using an Arduino microphone module, you can set up a system where Stark activates upon a distinct sound pattern, like a double or triple clap.
+Wakeword detection is a popular approach in modern VAs. Using fast lightweight wakeword detectors like Picovoice's Porcupine, you can have your VA spring into action upon hearing a specific keyword or phrase.
+By embracing external triggers, you can elevate the adaptability and user experience of your voice assistant. Whether it's a simple keyboard shortcut or an intricate hardware setup, STARK's flexibility ensures that your VA is always ready and responsive, aligned with the needs of your user base.
+ +In the dynamic world of voice assistants and speech recognition, it's essential to account for the unpredictability of user input. Despite the comprehensive list of commands you may have configured, there will inevitably be instances where user utterances don't align with any predefined command. This is where the fallback command comes in.
+The fallback command in the STARK framework serves as a safety net, ensuring that when a user's voice input doesn't match any set command, there's still an appropriate and meaningful response.
+In the STARK framework, integrating a fallback command is streamlined. You can assign the fallback_command
to the CommandsContext
directly:
Here's a practical example:
+from stark.core.types import String
+...
+
+@manager.new('$string:String', hidden=True)
+async def fallback(string: String):
+ # Your fallback logic here
+ ...
+
+commands_context.fallback_command = fallback
+
In this example, any unrecognized string is directed to the fallback
function, allowing you to define how the system should respond.
With the rise of advanced language models like ChatGPT, it's now feasible to provide intelligent and contextually relevant responses even for unexpected user inputs. Integrating an LLM can elevate the user experience, making your voice assistant appear more intuitive and responsive.
+Fallbacks aren't limited to LLMs. You can get creative with your approach. Consider these options:
+Fallback commands are invaluable, ensuring your voice assistant remains responsive, intelligent, and user-friendly, even in the face of unexpected inputs. With the flexibility of STARK and the power of modern Large Language Models, creating a robust voice assistant has never been easier.
+ +Expecting Soon...
+In the meantime, if you're eager to contribute and expedite the growth of STARK, consider checking out the STARK PLACE. By contributing to the library, you can play a pivotal role in implementing features more rapidly and enhancing the framework for everyone.
+🔗 Interested in contributing? Dive into our Contribution and Shared Usage Guidelines for all the details!
+ +When it comes to Stark, or any software platform, optimization is pivotal to ensuring smooth and efficient operations. Here are some pivotal guidelines and best practices to ensure that Stark runs at its best:
+THE MOST IMPORTANT: Always ensure that you DO NOT place blocking code inside async def
functions. Blocking code can drastically reduce the performance of asynchronous applications by halting the execution of other parts of the application.
If you have commands that run blocking code, always define them using the simple def
(Sync-vs-Async). This ensures that Stark creates a separate worker thread to handle the execution of that command. By doing so, Stark remains responsive, even when processing resource-intensive commands.
Understanding the difference between synchronous and asynchronous code is crucial. Asynchronous code allows your application to perform other tasks while waiting for a particular task to complete, thus improving efficiency. The Sync-vs-Async page provides a comprehensive comparison and guidance on how to effectively leverage both.
+The asyncer documentation is a valuable resource. It provides an array of tools and methods to help convert synchronous code to asynchronous and vice-versa, aiding in the optimization process.
+If you need to call blocking synchronous code within an async def
function, utilize asyncer.asyncify
. It allows you to effectively run synchronous code inside an asynchronous function without blocking the entire event loop.
If you have multiple asynchronous tasks that can be executed concurrently, group them together and await them as one unit. This approach allows tasks to be run simultaneously, improving the overall speed of the function.
+async def task_one():
+ ...
+
+async def task_two():
+ ...
+
+# or
+import anyio
+async with anyio.create_task_group() as task_group:
+ task_group.start_soon(task_one)
+ task_group.start_soon(task_two)
+# or
+import asyncer
+async with asyncer.create_task_group() as task_group
+ task_group.soonify(task_one)()
+ task_group.soonify(task_one)()
+# or
+import asyncio
+await asyncio.gather(task_one(), task_two())
+
Caching is a practice of storing frequently used data or results in a location for quicker access in the future. By implementing caching, you can significantly reduce repetitive computations and database lookups, leading to faster response times. Python libraries like cachetools
or functools.lru_cache
are popular tools for caching.
Optimization is a continuous process. As Stark grows and evolves, always look out for opportunities to refine and streamline its operations. Remember, the key is to ensure Stark remains responsive and efficient, offering users a seamless and efficient voice assistant experience.
+ +Congratulations on navigating through the entirety of the STARK documentation! We've aimed to cover a comprehensive range of topics and scenarios to help you get the most out of the framework. However, the vast expanse of technology means there might always be nuances or specific use cases we might not have touched upon.
+If you've scoured the documentation and still haven't found the precise information you're looking for, consider the following resources:
+Source Code: Often, the code itself and it's tests can be the best documentation. Delve into the inner workings and intricacies of the STARK framework by perusing the source code.
+Issues & Discussions: Engage with the community and the developers. The Issues and Discussions sections can offer insights into known challenges, proposed enhancements, and community-contributed solutions.
+STARK PLACE Repository: Apart from the main repository, the STARK PLACE repo houses a plethora of shared modules, extensions, and utilities. It's an excellent place to find (or contribute) additional tools or modules that might be relevant to your needs.
+STARK is designed with flexibility at its core. If you come across a situation where the existing methods don't align perfectly with your requirements, remember:
+Should you choose to delve into the source, customize components, or even contribute to the repository, we're excited to have you on board, pushing the boundaries of what STARK can achieve. Here's to building, innovating, and advancing together!
+ +{"use strict";/*!
+ * escape-html
+ * Copyright(c) 2012-2013 TJ Holowaychuk
+ * Copyright(c) 2015 Andreas Lubbe
+ * Copyright(c) 2015 Tiancheng "Timothy" Gu
+ * MIT Licensed
+ */var _a=/["'&<>]/;Pn.exports=Aa;function Aa(e){var t=""+e,r=_a.exec(t);if(!r)return t;var o,n="",i=0,s=0;for(i=r.index;i