Merge pull request #4 from IlgarLunin/update-5.2

IlgarLunin · Jan 22, 2023 · 7f883c4 · 7f883c4
2 parents fbe627b + 2ce7fd4
commit 7f883c4
Show file tree

Hide file tree

Showing 12 changed files with 142 additions and 56 deletions.
diff --git a/README.md b/README.md
@@ -6,7 +6,74 @@ This is UE4 wrapper for Google's [Cloud Text-to-Speech](https://cloud.google.com
 
 Plugin was battle tested in several commercial simulator projects. It is small, lean and simple to use.
 
-# Preparation
+# Table of contents
+- [**UE4 Google Speech Kit**](#ue4-google-speech-kit)
+- [Table of contents](#table-of-contents)
+- [Engine preparation](#engine-preparation)
+- [Cloud preparation](#cloud-preparation)
+- [Speech synthesis](#speech-synthesis)
+- [Speech recognition](#speech-recognition)
+  - [Grant permissions](#grant-permissions)
+    - [Windows](#windows)
+    - [Mac](#mac)
+    - [Android](#android)
+  - [Voice capture and speech recognition](#voice-capture-and-speech-recognition)
+- [Utilities](#utilities)
+  - [Percentage based string comparison (Fuzzy matching)](#percentage-based-string-comparison-fuzzy-matching)
+  - [Listing available capture devices](#listing-available-capture-devices)
+- [Supported platforms](#supported-platforms)
+- [Migration guide](#migration-guide)
+  - [Version 3.0](#version-30)
+- [Links](#links)
+
+# Engine preparation
+
+To make microphone work, you need to add following lines to `DefaultEngine.ini` of the project.
+```
+[Voice]
+bEnabled=true
+```
+
+To not loose pauses in between words, you probably want to check silence detection treshold `voice.SilenceDetectionThreshold`, value `0.01` is good.
+This also goes to `DefaultEngine.ini`.
+
+```
+[SystemSettings]
+voice.SilenceDetectionThreshold=0.01
+```
+Starting from Engine version 4.25 also put
+```
+voice.MicNoiseGateThreshold=0.01
+```
+
+Another voice related variables worth playing with
+```bash
+voice.MicNoiseGateThreshold
+voice.MicInputGain
+voice.MicStereoBias
+voice.MicNoiseAttackTime
+voice.MicNoiseReleaseTime
+voice.MicStereoBias
+voice.SilenceDetectionAttackTime
+voice.SilenceDetectionReleaseTime
+```
+
+To find available settings type `voice.` in editor console, and autocompletion widget will pop up.
+
+![](pics/voicesettings.png)
+
+Console variables can be modified in runtime like this
+
+![](pics/silencenode.png)
+
+To debug your microphone input you can convert output sound buffer to
+unreal sound wave and play it.
+
+![](pics/buffertosound.png)
+
+Above values may differ depending on actual microphone characteristics.
+
+# Cloud preparation
 1) Go to [google cloud](https://console.cloud.google.com) and create payment account.
 2) Enable [Cloud Speech-to-Text API](https://console.cloud.google.com/apis/library/speech.googleapis.com) and [Cloud Text-to-Speech API](https://console.cloud.google.com/apis/library/texttospeech.googleapis.com).
 3) Create credentials to access your enabled APIs. See instructions [here](https://cloud.google.com/docs/authentication).
@@ -34,7 +101,7 @@ sound wave object which can be played by engine.
 
 ![](pics/googletts.png)
 
-## Bonus!
+<!-- ## Bonus!
 
 Output raw samles can be used with oculus ovr lipsync in runtime.
 
@@ -44,82 +111,106 @@ Get node [here](https://github.com/IlgarLunin/UE4OVRLipSyncCookFrameSequence).
 
 Demo:
 
-[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/B78aQly2wrI/0.jpg)](https://www.youtube.com/watch?v=B78aQly2wrI)
+[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/B78aQly2wrI/0.jpg)](https://www.youtube.com/watch?v=B78aQly2wrI) -->
 
 # Speech recognition
 
-Consists of two parts. First, we need to record voice from microphone. To do that, use provided **MicrophoneCapture**
-actor component as shown below. Next, construct recognition parameters and pass them to **Google STT** async node.
+Consists of two parts. Voice capture, and sending request. There are two ways how you can capture your voice, depending on your needs.
+
+## Grant permissions
+
+### Windows
+No actions needed
+### Mac
+1. In Xcode, select you project
+1. Go to `Info` tab
+1. Expand `Custom macOS Application Target Properties` section
+1. Hit `+`, and add `Privacy - Microphone Usage Description` string key, set any value you want, for example "GoogleSpeechKitMicAccess" 
+![](pics/microphone_access_xcode.png)
+### Android
+Call this somewhere on begin play
+1. Give [microphone access](https://blueprintue.com/blueprint/v-3i68vw/) (**android.permission.RECORD_AUDIO**)
+ ![](pics/mic_access_android.png)
+1. Give [disk read access](https://blueprintue.com/blueprint/myo1kxkf/) (**android.permission.READ_EXTERNAL_STORAGE**)
+ ![](pics/disk_read_access_android.png)
+1. Give [disk write access](https://blueprintue.com/blueprint/32f-40w8/)  (**android.permission.WRITE_EXTERNAL_STORAGE**)
+ ![](pics/disk_write_access_android.png)
+
+## Voice capture and speech recognition
+
+<!-- WINDOWS -->
+<details>
+  <summary>Windows only method (deprecated)</summary>
+
+
+Use provided **MicrophoneCapture** actor component as shown below. Next, construct recognition parameters and pass them to **Google STT** async node.
 
 ![](pics/googlestt.png)
 
-Note that you can pass microphone name to microphone capture component. To get list of available microphones, use following setup
+</details>
 
-![](pics/enumerate_microphones.png)
+---
 
-Another way to perform recognition is to use **Google STT Variants** node. Which, instead of returning result with highest confidence, returns an array of variants.
+<!-- MAC -->
+<details>
+  <summary>Cross platform method (use this instead)</summary>   
 
-![](pics/googlesttvariants.png)
+1. Create SoundMix.
+    1. Right click in content browser - `Sounds > Mix > Sound Soundmix`
+    2. Open it, and set output value to -96.0
+    ![](pics/sound_mix.png)
 
-Probably you will need to send voice commands to you app, to increase recognition chances use `CompareStrings` node. Below call will return 0.666 value,
-so we can treat those strings equal since they are simmilar on 66%.
+2. Create sound class
+    1. Right click in content browser - `Sounds > Classes > Sound Class`
+    2. Open it, and set our submix that we created in previous step as sound class default submix
 
-![](pics/compare.png)
+3. Make sure Audio Capture plugin is enabled
+    ![](pics/audio_capture_plugin.png)
+4. Go to your actor, and add AudioCapture component in components tab
+5. Disable "Auto Activate" option on AudioCapture
+6. Set our sound class to AudioCapture
+    ![](pics/audio_capture_sound_class.png)
 
-# Important steps
+7. Now we can drop some nodes. In order to start and stop recording, we use `Activate` and `Deactivate` nodes with previously added AudioCapture component as a target. When audio capture is activated, we can start recording output to our submix
+8. When audio capture is deactivated, we finish recording output to `Wav File`! **This is important**! Give your wav file a name (e.g. "stt_sample"), `Path` can be absolute, or relative (to the /Saved/BouncedWavFiles folder)
+![](pics/start_stop_recording_set_submix.png)
+1. Then, after small delay, we can read saved file back as byte samples, ready to be fed to `Google STT` node. Delay is needed since "Finish Recording Output" node writes sound to disk, file write operation takes some time, if we will proceed immediately, ReadWaveFile node will fail
+![](pics/read_back.png)
 
-To make microphone work, you need to add following lines to `DefaultEngine.ini` of the project.
-```
-[Voice]
-bEnabled=true
-```
+Here is the whole setup
 
-To not loose pauses in between words, you probably want to check silence detection treshold `voice.SilenceDetectionThreshold`, value `0.01` is good.
-This also goes to `DefaultEngine.ini`.
+![](pics/setup_all.png)
 
-```
-[SystemSettings]
-voice.SilenceDetectionThreshold=0.01
-```
-Starting from Engine version 4.25 also put
-```
-voice.MicNoiseGateThreshold=0.01
-```
+</details>
 
-Another voice related variables worth playing with
-```bash
-voice.MicNoiseGateThreshold
-voice.MicInputGain
-voice.MicStereoBias
-voice.MicNoiseAttackTime
-voice.MicNoiseReleaseTime
-voice.MicStereoBias
-voice.SilenceDetectionAttackTime
-voice.SilenceDetectionReleaseTime
-```
+---
 
-To find available settings type `voice.` in editor console, and autocompletion widget will pop up.
 
-![](pics/voicesettings.png)
+There is another STT node - **Google STT Variants** node. Which, instead of returning result with highest confidence, returns an array of variants.
 
-Console variables can be modified in runtime like this
+![](pics/googlesttvariants.png)
 
-![](pics/silencenode.png)
+# Utilities
+## Percentage based string comparison (Fuzzy matching)
 
-To debug your microphone input you can convert output sound buffer to
-unreal sound wave and play it.
+Probably, you will need to process recognised voice in your app, to increase recognition chances use `CompareStrings` node. Below call will return 0.666 value,
+so we can treat those strings equal since they are simmilar on 66%. Utilizes [Levenstein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) algorithm
 
-![](pics/buffertosound.png)
+![](pics/compare.png)
 
-Above values may differ depending on actual microphone characteristics.
+## Listing available capture devices
+
+You can pass microphone name to microphone capture component. To get list of available microphones, use following setup
+
+![](pics/enumerate_microphones.png)
 
-# Platforms supported
+# Supported platforms
 
-**Windows** and **Mac**.
+**Windows**, **Mac** and **Android**.
 
 # Migration guide
-<details>
-<summary>Version 3.0</summary>
+
+## Version 3.0
 
 `EGoogleTTSLanguage` was removed. You need to pass [voice name](https://cloud.google.com/text-to-speech/docs/voices) as string (**Voice name** column).
 
@@ -133,12 +224,7 @@ Above values may differ depending on actual microphone characteristics.
 
 The reason for this is that the number of languages has exceeded 256, and we can't put this amount into 8 bit enums (This is Unreal's limitation)
 
-
-
-</details>
-
 # Links
-Find out more in documentation for corresponding sections.
 * [Supported TTS voices](https://cloud.google.com/text-to-speech/docs/voices) ([WaveNet](https://en.wikipedia.org/wiki/WaveNet) are the best)
 * [Speech synthesis config](https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig)
 * [Supported STT languages](https://cloud.google.com/speech-to-text/docs/languages)

diff --git a/pics/audio_capture_plugin.png b/pics/audio_capture_plugin.png
diff --git a/pics/audio_capture_sound_class.png b/pics/audio_capture_sound_class.png
diff --git a/pics/disk_read_access_android.png b/pics/disk_read_access_android.png
diff --git a/pics/disk_write_access_android.png b/pics/disk_write_access_android.png
diff --git a/pics/mic_access_android.png b/pics/mic_access_android.png
diff --git a/pics/microphone_access_xcode.png b/pics/microphone_access_xcode.png
diff --git a/pics/read_back.png b/pics/read_back.png
diff --git a/pics/setup_all.png b/pics/setup_all.png
diff --git a/pics/sound_class.png b/pics/sound_class.png
diff --git a/pics/sound_mix.png b/pics/sound_mix.png
diff --git a/pics/start_stop_recording_set_submix.png b/pics/start_stop_recording_set_submix.png