Skip to content

Commit

Permalink
Merge pull request #4 from IlgarLunin/update-5.2
Browse files Browse the repository at this point in the history
  • Loading branch information
IlgarLunin authored Jan 22, 2023
2 parents fbe627b + 2ce7fd4 commit 7f883c4
Show file tree
Hide file tree
Showing 12 changed files with 142 additions and 56 deletions.
198 changes: 142 additions & 56 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,74 @@ This is UE4 wrapper for Google's [Cloud Text-to-Speech](https://cloud.google.com

Plugin was battle tested in several commercial simulator projects. It is small, lean and simple to use.

# Preparation
# Table of contents
- [**UE4 Google Speech Kit**](#ue4-google-speech-kit)
- [Table of contents](#table-of-contents)
- [Engine preparation](#engine-preparation)
- [Cloud preparation](#cloud-preparation)
- [Speech synthesis](#speech-synthesis)
- [Speech recognition](#speech-recognition)
- [Grant permissions](#grant-permissions)
- [Windows](#windows)
- [Mac](#mac)
- [Android](#android)
- [Voice capture and speech recognition](#voice-capture-and-speech-recognition)
- [Utilities](#utilities)
- [Percentage based string comparison (Fuzzy matching)](#percentage-based-string-comparison-fuzzy-matching)
- [Listing available capture devices](#listing-available-capture-devices)
- [Supported platforms](#supported-platforms)
- [Migration guide](#migration-guide)
- [Version 3.0](#version-30)
- [Links](#links)

# Engine preparation

To make microphone work, you need to add following lines to `DefaultEngine.ini` of the project.
```
[Voice]
bEnabled=true
```

To not loose pauses in between words, you probably want to check silence detection treshold `voice.SilenceDetectionThreshold`, value `0.01` is good.
This also goes to `DefaultEngine.ini`.

```
[SystemSettings]
voice.SilenceDetectionThreshold=0.01
```
Starting from Engine version 4.25 also put
```
voice.MicNoiseGateThreshold=0.01
```

Another voice related variables worth playing with
```bash
voice.MicNoiseGateThreshold
voice.MicInputGain
voice.MicStereoBias
voice.MicNoiseAttackTime
voice.MicNoiseReleaseTime
voice.MicStereoBias
voice.SilenceDetectionAttackTime
voice.SilenceDetectionReleaseTime
```

To find available settings type `voice.` in editor console, and autocompletion widget will pop up.

![](pics/voicesettings.png)

Console variables can be modified in runtime like this

![](pics/silencenode.png)

To debug your microphone input you can convert output sound buffer to
unreal sound wave and play it.

![](pics/buffertosound.png)

Above values may differ depending on actual microphone characteristics.

# Cloud preparation
1) Go to [google cloud](https://console.cloud.google.com) and create payment account.
2) Enable [Cloud Speech-to-Text API](https://console.cloud.google.com/apis/library/speech.googleapis.com) and [Cloud Text-to-Speech API](https://console.cloud.google.com/apis/library/texttospeech.googleapis.com).
3) Create credentials to access your enabled APIs. See instructions [here](https://cloud.google.com/docs/authentication).
Expand Down Expand Up @@ -34,7 +101,7 @@ sound wave object which can be played by engine.

![](pics/googletts.png)

## Bonus!
<!-- ## Bonus!
Output raw samles can be used with oculus ovr lipsync in runtime.
Expand All @@ -44,82 +111,106 @@ Get node [here](https://github.com/IlgarLunin/UE4OVRLipSyncCookFrameSequence).
Demo:
[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/B78aQly2wrI/0.jpg)](https://www.youtube.com/watch?v=B78aQly2wrI)
[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/B78aQly2wrI/0.jpg)](https://www.youtube.com/watch?v=B78aQly2wrI) -->

# Speech recognition

Consists of two parts. First, we need to record voice from microphone. To do that, use provided **MicrophoneCapture**
actor component as shown below. Next, construct recognition parameters and pass them to **Google STT** async node.
Consists of two parts. Voice capture, and sending request. There are two ways how you can capture your voice, depending on your needs.

## Grant permissions

### Windows
No actions needed
### Mac
1. In Xcode, select you project
1. Go to `Info` tab
1. Expand `Custom macOS Application Target Properties` section
1. Hit `+`, and add `Privacy - Microphone Usage Description` string key, set any value you want, for example "GoogleSpeechKitMicAccess"
![](pics/microphone_access_xcode.png)
### Android
Call this somewhere on begin play
1. Give [microphone access](https://blueprintue.com/blueprint/v-3i68vw/) (**android.permission.RECORD_AUDIO**)
![](pics/mic_access_android.png)
1. Give [disk read access](https://blueprintue.com/blueprint/myo1kxkf/) (**android.permission.READ_EXTERNAL_STORAGE**)
![](pics/disk_read_access_android.png)
1. Give [disk write access](https://blueprintue.com/blueprint/32f-40w8/) (**android.permission.WRITE_EXTERNAL_STORAGE**)
![](pics/disk_write_access_android.png)

## Voice capture and speech recognition

<!-- WINDOWS -->
<details>
<summary>Windows only method (deprecated)</summary>


Use provided **MicrophoneCapture** actor component as shown below. Next, construct recognition parameters and pass them to **Google STT** async node.

![](pics/googlestt.png)

Note that you can pass microphone name to microphone capture component. To get list of available microphones, use following setup
</details>

![](pics/enumerate_microphones.png)
---

Another way to perform recognition is to use **Google STT Variants** node. Which, instead of returning result with highest confidence, returns an array of variants.
<!-- MAC -->
<details>
<summary>Cross platform method (use this instead)</summary>

![](pics/googlesttvariants.png)
1. Create SoundMix.
1. Right click in content browser - `Sounds > Mix > Sound Soundmix`
2. Open it, and set output value to -96.0
![](pics/sound_mix.png)

Probably you will need to send voice commands to you app, to increase recognition chances use `CompareStrings` node. Below call will return 0.666 value,
so we can treat those strings equal since they are simmilar on 66%.
2. Create sound class
1. Right click in content browser - `Sounds > Classes > Sound Class`
2. Open it, and set our submix that we created in previous step as sound class default submix

![](pics/compare.png)
3. Make sure Audio Capture plugin is enabled
![](pics/audio_capture_plugin.png)
4. Go to your actor, and add AudioCapture component in components tab
5. Disable "Auto Activate" option on AudioCapture
6. Set our sound class to AudioCapture
![](pics/audio_capture_sound_class.png)

# Important steps
7. Now we can drop some nodes. In order to start and stop recording, we use `Activate` and `Deactivate` nodes with previously added AudioCapture component as a target. When audio capture is activated, we can start recording output to our submix
8. When audio capture is deactivated, we finish recording output to `Wav File`! **This is important**! Give your wav file a name (e.g. "stt_sample"), `Path` can be absolute, or relative (to the /Saved/BouncedWavFiles folder)
![](pics/start_stop_recording_set_submix.png)
1. Then, after small delay, we can read saved file back as byte samples, ready to be fed to `Google STT` node. Delay is needed since "Finish Recording Output" node writes sound to disk, file write operation takes some time, if we will proceed immediately, ReadWaveFile node will fail
![](pics/read_back.png)

To make microphone work, you need to add following lines to `DefaultEngine.ini` of the project.
```
[Voice]
bEnabled=true
```
Here is the whole setup

To not loose pauses in between words, you probably want to check silence detection treshold `voice.SilenceDetectionThreshold`, value `0.01` is good.
This also goes to `DefaultEngine.ini`.
![](pics/setup_all.png)

```
[SystemSettings]
voice.SilenceDetectionThreshold=0.01
```
Starting from Engine version 4.25 also put
```
voice.MicNoiseGateThreshold=0.01
```
</details>

Another voice related variables worth playing with
```bash
voice.MicNoiseGateThreshold
voice.MicInputGain
voice.MicStereoBias
voice.MicNoiseAttackTime
voice.MicNoiseReleaseTime
voice.MicStereoBias
voice.SilenceDetectionAttackTime
voice.SilenceDetectionReleaseTime
```
---

To find available settings type `voice.` in editor console, and autocompletion widget will pop up.

![](pics/voicesettings.png)
There is another STT node - **Google STT Variants** node. Which, instead of returning result with highest confidence, returns an array of variants.

Console variables can be modified in runtime like this
![](pics/googlesttvariants.png)

![](pics/silencenode.png)
# Utilities
## Percentage based string comparison (Fuzzy matching)

To debug your microphone input you can convert output sound buffer to
unreal sound wave and play it.
Probably, you will need to process recognised voice in your app, to increase recognition chances use `CompareStrings` node. Below call will return 0.666 value,
so we can treat those strings equal since they are simmilar on 66%. Utilizes [Levenstein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) algorithm

![](pics/buffertosound.png)
![](pics/compare.png)

Above values may differ depending on actual microphone characteristics.
## Listing available capture devices

You can pass microphone name to microphone capture component. To get list of available microphones, use following setup

![](pics/enumerate_microphones.png)

# Platforms supported
# Supported platforms

**Windows** and **Mac**.
**Windows**, **Mac** and **Android**.

# Migration guide
<details>
<summary>Version 3.0</summary>

## Version 3.0

`EGoogleTTSLanguage` was removed. You need to pass [voice name](https://cloud.google.com/text-to-speech/docs/voices) as string (**Voice name** column).

Expand All @@ -133,12 +224,7 @@ Above values may differ depending on actual microphone characteristics.

The reason for this is that the number of languages has exceeded 256, and we can't put this amount into 8 bit enums (This is Unreal's limitation)



</details>

# Links
Find out more in documentation for corresponding sections.
* [Supported TTS voices](https://cloud.google.com/text-to-speech/docs/voices) ([WaveNet](https://en.wikipedia.org/wiki/WaveNet) are the best)
* [Speech synthesis config](https://cloud.google.com/text-to-speech/docs/reference/rest/v1/text/synthesize#audioconfig)
* [Supported STT languages](https://cloud.google.com/speech-to-text/docs/languages)
Expand Down
Binary file added pics/audio_capture_plugin.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/audio_capture_sound_class.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/disk_read_access_android.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/disk_write_access_android.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/mic_access_android.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/microphone_access_xcode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/read_back.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/setup_all.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/sound_class.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/sound_mix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added pics/start_stop_recording_set_submix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 7f883c4

Please sign in to comment.