Releases · KoljaB/TurnVoice

10 Oct 15:37

KoljaB

v0.0.7

71a27c7

v0.0.7 Latest

Latest

added error message for missing spleeter installation
upgraded every dependency to latest version
some updates to Readme (CUDA 12.1, torch version, troubleshoot)

Assets 2

20 Dec 20:09

KoljaB

v0.0.65

329b4f3

v0.0.65

added --faster parameter to select faster_whisper for timestamp transcription instead of stable_whisper (stable takes lot of resources esp on longer videos)
added --model parameter to select model for transcription. can be 'tiny', 'tiny.en', 'base', 'base.en', 'small', 'small.en', 'medium', 'medium.en', 'large-v1', 'large-v2', 'large-v3', or 'large'
updated to Coqui TTS v0.22.0 which enables access to 58 free predefined speaker voices

Assets 2

18 Dec 16:11

KoljaB

v0.0.60

43f9f5e

v0.0.60

switched from Deezer's Spleeter to Facebook Demux, reasons:
- better vocal splitting quality
- ability to more solid handle >10 min files
crossfade-algorithm to switch between original and vocalstripped audio more seamlessly
usage of stable_whisper timestamp refinement technique to achieve higher timestamp detection precision
new javascript Renderscript-Editor to finetune speaking timings, text and speaker assignment

Assets 2

15 Dec 20:25

KoljaB

v0.0.50

4c6e512

v0.0.50

added --prepare to write a full script including text, speakers and timestamps
```
turnvoice https://www.youtube.com/watch?v=2N3PsXPdkmM --prepare
```

added --render to read back such a script and generate the final video from it:

turnvoice https://www.youtube.com/watch?v=2N3PsXPdkmM --render "downloads\my_video_name\full_script.txt"

improved audio quality output

Assets 2

12 Dec 23:31

KoljaB

v0.0.45

83135a1

v0.0.45

added --prompt to to change speaking style

Example:

turnvoice https://www.youtube.com/watch?v=K89dChsgznw --prompt "speaking style of captain jack sparrow"

Assets 2

12 Dec 12:22

KoljaB

v0.0.41

f264390

v0.0.41

using deep-translator instead of NLLB-200-600M now so we don't need the CC-BY-NC License and also don't need to download, load and unload a heavyweight translation model anymore

^{(Deep-translator seems good to use for free. I think there is a way better and more general solution which I roughly have in mind. Some problems to solve yet but I guess I can make a quite significant upgrade to this in the coming days)}

Assets 2

12 Dec 00:18

KoljaB

v0.0.40

303230c

v0.0.40

added Elevenlabs, Azure, OpenAI TTS and System TTS as synthesis engines to select from
added possibility to feed a local video instead of a youtube video
added possibility to replace multiple speaker voices at once (submit more than one voice)
added possibility to submit own speaker timefiles (in the format of the created speaker1.txt, speaker2.txt etc timefiles) to finetune multiple speaker rendering

Assets 2

08 Dec 00:30

KoljaB

v0.0.30

c45330e

v0.0.30

added lots of stuff to the algorithm:
- we unload the transcription model completely from the GPU after the first main transcription
- we then load the synthesis in a freshly cleaned VRAM and start it to take as much VRAM as it wants, because this is our bottleneck
- after the first synthesis we lazy load the transcription model AGAIN
- we can then transcript the synthesis and verify it using measuring text distance (with levenshtein and jaro winkler)
- and we can detect if the model generates hallucinations using the transcription word timestamps
So with this we have
=> a massive speed gain (x5)
=> way lower VRAM usage (because the huge transcription gets removed from VRAM, also we unload the translation model if used)
=> way more solid synthesis via verification (reducing hallucinations and strange artifacts generation by retrying synthesis)

We can now voiceturn a 20 min video on a 8GB VRAM in ~33 min
added fades at start and end of the synthesis since it gets trimmed, so we don't clip
autostart finished video after rendering

Assets 2

05 Dec 18:59

KoljaB

v0.0.22

a6dc8a2

v0.0.22

can translate now
cleaner cli (takes IDs and -u not needed anymore - this was my facebook "the" moment)

turnvoice RK91Ji6GCZ8

Assets 2

05 Dec 14:10

KoljaB

v0.0.20

2946941

v0.0.20

improved sync

We now trim silence out of the synthesized audio before starting the voice speed matching algorithm. Coqui engine inserts ~0.3s (varies with speed) of silent audio at the end of the synthesis. That messed a bit with the transcription timestamps before and this upgrade made a good step good towards better synced results.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: KoljaB/TurnVoice

v0.0.7

v0.0.65

v0.0.60

v0.0.50

v0.0.45

v0.0.41

v0.0.40

v0.0.30

v0.0.22

v0.0.20