diff --git a/src/User_Manual/transcribe.html b/src/User_Manual/transcribe.html index 8719d42f..f2364bfd 100644 --- a/src/User_Manual/transcribe.html +++ b/src/User_Manual/transcribe.html @@ -1,125 +1,575 @@ + - - Settings - + + + Whisper Transcription + - -
-

Transcribe Audio

-
+ -
-

Transcribe Audio

-

My program now includes the ability to transcribe various types of audio files and the resulting .txt file will - be saved to the Docs_for_DB folder to be included when you create your database - enabling you to search audio! - To add multiple audio files to the vector database, simply choose multiple files and process them separately.

+
+

Whisper Transcription

+ +
+ +
+ +

Overview

+ +

My program uses powerful Whisper models for transcription in two ways:

+
    +
  1. Transcribe your question to the clipboard and then paste it to the question box; and
  2. +
  3. Within the "Tools" tab, transcribe entire audio files to be put into the vector database.
  4. +
-

Setting

-

The "translate" checkbox and the "language" pulldown menu are placeholders until I add that functionality in the next release. - However, the timestamps checkbox works fine. All of the other settings are identical to the settings for the voice transcriber - when asking the LLM a question. However, you can choose different settings here than those in the "Settings" tab for the voice - transcriber; they update the config.yaml file differently.

-

I highly recommend using gpu-acceleration if available (it'll be displayed as an option if available). Unlike the voice - transcription functionality, which typically transcribes a short question, processing audio files are usually a lot longer.

+

Transcribe Question

+ +

The start and stop recording button transcribe your voice to the clipboard, which you can then simply paste + into the question box for the LLM. The quality of the transcription can be controlled from the "Settings" tab. + The available Whisper model sizes and quantizations are automatically populated based on your system's capabilities. + You can read below what these settings exactly mean.

+ +

However, feel free to use gpu or cpu acceleration since the Whisper models are immediately unloaded from memory after + the transcription is complete in order to conserve system memory. Remember to click update settings whenever you change + the settings, however.

+ +

Transcribe Audio Files for Database

+ +

The "Tools" tab includes a new feature to transcribe audio files of any length and put a .txt file in the folder + holding the files to put into the vector database. Once the transcription is complete, it will automatically put the file there. + Remember, you must re-created the database anytime you want to add/remove a file from it.

+ +

I highly recommend using GPU-acceleration if available since transcribing an audio file takes a lot longer than a simple question. + The settings for transcribing an audio file are separate than the voice-transcribe settings. Therefore, changing one does not + change the other. Read more below about the Whisper models and quantization to ensure you're getting the most out of this + powerful new feature. And batch processing is coming in the future.

+ +

Whisper Models and Quants

+ +

English versus Non-English Models

-

Currently, whenever you change the quant, model size, or compute "Device," you must click "Update Settings" before - starting the transcription.

+

Use models ending in .en if you speak English. The large-v2 model doesn't come in an + English-specific variant becauase it's just good at everything.

-

Upcoming Improvements

-

Add functionality to process multiple files in batch by selecting a single directory

+

My Recommendations

-

Add functionality to process multiple files at once (multiple "workers") while batch processing (vram intensive).

+

The size of the model is most important factor followed by quantization. For transcribing your questions, I recommend + small.en with float32 for everyday usage (using CPU). Regarding transcribing an audio file, always + use GPU if available. I also recommend using as large of a Whisper model as possible and then either a quantization of + float32, float16, or bfloat16 so you don't have to re-transcribe it. Also, don't + forget to check the timestamps option if you want.

-

Support the new Whisper large-v3 model released a few days ago.

+

If you're trying to transcribe a file using your CPU, it heavily depends on your CPU, but be aware that using the CPU is + about 500x slower than even a mediocre GPU. Even then, I wouldn't recommend going below small/small.en because + there is a significant jump in quality between base and small. -

Support the new Distil-Whisper models that use half VRAM and compute requirements, but only do English.

- -
+

Additional Info

+ +

Below is a table of all Whisper models I've quantized as well, and below that, a primer about floating point formats + and quantization!

+ +

Available Whisper Models and Quants

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
QuantizationSize on Disk
whisper-tiny.en-ct2-int8_bfloat1642.7 MB
whisper-tiny.en-ct2-int8_float1642.7 MB
whisper-tiny-ct2-int8_bfloat1643.1 MB
whisper-tiny-ct2-int8_float1643.1 MB
whisper-tiny.en-ct2-int845.4 MB
whisper-tiny.en-ct2-int8_float3245.4 MB
whisper-tiny-ct2-int845.7 MB
whisper-tiny-ct2-int8_float3245.7 MB
whisper-base.en-ct2-int8_bfloat1678.4 MB
whisper-base.en-ct2-int8_float1678.4 MB
whisper-base-ct2-int8_bfloat1678.7 MB
whisper-base-ct2-int8_float1678.7 MB
whisper-tiny.en-ct2-bfloat1678.8 MB
whisper-tiny.en-ct2-float1678.8 MB
whisper-tiny-ct2-bfloat1679.1 MB
whisper-tiny-ct2-float1679.1 MB
whisper-base.en-ct2-int882.4 MB
whisper-base.en-ct2-int8_float3282.4 MB
whisper-base-ct2-int882.7 MB
whisper-base-ct2-int8_float3282.7 MB
whisper-base.en-ct2-bfloat16148.5 MB
whisper-base.en-ct2-float16148.5 MB
whisper-base-ct2-bfloat16148.8 MB
whisper-base-ct2-float16148.8 MB
whisper-tiny.en-ct2-float32154.4 MB
whisper-tiny-ct2-float32154.7 MB
whisper-small.en-ct2-int8_bfloat16249.8 MB
whisper-small.en-ct2-int8_float16249.8 MB
whisper-small-ct2-int8_bfloat16250.2 MB
whisper-small-ct2-int8_float16250.2 MB
whisper-small.en-ct2-int8257.3 MB
whisper-small.en-ct2-int8_float32257.3 MB
whisper-small-ct2-int8257.7 MB
whisper-small-ct2-int8_float32257.7 MB
whisper-base.en-ct2-float32293.7 MB
whisper-base-ct2-float32294.0 MB
whisper-small.en-ct2-bfloat16486.8 MB
whisper-small.en-ct2-float16486.8 MB
whisper-small-ct2-bfloat16487.1 MB
whisper-small-ct2-float16487.1 MB
whisper-medium.en-ct2-int8_bfloat16775.8 MB
whisper-medium.en-ct2-int8_float16775.8 MB
whisper-medium-ct2-int8_bfloat16776.1 MB
whisper-medium-ct2-int8_float16776.1 MB
whisper-medium.en-ct2-int8788.2 MB
whisper-medium.en-ct2-int8_float32788.2 MB
whisper-medium-ct2-int8788.5 MB
whisper-medium-ct2-int8_float32788.5 MB
whisper-small.en-ct2-float32970.4 MB
whisper-small-ct2-float32970.7 MB
whisper-medium.en-ct2-bfloat161.5 GB
whisper-medium.en-ct2-float161.5 GB
whisper-medium-ct2-bfloat161.5 GB
whisper-medium-ct2-float161.5 GB
whisper-large-v2-ct2-int8_bfloat161.6 GB
whisper-large-v2-ct2-int8_float161.6 GB
whisper-large-v2-ct2-int81.6 GB
whisper-large-v2-ct2-int8_float321.6 GB
whisper-medium.en-ct2-float323.1 GB
whisper-medium-ct2-float323.1 GB
whisper-large-v2-ct2-bfloat163.1 GB
whisper-large-v2-ct2-float163.1 GB
whisper-large-v2-ct2-float326.2 GB
+ + +

Introduction to Floating Point Formats

+ +
+ Floating Point +
+ +

Running an embedding or a large language model requires a lot of math calculations and computers don't understand + decimals (1,2,3) like you and me. Rather, they represent numbers with a series of ones and zeros called "bits." + In general, the more bits used means higher quality but also higher VRAM/RAM and compute power needed. + With that being said, the quality also depends on how many of the bits are "exponent" versus "fraction."

+ +

The phrase "Floating point format" refers to the total number of bits used and how many are "exponent" versus "fraction." + The three most common floating point formats are shown above. Notice that both Float16 and Bfloat16 use 16 bits but + a different number of "exponent" versus "fraction" bits.

+ +

"Exponent" bits essentially determine the "range" of numbers that a neural network can use when doing math. + For example, Float32 has 8 "exponent" bits so hypothetically this allows the neural network to use any integer + between one and one-hundred - its "range is 1-100. Bfloat16 would have the same "range" because it also has + 8 "exponent" bits. However, since Float16 only has 5 "exponent" bits its "range" might only be 1-50.

+ +

+ +

"Fraction" bits essentially determine the number of unique values that can be used within that "range." + For example, Float32 has 23 "fraction" bits so hypothetically it can use every whole number between 1-100 when doing math. + Since Bfloat16 only has 7 "fraction" bits, it might only have 25 unique values within 1-100. + This is also referred to as the "precision" of a neural network.

+ +

These are hypotheticals and the actual ranges and precisions are summarized in this table:

+ + + + + + + + + + + + + + + + + + + + + + +
Floating Point FormatRange (Based on Exponent)Discrete Values (Based on Fraction)
float32~3.4×10388,388,608
float16±65,5041,024
bfloat16~3.4×1038128
+ +

The "range" and "precision" both determine the "quality" of an output, but in different ways. + In general, different floating point formats are good for different purposes. For example, Google, which created + Bfloat16, found that it was better for neural networks while Float16 was better for scientific calculations.

+ +

You can see the floating point format of the various embedding models used in my program by looking at the + "config.json" file for each model.

+ +
+

What is Quantization?

+ +

"Quantization" refers to converting the original floating point format to one with a smaller "range" and "precision." + Projects like LLAMA.CPP and AutoGPTQ do this with slightly different algorithms. The overall goal is to reduce + the memory and computational power needed while only suffering a "reasonable" loss in quality. + Specific "quantizations" like "Q8_0" or "8-bit" refer to the "floating point format" of "int8." + (Technically, "int8" is no longer "floating" but you don't need to delve into the nuances of this to understand + the basic concepts I'm trying to communicate.)

+ +

Here is the range and precision for "Int8," which is clearly less:

+ + + + + + + + + + + +
Floating Point FormatRange (Based on Exponent)Discrete Values (Based on Fraction)
int8-128 to 127±127 (within integer range)
- +
+