Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's changing
Add support for both the
text-to-text
and thetext-to-speech
model to be loaded on the GPU.How to test it
Steps to test the changes:
nvidia-smi
that your GPU has loaded the modelsAdditional notes for reviewers
This became more difficult than I expected because the two different frameworks of the models need to have support for the same cuda toolkit. Support for the
text-to-speech
is complete and was easy. Support for thetext-to-text
has proven quite difficult.Some rough benchmarks:
Setup: Count time from the moment you upload the document until the first audio sample of speaker 1 is generated (includes loading both models and running inference once with both models). The
.html
file inexample_data
was used. GPU: RTX 2060CPU:
text-to-text
&text-to-speech
-> 2min, 41secCPU:
text-to-text
GPU:text-to-speech
-> 1min, 27 sec (!)I already...
/docs
)