-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache mechanism to sherpa tts #1732
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution! Will review it. |
android/SherpaOnnxTtsEngine/app/src/main/java/com/k2fsa/sherpa/onnx/tts/engine/MainActivity.kt
Show resolved
Hide resolved
@@ -32,6 +33,9 @@ struct OfflineTtsConfig { | |||
// If you set it to -1, then we process all sentences in a single batch. | |||
int32_t max_num_sentences = 1; | |||
|
|||
// Path to cache_directory | |||
std::string cache_dir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create a new config for cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove cache_dir by calling the constructor from within GetOfflineTtsConfig. Is it acceptable?
If not, please give more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, OfflineTts has a config called OfflineTtsConfig.
You can create a struct called CacheMechanismConfig and put cache_dir in it. In this way, you don't need to add a new field to OfflineTtsConfig. You also don't need to add new methods to OfflineTts. (Note you need to wrap CacheMechanism and its config to Kotlin via jni)
Remember that sherpa-onnx currently supports 12 programming languages. If we change the config in C++, we also need to update APIs of other programming languages.
You can create a smart pointer of CacheMechanism and pass it to the constructor of OfflineTts.
If later we need to extend the CacheMechanism, we can leave OfflineTts and OfflineTtsConfig untouched. We only need to change CacheMechanism or/and CacheMechanismConfig.
By the way, since the CacheMechanism is related to tts, can you rename it to OfflineTtsCacheMechanism?
In general, the filename of a class contains the class name, though there are dashes in the filename but not in the class name
Methods of CacheMechanism don't need to be repeated in OfflineTts.
sherpa-onnx/csrc/offline-tts.h
Outdated
// Return the maximum number of cached audio files size | ||
int32_t CacheSize() const; | ||
|
||
// Set the maximum number of cached audio files size | ||
void SetCacheSize(const int32_t cache_size); | ||
|
||
// Remove all cache data | ||
void ClearCache(); | ||
|
||
// To get total used cache size(for wav files) in bytes | ||
int64_t GetTotalUsedCacheSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these methods from OfflineTtts.
You can pass a smart pointer of CacheMechanism
to the constructor of OffineTts.
(You can add an overload constructor for OfflineTts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not get how.
You mean I directly call functions in CacheMechanism from jni functions themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. CacheMechanism is constructed outside of OfflineTts.
OfflineTts is only for converting text to speech. Anything related to cache should be put in a separate class, i.e., please don't repeat the methods of CacheMechanism in OfflineTts.
OfflineTtsConfig config_; | ||
std::unique_ptr<OfflineTtsImpl> impl_; | ||
std::unique_ptr<CacheMechanism> cache_mechanism_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OfflineTtsConfig config_; | |
std::unique_ptr<OfflineTtsImpl> impl_; | |
std::unique_ptr<CacheMechanism> cache_mechanism_; | |
std::unique_ptr<OfflineTtsImpl> impl_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get it. If CacheMechanism is to be added to the constructor of OffineTts, aren't these lines still needed?
Please use You can refer to sherpa-onnx/scripts/check_style_cpplint.sh Lines 19 to 26 in 66e02d8
You can use
to install clang-format and use it to auto-format your code. |
Thank you, I will correct tomorrow. |
Yes, I agree. By the way, you can even set a default cache size. Users don't need to set the cache size through the UI. |
Another proposal: whatever we do in the model, there still remains some words that are not spelled correctly(like keyboard letters). Good to add them permanently in the cache so that they be read fron the cache instead of the model? |
About the cache slider, I think it is good to keep it for a while. Then rethink based on user feedback. I know it is good between 20 and 200 MB but need feedback to fix it(maybe not a single optimal value based on phone). I also think the user may be happier when knowing about the cach size and would accept the extra memory usage easier. Also better for privacy concerns to see the slider. |
Ok, it is fine with me. |
Please make the cache do only 1 thing. You can create a separate PR to add it. |
done. |
Delay is a major drawback of sherpa tts on android phones, which makes it unusable for average blind people using average phones. By caching the most frequent tts requests, sherpa is made as fast as lightning on most of the time, using much less cpu and battery. Some blind individials have tested this upgrade and have been happy :)
Considered guide-lines: