split decoder into spectrogram and vocoder without changing API #851

Yosshi999 · 2024-10-09T16:10:26Z

内容

ストリーミング処理を見据え、decoderからvocoderを切り離す
cf. Hiroshiba/vv_core_inference#28

一部テスト更新してないかも

その他

VOICEVOX#737 に向け。また VOICEVOX#851 の後にdecode.onnx入りのVVMに対応するときも同様に役に立つはず。

qryxip · 2024-10-09T16:30:49Z

crates/voicevox_core/src/synthesizer.rs

CIのうちrust-lintが落ちてるのはcargo fmtで直るはず。

❯ cargo fmt ❯ git diff --stat crates/voicevox_core/src/infer/domains.rs | 5 +++-- crates/voicevox_core/src/synthesizer.rs | 15 ++++++--------- crates/voicevox_core/src/voice_model.rs | 11 ++++++++--- crates/voicevox_core_macros/src/lib.rs | 2 +- 4 files changed, 18 insertions(+), 15 deletions(-)

qryxip · 2024-10-09T16:36:53Z

crates/voicevox_core/src/synthesizer.rs

+            let RunVocoderOutput { wave: output } = self.status.run_session(
+                model_id,
+                RunVocoderInput {
+                    spec: interm,


この分野に詳しくないため"interm"の意味が掴めていないのですが("intermediate"とか?)、typosに引っ掛かっているようです。_typos.tomlに追加すれば除外することができます。

[default.extend-identifiers] NdArray="NdArray" # onnxruntime::session::NdArray +interm="interm"

[追記] ちなみにspecという名前で良いのなら、spec: specはspecと書けたりします。

[追記] なんかtyposにissue立ってた。
crate-ci/typos#1033

VOICEVOX#737 に向け。また VOICEVOX#851 の後にdecode.onnx入りのVVMに対応するときも同様に役に立つはず。

Hiroshiba · 2024-10-09T20:06:03Z

model/sample.vvm/manifest.json

+    "predict_spectrogram_filename": "predict_spectrogram.onnx",
+    "vocoder_filename": "vocoder.onnx",


命名めちゃくちゃ迷いますね！！！！！！！

というのも、ここ以外の他のファイル名は全部「作りたい何か」がある状態なんですよね。
でもここの2分割の目的は作るものを分けるためではなく、「入力全体から中間表現全体を作るもの」と「一部の中間表現から一部の音声を作るもの」に分けたいからなんですよねーーーーーーー。

その結果として、今偶然中間表現がスペクトログラムになってるだけなので、これを命名してしまうと将来名前と実態が合わなくなる可能性があるから、目的の方を名前にしたいなぁと。
もうちょっと局所的に言うと、「スペクトログラム」は避けたい。

でも名前が思いつかない。。。。。。。。。
例えばpredict_global_featureとpartial_decodeとかですが･････featureが良いかというと･･･････。

あるいはstepwise_decodeと･･････なんだろう、polish_decode･･･？
まあ後者はvocodeでも。。。良いかも。。。
（音声に限るとdecodeとの意味の違いが一切ないので、ソングの方とで二種あると混乱しそう）

もうdecode_step1とdecode_step2でも。。。。

でもまあとりあえず名前は何でもいいかも！！！後でいい感じに変更させていただく感じでも！！！
特に思いつかなかったら。。。。decode_global_step1とdecode_partial_step2とかで･･････！！！！

generate_full_intermediate と render_audio_segment でどうですか

そちらの名前素敵だと思います！！！
その2つが良さそうに感じました！！

（以下色々考えたこと）

predictを使わずに新しい動詞を使うべきか考えました。
predictは意味のある値を推論する時に使う･･･････気がする････ので、意味がない中間特徴量を作るのは動詞を分けても良さそう。
renderは気になるところから生成していける雰囲気があるので、動詞としてはかなり良さそう。predictでも良いけど、まあ全体のうち一部を生成する時はrenderと呼ぶ、みたいな感じでも良さそう！
（だからsegmentはつけなくてもいいかも。別につけても良さそう。）

@qryxip さんのprecomputeも良いと思うのですが、この動詞を使うとintermediate が不要になっちゃって、precompute_fullとかglobalとかになり、preなのかfullなのかわからんってなるかもとか思った感じです！！！

まあでもぶっちゃけこの辺の語句はユーザーに露出しないので、割となんでもいいんですけどね！！！

まあでもぶっちゃけこの辺の語句はユーザーに露出しないので、割となんでもいいんですけどね！！！

よくわかっていないのですが、将来的にENGINEレイヤーからAPIを生やす際は別の名前になる感じですかね？

"render"は良い名前だと思います。Web API (ENGINE)やプログラミング言語用API (CORE)の命名レベルでもいけそう。

audio = await synth.generate_full_intermediate(...) wav_seg = await audio.render_segment(...)

よくわかっていないのですが、将来的にENGINEレイヤーからAPIを生やす際は別の名前になる感じですかね？

はい！
ENGINEリポジトリのweb APIも、synthesizerのメソッド名も、別の名前になりうると感じてます！

onnxファイル名はコミッターたちにとって分かりやすいのがベストで、API の名称はAPI ユーザーにとってわかりやすい形がベスト、そしてその2つは違うだろうなと･･･！

今までのdecode.onnxがAPIではsynthesisになってたような感じで。

#737 に向け。また #851 の後にdecode.onnx入りのVVMに対応するときも同様に役に立つはず。

qryxip

LGTM!

Hiroshiba

LGTM！！！！

render_audio_segmentは、audioではなくwaveが良いかも。
audioは（狭義には）「人が聞ける範囲の音」くらいの感じなんですが、実際作ってるのは波形なのでこっちのほうが事実に即してるかな、くらい。
ちなみにvoiceは人が出す音声。

まあリリースするまでに気が向いたら変更するかも、くらいで良いかなと！！！

@qryxip

この本文は @qryxip が記述している。 #851 で生まれた`generate_full_intermediate`と`render_audio_segment`を用いて次の公開APIを作る。`precompute_render`で`AudioFeature`を生成し、 `AudioFeature`と区間指定を引数とした`render`で指定区間のPCMを生成する形。 - `voicevox_core::blocking::Synthesizer::precompute_render` - `voicevox_core::blocking::Synthesizer::render` - `voicevox_core::blocking::AudioFeature` また`render`で生成したPCMをWAVとして組み立てるため、次の公開APIも作る。 - `voicevox_core::wav_from_s16le` ただしこのPRで実装するのはRust APIとPython APIのみ。非同期API、C API、 Java APIについては今後実装する。Python APIのtype stubも今後用意する。またテストも今後書く。 Refs: #853 Co-authored-by: Ryo Yamashita <[email protected]> Co-authored-by: Hiroshiba <[email protected]> Co-authored-by: Nanashi. <[email protected]>

split decoder into spectrogram and vocoder without changing API

ff3d552

qryxip mentioned this pull request Oct 9, 2024

refactor: InferenceDomainMapValuesのインスタンスをマクロで作る #852

Merged

qryxip reviewed Oct 9, 2024

View reviewed changes

Hiroshiba reviewed Oct 9, 2024

View reviewed changes

cargo fmt

c916c29

qryxip added a commit that referenced this pull request Oct 10, 2024

refactor: InferenceDomainMapValuesのインスタンスをマクロで作る (#852)

991fbc8

#737 に向け。また #851 の後にdecode.onnx入りのVVMに対応するときも同様に役に立つはず。

rename functions, formatting

4ec25c1

qryxip approved these changes Oct 11, 2024

View reviewed changes

sevenc-nanashi approved these changes Oct 12, 2024

View reviewed changes

Yosshi999 mentioned this pull request Oct 12, 2024

ストリーミング処理の対応 #853

Closed

Hiroshiba approved these changes Oct 12, 2024

View reviewed changes

qryxip merged commit 4547925 into VOICEVOX:main Oct 12, 2024
30 checks passed

qryxip mentioned this pull request Oct 19, 2024

ストリーミングモードのdecodeを実装（precompute_renderとrender） #854

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split decoder into spectrogram and vocoder without changing API #851

split decoder into spectrogram and vocoder without changing API #851

Yosshi999 commented Oct 9, 2024

qryxip Oct 9, 2024

qryxip Oct 9, 2024 •

edited

Loading

Hiroshiba Oct 9, 2024 •

edited

Loading

Yosshi999 Oct 10, 2024

Hiroshiba Oct 10, 2024 •

edited

Loading

qryxip Oct 11, 2024 •

edited

Loading

Hiroshiba Oct 12, 2024

qryxip left a comment

Hiroshiba left a comment •

edited

Loading

		"predict_spectrogram_filename": "predict_spectrogram.onnx",
		"vocoder_filename": "vocoder.onnx",

split decoder into spectrogram and vocoder without changing API #851

split decoder into spectrogram and vocoder without changing API #851

Conversation

Yosshi999 commented Oct 9, 2024

内容

関連 Issue

その他

qryxip Oct 9, 2024

Choose a reason for hiding this comment

qryxip Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Hiroshiba Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Yosshi999 Oct 10, 2024

Choose a reason for hiding this comment

Hiroshiba Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

qryxip Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Hiroshiba Oct 12, 2024

Choose a reason for hiding this comment

qryxip left a comment

Choose a reason for hiding this comment

Hiroshiba left a comment • edited Loading

Choose a reason for hiding this comment

qryxip Oct 9, 2024 •

edited

Loading

Hiroshiba Oct 9, 2024 •

edited

Loading

Hiroshiba Oct 10, 2024 •

edited

Loading

qryxip Oct 11, 2024 •

edited

Loading

Hiroshiba left a comment •

edited

Loading