-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea - NNSVS Support #69
Comments
It would allow UTSU to become kind of the "AI UTAU" for the community where anyone could create AI vocals and make music with them but before working on it i first wanted to know if the creator of utsu and other contributors want to see other vb types than just UTAUs supported as it's possible you only want UTAUs |
I think this is a great idea! Would it generate a voicebank for UTSU's resample, or it also works as a whole resampler? |
This might be helpful: https://note.com/crazy_utau/n/n45db22b33d2c |
ENUNU is separate because it relies on the UTAU plugin api from what i remember it doesn't use the same system at all than utau in reality and relies on the data it can access to via plugins, that's why in the git it specifies the things it couldn't do due to the limitations of the plugin api
it works as its own thing due to the difference of AI synthesis |
Reading about the software, it looks like the task would be then to make UTSU be able to export to MusicXML so NNSVS can interpret it, right? And maybe bundle it in if the license permits. |
it technically could but I think some small editing would be enough for it to directly get the data from UST files, all it needs is the data, the files it gets the data from aren't as important as long as there are deserializers and yes this is one of the goals |
licence wise, nnsvs is MIT, there is 0 problems with bundling it as from what i can see UTSU is under an MIT compatible licence |
Times like this remind me how much UTSU needs its own plugin framework. If I understand correctly, integrating NNSVS with UTSU would have three parts to it. Rendering songs in NNSVS: The easiest way is to write code converting the internal Song object into an NNSVS-readable file, then run NNSVS on that in the background. Using NNSVS voicebanks: I could see UTSU's song editor being tweaked so that it pretends that NNSVS voicebanks are regular UTAU voicebanks on the frontend, but in the backend only renders them with NNSVS. Creating NNSVS voicebanks: Since the format is completely different from UTAU's voicebanks, you'd have to write an entirely new voicebank editor UI. |
Yes, I imagine that the vb creation side would come last since it's not the priority, usage would come first and vb creation second (since creating AI vbs to begin with is much more complex due to the AI training phase) |
I wanna try making a serializer from the Song object to NNSVS. Looks like there's a need to convert it to HTS full-context label files (using Sinsy), and NNSVS makes a MusicXML to label file step to use it. If i'm correct, does anyone knows any document with the specification of the data from HTS label files? (For the little research that I did, it looks like it's easier to use MusicXML as a middle man and use pysinsy to do the final conversion to HTS) |
It's an idea of something i'm thinking about working on but am not sure about the interest.
I have the idea to integrate a secondary engine to utsu named NNSVS which is an engine for AI based vocal synthesis that is open source https://github.com/r9y9/nnsvs it doesn't support the same type of voicebanks as they require to be AI trained but i feel like having one editor to use multiple types of voicebanks would be great, even more for voicebanks of better quality AI trained
NNSVS currently doesn't have compatible editor so i felt like it would also help and bring more people to utsu and nnsvs as well, helping the open source community
The text was updated successfully, but these errors were encountered: