Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input format of SherpaOnnxCreateOnlineStreamWithHotwords #1647

Open
EvanParallax opened this issue Dec 25, 2024 · 6 comments
Open

Input format of SherpaOnnxCreateOnlineStreamWithHotwords #1647

EvanParallax opened this issue Dec 25, 2024 · 6 comments

Comments

@EvanParallax
Copy link

Hi there. I'm trying to use SherpaOnnxCreateOnlineStreamWithHotwords from c# code like this

		public OnlineStream CreateStreamWithHotwords(string hotwords)
		{
			return new OnlineStream(SherpaOnnxCreateOnlineStreamWithHotwords(_handle.Handle, hotwords));
		}

		[DllImport("sherpa-onnx-c-api")]
		private static extern IntPtr SherpaOnnxCreateOnlineStreamWithHotwords(IntPtr handle, string hotwords);

I tried to pass as hotwords string hotwords file contents("word0\nword1\n...) or hotwords file path
but it doesn't work. It ignores hotwords. Maybe marshaling error? What is input format of the parameter string hotwords?
Thank you.

@csukuangfj
Copy link
Collaborator

Please see

/** Create a stream for decoding.
*
* @param The hotwords for this string, it might contain several hotwords,
* the hotwords are separated by "/". In each of the hotwords, there
* are cjkchars or bpes, the bpe/cjkchar are separated by space (" ").
* For example, hotwords I LOVE YOU and HELLO WORLD, looks like:
*
* "▁I ▁LOVE ▁YOU/▁HE LL O ▁WORLD"
*/
std::unique_ptr<OnlineStream> CreateStream(const std::string &hotwords) const;

@EvanParallax
Copy link
Author

it's not working yet, can you check is my model config correct?

		private const string Provider = "cpu";
		private const string DecodingMethod = "modified_beam_search";
		private const int EnableEndpoint = 1;
		private const float Rule1MinTrailingSilence = 2.4f;
		private const float Rule2MinTrailingSilence = 0.8f;
		private const float Rule3MinUtteranceLength = 20f;
		private const string ModelingUnit = "bpe";
		private const float HotwordsScore = 4.0f;
		private const string ModelType = "zipformer2";

@csukuangfj
Copy link
Collaborator

Sorry, the information you give is too limited.

Could you post all of the config?

@csukuangfj
Copy link
Collaborator

By the way, I suggest that you first reproduce the example in our document and then try to add your own hotwords.

@EvanParallax
Copy link
Author

ok, i'll try, thank you

@EvanParallax
Copy link
Author

Hello again. I tryed to reproduce the example with those parameters:

First question is how can i detect that it working with hotwords? It recognizes phares "I LOVE YOU" and "HELLO WORLD" with or without setting hotwords. Dunno, maybe it's a matter of accent or pronunciation, english is not my native language.

Second question is what am i doing wrong if i want to pass an abbreviation when creating online stream like "▁K F C", tokens for each separate characters is in tokens file.

Third question is are you going to simplify the api for SherpaOnnxCreateOnlineStreamWithHotwords? I would assume that OnlineRecognizer does the same transformation with the hotwords file to bring it to the form "▁I ▁LOVE ▁YOU/▁HE LL O ▁WORLD" using passed vocab or tokens file. Is it possible to encapsulate this logic inside SherpaOnnxCreateOnlineStreamWithHotwords?

thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants