Skip to content
This repository has been archived by the owner on Aug 28, 2023. It is now read-only.

Multilingual model with spanish #19

Open
AlejandroLanaspa opened this issue Feb 9, 2023 · 10 comments
Open

Multilingual model with spanish #19

AlejandroLanaspa opened this issue Feb 9, 2023 · 10 comments

Comments

@AlejandroLanaspa
Copy link

I have been trying to follow https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb
to generate a multilingual model that I can use for the android app with spanish detection.

However, when doing so, I was constantly getting the error 'TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType', which I could solve by adding to the model the forced_decoder_ids. Now, this works in the notebook, however, when trying to use it in the android app, I am constantly getting the following error message:

E/tflite: gather index out of bounds
E/tflite: Node number 35 (GATHER) failed to invoke.
E/tflite: Node number 694 (WHILE) failed to invoke.
E/ANR_LOG: >>> msg's executing time is too long
E/ANR_LOG: Blocked msg = { when=-2s941ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.view.View$PerformClick } , cost = 2832 ms
E/ANR_LOG: >>>Current msg List is:
E/ANR_LOG: Current msg <1> = { when=-2s940ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.view.View$UnsetPressedState }
E/ANR_LOG: Current msg <2> = { when=-2s830ms what=3 target=android.media.AudioRecord$NativeEventHandler }
E/ANR_LOG: Current msg <3> = { when=-2s727ms barrier=9 }
E/ANR_LOG: Current msg <4> = { when=-2s645ms what=3 target=android.view.GestureDetector$GestureHandler }
E/ANR_LOG: >>>CURRENT MSG DUMP OVER<<<
I/Quality: Blocked msg = Package name: com.whisper.android.tflitecpp [ schedGroup: 5 schedPolicy: 0 ] process the message: { when=-2s942ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.view.View$PerformClick } took 2833 ms
E/com.whisper.android.tflitecpp.MainActivity$WavAudioRecorder: Error occured in updateListener, recording is aborted
W/System.err: java.io.IOException: write failed: EBADF (Bad file descriptor)
W/System.err: at libcore.io.IoBridge.write(IoBridge.java:654)
W/System.err: at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:546)
W/System.err: at java.io.RandomAccessFile.write(RandomAccessFile.java:559)
W/System.err: at com.whisper.android.tflitecpp.MainActivity$WavAudioRecorder$1.onPeriodicNotification(MainActivity.java:250)
W/System.err: at android.media.AudioRecord$NativeEventHandler.handleMessage(AudioRecord.java:2216)
W/System.err: at android.os.Handler.dispatchMessage(Handler.java:106)
W/System.err: at android.os.Looper.loopOnce(Looper.java:233)
W/System.err: at android.os.Looper.loop(Looper.java:344)
W/System.err: at android.app.ActivityThread.main(ActivityThread.java:8205)
W/System.err: at java.lang.reflect.Method.invoke(Native Method)
W/System.err: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:589)
W/System.err: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1071)
W/System.err: Caused by: android.system.ErrnoException: write failed: EBADF (Bad file descriptor)
W/System.err: at libcore.io.Linux.writeBytes(Native Method)
W/System.err: at libcore.io.Linux.write(Linux.java:296)
W/System.err: at libcore.io.ForwardingOs.write(ForwardingOs.java:951)
W/System.err: at libcore.io.BlockGuardOs.write(BlockGuardOs.java:447)
W/System.err: at libcore.io.ForwardingOs.write(ForwardingOs.java:951)
W/System.err: at libcore.io.IoBridge.write(IoBridge.java:649)
W/System.err: ... 11 more
I/Choreographer: Skipped 163 frames! The application may be doing too much work on its main thread.

I have generated the tflite by changing the following comand on the notebook

class GenerateModel(tf.Module):
  def __init__(self, model):
    super(GenerateModel, self).__init__()
    self.model = model

  @tf.function(
    # shouldn't need static batch size, but throws exception without it (needs to be fixed)
    input_signature=[
      tf.TensorSpec((1, 80, 3000), tf.float32, name="input_features"), 
    ],
  )
  def serving(self, input_features):
    outputs = self.model.generate(
      input_features,
      max_new_tokens = 223,
      return_dict_in_generate=True,
      forced_decoder_ids = [(1, 50262), (2, 50359), (3, 50363)] # ids resulting from processor.get_decoder_prompt_ids(language="spanish", task="transcribe")
    )
    return {"sequences": outputs["sequences"]}
  

What am I doing wrong?

@nyadla-sys
Copy link
Contributor

@nyadla-sys
Copy link
Contributor

nyadla-sys commented Feb 9, 2023

below is the result with the above models using minimal example
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 7 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[SOT]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50358][BEG] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[SOT]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 43 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[SOT]

@nyadla-sys
Copy link
Contributor

Please make sure to use https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.h instead of English vocab binary

@AlejandroLanaspa
Copy link
Author

Thanks for the quick response.
While using https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper-tiny.tflite and https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.bin on the android app it does not crash but the trasncription does not properly for spanish, that was the reason to try to "enforce it" (also, to get rid of the print of [_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG])

Any ideas?

@nyadla-sys
Copy link
Contributor

nyadla-sys commented Feb 9, 2023

Add something like below in the native_lib.cpp of Android APP as well

    if((output_int[i] !=50258)&& (output_int[i] !=50261)&& (output_int[i] !=50359))
        text += whisper_token_to_str(output_int[i]);

as well pls change filters_vocab_gen.bin with https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.bin

int main(int argc, char* argv[]) {
  if ((argc != 2) && (argc != 3)) {
    fprintf(stderr, "'minimal <tflite model>' or 'minimal <tflite model> <pcm_file name>'\n");
    return 1;
  }
  const char* filename = argv[1];
  whisper_filters filters;
  whisper_mel mel;
  struct timeval start_time,end_time;
  std::string word;
  int32_t n_vocab = 0;
  std::string fname = "./filters_vocab_gen.bin";

@nyadla-sys
Copy link
Contributor

We tested with Germany and it is working ,I will try with other language and let you know.

@AlejandroLanaspa
Copy link
Author

Thanks for the trick

if((output_int[i] !=50258)&& (output_int[i] !=50261)&& (output_int[i] !=50359))
        text += whisper_token_to_str(output_int[i]);

As per the filters_vocab_gen.bin, I was already replacing it with the filters_vocab_multilingual.bin (changing its name to filters_vocab_gen.bin )

It seems it does not recognize me speaking in spanish :/

@nyadla-sys
Copy link
Contributor

@AlejandroLanaspa Could you please share the Spanish sample and will test and upload new tflite model which can support spanish language

@AlejandroLanaspa
Copy link
Author

Here is a sample https://datasets-server.huggingface.co/assets/common_voice/--/es/train/99/audio/audio.mp3

Others accessible here https://huggingface.co/datasets/common_voice/viewer/es/train

Thank you very much, and also for the rest of your work, awesome materials!!!

@nyadla-sys
Copy link
Contributor

I created two tflite models for encoder and decoder and it does multilanguage support.
You may have to extend Android app to use two tflite models to perform ASR.
https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants