Multilingual model with spanish #19

AlejandroLanaspa · 2023-02-09T07:24:58Z

I have been trying to follow https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/generate_tflite_from_whisper.ipynb
to generate a multilingual model that I can use for the android app with spanish detection.

However, when doing so, I was constantly getting the error 'TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType', which I could solve by adding to the model the forced_decoder_ids. Now, this works in the notebook, however, when trying to use it in the android app, I am constantly getting the following error message:

E/tflite: gather index out of bounds
E/tflite: Node number 35 (GATHER) failed to invoke.
E/tflite: Node number 694 (WHILE) failed to invoke.
E/ANR_LOG: >>> msg's executing time is too long
E/ANR_LOG: Blocked msg = { when=-2s941ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.view.View$PerformClick } , cost = 2832 ms
E/ANR_LOG: >>>Current msg List is:
E/ANR_LOG: Current msg <1> = { when=-2s940ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.view.View$UnsetPressedState }
E/ANR_LOG: Current msg <2> = { when=-2s830ms what=3 target=android.media.AudioRecord$NativeEventHandler }
E/ANR_LOG: Current msg <3> = { when=-2s727ms barrier=9 }
E/ANR_LOG: Current msg <4> = { when=-2s645ms what=3 target=android.view.GestureDetector$GestureHandler }
E/ANR_LOG: >>>CURRENT MSG DUMP OVER<<<
I/Quality: Blocked msg = Package name: com.whisper.android.tflitecpp [ schedGroup: 5 schedPolicy: 0 ] process the message: { when=-2s942ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.view.View$PerformClick } took 2833 ms
E/com.whisper.android.tflitecpp.MainActivity$WavAudioRecorder: Error occured in updateListener, recording is aborted
W/System.err: java.io.IOException: write failed: EBADF (Bad file descriptor)
W/System.err: at libcore.io.IoBridge.write(IoBridge.java:654)
W/System.err: at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:546)
W/System.err: at java.io.RandomAccessFile.write(RandomAccessFile.java:559)
W/System.err: at com.whisper.android.tflitecpp.MainActivity$WavAudioRecorder$1.onPeriodicNotification(MainActivity.java:250)
W/System.err: at android.media.AudioRecord$NativeEventHandler.handleMessage(AudioRecord.java:2216)
W/System.err: at android.os.Handler.dispatchMessage(Handler.java:106)
W/System.err: at android.os.Looper.loopOnce(Looper.java:233)
W/System.err: at android.os.Looper.loop(Looper.java:344)
W/System.err: at android.app.ActivityThread.main(ActivityThread.java:8205)
W/System.err: at java.lang.reflect.Method.invoke(Native Method)
W/System.err: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:589)
W/System.err: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1071)
W/System.err: Caused by: android.system.ErrnoException: write failed: EBADF (Bad file descriptor)
W/System.err: at libcore.io.Linux.writeBytes(Native Method)
W/System.err: at libcore.io.Linux.write(Linux.java:296)
W/System.err: at libcore.io.ForwardingOs.write(ForwardingOs.java:951)
W/System.err: at libcore.io.BlockGuardOs.write(BlockGuardOs.java:447)
W/System.err: at libcore.io.ForwardingOs.write(ForwardingOs.java:951)
W/System.err: at libcore.io.IoBridge.write(IoBridge.java:649)
W/System.err: ... 11 more
I/Choreographer: Skipped 163 frames! The application may be doing too much work on its main thread.

I have generated the tflite by changing the following comand on the notebook

class GenerateModel(tf.Module):
  def __init__(self, model):
    super(GenerateModel, self).__init__()
    self.model = model

  @tf.function(
    # shouldn't need static batch size, but throws exception without it (needs to be fixed)
    input_signature=[
      tf.TensorSpec((1, 80, 3000), tf.float32, name="input_features"), 
    ],
  )
  def serving(self, input_features):
    outputs = self.model.generate(
      input_features,
      max_new_tokens = 223,
      return_dict_in_generate=True,
      forced_decoder_ids = [(1, 50262), (2, 50359), (3, 50363)] # ids resulting from processor.get_decoder_prompt_ids(language="spanish", task="transcribe")
    )
    return {"sequences": outputs["sequences"]}

What am I doing wrong?

The text was updated successfully, but these errors were encountered:

nyadla-sys · 2023-02-09T18:12:36Z

@AlejandroLanaspa Could you please try the below two models on android app,these are multilingual models
https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper-tiny.tflite
https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper-small.tflite

nyadla-sys · 2023-02-09T18:16:10Z

below is the result with the above models using minimal example
mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-tiny.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 7 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich unabhängig von Geschlecht, sexuelle Orientierung, Religion, Hautfarbe oder Geo-Kordinaten der Geburt.[SOT]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-base.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 12 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50358][BEG] For me, all people are equally independent of gender, sex, orientation, religion, hate, or gender coordinates of birth.[SOT]

mycroft@OpenVoiceOS-e3830c:~/whisper $ minimal models/whisper-small.tflite de_speech_thorsten_sample03_8s.wav

n_vocab:50257

mel.n_len3000

mel.n_mel:80
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference time 43 seconds

[_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG] Für mich sind alle Menschen gleich, unabhängig von Geschlecht, sexueller Orientierung, Religion, Hautfarbe oder Geo-Koordinaten der Geburt.[SOT]

nyadla-sys · 2023-02-09T19:09:29Z

Please make sure to use https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.h instead of English vocab binary

AlejandroLanaspa · 2023-02-09T19:25:02Z

Thanks for the quick response.
While using https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper-tiny.tflite and https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.bin on the android app it does not crash but the trasncription does not properly for spanish, that was the reason to try to "enforce it" (also, to get rid of the print of [_extra_token_50258][_extra_token_50261][_extra_token_50359][BEG])

Any ideas?

nyadla-sys · 2023-02-09T19:30:25Z

Add something like below in the native_lib.cpp of Android APP as well

    if((output_int[i] !=50258)&& (output_int[i] !=50261)&& (output_int[i] !=50359))
        text += whisper_token_to_str(output_int[i]);

as well pls change filters_vocab_gen.bin with https://github.com/usefulsensors/openai-whisper/blob/main/models/filters_vocab_multilingual.bin

int main(int argc, char* argv[]) {
  if ((argc != 2) && (argc != 3)) {
    fprintf(stderr, "'minimal <tflite model>' or 'minimal <tflite model> <pcm_file name>'\n");
    return 1;
  }
  const char* filename = argv[1];
  whisper_filters filters;
  whisper_mel mel;
  struct timeval start_time,end_time;
  std::string word;
  int32_t n_vocab = 0;
  std::string fname = "./filters_vocab_gen.bin";

nyadla-sys · 2023-02-09T19:45:43Z

We tested with Germany and it is working ,I will try with other language and let you know.

AlejandroLanaspa · 2023-02-09T19:55:14Z

Thanks for the trick

if((output_int[i] !=50258)&& (output_int[i] !=50261)&& (output_int[i] !=50359))
        text += whisper_token_to_str(output_int[i]);

As per the filters_vocab_gen.bin, I was already replacing it with the filters_vocab_multilingual.bin (changing its name to filters_vocab_gen.bin )

It seems it does not recognize me speaking in spanish :/

nyadla-sys · 2023-02-09T19:57:08Z

@AlejandroLanaspa Could you please share the Spanish sample and will test and upload new tflite model which can support spanish language

AlejandroLanaspa · 2023-02-09T20:42:10Z

Here is a sample https://datasets-server.huggingface.co/assets/common_voice/--/es/train/99/audio/audio.mp3

Others accessible here https://huggingface.co/datasets/common_voice/viewer/es/train

Thank you very much, and also for the rest of your work, awesome materials!!!

nyadla-sys · 2023-02-16T16:45:04Z

I created two tflite models for encoder and decoder and it does multilanguage support.
You may have to extend Android app to use two tflite models to perform ASR.
https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilingual model with spanish #19

Multilingual model with spanish #19

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023 •

edited

Loading

nyadla-sys commented Feb 9, 2023

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023 •

edited

Loading

nyadla-sys commented Feb 9, 2023

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 16, 2023

Multilingual model with spanish #19

Multilingual model with spanish #19

Comments

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023 • edited Loading

nyadla-sys commented Feb 9, 2023

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023 • edited Loading

nyadla-sys commented Feb 9, 2023

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 9, 2023

AlejandroLanaspa commented Feb 9, 2023

nyadla-sys commented Feb 16, 2023

nyadla-sys commented Feb 9, 2023 •

edited

Loading

nyadla-sys commented Feb 9, 2023 •

edited

Loading