how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? #58

StephennFernandes · 2024-04-08T12:49:56Z

Hey @shashikg great repo and cheers to the insane efforts in building this repo.

I have a finetuned whisper model (both in original openai and HF formats ) which I want to use in TensorRT backend using WhisperS2T. While I figured out how to load official whisper models, i was wondering how could i convert whisper models to TensorRT and load them using WhisperS2T

StephennFernandes · 2024-04-08T18:48:23Z

for some reference i did refer the TensorRT-LLM repo's whisper example but upon loading the model from the path like this:

model = whisper_s2t.load_model(model_identifier="/app/TRT_whisper/whisper_large_v3", backend='TensorRT-LLM')

i get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/__init__.py", line 44, in load_model
    return WhisperModel(model_identifier, **model_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 81, in __init__
    trt_build_args = load_trt_build_config(self.model_path)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/engine_builder/__init__.py", line 77, in load_trt_build_config
    with open(f'{output_dir}/trt_build_args.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/app/TRT_whisper/whisper_large_v3/trt_build_args.json'

aleksandr-smechov · 2024-04-09T23:12:36Z

The WhisperS2T code generates a trt_build_args JSON here, so you'll need to generate that if you're using the official example. Also note that while you can use TensorRT-LLM's example, you need to change the tensorrt_llm version in the requirements.txt file here to whatever WhisperS2T's using.

StephennFernandes · 2024-04-09T23:29:53Z

@aleksandr-smechov hey thanks for replying, could you please let me know how do i generate the trt_build_args JSON. i had a finetuned whisper model from HF converted back to openai format. which then i use the tensorRT_llm converters built script.

python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin  --use_bert_attention_plugin --enable_context_fmha

which just gives me the whisper model in TensorRT_LLM format.

is there a way in WhisperS2T code that i can generate the trt_build_args Json file.

aleksandr-smechov · 2024-04-09T23:57:07Z

I completely removed that requirement from WhisperS2T code personally, but you can "fake" it by running WhisperS2T normally, finding the cached directory where these files are stored, and adjusting the JSON to your needs. Also remember to rename the encoder and decoder engines from the official example to encoder.engine and decoder.engine.

StephennFernandes · 2024-04-10T00:02:33Z

@aleksandr-smechov , thanks a ton for your help. i would try this out you said.
just one clarification, when you say, "I completely removed that requirement from WhisperS2T code personally" do you mean you have a personal repo/fork of WhisperS2T that doesn't have these requirements. if yes, could you please link it.

aleksandr-smechov · 2024-04-10T00:18:29Z

Sure, you can compare the WhisperModelTRT implementation in WhisperS2T here to the implementation here.

StephennFernandes · 2024-04-10T00:56:56Z

I completely removed that requirement from WhisperS2T code personally, but you can "fake" it by running WhisperS2T normally, finding the cached directory where these files are stored, and adjusting the JSON to your needs. Also remember to rename the encoder and decoder engines from the official example to encoder.engine and decoder.engine.

hey i did as you said, i used the trt_model_args.json file and placed it into my TRT_model dir, as well as replaced the .engine files to encoder.engine and decoder.engine.

the problem is i get the following error:

TypeError: ModelConfig.__init__() missing 2 required positional arguments: 'max_batch_size' and 'max_beam_width'

despite the max_batch_size and max_beam_width existing in the trt_model_args.json even with me explicitly setting this args the issue still persists.

the following is the code of explicitly setting the args:

model = whisper_s2t.load_model(model_identifier="/app/TRT_whisper/whisper_large_v3", backend='TensorRT-LLM', max_batch_size=24,max_beam_width=1)

the following is the trt_model_args.json file:

{"max_batch_size": 24, "max_beam_width": 1, "max_input_len": 4, "max_output_len": 448, "world_size": 1, "dtype": "float16", "quantize_dir": "quantize/1-gpu", "use_gpt_attention_plugin": "float16", "use_bert_attention_plugin": null, "use_context_fmha_enc": false, "use_context_fmha_dec": false, "use_gemm_plugin": "float16", "use_layernorm_plugin": false, "remove_input_padding": false, "use_weight_only_enc": false, "use_weight_only_dec": false, "weight_only_precision": "int8", "int8_kv_cache": false, "debug_mode": false, "cuda_compute_capability": [8, 6], "output_dir": "/root/.cache/whisper_s2t/models/trt/large-v3/c55664fdf5b447062c4cd7a0b64b72fc", "model_path": "/root/.cache/whisper_s2t/models/trt/large-v3/pt_ckpt.pt"}

StephennFernandes · 2024-04-12T05:30:54Z

I completely removed that requirement from WhisperS2T code personally, but you can "fake" it by running WhisperS2T normally, finding the cached directory where these files are stored, and adjusting the JSON to your needs. Also remember to rename the encoder and decoder engines from the official example to encoder.engine and decoder.engine.

hey i did as you said, i used the trt_model_args.json file and placed it into my TRT_model dir, as well as replaced the .engine files to encoder.engine and decoder.engine.

the problem is i get the following error:
TypeError: ModelConfig.__init__() missing 2 required positional arguments: 'max_batch_size' and 'max_beam_width'
despite the max_batch_size and max_beam_width existing in the trt_model_args.json even with me explicitly setting this args the issue still persists.

the following is the code of explicitly setting the args:
model = whisper_s2t.load_model(model_identifier="/app/TRT_whisper/whisper_large_v3", backend='TensorRT-LLM', max_batch_size=24,max_beam_width=1)
the following is the trt_model_args.json file:
{"max_batch_size": 24, "max_beam_width": 1, "max_input_len": 4, "max_output_len": 448, "world_size": 1, "dtype": "float16", "quantize_dir": "quantize/1-gpu", "use_gpt_attention_plugin": "float16", "use_bert_attention_plugin": null, "use_context_fmha_enc": false, "use_context_fmha_dec": false, "use_gemm_plugin": "float16", "use_layernorm_plugin": false, "remove_input_padding": false, "use_weight_only_enc": false, "use_weight_only_dec": false, "weight_only_precision": "int8", "int8_kv_cache": false, "debug_mode": false, "cuda_compute_capability": [8, 6], "output_dir": "/root/.cache/whisper_s2t/models/trt/large-v3/c55664fdf5b447062c4cd7a0b64b72fc", "model_path": "/root/.cache/whisper_s2t/models/trt/large-v3/pt_ckpt.pt"}

@aleksandr-smechov what seems possibly wrong that the following error is triggered ? despite me explicitly even adding the args and the args being present in the json file.

aleksandr-smechov · 2024-04-12T19:21:35Z

@StephennFernandes I believe I encountered the same issue before and overcame it by adding these args here.

StephennFernandes · 2024-04-13T19:53:10Z

@aleksandr-smechov thanks for the heads up,it really means a lot, i was able to fix this issue. by refactoring in 2 places.
(let me know in case i could submit a PR for this, on how to port a custom OAI model )

but the model is stuck and hang up with a new issue.

1. editing the decoder_model_config in the model.py and explicitly adding the 2 args: max_batch_size and max_beam_width

            num_heads=self.decoder_config['num_heads'],
            num_kv_heads=self.decoder_config['num_heads'],
            hidden_size=self.decoder_config['hidden_size'],
            vocab_size=self.decoder_config['vocab_size'],
            num_layers=self.decoder_config['num_layers'],
            gpt_attention_plugin=self.decoder_config['gpt_attention_plugin'],
            remove_input_padding=self.decoder_config['remove_input_padding'],
            cross_attention=self.decoder_config['cross_attention'],
            has_position_embedding=self.decoder_config['has_position_embedding'],
            has_token_type_embedding=self.decoder_config['has_token_type_embedding'],
            max_batch_size=self.decoder_config["max_batch_size"],
            max_beam_width=self.decoder_config["max_beam_width"],
        )

2. i had to pull the tokenizer.json file from HF transformers into the dir where my tensorrt_llm model file was saved.

post editing all this now the model is stuck / hangup and following are the terminal logs.

Transcribing:   0%|                                                                                                              | 0/100 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/app/TRT_whisper/inference.py", line 8, in <module>
    out = model.transcribe_with_vad(files,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/__init__.py", line 171, in transcribe_with_vad
    res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 235, in generate_segment_batched
    result = self.model.generate(features,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 185, in generate
    output_ids = self.decoder.generate(decoder_input_ids,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 146, in generate
    output_ids = self.decoder_generation_session.decode(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 789, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2993, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2642, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2334, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!
double free or corruption (out)
[user-DSA7TGX-424R:46643] *** Process received signal ***
[user-DSA7TGX-424R:46643] Signal: Aborted (6)
[user-DSA7TGX-424R:46643] Signal code:  (-6)
[user-DSA7TGX-424R:46643] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x70d5e560d520]
[user-DSA7TGX-424R:46643] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x70d5e56619fc]
[user-DSA7TGX-424R:46643] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x70d5e560d476]
[user-DSA7TGX-424R:46643] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x70d5e55f37f3]
[user-DSA7TGX-424R:46643] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x89676)[0x70d5e5654676]
[user-DSA7TGX-424R:46643] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0xa0cfc)[0x70d5e566bcfc]
[user-DSA7TGX-424R:46643] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0xa2e70)[0x70d5e566de70]
[user-DSA7TGX-424R:46643] [ 7] /lib/x86_64-linux-gnu/libc.so.6(free+0x73)[0x70d5e5670453]
[user-DSA7TGX-424R:46643] [ 8] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x1a43d22)[0x70d5ccf96d22]
[user-DSA7TGX-424R:46643] [ 9] /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch_cpu.so(+0x1a4dd54)[0x70d5ccfa0d54]
[user-DSA7TGX-424R:46643] [10] /lib/x86_64-linux-gnu/libc.so.6(+0x45495)[0x70d5e5610495]
[user-DSA7TGX-424R:46643] [11] /lib/x86_64-linux-gnu/libc.so.6(on_exit+0x0)[0x70d5e5610610]
[user-DSA7TGX-424R:46643] [12] /lib/x86_64-linux-gnu/libc.so.6(+0x29d97)[0x70d5e55f4d97]
[user-DSA7TGX-424R:46643] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x70d5e55f4e40]
[user-DSA7TGX-424R:46643] [14] python(_start+0x25)[0x559e80281f25]
[user-DSA7TGX-424R:46643] *** End of error message ***

for some additional context i am running all of this on NVIDIA A6000, and i am using the TensorRT version 9.2.0.5.

StephennFernandes · 2024-04-14T23:45:45Z

@aleksandr-smechov @shashikg could this be a version mismatch?

as i have built the whisper model to TensorRT using the TensorRT version 9.2.0.5 and whisperS2T expects its own TRT version.

i tried building my whisper model on the WhisperS2T official docker image, but i get the following error when builing whisper to TRT format.

python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin  --use_bert_attentio
n_plugin --enable_context_fmha
[TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301Traceback (most recent call last):
 File "/app/TensorRT-LLM/examples/whisper/build.py", line 27, in <module>
   from tensorrt_llm.models.modeling_utils import QuantConfig
ImportError: cannot import name 'QuantConfig' from 'tensorrt_llm.models.modeling_utils' (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py)

aleksandr-smechov · 2024-04-14T23:49:11Z

@StephennFernandes that's correct, you'd need to build the TRT model using the same version of TensorRT-LLM as WhisperS2T uses.

StephennFernandes · 2024-04-14T23:52:14Z

@aleksandr-smechov

I tried, but unable to build.

i am facing the following error.

python3 build.py --output_dir whisper_large_v3 --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attentio n_plugin --enable_context_fmha [TensorRT-LLM] TensorRT-LLM version: 0.8.0.dev2024012301Traceback (most recent call last): File "/app/TensorRT-LLM/examples/whisper/build.py", line 27, in <module> from tensorrt_llm.models.modeling_utils import QuantConfig ImportError: cannot import name 'QuantConfig' from 'tensorrt_llm.models.modeling_utils' (/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py)

does WhisperS2T have a different build structure/format ?

I mean to say a totally different build script ? The official build script for whisper, from tensorRT_LLM's repo doesn't work. the following error is from that build script.

StephennFernandes · 2024-04-15T02:42:41Z

@aleksandr-smechov
UPDATE:
i found the following conversion script inside the engine_builder's init function link
python3 -m whisper_s2t.backends.tensorrt.engine_builder.builder --output_dir=./model_export_path --log_level=error

so i made a dir, placed the .pt model file, hf tokenizer and the trt_build_args.json file into the dir (edited the trt_build_args.json files, output_dir and model_path paths to the current output dir) and launched the script as above.

but now the inference code still crashes with a new error.

  File "/workspace/whispers2t_inference.py", line 10, in <module>
    out = model.transcribe_with_vad(files,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/__init__.py", line 171, in transcribe_with_vad
    res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 235, in generate_segment_batched
    result = self.model.generate(features,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 185, in generate
    output_ids = self.decoder.generate(decoder_input_ids,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 146, in generate
    output_ids = self.decoder_generation_session.decode(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 755, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2891, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2548, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2231, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

now i have even built the model on the same official whisperS2T docker image. so it doesn't seem like a TRT versioning issue.

the following is the entire stack track of the error:

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Transcribing:   0%|                                                                                                                                | 0/100 [00:00<?, ?it/s][04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::setInputShape::2278] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2278, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:09] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
[04/15/2024-04:09:10] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2991] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2991, condition: allInputDimensionsSpecified(routine) )
Transcribing:   0%|                                                                                                                                | 0/100 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/workspace/whispers2t_inference.py", line 12, in <module>
    out = model.transcribe_with_vad(files,
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/__init__.py", line 171, in transcribe_with_vad
    res = self.generate_segment_batched(mels.to(self.device), prompts, seq_len, seg_metadata)
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/model.py", line 235, in generate_segment_batched
    result = self.model.generate(features,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 185, in generate
    output_ids = self.decoder.generate(decoder_input_ids,
  File "/usr/local/lib/python3.10/dist-packages/whisper_s2t/backends/tensorrt/trt_model.py", line 146, in generate
    output_ids = self.decoder_generation_session.decode(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 755, in wrapper
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2891, in decode
    return self.decode_regular(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2548, in decode_regular
    should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, logits, encoder_input_lengths = self.handle_per_step(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/runtime/generation.py", line 2231, in handle_per_step
    raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

i tried to move the TRT model files to the cache dir where "whisper-v3" is saved internally. upon replacing and running the code as if i would be running the regular model, the inference works. but adding the path doesn't

nicolas-docto · 2024-04-15T13:16:46Z

Hi, also very interested on how to integrate a custom finetuned Whisper to whisper_s2t in TensorRT-LLM. Thanks a lot.
I have been updating a forked of this amazing repo, but still struggling to integrate a fine-tuned whisper (in HF format)

StephennFernandes · 2024-04-17T17:46:27Z

@aleksandr-smechov @shashikg

i tried to move the TRT model files to the cache dir where "whisper-v3" is saved internally.

upon replacing and running the code as if i would be running the regular model, the inference works. but adding the path doesn't

i cannot clearly get it, what could be the issue here ... seems like the issue only gets triggered when the model is called from a path

aleksandr-smechov · 2024-04-17T18:38:33Z

Hi @StephennFernandes awesome to hear that it's working for you. As you mentioned, it might be a path issue. I did some major refactoring for my library so it didn't come up as an issue.

eschmidbauer · 2024-09-24T19:12:06Z

is it possible to update the TensorRT version to support newer models?
example https://huggingface.co/yuekai/whisper_large_v3_trtllm_triton

eschmidbauer · 2024-10-01T13:38:44Z

running into this same issue trying to convert whisper-v3-turbo

StephennFernandes · 2024-10-12T20:17:57Z

@eschmidbauer any luck trying to convert v3-turbo ?

eschmidbauer · 2024-10-18T12:52:04Z

no. the TensorRT-LLM support in WhisperS2T is a bit out of date. Last i checked the latest version was 0.14
This is pretty useful example of using latest TensorRT-LLM

StephennFernandes changed the title ~~how to convert the official whisper or HF whisper model to TensorRT based backend ?~~ how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? Apr 8, 2024

shashikg self-assigned this Apr 16, 2024

shashikg added the enhancement New feature or request label Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? #58

how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? #58

StephennFernandes commented Apr 8, 2024

StephennFernandes commented Apr 8, 2024

aleksandr-smechov commented Apr 9, 2024 •

edited

Loading

StephennFernandes commented Apr 9, 2024

aleksandr-smechov commented Apr 9, 2024 •

edited

Loading

StephennFernandes commented Apr 10, 2024

aleksandr-smechov commented Apr 10, 2024

StephennFernandes commented Apr 10, 2024

StephennFernandes commented Apr 12, 2024

aleksandr-smechov commented Apr 12, 2024 •

edited

Loading

StephennFernandes commented Apr 13, 2024

StephennFernandes commented Apr 14, 2024

aleksandr-smechov commented Apr 14, 2024

StephennFernandes commented Apr 14, 2024 •

edited

Loading

StephennFernandes commented Apr 15, 2024 •

edited

Loading

nicolas-docto commented Apr 15, 2024

StephennFernandes commented Apr 17, 2024

aleksandr-smechov commented Apr 17, 2024

eschmidbauer commented Sep 24, 2024

eschmidbauer commented Oct 1, 2024

StephennFernandes commented Oct 12, 2024

eschmidbauer commented Oct 18, 2024

how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? #58

how to convert a custom whisper in openai format or HF whisper model to TensorRT based backend ? #58

Comments

StephennFernandes commented Apr 8, 2024

StephennFernandes commented Apr 8, 2024

aleksandr-smechov commented Apr 9, 2024 • edited Loading

StephennFernandes commented Apr 9, 2024

aleksandr-smechov commented Apr 9, 2024 • edited Loading

StephennFernandes commented Apr 10, 2024

aleksandr-smechov commented Apr 10, 2024

StephennFernandes commented Apr 10, 2024

StephennFernandes commented Apr 12, 2024

aleksandr-smechov commented Apr 12, 2024 • edited Loading

StephennFernandes commented Apr 13, 2024

StephennFernandes commented Apr 14, 2024

aleksandr-smechov commented Apr 14, 2024

StephennFernandes commented Apr 14, 2024 • edited Loading

StephennFernandes commented Apr 15, 2024 • edited Loading

nicolas-docto commented Apr 15, 2024

StephennFernandes commented Apr 17, 2024

aleksandr-smechov commented Apr 17, 2024

eschmidbauer commented Sep 24, 2024

eschmidbauer commented Oct 1, 2024

StephennFernandes commented Oct 12, 2024

eschmidbauer commented Oct 18, 2024

aleksandr-smechov commented Apr 9, 2024 •

edited

Loading

aleksandr-smechov commented Apr 9, 2024 •

edited

Loading

aleksandr-smechov commented Apr 12, 2024 •

edited

Loading

StephennFernandes commented Apr 14, 2024 •

edited

Loading

StephennFernandes commented Apr 15, 2024 •

edited

Loading