DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker #1792

omarelshehy · 2024-04-20T14:41:33Z

Description

I followed the recipe given here to manually convert teknium/OpenHermes-2.5-Mistral-7B to tensorrt on sagemaker's ml.g5.4xlarge and deploy the compiled model saved on s3 on sagemaker endpoint using ml.g5.2xlarge (only cpu and ram are different). When i invoke the endpoint simply using

import boto3
import json 

runtime = boto3.client("sagemaker-runtime")

endpoint_name = "djl-trtllm-endpoint"
content_type = "application/json"
payload = json.dumps({"inputs": "hey", "parameters": {}})

response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=content_type,
    Body=payload)

I receive the following error log:

Error Message

[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.26.0/djl_python/rolling_batch/rolling_batch.py", line 189, in try_catch_handling
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    return func(self, input_data, parameters)
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.26.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 80, in inference
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    generation = trt_resp.fetch()
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/detoknized_triton_repsonse.py", line 69, in fetch
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    self.decode_token(), len(self.all_input_ids), complete)
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/detoknized_triton_repsonse.py", line 45, in decode_token
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    new_text = self.tokenizer.decode(
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3750, in decode
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    return self._decode(
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 625, in _decode
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:TypeError: argument 'ids': 'list' object cannot be interpreted as an integer

I assume the error is coming from giving a list of lists to the _tokenizer.decode function instead of just a list of input_ids. Can someone help me understand why this happens ?

The text was updated successfully, but these errors were encountered:

lanking520 · 2024-05-16T00:20:11Z

could you share which DJLServing or LMI version you are using?

omarelshehy added the bug Something isn't working label Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker #1792

DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker #1792

omarelshehy commented Apr 20, 2024

lanking520 commented May 16, 2024

DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker #1792

DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker #1792

Comments

omarelshehy commented Apr 20, 2024

Description

Error Message

lanking520 commented May 16, 2024