Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker #1792

Open
omarelshehy opened this issue Apr 20, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@omarelshehy
Copy link

Description

I followed the recipe given here to manually convert teknium/OpenHermes-2.5-Mistral-7B to tensorrt on sagemaker's ml.g5.4xlarge and deploy the compiled model saved on s3 on sagemaker endpoint using ml.g5.2xlarge (only cpu and ram are different). When i invoke the endpoint simply using

import boto3
import json 

runtime = boto3.client("sagemaker-runtime")

endpoint_name = "djl-trtllm-endpoint"
content_type = "application/json"
payload = json.dumps({"inputs": "hey", "parameters": {}})

response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=content_type,
    Body=payload)

I receive the following error log:

Error Message

[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.26.0/djl_python/rolling_batch/rolling_batch.py", line 189, in try_catch_handling
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    return func(self, input_data, parameters)
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.26.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 80, in inference
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    generation = trt_resp.fetch()
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/detoknized_triton_repsonse.py", line 69, in fetch
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    self.decode_token(), len(self.all_input_ids), complete)
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/detoknized_triton_repsonse.py", line 45, in decode_token
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    new_text = self.tokenizer.decode(
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3750, in decode
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    return self._decode(
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 625, in _decode
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:    text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
[INFO ] PyProcess - W-350-model-stdout: [1,0]<stdout>:TypeError: argument 'ids': 'list' object cannot be interpreted as an integer

I assume the error is coming from giving a list of lists to the _tokenizer.decode function instead of just a list of input_ids. Can someone help me understand why this happens ?

@omarelshehy omarelshehy added the bug Something isn't working label Apr 20, 2024
@lanking520
Copy link
Contributor

could you share which DJLServing or LMI version you are using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants