Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setup instructions for TensorRT-LLM #789

Closed
wants to merge 8 commits into from
Closed

Conversation

linden-li
Copy link

No description provided.

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will leave approval for Daya or Megha

scripts/inference/README.md Outdated Show resolved Hide resolved
1. Convert an MPT HuggingFace checkpoint into the FasterTransformer format.
2. Build a TensorRT engine with the FasterTransformer weights

Using this engine, you can utilize TensorRT-LLM for fast inference. If you would like to use TensorRT-LLM as an end-to-end solution for an inference service, you can utilize the built engine with an NVIDIA Triton server backend: an example server can be found in [this repository](https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.6.1) accompanying the most recent release.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was "built engine" supposed to be "built-in engine"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase it as "built TRT engine". Also, here again we should drop "most recent release" as suggested by Daniel above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linden-li can you pls make the suggested changes here? also, update TRT LLM link to v0.7.1?

scripts/inference/README.md Show resolved Hide resolved
@dakinggg
Copy link
Collaborator

dakinggg commented Feb 2, 2024

@megha95 can we merge this?

@dakinggg dakinggg closed this Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants