Skip to content

rproskuryakov/triton-sentence-transformer-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

triton-sentence-transformer-tutorial

Deploying a Sentence Transformer with Triton Inference Server

This repository contains all the necessary code and scripts to deploy a huggingface retrieval model such as multilingual-e5-large using NVIDIA's Triton Inference Server. The guide covers every step from model export, configuration, and optimization to deploying the model on Triton for high-performance inference.

Contents

  • Model conversion scripts (e.g., ONNX or TensorFlow to Triton format)
  • Configuration files (config.pbtxt)
  • Docker setup for Triton Server
  • Load testing code for sending inference requests

A complete step-by-step guide is available in my detailed blog post: Deploying a Sentence Transformer with Triton Inference Server. This post explains the deployment process and how to use the files provided in this repository.

How to Use

Clone this repository:

git clone [email protected]:rproskuryakov/triton-sentence-transformer-tutorial.git
cd triton-sentence-transformer-tutorial

Follow the instructions in the guide to set up Triton and deploy your model.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published