Oh_My_TensorRT_ToolKit

Background

Introduction

Oh_My_TensorRT_ToolKit is a wrapped toolkit for GPU performance test using various NN models based on TensorRT official source code.

I develop this toolkit only for providing the assistance of GPU performance test when we use the official TensorRT tools. Besides, this toolkit could be highly customed in different versions of TensorRT, please refer to the Contributing part.

Oh_My_TensorRT_ToolKit needs serveral procedures to established, which could be found in Usage part. We implement this toolkit by basicly using the ./trtexec, a TensorRT Command-Line Wrapper provided by NVIDIA TensorRT official samples, in our scripts.

The workload of Oh_My_TensorRT_ToolKit is:

Generate Onnx models from our Pytorch / Tensorflow models. We provide both Pytorch and Tensorflow examples in Examples part.
Generate dynamic engines from the Onnx models by using ./trtexec tool.
Perform inference tests with the engines in different MIG devices and batch sizes.
Format output the throughput / latency statistical results.

As for the generation of onnx model, we provide both Pytorch and Tensorflow examples in Examples part.

GPU Environment

Oh_My_TensorRT_ToolKit should be used on GPU environment, and our standard GPU device and drivers are:

GPU Device: A100-PCIE-40GB with MIG mechanism (temporarily 3 MIG devices)

GPU 0: A100-PCIE-40GB (UUID: GPU-9de3d0e8-33f5-10dc-0c79-2c88a7ab0a23)
  MIG 4g.20gb Device 0: (UUID: MIG-GPU-9de3d0e8-33f5-10dc-0c79-2c88a7ab0a23/2/0)
  MIG 2g.10gb Device 1: (UUID: MIG-GPU-9de3d0e8-33f5-10dc-0c79-2c88a7ab0a23/3/0)
  MIG 1g.5gb Device 2: (UUID: MIG-GPU-9de3d0e8-33f5-10dc-0c79-2c88a7ab0a23/9/0)

More introductions on MIG mechanism could be found in Basic on MIG Tutorial.

Drivers:
1. NVIDIA-SMI 460.80, Driver Version: 460.80
2. CUDA Version: 11.2
3. Pycuda Version: 2021.1
4. TensorRT Version: 8.0.1.6
5. Torch Version: 1.9.0+cu111 (Customed configurations could be found in Pytorch Official)
6. Torchvision Version: 0.10.0+cu111
7. Pytorch-pretrained-bert Version: 0.6.2 (This is for the usage of bert model)
8. Onnx Version: 1.9.0
9. Netron Version: 5.0.0
10. Onnxruntime Version: 1.2.0
11. Tensorflow Version: 2.5.0
12. H5py Version: 3.1.0

Installation

Step 1. TensorRT Installation

Before we start, you should be informed that the codes we provide in our repository are the supplement to the official TensorRT source code. Therefore, we should first download and install TensorRT referred to the Tutorial.

Install pycuda: pip install pycuda==2021.1
Download TensorRT official source code in Download link, you should choose the correct version based on your system and CUDA Version. (In our experiment, we use TensorRT 8.0.1 GA for Linux x86_64 and CUDA 11.3 TAR package)

Install.

# Unzip
tar xvf TensorRT-8.0.1.6.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar
# Install TensorRT
cd TensorRT-8.0.1.6/python
pip install tensorrt-8.0.1.6-cp38-none-linux_x86_64.whl	# Correspond to you python version, cp38 means python3.8
# Install UFF
cd TensorRT-8.0.1.6/uff
pip install uff-0.6.9-py2.py3-none-any.whl
# Install graphsurgeon
cd TensorRT-8.0.1.6/graphsurgeon
pip install graphsurgeon-0.4.5-py2.py3-none-any.whl

Add environment path.

touch .tensorrt_bashrc
nano .tensorrt_bashrc
# Write into it
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cyxue/TensorRT-8.0.1.6/lib
# Every time we establish a session, source it
source .tensorrt_bashrc

Step 2. Oh_My_TensorRT_ToolKit Installation

After we finish the installation of official TensorRT, we begin to install Oh_My_TensorRT_ToolKit based on it.

Git our repository to get the codes:

git clone https://github.com/DicardoX/Oh_My_TensorRT_ToolKit.git

Add the contents in our repository into the main directory of TensorRT official source code. As for ./data, which is already contained in it, we just merge them.
The main directory of Oh_My_TensorRT_ToolKit should look like this (except for README.md and its related files):

Usage

run.sh

The run.sh script is our main worker for this toolkit. Its option configuration is:

bash ./run.sh -m [MODEL_NAME] -b [MAX_BATCH_SIZE] -t [TYPE_MODE] -p -h

MODEL_NAME: the name of model to be tested (chosen in [bert_16, bert_64, bert_128, resnet_50, resnet_101, resnet_152], default = bert_16).
MAX_BATCH_SIZE: max batch size of the chosen model in inference (pow of 2, default = 128).
TYPE_MODE: 0 / 1 (0 for is_training, 1 for is_inference)
[-p]: plot result for inference stage (Path: './output_figs/', 0 for is_plot, 1 for not_plot)
[-h]: help message

Notice that the path of the onnx & engine files are set in the shell or python scripts, feel free to modify it if needed.

Workload

We take bert_16 as our example to show the workload of our Oh_My_TensorRT_ToolKit. Note that bert_16 has a sequence length equals to 16.

Generate onnx model.

cd TensorRT-8.0.1.6/custom_python_samples/tensorrt_bert
python bert_to_onnx_dynamic_seq.py --seq_len 16
# For resnet, the command should be:
# cd TensorRT-8.0.1.6/custom_python_samples/tensorrt_resnet
# python main.py --layer_num [LAYER_NUM for 50, 101 or 152]

The .onnx model should be saved as TensorRT-8.0.1.6/custom_python_samples/tensorrt_bert/onnx/bert_16.onnx.

Generate engine file. (Training process)

cd TensorRT-8.0.1.6
# Training process for bert_16, dynamic range of batch size from 1 to 128
bash ./run.sh -m bert_16 -b 128 -t 0

The .trt engine file should be saved as TensorRT-8.0.1.6/engines/bert_16_dynamic.trt.

Operate multiple inference processes. (Inference process)
```
cd TensorRT-8.0.1.6
# Inference process for bert_16, this should run inference process on each MIG device and each bach size
bash ./run.sh -m bert_16 -b 128 -t 1 -p
```
The performance summary should be saved as TensorRT-8.0.1.6/performance_summary.txt, and the visible plot results should be saved as TensorRT-8.0.1.6/output_figs.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
custom_python_samples		custom_python_samples
cut		cut
data		data
docs		docs
engines		engines
log		log
output_figs		output_figs
.DS_Store		.DS_Store
README.md		README.md
bert_to_onnx_dynamic_seq.py		bert_to_onnx_dynamic_seq.py
curve_plotter.py		curve_plotter.py
format_output.py		format_output.py
performance_summary.txt		performance_summary.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oh_My_TensorRT_ToolKit

Table of Contents

Background

Introduction

GPU Environment

Installation

Step 1. TensorRT Installation

Step 2. Oh_My_TensorRT_ToolKit Installation

Usage

run.sh

Workload

About

Releases

Packages

Languages

DicardoX/Oh_My_TensorRT_ToolKit

Folders and files

Latest commit

History

Repository files navigation

Oh_My_TensorRT_ToolKit

Table of Contents

Background

Introduction

GPU Environment

Installation

Step 1. TensorRT Installation

Step 2. Oh_My_TensorRT_ToolKit Installation

Usage

run.sh

Workload

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages