- 🎉 What's New
- 📚 Introduction
- ⚙️ Installation
- ✏️ Tutorial
- 🪄 Applications
- 💻 Overview of Benchmark and Model Zoo
- 📖 Document
- ❓ FAQ
- 🧾 License
- 🎯 Reference
✨v0.1.0 First release on March 10, 2024:
- JetYOLO is born!
This project leverages the DeepStream toolkit along with NVIDIA's CUDA and TensorRT to effortlessly create real-time streaming analytics applications for a broad range of scenarios. Aimed at lowering the barrier to entry for developers, it offers a lightweight, intuitive platform that simplifies the entire development cycle of DeepStream applications, encompassing model deployment, inference, and TensorRT optimization.
The purpose of designing this framework is to streamline the application development process, enabling developers to concentrate on application construction rather than complex coding tasks. Given the similarity in technical requirements across most application scenarios, such as boundary checking, which is applicable to various contexts and domains, the framework also supports quick functionality migration and allows for minor modifications to adapt an application to different domains.
Considering the extensive coding and the complex debugging and optimization processes involved in most TensorRT constructions, we have developed an easy-to-use, one-stop TensorRT application framework XTRT. This framework enables rapid construction of TensorRT engines and facilitates multi-level optimizations, including improvements in model precision, speed, plugin functionality, as well as modifications and quantization of ONNX models. It also offers a comprehensive suite of optimization tools, detailed documentation, and abundant example codes to help developers get started and implement effectively.
Key features include:
- Ease of Use: Reduces project complexity, enabling quick startup and rapid development. Each component follows a modular design for further decoupling, allowing every tool and application to be independently utilized, meeting specific project requirements.
- Comprehensive Toolkit: Includes all necessary tools for developing high-performance inference applications in edge computing, offering a suite from model export to modification, quantization, deployment, and optimization, ensuring a smooth development process and efficient model operation.
- High-Performance Inference: We've also developed a high-efficiency inference framework, xtrt, based on NVIDIA TensorRT and CUDA, integrated with NVIDIA Polygraph, ONNX GraphSurgeon, and the PPQ quantization tool, among others. This framework features comprehensive model modification, quantization, and performance analysis tools for easy and quick debugging and optimization.
- Practical Case Studies: Provides multiple real-world examples demonstrating the framework's applicability and effectiveness, with minor adjustments needed to fit a wide range of application scenarios.
The JetYOLO project workflow is streamlined and consists of just three steps:
- Start with a Pre-trained Model: Obtain a pre-trained model from common training frameworks and export it as an ONNX file.
- Build the TensorRT Engine with X-TRT: Import the ONNX model into our X-TRT, a lightweight inference tool, to construct a TensorRT engine. Within XTRT, you have the flexibility to customize and modify the ONNX model file, use tools for model quantization scripts to quantize the model, and employ performance analysis tools to test accuracy and optimize the model.
- Integrate with DeepStream for Application Development: Configure the exported Engine file from X-TRT into the DeepStream configuration files for further application development, such as people flow detection, to create tailored applications.
Click to expand to read the detailed Docker environment configuration.
We recommend deploying with Docker for the quickest project startup. Docker images for both X86 architecture and NVIDIA Jetson ARM architecture are provided.
docker build -f docker/[dockerfile]
If you prefer to manually configure the environment, please continue reading the section below.
- JetPack SDK >= v5.0.2
- DeepStream >= v6.1
- gstreamer1.0
- nlohmann/json.hpp
For more details, please see FAQ.
Click to expand to read the detailed environment configuration.
To build the JetYOLO
components, you will first need the following software packages.
TensorRT
- TensorRT >= v8.5
DeepStream
- DeepStream >= v6.2
GStreamer
System Packages
-
Recommended versions:
- cuda-12.2.0 + cuDNN-8.8
- cuda-11.8.0 + cuDNN-8.8
-
GNU make >= v4.1
-
cmake >= v3.11
-
python >= v3.8, <= v3.10.x
-
pip >= v19.0
-
Essential utilities
Pytorch(Optional)
- You need the CUDA version of PyTorch. If your device is Jetson, please refer to the Jetson Models Zoo for installation.
If you have completed the above environment setup, you can proceed with the following steps. Building the Basic Inference Framework, the following code is located in scripts/run.sh
:
git clone --recurse-submodules https://github.com/gitctrlx/JetYOLO.git
cmake -S . -B build \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CUDA_ARCHITECTURES=72 \
-DBUILD_XRT=ON \
-DBUILD_NVDSINFER_CUSTOM_IMPL=ON \
-DBUILD_TOOLS_POLYGON_DRAW=ON \
-DBUILD_APPS_DS_YOLO_DETECT=ON \
-DBUILD_APPS_DS_YOLO_LPR=ON \
-DBUILD_APPS_DS_YOLO_TRACKER=ON
cmake --build build
Configure your build with the following options to tailor the setup to your needs:
-DCMAKE_BUILD_TYPE=Release
: Sets the build type to Release for optimized performance.-DCMAKE_CUDA_ARCHITECTURES=72
: Specify the CUDA compute capability (sm) of your host (Jetson Xavier NX: 72
).-DBUILD_XRT=ON
: Enables the build of xtrt, our lightweight, high-performance inference tool.-DBUILD_NVDSINFER_CUSTOM_IMPL=ON
: Determines whether to compile the DeepStream plugin for app applications.-DBUILD_TOOLS_POLYGON_DRAW=ON
: Controls the inclusion of the bounding box drawing tool in theapp/ds_yolo_tracker
application.-DBUILD_APPS_DS_YOLO_DETECT=ON
: Determines whether to build theapp/ds_yolo_detect
application.-DBUILD_APPS_DS_YOLO_LPR=ON
: Determines whether to build theapp/ds_yolo_lpr
application.-DBUILD_APPS_DS_YOLO_TRACKER=ON
: Determines whether to build theapp/ds_yolo_tracker
application.
If you are unsure about your CUDA SM version, you can run
xtrt/tools/cudasm.sh
to check. For more details, please see FAQ.We recommend enabling all options for the build. If you encounter errors during compilation, you can selectively disable some options to troubleshoot, or feel free to submit an issue to us. We are more than happy to assist in resolving it.
(Optional) If you would like to use the complete set of tools developed in Python, please install the following:
python3 -m pip install xtrt/requirements.txt
Data is used for calibration during quantization. We plan to use the COCO val dataset for model quantization calibration work. Place the downloaded val2017 dataset in the xtrt/data/coco
directory.
xtrt\
└── data
└── coco
├── annotations
└── val2017
Please read the 🔖 Model Zoo section for downloading. If you want to quickly start with the examples below, you can skip this step, as the xtrt/weights
folder in the cloned repository contains a yolov5s
ONNX model with EfficientNMS plugin
.
Once the dataset is ready, the next step is to construct the engine. Below is an example for building a YOLOv5s TensorRT engine, with the corresponding code located in scripts/build_engine.sh
:
./build/xtrt/build \
"./xtrt/weights/yolov5s_trt8.onnx" \ # ONNX Model File Path
"./xtrt/engine/yolo.plan" \ # TensorRT Engine Save Path
"int8" \ # Quantization Precision
3 \ # TRT Optimization Level
1 1 1 \ # Dynamic Shape Parameters
3 3 3 \
640 640 640 \
640 640 640 \
550 \ # Calibration Iterations
"./xtrt/data/coco/val2017" \ # Calibration Dataset Path
"./xtrt/data/coco/filelist.txt" \ # Calibration Image List
"./xtrt/engine/int8Cache/int8.cache" \ # Calibration File Save Path
true \ # Timing Cache Usage
false \ # Ignore Timing Cache Mismatch
"./xtrt/engine/timingCache/timing.cache"# Timing Cache Save Path
For a detailed analysis of the code's parameters, please see the detailed documentation.
Verify the engine: Executing Inference(xtrt's inference demo)
Note: Run the demo to test if the engine was built successfully.
- demo-1: Inferencing a single image using the built YOLO TensorRT engine. The following code is located in
scripts/demo_yolo_det_img.sh
:
./build/xtrt/yolo_det_img \
"./xtrt/engine/yolo_trt8.plan" \ # TensorRT Engine Save Path
"./xtrt/media/demo.jpg" \ # Input Image Path
"./xtrt/output/output.jpg"\ # Output Image Path
2 \ # Pre-processing Pipeline
1 3 640 640 # Input Model Tensor Values
- demo-2: Inferencing a video using the built YOLO TensorRT engine. The following code is located in
scripts/demo_yolo_det_video.sh
:
./build/xtrt/yolo_det \
"./xtrt/engine/yolo_trt8.plan" \ # TensorRT Engine Save Path
"./xtrt/media/c3.mp4" \ # Input Video Path
"./xtrt/output/output.mp4"\ # Output Video Path
2 \ # Pre-processing Pipeline
1 3 640 640 # Input Model Tensor Values
Then you can find the output results in the xtrt/output
folder.
Note: It is recommended to directly run the script or copy the code within the script for execution, rather than copying and running the code with comments included above:
chmod 777 ./scripts/demo_yolo_det_img.sh # Grant execution permission to the script. ./scripts/demo_yolo_det_img.sh
For a detailed analysis of the code's parameters, please see the detailed documentation.
Next, you can use DeepStream to build end-to-end, AI-driven applications for analyzing video and sensor data.
Quick Start
- You can quickly launch a DeepStream application using deepStream-app:
Before running the code below, please make sure that you have built the engine file using xtrt
, meaning you have completed the section 3. Building the Engine.
deepstream-app -c deepstream_app_config.txt
Note: If you wish to start directly from this step, please ensure that you have completed the following preparations:
First, you need to modify the
deepstream_app_config.txt
configuration file by updating the engine file path to reflect your actual engine file path. Given that the engine is built within xtrt, you will find the engine file within thextrt/engine
directory. In addition to this, it is crucial to verify that the path to your plugin has been properly compiled. By default, the plugin code resides in thenvdsinfer_custom_impl
folder, while the compiled plugin.so
files can be found in thebuild/nvdsinfer_custom_impl
directory.
- Alternatively, you can run the following code to view an example of the detection inference:
./build/apps/ds_yolo_detect/ds_detect file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
Note:
The command to run is:
./build/apps/ds_yolo_tracker/ds_tracker_app [Your video file path or RTSP stream URL]
Display Contents:
- The top left corner shows the current frame's pedestrian and vehicle count.
- Detected individuals and vehicles within the frame will be marked with bounding boxes.
This example is based on the app/ds_yolo_detect
directory, showcasing its processing pipeline as illustrated below:
Upon running the application, you can view the output stream on players like VLC by entering: rtsp://[IP address of the device running the application]:8554/ds-test
. This allows you to see:
**Note:**The streamed video output can be viewed on any device within the same local network.
We also provide some example applications created with deepstream
, located in the app
folder.
This feature enables real-time tracking and boundary detection for individuals and vehicles using a single video stream. The application utilizes DeepStream for efficient processing.
This example is based on the app/ds_yolo_tracker
directory, showcasing its processing pipeline as illustrated below:
To view an inference example, execute the following command:
./build/apps/ds_yolo_tracker/ds_tracker_app file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
Usage:
./build/apps/ds_yolo_tracker/ds_tracker_app [Your video file path or RTSP stream URL]
Display Features:
- The top-left corner shows the total count of pedestrians and vehicles that have passed.
- At the center is a boundary detection box; vehicles crossing this area are highlighted with a red bounding box.
Upon running the application, you can view the output stream on players like VLC by entering: rtsp://[IP address of the device running the application]:8554/ds-test
. This allows you to see:
**Note:**The streamed video output can be viewed on any device within the same local network.
This application extends the capabilities of the single-stream inference application to support simultaneous processing and analysis of multiple video streams. It enables efficient monitoring and boundary detection for individuals and vehicles across several feeds, leveraging NVIDIA DeepStream for optimized performance.
This example is based on the app/ds_yolo_tracker
directory, showcasing its processing pipeline as illustrated below:
To run the application with multiple video feeds, use the following command syntax:
./build/apps/ds_yolo_tracker/ds_tracker_app_multi file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
Usage:
./build/apps/ds_yolo_tracker/ds_tracker_app_multi [Video file path or RTSP stream URL 1] [Video file path or RTSP stream URL 2] [...]
- note: After compilation, the current program only supports input from two stream addresses. If you wish to facilitate input from more streams, you will need to modify the corresponding code. For details, please refer to the detailed documentation.
Display Features: The application provides a unified display that incorporates elements from all the processed streams.
- Overall Counts: The top-left corner of each video feed display shows the total count of pedestrians and vehicles that have passed within that specific stream.
- Boundary Detection Box: A boundary detection box is presented at the center of each video feed. Vehicles crossing this predefined area in any of the streams are immediately highlighted with a red bounding box to signify a boundary violation.
Upon running the application, you can view the output stream on players like VLC by entering: rtsp://[IP address of the device running the application]:8554/ds-test
. This allows you to see:
**Note:**The streamed video output can be viewed on any device within the same local network.
The DeepStream application offers a comprehensive solution for detecting and recognizing license plates in real-time.
This example is based on the app/ds_yolo_lpr
directory, showcasing its processing pipeline as illustrated below:
Note: Before you run the command below, you need to prepare two stages of TensorRT engine files for the first stage of detection and the second stage of detection, with the corresponding code located in scripts/build_lpr_engine.sh
:
./scripts/build_lpr_engine.sh
To launch the license plate detection and recognition feature, use the following command:
./build/apps/ds_yolo_lpr/ds_lpr [file or rtsp]
Usage:
./build/apps/ds_yolo_lpr/ds_lpr [Your video file path or RTSP stream URL]
Display Features:
- The number displayed in the top-left corner of the screen indicates the total count of license plates detected in the current frame.
- License plates within the frame are enclosed by detection boxes, and when the plate content is fully recognized, the plate number will be displayed above the detection box. The confidence level of the recognition result is shown on the right side of the detection box.
Upon running the application, you can view the output stream on players like VLC by entering: rtsp://[IP address of the device running the application]:8554/ds-test
. The application displays the detected license plates and their recognized characters.
**Note:**The streamed video output can be viewed on any device within the same local network.
PS: The video is sourced from the internet. Should there be any copyright infringement, please notify for removal.
This functionality is based on the NVIDIA-AI-IOT lab's three-stage license plate detection project at deepstream_lpr_app, with modifications for enhanced performance.
We have also provided a flowchart for the three-stage license plate detection and recognition process as follows. For a detailed analysis, please refer to the detailed documentation.
We are committed to continuously enriching and expanding our content library by introducing more exciting and practical cases. In doing so, our goal is not only to increase the diversity and practicality of the content but also to continually enhance user experience and satisfaction. Each new case is aimed at providing fresh insights, skills, or solutions to help users better understand and apply relevant knowledge. We believe that through this ongoing effort and refinement, we can create a resource library that is both rich and practical, meeting the growing needs of our broad user base!!!
- Face Detection and Pose Recognition Project Initiatives: For facial detection and behavior recognition, detecting falls of individuals.
- Food Safety Inspection: Monitoring kitchen and food preparation to ensure the accuracy of product preparation and assembly steps.
- Livestock Management: Used by herders to conveniently manage bred livestock through the use of drones combined with detection and tracking technology.
- Forest Monitoring: Determining the location, diameter, and volume of each tree, suitable for drones and smart agriculture.
Note:
We are setting out to develop practical applications for face detection and pose recognition by building upon the foundation laid by exemplary works, namely DeepStream-Yolo-Face and DeepStream-Yolo-Pose. Our objective includes devising compelling applications such as detecting human falls.
Additionally, we plan to integrate these solutions with our XTRT inference engine. The integration aims at enhancing the performance of the Yolo-Face and Yolo-Pose TensorRT engines through plugin-based optimizations for smoother and more efficient inference. We are open to new ideas and invite contributions and suggestions to further enrich our project.
Leveraging MMYOLO's comprehensive suite of pre-trained models, we have utilized its provided pre-trained models to convert into TensorRT engines at fp16 precision
, incorporating the TensorRT8-EfficientNMS
plugin. This process was aimed at evaluating the accuracy and speed of inference on the COCO val2017 dataset
under these conditions.
The following graph displays the benchmarks achieved using MMYOLO on an NVIDIA Tesla T4 platform:
The evaluation results above are from the MMYOLO model under FP16 precision. The "TRT-FP16-GPU-Latency(ms)" refers to the GPU compute time for model forwarding only on the NVIDIA Tesla T4 device using TensorRT 8.4, with a batch size of 1, testing shape of 640x640 (for YOLOX-tiny, the testing shape is 416x416).
**Note:**In practical tests, we found that on the Jetson platform, due to differences in memory size, there might be some impact on the model's accuracy. This is because TensorRT requires sufficient memory during the engine construction phase to test certain strategies. Across different platforms, there could be an accuracy loss of about
0.2%-0.4%
.
For convenience, you can use the YOLO series ONNX models we have uploaded to HuggingFace, please refer to the doc/model_zoo.md
document.
You can download the ONNX model of your choice from the following link: https://huggingface.co/CtrlX/JetYOLO/tree/main
Place the downloaded ONNX model files into the following folder:
xtrt\
└── weights
If you wish to convert PyTorch models to ONNX format yourself, please refer to the doc/model_convert.md
document.
**Note:**The models we have uploaded to HuggingFace are exported to ONNX from MMYOLO's pre-trained models and are available in two formats: One is an end-to-end ONNX that has added the
EfficientNMS
node fromTensorRT8
, and the other is a pure model part that has removed the decode part (including three output results). For detailed content, please see thedoc/model_convert.md
document. You can use the ONNX model that has addedEfficientNMS
, or use the model that has removed the decode part and manually add plugins for acceleration. The related code can be found inxtrt/tools/modify_onnx
.
For more detailed tutorials about the project, please refer to the detailed documentation.
Please refer to the FAQ for frequently asked questions.
This project is released under the GPL 3.0 license.
This project references many excellent works from predecessors, and some useful repository links are provided at the end.