Docker Image OpenADK with CUDA support has an issue with CUDA installation #4765
Closed
3 tasks done
Labels
type:bug
Software flaws or errors.
Checklist
Description
Inside ghcr.io/autowarefoundation/autoware-openadk:latest-devel-cuda container
Im trying to use tensorrt_yolox package. The package includes some CUDA kernels which fails to build and shows the following warning:
--- stderr: tensorrt_yolox
CMake Warning at CMakeLists.txt:19 (message):
CUDA is not found. preprocess acceleration using CUDA will not be
available.
It seems that CMake variable CMAKE_CUDA_COMPILER is not set
Then while using tensorrt_yolox for object detection, the system crashes with the following error:
[tensorrt_yolox_node_exe-2] /home/os/elm/autoware/install/tensorrt_yolox/lib/tensorrt_yolox/tensorrt_yolox_node_exe: symbol lookup error: /home/os/elm/autoware/install/tensorrt_yolox/lib/libtensorrt_yolox.so: undefined symbol: _ZN14tensorrt_yolox50resize_bilinear_letterbox_nhwc_to_nchw32_batch_gpuEPfPhiiiiiiifP11CUstream_st
[ERROR] [tensorrt_yolox_node_exe-2]: process has died [pid 977, exit code 127, cmd '/home/os/elm/autoware/install/tensorrt_yolox/lib/tensorrt_yolox/tensorrt_yolox_node_exe --ros-args -r __node:=tensorrt_yolox --params-file /tmp/launch_params_d1ll7q3z --params-file /tmp/launch_params_cq_ya7ic -r ~/in/image:=/fr_camera/image_rect -r ~/out/objects:=roi0'].
The missing symbol is actually a CUDA kernel that failed to build previously.
Expected behavior
Actual behavior
tensorrt_yolox builds with a Warning and skips building the CUDA kernels, which leads to a runtime crash later.
Steps to reproduce
Inside ghcr.io/autowarefoundation/autoware-openadk:latest-devel-cuda container
Versions
No response
Possible causes
After some investigation and trying to build the official CUDA Samples to track the issue, it appeared that some cuda libraries were missing
/usr/bin/ld: cannot find -lcudadevrt
/usr/bin/ld: cannot find -lcudart_static
After applying the following patch and rebuilding the docker image, the cuda kernels were built and object detection model was running well.
Additional context
No response
The text was updated successfully, but these errors were encountered: