Roman Belshevitz

Posted on Feb 21

Accelerating OpenCV with CUDA on Jetson Orin NX: A Complete Build Guide

#nvidia #jetson #cuda

What do we have right out of the box?

The NVIDIA Jetson Orin NX is a powerful, community recognized edge AI platform, designed for real-time computer vision and deep learning applications.

While JetPack 5.1.x provides an optimized environment with CUDA, cuDNN, and TensorRT, the default OpenCV package in Ubuntu’s repositories does not take full advantage of the GPU. This means that task such as object detection, video processing, and feature extraction runs primarily on the CPU, significantly limiting performance.

To unlock the full potential of OpenCV on the Orin NX, we need to build it from source with CUDA and cuDNN enabled. This ensures that image processing and deep learning workloads benefit from GPU acceleration, leading to significant speed improvements. In this guide, we will walk through the entire build process, from installing dependencies to verifying a successful installation.

Choosing the Right OpenCV Version

For JetPack 5.1.x on Ubuntu 20.04, the most robust OpenCV versions are 4.5.5 and 4.6.0. These versions have been tested extensively with CUDA 11 and cuDNN, ensuring compatibility and stability. While newer versions, such as 4.7.x and 4.8.x, are available, they may require additional patches and modifications to work seamlessly on Jetson hardware. I recommend sticking with 4.5.5 unless specific features from newer releases are needed.

Building OpenCV with CUDA

Before we start, it's important to remove any pre-installed OpenCV versions that might interfere with our custom build.

The default python3-opencv package from Ubuntu repositories is CPU-only and does not support CUDA acceleration. To avoid conflicts, remove it along with other OpenCV-related packages:

sudo apt remove --purge -y libopencv-dev libopencv-core-dev libopencv-imgproc-dev python3-opencv
sudo apt autoremove -y

After building OpenCV, we must ensure Python correctly loads the CUDA-enabled version. We will set up the PYTHONPATH accordingly.

For the Jetson Orin NX, which uses the Ampere architecture, you should adjust the CUDA_ARCH_BIN to 8.7.

The following Bash script automates the entire process, ensuring a seamless installation of OpenCV with CUDA and Python bindings.

#!/bin/bash

set -e  # Exit on error
set -x  # Debug mode (prints each command)

# Define OpenCV version
OPENCV_VERSION="4.5.5"

# Update system packages
sudo apt update && sudo apt upgrade -y

# Install required dependencies
sudo apt install -y build-essential pv cmake ccache git unzip pkg-config \
    libjpeg-dev libpng-dev libtiff-dev libavcodec-dev libavformat-dev \
    libswscale-dev libv4l-dev v4l-utils libxvidcore-dev libx264-dev \
    libgtk-3-dev libcanberra-gtk3-dev libtbb2 libtbb-dev libdc1394-22-dev \
    python3-dev python3-numpy python3-pip libopenblas-dev liblapack-dev gfortran \
    libhdf5-dev

# Clone OpenCV and contrib modules
cd ~
git clone --branch ${OPENCV_VERSION} https://github.com/opencv/opencv.git
git clone --branch ${OPENCV_VERSION} https://github.com/opencv/opencv_contrib.git

# Create build directory
cd ~/opencv
mkdir -p build && cd build

# Configure CMake with CUDA, cuDNN, and TensorRT
cmake -D CMAKE_BUILD_TYPE=RELEASE \
      -D CMAKE_INSTALL_PREFIX=/usr/local \
      -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \
      -D WITH_CUDA=ON \
      -D CUDA_ARCH_BIN=8.7 \
      -D CUDA_ARCH_PTX="" \
      -D WITH_CUDNN=ON \
      -D OPENCV_DNN_CUDA=ON \
      -D ENABLE_FAST_MATH=ON \
      -D CUDA_FAST_MATH=ON \
      -D WITH_CUBLAS=ON \
      -D WITH_V4L=ON \
      -D WITH_LIBV4L=ON \
      -D WITH_OPENGL=ON \
      -D BUILD_OPENCV_PYTHON3=ON \
      -D BUILD_EXAMPLES=OFF \
      -D BUILD_TESTS=OFF \
      -D BUILD_DOCS=OFF \
      -D BUILD_PERF_TESTS=OFF \ 
      -D CMAKE_C_COMPILER_LAUNCHER=ccache \
      -D CMAKE_CXX_COMPILER_LAUNCHER=ccache ..

# Compile OpenCV using all CPU cores
TOTAL_CPP=$(find ~/opencv ~/opencv_contrib/modules -name "*.cpp" | wc -l)
make -j$(nproc) | pv -lep -s ${TOTAL_CPP}

make -j$(nproc)

# Install OpenCV
sudo make install
sudo ldconfig

# Verify installation
python3 -c "import cv2; print(cv2.getBuildInformation())"

# Ensure Python recognizes the new OpenCV installation
PYTHON_VERSION=$(python3 -c "import sys; print('python'+sys.version[:3])")
echo "export PYTHONPATH=/usr/local/lib/${PYTHON_VERSION}/site-packages:\$PYTHONPATH" >> ~/.bashrc
source ~/.bashrc

echo "✅ OpenCV ${OPENCV_VERSION} built and installed with CUDA!"

Here ccache (Compiler Cache) speeds up compilation by storing previously compiled object files and reusing them when no source code changes occur. This is useful when frequently tweaking settings or rebuilding OpenCV.

What is this all for?

Building OpenCV from source with CUDA support on the Jetson Orin NX is an essential optimization for developers working with real-time image processing and AI applications. By leveraging the GPU for operations such as object detection, background subtraction, and feature extraction, performance can improve by 5-10x compared to CPU-only execution.

This custom-built OpenCV version seamlessly integrates with Python, ensuring that developers can access GPU acceleration without changing their existing OpenCV-based code. Whether you are deploying deep learning models, processing high-resolution video streams, or performing complex computer vision tasks, this optimized OpenCV installation ensures that your Orin NX operates at its maximum potential.

With this approach and using the provided script, developers can streamline the build process and focus on developing CUDA-powered applications that take full advantage of NVIDIA's cutting-edge hardware.

Performance gain

Motion tracking and optical flow are ~7-10x faster with CUDA. Moving object detection gets a 9x boost with CUDA. E.g. from ~12 up to 100 FPS. Recalling relatively simple tasks, it is also worth noting that CUDA provides a 5x–10x speed boost for basic image processing.

Wait, does NVIDIA’s TensorRT or DeepStream need OpenCV?

TensorRT is a library for optimized deep learning inference on GPUs. It does not need OpenCV, as it processes models (YOLO, ResNet, etc.) directly. You interact with TensorRT using Python (tensorrt package) or C++. If you need preprocessing (e.g., image resizing, normalization), OpenCV can help but isn’t required.

DeepStream is a full pipeline for video analytics using TensorRT + GStreamer. If you use DeepStream’s GStreamer-based pipeline, OpenCV is optional. However, if you’re post-processing model output (e.g., drawing bounding boxes), OpenCV can be useful.

🤖🔎👀 Wishing you keen machine vision!

Technical specs sources:
a. https://developer.nvidia.com/cuda-gpus
b. https://developer.download.nvidia.com/assets/embedded/secure/jetson/orin_nx/docs/Jetson_Orin_NX_DS-10712-001_v0.5.pdf
c. https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
d. https://opencv.org/platforms/cuda/

DEV Community

Accelerating OpenCV with CUDA on Jetson Orin NX: A Complete Build Guide

What do we have right out of the box?

Choosing the Right OpenCV Version

Building OpenCV with CUDA

What is this all for?

Performance gain

Wait, does NVIDIA’s TensorRT or DeepStream need OpenCV?

Top comments (0)

Read next

Node.js Single-Threading Limits Vertical Scaling Without Extra Modules

How Does Kafka Consumer Rebalance Work?

Spurtcommerce Version 5.1 is on Live

Implementing the Delegation Pattern with ILogger in 4 steps