Hello World Tutorial

This tutorial will guide you through creating a simple CV-CUDA application that performs basic image processing operations. This is a Python script that will demonstrate the following:

Load a batch of images into CV-CUDA
Resize the images
Apply a Gaussian blur
Save the results
Visualize the results

Prerequisites

NVIDIA GPU with compute capabilities 5.2 or newer.
Ubuntu 20.04, 22.04 or 24.04
CUDA 12 runtime with compatible NVIDIA driver.
Python 3.10
Python packages from samples/hello_world/python/requirements.txt

To run this tutorial, install the required Python packages, preferrably in a virtual environment. This tutorial was writen for Python 3.10, but newer versions of Python 3 may work.

Install the required pip packages listed in the file samples/hello_world/python/requirements.txt :

pip3 install -r requirements.txt

Writing the Hello World App

Find the complete source code for the tutorial in file samples/hello_world/python/hello_world.py

First, let’s import the necessary modules:

1
2import argparse
3import cvcuda
4import cupy as cp
5from nvidia import nvimgcodec
6from matplotlib import pyplot as plt
7
Module argparse is used to implement the command line argument parsing:

Module cvcuda imports CV-CUDA API.

Module cupy is used to get access to numpy interfaces with support for CUDA backend.

Module nvimagecodec is used to load (decode) images from files and decode (store) images to files.

Module pyplot is used to display images.

The main() function contains the logic for the app.

We start by loading all the images from files and stack them as a batch into a CV-CUDA tensor.

# Create the nvimgcodec decoder to load images.
decoder = nvimgcodec.Decoder()

print("Loading images...")

cv_tensors: cvcuda.Tensor = None
img_shape = None
for input_filename in inputs:

    # Open the input file and decode it into an image.
    # nvimgcodec supports jpeg, jpeg2000, tiff, bmp, png, pnm, webp image file formats.
    print(f"Loading image from {input_filename}")
    with open(input_filename, "rb") as in_file:
        data = in_file.read()
    # Decode the loaded image and store in the default CUDA device.
    # nvimgcodec decodes images into RGB uint8 HWC format.
    nv_gpu_img: nvimgcodec.Image = decoder.decode(data).cuda()

    # Wrap an existing CUDA buffer in a CVCUDA tensor.
    # CVCUDA supports (N)HWC image layout only.
    cv_tensor = cvcuda.as_tensor(nv_gpu_img, "HWC")

    # Add loaded image to batch:

    # Check that image sizes are the same.
    if img_shape:
        if img_shape != cv_tensor.shape:
            raise RuntimeError(
                f"All images in input must be of the same size: {img_shape} != {cv_tensor.shape}"
            )
    else:
        img_shape = cv_tensor.shape
    # Pack the loaded tensor into a batch (NHWC).
    cv_tensor = cv_tensor.reshape((1, *cv_tensor.shape), "NHWC")
    cv_tensors = (
        cvcuda.stack([cv_tensors, cv_tensor])
        if cv_tensors
        else cvcuda.stack([cv_tensor])
    )

Here, we use nvimgcodec.Decoder.decode() to decode an image loaded from a file specified in the list of input images into RGB uint8 HWC format, loading it into the default CUDA device.

We convert the loaded nvimgcodec.Image into a cvcuda.Tensor, bringing the data into CV-CUDA.

For each input image, we stack it into a batch in a cvcuda.Tensor, converting it from HWC to NHWC where N is the batch size. In this tutorial, we require the images to be all of the same size (width and height) to fit them into a single cvcuda.Tensor with the NHWC layout.

Note that we can perform the CV-CUDA operations directly on each cvcuda.Tensor as we obtain it without having to batch them. Batching here is used to illustrate how to operate more efficiently on batches of images.

Next we perform the image processing.

# The resulting cv_tensors has the NHWC layout with N = len(inputs).
assert cv_tensors.shape[0] == len(inputs)
print(cv_tensors.shape)

# Manipulate the tensor data in CVCUDA.

# Resize the tensors.
cv_tensors_result = cvcuda.resize(
    cv_tensors,
    (cv_tensors.shape[0], 224, 224, cv_tensors.shape[-1]),  # N, H, W, C
    interp=cvcuda.Interp.LINEAR,
)

# Apply a gaussian blur.
kernel_size = (3, 3)
gaussian_sigma = (1, 1)
cv_tensors_result = cvcuda.gaussian(
    cv_tensors_result, kernel_size, gaussian_sigma, cvcuda.Border.CONSTANT
)

Once the data is in a cvcuda.Tensor, we perform a resize to 224 x 224, followed by a Gaussian blur with a 3 x 3 kernel and a sigma of 1.

Then, we retrieve the results from CV-CUDA and store them to the specified output files.

 1
 2print("Storing images...")
 3
 4# Create the nvimgcodec encoder to store images.
 5encoder = nvimgcodec.Encoder()
 6
 7# Use cupy to separate the tensor batch.
 8# cvcuda.Tensor.cuda() returns the buffer with __cuda_array_interface__.
 9cp_array_result = cp.asarray(cv_tensors_result.cuda())
10# Write each image to storage.
11encoder.write(outputs, [cp_arr for cp_arr in cp_array_result])
We start by wrapping the cvcuda.Tensor into a cupy.array. The cvcuda.Tensor object is opaque for performance purposes. This step grants us the flexibility to access the data contained in each resulting image to store it.

Then, we save the images to the specified files using the nvimgcodec.Encoder.write() method.

Finally, once we have the resulting images wrapped in a cupy.array, we can use pyplot to display them. We display the first image in the batch as an example.

# Use pyplot to display the first result.
print("Displaying the first result...")
plt.imshow(cp_array_result[0].get())
plt.show()

Running the Sample

To run the hello world example, make sure the prerequisites are satisfied.

python3 hello_world.py -i /path/to/image1.jpg /path/to/image2.jpg -o output1.jpg output2.jpg

This will:

Load your input images.
Apply image processing (resize and Gaussian blur).
Save results to output files (existing files will be overwriten).
Display the result of the first image.

Command Line Interface

--inputs, -i is used to input a list of image files to load into the app. These must all be of the same size (width and height). Only images in these formats are supported: jpeg, jpeg2000, tiff, bmp, png, pnm, or webp.
--outputs, -o is used to specify the name of the files where the resulting images will be stored. The number of output files must be the same as the number of input files.

Next Steps

Now that you’ve completed the hello world tutorial, you can:

Try modifying the size values or the Gaussian parameters.
Add more image processing operations.
Explore other CV-CUDA operators.
Check out the more advanced samples in the Samples section.