Hello World

Overview

The Hello World sample demonstrates the fundamental workflow of CV-CUDA for GPU-accelerated computer vision processing. This sample showcases:

  • Load images from disk into CV-CUDA tensors

  • Resize images to a target resolution

  • Batch multiple images into a single tensor

  • Apply GPU-accelerated Gaussian blur

  • Split the batch and save results back to disk

All operations are performed entirely on the GPU without copying data back to the host, demonstrating CV-CUDA’s zero-copy interoperability and efficient batch processing capabilities.

Usage

Run with default parameters (processes tabby_tiger_cat.jpg):

python3 hello_world.py

Process a single image with custom dimensions:

python3 hello_world.py -i input.jpg -o output.jpg --width 512 --height 512

Process multiple images in a single batch:

python3 hello_world.py -i img1.jpg img2.jpg img3.jpg -o out1.jpg out2.jpg out3.jpg

Command-Line Arguments

Argument

Short Form

Default

Description

--inputs

-i

tabby_tiger_cat.jpg

Input image file path(s). Multiple images can be specified.

--outputs

-o

cvcuda/.cache/cat_hw.jpg

Output image file path(s). Must match number of inputs.

--width

224

Target width for resized images

--height

224

Target height for resized images

--kernel

-k

5

Kernel size for Gaussian blur (must be odd)

--sigma

-s

1.0

Sigma value for Gaussian blur

Implementation Details

The Hello World sample follows this processing pipeline:

  1. Image loading (decode from disk using nvImageCodec to CV-CUDA tensors)

  2. Resizing (to target dimensions)

  3. Batching (stack into single batched tensor)

  4. Gaussian blur

  5. Splitting (back to individual tensors)

  6. Saving (encode to disk)

All operations execute entirely on the GPU.

Code Walkthrough

Loading Images

# 1. Load the images into CV-CUDA
decoder = nvimgcodec.Decoder()
images: list[nvimgcodec.Image] = [
    decoder.decode(str(i_path)) for i_path in input_paths
]
tensors: list[cvcuda.Tensor] = [
    cvcuda.as_tensor(image, "HWC") for image in images
]

Images are:

  • Decoded using nvImageCodec’s GPU decoder

  • Converted to CV-CUDA tensors with HWC (Height-Width-Channels) layout

  • Kept in GPU memory throughout

Resizing Images

# 2. Resize the images
resized_tensors: list[cvcuda.Tensor] = [
    cvcuda.resize(
        tensor,
        (args.height, args.width, 3),
        interp=cvcuda.Interp.LINEAR,
    )
    for tensor in tensors
]

Each image is resized to the target dimensions using linear (bilinear) interpolation for smooth results.

Batching Images

# 3. Batch all the images into a single CV-CUDA tensor
batch_tensor: cvcuda.Tensor = cvcuda.stack(resized_tensors)

The cvcuda.stack() operation combines individual HWC tensors into a single NHWC tensor, enabling efficient batched processing.

Applying Gaussian Blur

# 4. Apply a Gaussian blur on the batch
blurred_tensor_batch: cvcuda.Tensor = cvcuda.gaussian(
    batch_tensor,
    (args.kernel, args.kernel),
    (args.sigma, args.sigma),
    cvcuda.Border.CONSTANT,
)

The Gaussian blur operation applies to entire batch simultaneously with constant border handling for edge pixels.

Splitting and Saving

    # 5. Save the images back to disk
    # 5.1 Split the batch into individual images
    tensors: list[cvcuda.Tensor] = zero_copy_split(blurred_tensor_batch)

with timer("Write images to disk"):
    # 5.2 Encode the images back to disk
    encoder = nvimgcodec.Encoder()
    for tensor, output_path in zip(tensors, output_paths):
        nvc_img = nvimgcodec.as_image(tensor.cuda())
        encoder.write(str(output_path), nvc_img)

# 6. Verify output files exist
for output_path in output_paths:
    assert output_path.exists()
    print(f"Wrote image to {output_path}")

The zero_copy_split() helper function splits the batched tensor back to individual images without memory copying.

Expected Output

The output will be a 224×224 image with a 5×5 Gaussian blur applied. The output will be smoothed while preserving major features and colors of the original image.

../../_images/tabby_tiger_cat.jpg

Original Input Image

../../_images/cat_hw.jpg

Output: Resized and Blurred

CV-CUDA Operators Used

Operator

Purpose

cvcuda.resize()

Resize images to target dimensions using bilinear interpolation

cvcuda.stack()

Combine multiple tensors into a batched tensor along a new dimension

cvcuda.gaussian()

Apply Gaussian blur filter for smoothing

Common Utilities Used

Recap

The Hello World sample introduced you to CV-CUDA’s core concepts:

GPU-Only Processing

All operations happened on the GPU. The image was loaded from disk directly to GPU memory, processed entirely on the GPU, and saved from GPU memory back to disk. No CPU-GPU memory copies were needed for the actual image data.

Batching for Efficiency

Images are converted to batch format (NHWC) to enable efficient parallel processing of multiple images.

Zero-Copy Interoperability

CV-CUDA integrates seamlessly with nvImageCodec for image I/O, using the __cuda_array_interface__ protocol for zero-copy data sharing between libraries.

Operator Chaining

The sample chained multiple operations (load → resize → stack → blur → split → save) in a pipeline, showing how CV-CUDA operators work together.

See Also

Next Steps

After mastering the Hello World sample, explore: