Hello World

Overview

The Hello World sample demonstrates the fundamental workflow of CV-CUDA for GPU-accelerated computer vision processing. This sample showcases:

Load images from disk into CV-CUDA tensors
Resize images to a target resolution
Batch multiple images into a single tensor
Apply GPU-accelerated Gaussian blur
Split the batch and save results back to disk

All operations are performed entirely on the GPU without copying data back to the host, demonstrating CV-CUDA’s zero-copy interoperability and efficient batch processing capabilities.

Usage

Run with default parameters (processes tabby_tiger_cat.jpg):

python3 hello_world.py

Process a single image with custom dimensions:

python3 hello_world.py -i input.jpg -o output.jpg --width 512 --height 512

Process multiple images in a single batch:

python3 hello_world.py -i img1.jpg img2.jpg img3.jpg -o out1.jpg out2.jpg out3.jpg

Command-Line Arguments

Argument	Short Form	Default	Description
`--inputs`	`-i`	tabby_tiger_cat.jpg	Input image file path(s). Multiple images can be specified.
`--outputs`	`-o`	cvcuda/.cache/cat_hw.jpg	Output image file path(s). Must match number of inputs.
`--width`		224	Target width for resized images
`--height`		224	Target height for resized images
`--kernel`	`-k`	5	Kernel size for Gaussian blur (must be odd)
`--sigma`	`-s`	1.0	Sigma value for Gaussian blur

Implementation Details

The Hello World sample follows this processing pipeline:

Image loading (decode from disk using nvImageCodec to CV-CUDA tensors)
Resizing (to target dimensions)
Batching (stack into single batched tensor)
Gaussian blur
Splitting (back to individual tensors)
Saving (encode to disk)

All operations execute entirely on the GPU.

Code Walkthrough

Loading Images

# 1. Load the images into CV-CUDA
decoder = nvimgcodec.Decoder()
images: list[nvimgcodec.Image] = [
    decoder.decode(str(i_path)) for i_path in input_paths
]
tensors: list[cvcuda.Tensor] = [
    cvcuda.as_tensor(image, "HWC") for image in images
]

Images are:

Decoded using nvImageCodec’s GPU decoder
Converted to CV-CUDA tensors with HWC (Height-Width-Channels) layout
Kept in GPU memory throughout

Resizing Images

# 2. Resize the images
resized_tensors: list[cvcuda.Tensor] = [
    cvcuda.resize(
        tensor,
        (args.height, args.width, 3),
        interp=cvcuda.Interp.LINEAR,
    )
    for tensor in tensors
]

Each image is resized to the target dimensions using linear (bilinear) interpolation for smooth results.

Batching Images

# 3. Batch all the images into a single CV-CUDA tensor
batch_tensor: cvcuda.Tensor = cvcuda.stack(resized_tensors)

The cvcuda.stack() operation combines individual HWC tensors into a single NHWC tensor, enabling efficient batched processing.

Applying Gaussian Blur

# 4. Apply a Gaussian blur on the batch
blurred_tensor_batch: cvcuda.Tensor = cvcuda.gaussian(
    batch_tensor,
    (args.kernel, args.kernel),
    (args.sigma, args.sigma),
    cvcuda.Border.CONSTANT,
)

The Gaussian blur operation applies to entire batch simultaneously with constant border handling for edge pixels.

Splitting and Saving

    # 5. Save the images back to disk
    # 5.1 Split the batch into individual images
    tensors: list[cvcuda.Tensor] = zero_copy_split(blurred_tensor_batch)

with timer("Write images to disk"):
    # 5.2 Encode the images back to disk
    encoder = nvimgcodec.Encoder()
    for tensor, output_path in zip(tensors, output_paths):
        nvc_img = nvimgcodec.as_image(tensor.cuda())
        encoder.write(str(output_path), nvc_img)

# 6. Verify output files exist
for output_path in output_paths:
    assert output_path.exists()
    print(f"Wrote image to {output_path}")

The zero_copy_split() helper function splits the batched tensor back to individual images without memory copying.

Expected Output

The output will be a 224×224 image with a 5×5 Gaussian blur applied. The output will be smoothed while preserving major features and colors of the original image.

../../_images/tabby_tiger_cat.jpg — Original Input Image

../../_images/cat_hw.jpg — Output: Resized and Blurred

CV-CUDA Operators Used

Operator	Purpose
`cvcuda.resize()`	Resize images to target dimensions using bilinear interpolation
`cvcuda.stack()`	Combine multiple tensors into a batched tensor along a new dimension
`cvcuda.gaussian()`	Apply Gaussian blur filter for smoothing

Common Utilities Used

zero_copy_split() - Split batched tensors efficiently

Recap

The Hello World sample introduced you to CV-CUDA’s core concepts:

GPU-Only Processing: All operations happened on the GPU. The image was loaded from disk directly to GPU memory, processed entirely on the GPU, and saved from GPU memory back to disk. No CPU-GPU memory copies were needed for the actual image data.
Batching for Efficiency: Images are converted to batch format (NHWC) to enable efficient parallel processing of multiple images.
Zero-Copy Interoperability: CV-CUDA integrates seamlessly with nvImageCodec for image I/O, using the __cuda_array_interface__ protocol for zero-copy data sharing between libraries.
Operator Chaining: The sample chained multiple operations (load → resize → stack → blur → split → save) in a pipeline, showing how CV-CUDA operators work together.

Next Steps

After mastering the Hello World sample, explore:

Image Classification - End-to-end deep learning inference
Object Detection - Detection with bounding boxes
Semantic Segmentation - Pixel-level classification