Hello World
Overview
The Hello World sample demonstrates the fundamental workflow of CV-CUDA for GPU-accelerated computer vision processing. This sample showcases:
Load images from disk into CV-CUDA tensors
Resize images to a target resolution
Batch multiple images into a single tensor
Apply GPU-accelerated Gaussian blur
Split the batch and save results back to disk
All operations are performed entirely on the GPU without copying data back to the host, demonstrating CV-CUDA’s zero-copy interoperability and efficient batch processing capabilities.
Usage
Run with default parameters (processes tabby_tiger_cat.jpg):
python3 hello_world.py
Process a single image with custom dimensions:
python3 hello_world.py -i input.jpg -o output.jpg --width 512 --height 512
Process multiple images in a single batch:
python3 hello_world.py -i img1.jpg img2.jpg img3.jpg -o out1.jpg out2.jpg out3.jpg
Command-Line Arguments
Argument |
Short Form |
Default |
Description |
|---|---|---|---|
|
|
tabby_tiger_cat.jpg |
Input image file path(s). Multiple images can be specified. |
|
|
cvcuda/.cache/cat_hw.jpg |
Output image file path(s). Must match number of inputs. |
|
224 |
Target width for resized images |
|
|
224 |
Target height for resized images |
|
|
|
5 |
Kernel size for Gaussian blur (must be odd) |
|
|
1.0 |
Sigma value for Gaussian blur |
Implementation Details
The Hello World sample follows this processing pipeline:
Image loading (decode from disk using nvImageCodec to CV-CUDA tensors)
Resizing (to target dimensions)
Batching (stack into single batched tensor)
Gaussian blur
Splitting (back to individual tensors)
Saving (encode to disk)
All operations execute entirely on the GPU.
Code Walkthrough
Loading Images
# 1. Load the images into CV-CUDA
decoder = nvimgcodec.Decoder()
images: list[nvimgcodec.Image] = [
decoder.decode(str(i_path)) for i_path in input_paths
]
tensors: list[cvcuda.Tensor] = [
cvcuda.as_tensor(image, "HWC") for image in images
]
Images are:
Decoded using nvImageCodec’s GPU decoder
Converted to CV-CUDA tensors with HWC (Height-Width-Channels) layout
Kept in GPU memory throughout
Resizing Images
# 2. Resize the images
resized_tensors: list[cvcuda.Tensor] = [
cvcuda.resize(
tensor,
(args.height, args.width, 3),
interp=cvcuda.Interp.LINEAR,
)
for tensor in tensors
]
Each image is resized to the target dimensions using linear (bilinear) interpolation for smooth results.
Batching Images
# 3. Batch all the images into a single CV-CUDA tensor
batch_tensor: cvcuda.Tensor = cvcuda.stack(resized_tensors)
The cvcuda.stack() operation combines individual HWC tensors into a single NHWC tensor, enabling efficient batched processing.
Applying Gaussian Blur
# 4. Apply a Gaussian blur on the batch
blurred_tensor_batch: cvcuda.Tensor = cvcuda.gaussian(
batch_tensor,
(args.kernel, args.kernel),
(args.sigma, args.sigma),
cvcuda.Border.CONSTANT,
)
The Gaussian blur operation applies to entire batch simultaneously with constant border handling for edge pixels.
Splitting and Saving
# 5. Save the images back to disk
# 5.1 Split the batch into individual images
tensors: list[cvcuda.Tensor] = zero_copy_split(blurred_tensor_batch)
with timer("Write images to disk"):
# 5.2 Encode the images back to disk
encoder = nvimgcodec.Encoder()
for tensor, output_path in zip(tensors, output_paths):
nvc_img = nvimgcodec.as_image(tensor.cuda())
encoder.write(str(output_path), nvc_img)
# 6. Verify output files exist
for output_path in output_paths:
assert output_path.exists()
print(f"Wrote image to {output_path}")
The zero_copy_split() helper function splits the batched tensor back to individual images without memory copying.
Expected Output
The output will be a 224×224 image with a 5×5 Gaussian blur applied. The output will be smoothed while preserving major features and colors of the original image.
Original Input Image |
Output: Resized and Blurred |
CV-CUDA Operators Used
Operator |
Purpose |
|---|---|
Resize images to target dimensions using bilinear interpolation |
|
Combine multiple tensors into a batched tensor along a new dimension |
|
Apply Gaussian blur filter for smoothing |
Common Utilities Used
zero_copy_split() - Split batched tensors efficiently
Recap
The Hello World sample introduced you to CV-CUDA’s core concepts:
- GPU-Only Processing
All operations happened on the GPU. The image was loaded from disk directly to GPU memory, processed entirely on the GPU, and saved from GPU memory back to disk. No CPU-GPU memory copies were needed for the actual image data.
- Batching for Efficiency
Images are converted to batch format (NHWC) to enable efficient parallel processing of multiple images.
- Zero-Copy Interoperability
CV-CUDA integrates seamlessly with nvImageCodec for image I/O, using the
__cuda_array_interface__protocol for zero-copy data sharing between libraries.- Operator Chaining
The sample chained multiple operations (load → resize → stack → blur → split → save) in a pipeline, showing how CV-CUDA operators work together.
See Also
Gaussian Blur Operator Sample - Detailed Gaussian blur documentation
Resize Operator Sample - Detailed resize documentation
Stack Operator Sample - Detailed stack documentation
Common Utilities - Shared helper functions
Next Steps
After mastering the Hello World sample, explore:
Image Classification - End-to-end deep learning inference
Object Detection - Detection with bounding boxes
Semantic Segmentation - Pixel-level classification