Utilities
The common.py module provides utilities for:
Image I/O - GPU-accelerated reading and writing
CUDA Memory - Host-device memory transfers
TensorRT - Model inference wrapper
Model Export - PyTorch to ONNX to TensorRT
All samples import from this module to avoid code duplication.
Module Location
File: samples/common.py
from common import (
read_image,
write_image,
TRT,
cuda_memcpy_h2d,
cuda_memcpy_d2h,
zero_copy_split,
parse_image_args,
get_cache_dir,
engine_from_onnx,
export_classifier_onnx,
export_retinanet_onnx,
export_segmentation_onnx,
)
Image I/O Functions
read_image()
def read_image(path: Path) -> cvcuda.Tensor
# path: Path to input image file (JPG, PNG, etc.)
# Returns: CV-CUDA tensor in HWC layout, uint8 data type
Load an image from disk directly into GPU memory using nvImageCodec for GPU-accelerated decoding.
Example:
from common import read_image
image = read_image(Path("input.jpg"))
print(image.shape) # (H, W, 3)
print(image.dtype) # uint8
write_image()
def write_image(tensor: cvcuda.Tensor, path: Path) -> None
# tensor: CV-CUDA tensor in HWC layout
# path: Output file path (format determined by extension: .jpg, .png, etc.)
Save a CV-CUDA tensor as an image file using nvImageCodec.
Example:
from common import write_image
write_image(processed_image, Path("output.jpg"))
CUDA Memory Operations
cuda_memcpy_h2d()
def cuda_memcpy_h2d(
host_array: np.ndarray, # NumPy array on CPU
device_array: int | dict | object # GPU pointer or CV-CUDA tensor
) -> None
Copy data from CPU (host) memory to GPU (device) memory.
Example:
# Upload normalization parameters
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
mean_tensor = cvcuda.Tensor((3,), np.float32)
cuda_memcpy_h2d(mean, mean_tensor.cuda())
cuda_memcpy_d2h()
def cuda_memcpy_d2h(
device_array: int | dict | object, # GPU pointer or CV-CUDA tensor
host_array: np.ndarray # NumPy array on CPU (pre-allocated)
) -> None
Copy data from GPU (device) memory to CPU (host) memory.
Example:
# Download inference results
output = np.zeros((1, 1000), dtype=np.float32)
cuda_memcpy_d2h(output_tensor.cuda(), output)
# Now process on CPU
top_classes = np.argsort(output[0])[::-1][:5]
Tensor Utilities
zero_copy_split()
def zero_copy_split(batch: cvcuda.Tensor) -> list[cvcuda.Tensor]
# batch: Batched tensor with shape (N, ...) where N is batch size
# Returns: List of N tensors, each representing one item from the batch
Split a batched tensor into individual tensors without copying data (creates views into original memory).
Example:
# Stack images
batch = cvcuda.stack([img1, img2, img3]) # Shape: (3, H, W, C)
# Process batch
processed = cvcuda.gaussian(batch, (5, 5), (1.0, 1.0))
# Split back to individual images
images = zero_copy_split(processed) # List of 3 tensors
for img in images:
print(img.shape) # (H, W, C)
Argument Parsing
parse_image_args()
def parse_image_args(default_output: str = "output.jpg") -> argparse.Namespace
# default_output: Default output filename
# Returns: Namespace with input, output, width, height attributes
Parse command-line arguments for image processing samples (--input, --output, --width, --height).
Example:
args = parse_image_args("processed.jpg")
input_image = read_image(args.input)
# ... process ...
write_image(result, args.output)
TensorRT Integration
TRT Class
class TRT:
def __init__(self, engine_path: Path)
# engine_path: Path to serialized TensorRT engine file (.trtmodel)
def __call__(self, inputs: list[cvcuda.Tensor]) -> list[cvcuda.Tensor]
# inputs: List of CV-CUDA tensors matching engine's expected inputs
# Returns: List of CV-CUDA tensors containing inference results
Wrapper class for TensorRT engine inference with CV-CUDA tensor support via __cuda_array_interface__.
Example:
# Load TensorRT engine
from common import get_cache_dir
model = TRT(get_cache_dir() / "resnet50.trtmodel")
# Run inference
input_tensors = [preprocessed_image]
output_tensors = model(input_tensors)
# Access results
logits = output_tensors[0]
engine_from_onnx()
def engine_from_onnx(
onnx_path: Path, # Path to ONNX model file
engine_path: Path, # Path where TensorRT engine will be saved
use_fp16: bool = True, # Enable FP16 precision
max_batch_size: int = 1 # Maximum batch size to support
) -> None
Build a TensorRT engine from an ONNX model with optimizations (FP16, layer fusion, etc.).
Example:
engine_from_onnx(
Path("model.onnx"),
Path("model.trtmodel"),
use_fp16=True
)
Model Export Functions
export_classifier_onnx()
def export_classifier_onnx(
model: torch.nn.Module, # PyTorch model
output_path: Path, # Where to save ONNX file
input_shape: tuple[int, int, int], # Model input shape (C, H, W)
verbose: bool = False # Print export details
) -> None
Export a PyTorch classification model to ONNX format.
Example:
import torchvision
model = torchvision.models.resnet50(weights='DEFAULT')
export_classifier_onnx(
model,
Path("resnet50.onnx"),
(3, 224, 224)
)
export_retinanet_onnx()
def export_retinanet_onnx(
model: torch.nn.Module, # PyTorch RetinaNet model
output_path: Path, # Output ONNX path
input_shape: tuple[int, int, int], # Input shape (C, H, W)
score_threshold: float = 0.5, # Confidence threshold for detections
iou_threshold: float = 0.5, # IoU threshold for NMS
max_detections: int = 100, # Maximum boxes to return
verbose: bool = False # Print export details
) -> None
Export RetinaNet detection model with TensorRT EfficientNMS plugin to ONNX (includes GPU-accelerated NMS).
export_segmentation_onnx()
def export_segmentation_onnx(
model: torch.nn.Module, # PyTorch segmentation model
output_path: Path, # Output ONNX path
input_shape: tuple[int, int, int], # Input shape (C, H, W)
verbose: bool = False # Print export details
) -> None
Export segmentation model (FCN, DeepLab, etc.) to ONNX.
Example:
import torchvision
fcn = torchvision.models.segmentation.fcn_resnet101(weights='DEFAULT')
export_segmentation_onnx(
fcn,
Path("fcn.onnx"),
(3, 224, 224)
)
Dependencies
The common module requires:
cvcuda - CV-CUDA
numpy - Array operations
tensorrt - TensorRT inference
torch - PyTorch for model export
nvimgcodec - Image I/O
cuda-python - CUDA runtime bindings
See Also
Hello World Sample - Uses image I/O functions
Classification Sample - Uses TensorRT utilities
Applications - End-to-end pipelines
Operators - Individual operators
Python API - Core API reference