Utilities

The common.py module provides utilities for:

Image I/O - GPU-accelerated reading and writing
CUDA Memory - Host-device memory transfers
TensorRT - Model inference wrapper
Model Export - PyTorch to ONNX to TensorRT

All samples import from this module to avoid code duplication.

Module Location

File: samples/common.py

from common import (
     read_image,
     write_image,
     TRT,
     cuda_memcpy_h2d,
     cuda_memcpy_d2h,
     zero_copy_split,
     parse_image_args,
     get_cache_dir,
     engine_from_onnx,
     export_classifier_onnx,
     export_retinanet_onnx,
     export_segmentation_onnx,
)

Image I/O Functions

read_image()

def read_image(path: Path) -> cvcuda.Tensor
    # path: Path to input image file (JPG, PNG, etc.)
    # Returns: CV-CUDA tensor in HWC layout, uint8 data type

Load an image from disk directly into GPU memory using nvImageCodec for GPU-accelerated decoding.

Example:

from common import read_image

image = read_image(Path("input.jpg"))
print(image.shape)  # (H, W, 3)
print(image.dtype)  # uint8

write_image()

def write_image(tensor: cvcuda.Tensor, path: Path) -> None
    # tensor: CV-CUDA tensor in HWC layout
    # path: Output file path (format determined by extension: .jpg, .png, etc.)

Save a CV-CUDA tensor as an image file using nvImageCodec.

Example:

from common import write_image

write_image(processed_image, Path("output.jpg"))

CUDA Memory Operations

cuda_memcpy_h2d()

def cuda_memcpy_h2d(
    host_array: np.ndarray,              # NumPy array on CPU
    device_array: int | dict | object    # GPU pointer or CV-CUDA tensor
) -> None

Copy data from CPU (host) memory to GPU (device) memory.

Example:

# Upload normalization parameters
mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
mean_tensor = cvcuda.Tensor((3,), np.float32)
cuda_memcpy_h2d(mean, mean_tensor.cuda())

cuda_memcpy_d2h()

def cuda_memcpy_d2h(
    device_array: int | dict | object,   # GPU pointer or CV-CUDA tensor
    host_array: np.ndarray               # NumPy array on CPU (pre-allocated)
) -> None

Copy data from GPU (device) memory to CPU (host) memory.

Example:

# Download inference results
output = np.zeros((1, 1000), dtype=np.float32)
cuda_memcpy_d2h(output_tensor.cuda(), output)

# Now process on CPU
top_classes = np.argsort(output[0])[::-1][:5]

Tensor Utilities

zero_copy_split()

def zero_copy_split(batch: cvcuda.Tensor) -> list[cvcuda.Tensor]
    # batch: Batched tensor with shape (N, ...) where N is batch size
    # Returns: List of N tensors, each representing one item from the batch

Split a batched tensor into individual tensors without copying data (creates views into original memory).

Example:

# Stack images
batch = cvcuda.stack([img1, img2, img3])  # Shape: (3, H, W, C)

# Process batch
processed = cvcuda.gaussian(batch, (5, 5), (1.0, 1.0))

# Split back to individual images
images = zero_copy_split(processed)  # List of 3 tensors
for img in images:
    print(img.shape)  # (H, W, C)

Argument Parsing

parse_image_args()

def parse_image_args(default_output: str = "output.jpg") -> argparse.Namespace
    # default_output: Default output filename
    # Returns: Namespace with input, output, width, height attributes

Parse command-line arguments for image processing samples (--input, --output, --width, --height).

Example:

args = parse_image_args("processed.jpg")
input_image = read_image(args.input)
# ... process ...
write_image(result, args.output)

TensorRT Integration

TRT Class

class TRT:
    def __init__(self, engine_path: Path)
        # engine_path: Path to serialized TensorRT engine file (.trtmodel)

    def __call__(self, inputs: list[cvcuda.Tensor]) -> list[cvcuda.Tensor]
        # inputs: List of CV-CUDA tensors matching engine's expected inputs
        # Returns: List of CV-CUDA tensors containing inference results

Wrapper class for TensorRT engine inference with CV-CUDA tensor support via __cuda_array_interface__.

Example:

# Load TensorRT engine
from common import get_cache_dir
model = TRT(get_cache_dir() / "resnet50.trtmodel")

# Run inference
input_tensors = [preprocessed_image]
output_tensors = model(input_tensors)

# Access results
logits = output_tensors[0]

engine_from_onnx()

def engine_from_onnx(
    onnx_path: Path,           # Path to ONNX model file
    engine_path: Path,         # Path where TensorRT engine will be saved
    use_fp16: bool = True,     # Enable FP16 precision
    max_batch_size: int = 1    # Maximum batch size to support
) -> None

Build a TensorRT engine from an ONNX model with optimizations (FP16, layer fusion, etc.).

Example:

engine_from_onnx(
    Path("model.onnx"),
    Path("model.trtmodel"),
    use_fp16=True
)

Model Export Functions

export_classifier_onnx()

def export_classifier_onnx(
    model: torch.nn.Module,           # PyTorch model
    output_path: Path,                # Where to save ONNX file
    input_shape: tuple[int, int, int], # Model input shape (C, H, W)
    verbose: bool = False             # Print export details
) -> None

Export a PyTorch classification model to ONNX format.

Example:

import torchvision

model = torchvision.models.resnet50(weights='DEFAULT')
export_classifier_onnx(
    model,
    Path("resnet50.onnx"),
    (3, 224, 224)
)

export_retinanet_onnx()

def export_retinanet_onnx(
    model: torch.nn.Module,           # PyTorch RetinaNet model
    output_path: Path,                # Output ONNX path
    input_shape: tuple[int, int, int], # Input shape (C, H, W)
    score_threshold: float = 0.5,     # Confidence threshold for detections
    iou_threshold: float = 0.5,       # IoU threshold for NMS
    max_detections: int = 100,        # Maximum boxes to return
    verbose: bool = False             # Print export details
) -> None

Export RetinaNet detection model with TensorRT EfficientNMS plugin to ONNX (includes GPU-accelerated NMS).

export_segmentation_onnx()

def export_segmentation_onnx(
    model: torch.nn.Module,           # PyTorch segmentation model
    output_path: Path,                # Output ONNX path
    input_shape: tuple[int, int, int], # Input shape (C, H, W)
    verbose: bool = False             # Print export details
) -> None

Export segmentation model (FCN, DeepLab, etc.) to ONNX.

Example:

import torchvision

fcn = torchvision.models.segmentation.fcn_resnet101(weights='DEFAULT')
export_segmentation_onnx(
    fcn,
    Path("fcn.onnx"),
    (3, 224, 224)
)

Dependencies

The common module requires:

cvcuda - CV-CUDA
numpy - Array operations
tensorrt - TensorRT inference
torch - PyTorch for model export
nvimgcodec - Image I/O
cuda-python - CUDA runtime bindings

Utilities

Module Location

Image I/O Functions

read_image()

write_image()

CUDA Memory Operations

cuda_memcpy_h2d()

cuda_memcpy_d2h()

Tensor Utilities

zero_copy_split()

Argument Parsing

parse_image_args()

TensorRT Integration

TRT Class

engine_from_onnx()

Model Export Functions

export_classifier_onnx()

export_retinanet_onnx()

export_segmentation_onnx()

Dependencies

See Also