Data Types

CV-CUDA provides high-performance data type abstractions for computer vision operations.

This guide covers the core data types including Tensors, Images, and their batched variants.

Tensor vs. Image

CV-CUDA provides two primary data types:

Tensor - N-dimensional arrays for generic planar data
Image – 2D data type with color information, support for multi-planar formats, and data layouts

When to Use Each

Tensor

Use a Tensor when:

Generic N-dimensional data - Not specifically image/color data
Non-color information - Feature maps, masks, depth maps
Uniform data layout - All dimensions follow regular striding
ML pipeline integration - Tensors map naturally to ML frameworks like PyTorch

Image

Use an Image when:

Color-correct processing is required - Image carries Format metadata including color space, encoding, and transfer functions
Multi-planar formats - Formats like NV12 have planes with different dimensions (Y plane is full resolution, UV plane is half resolution)
Complex color formats - YUV420, YUV422, raw Bayer patterns, etc.

Note

Why Images exist separately from Tensors:

The Image class provides a more complete abstraction for image data with methods and properties directly related to image operations. Even when an image could technically be represented as a Tensor, using Image is preferred because it better maps to the underlying domain concept.

The Image class carries the Format along with its data, which allows operators to correctly interpret the image’s color content. Additionally, some image formats cannot be directly represented as Tensors at all, such as NV12, which has two planes with different dimensions - the Y plane at full resolution and the UV plane at half resolution. Such multi-planar formats with different plane dimensions are not possible within a single Tensor.

Key Differences

Aspect	Tensor	Image
Primary use	Generic N-D arrays	Color-correct image processing
Format metadata	Just DataType	Full ImageFormat with color info
Planes	N/A (no plane concept)	Supports multi-planar (different dims per plane)
Layouts	N-dimensional strided	Pitch-linear and block-linear

Tensor

Overview

A Tensor is an N-dimensional array with a uniform data type and layout. Tensor is flexible and can represent a wide variety of data types including uniform planar images, segmentation masks, feature maps, depth maps, and general numerical data.

Properties:

Shape (size of each dimension)
DataType (e.g., U8, F32)
TensorLayout (e.g., NHWC, NCHW, HWC)
Strides (byte offset between elements in each dimension)

Common `Tensor` Layouts

A TensorLayout describes the semantic meaning of each dimension in a Tensor using dimension labels. Each label indicates what kind of information that dimension represents (e.g., batch size, height, width, channels).

Standard layouts:

cvcuda.TensorLayout.NHWC - Number of samples, Height, Width, Channels - Batch dimension followed by spatial dimensions with interleaved channels
cvcuda.TensorLayout.NCHW - Number of samples, Channels, Height, Width - Batch dimension followed by channels, then spatial dimensions (common in ML frameworks)
cvcuda.TensorLayout.HWC - Height, Width, Channels - Single 2D image with packed channels in one plane
cvcuda.TensorLayout.CHW - Channels, Height, Width - Single 2D multi-planar image where each channel is in its own plane
cvcuda.TensorLayout.NW - Number of samples, Width - 2D data without spatial dimensions

Standard dimension labels:

A label represents the semantic meaning for a dimension of a tensor. Common labels such as H (height) or W (width) are found, but there are additional common labels:

N - Batch/samples
C - Channels
H - Height
W - Width
D - Depth (3D spatial dimension)
F - Frames (temporal depth for video)

Creating Tensors

import cvcuda
import torch
import numpy as np


def main() -> None:
    # Basic tensor with explicit layout
    tensor1 = cvcuda.Tensor((224, 224, 3), np.uint8, layout="HWC")  # noqa: F841

    # Batch of images
    tensor2 = cvcuda.Tensor((10, 224, 224, 3), np.float32, layout="NHWC")  # noqa: F841

    # For image batch (infers NHWC layout from format)
    tensor3 = cvcuda.Tensor(  # noqa: F841
        nimages=5, imgsize=(640, 480), format=cvcuda.Format.RGB8
    )

    # With row alignment for optimized memory access
    tensor4 = cvcuda.Tensor(  # noqa: F841
        (224, 224, 3), np.uint8, layout="HWC", rowalign=32
    )  # Align rows to 32-byte boundaries

    # Generic N-D tensor
    tensor5 = cvcuda.Tensor((100, 50, 25), np.float32, layout="DHW")  # noqa: F841

    # Wrap existing torch tensor (zero-copy, NHWC)
    torch_tensor = torch.zeros((10, 224, 224, 3), dtype=torch.float32, device="cuda")
    cvcuda_tensor = cvcuda.as_tensor(torch_tensor, layout="NHWC")

    # Common ML layout: NCHW
    torch_nchw = torch.randn((4, 3, 256, 256), dtype=torch.float32, device="cuda")
    cvcuda_nchw = cvcuda.as_tensor(torch_nchw, layout="NCHW")  # noqa: F841

    # Bidirectional: CV-CUDA back to torch (also zero-copy)
    torch_output = torch.as_tensor(cvcuda_tensor.cuda(), device="cuda")  # noqa: F841

    # Video tensor with temporal dimension (Batch, Frames, Height, Width, Channels)
    video_tensor = cvcuda.Tensor(  # noqa: F841
        (2, 30, 720, 1280, 3), np.uint8, layout="NDHWC"
    )

Tensor Documentation

Refer to the following for more information:

Tensor documentation
TensorLayout documentation
Type documentation

Image

Overview

An Image in CV-CUDA is a 2D array of pixels, where each pixel is a unit of visual data composed of one or more color channels that may be stored across one or more planes. Image includes metadata about how the pixel data should be interpreted.

Properties:

Width and height dimensions
Format - encodes color model, color space, chroma subsampling, memory layout, data type, channel swizzle, and packing
Can have multiple planes (e.g., NV12 has 2 planes: Y and UV)

Common Image Formats

Simple RGB/BGR formats:

cvcuda.Format.RGB8 - 8-bit RGB interleaved - Red, Green, Blue channels packed together (RGBRGBRGB…) in a single plane, 8 bits per channel
cvcuda.Format.RGBA8 - 8-bit RGBA interleaved - Red, Green, Blue, Alpha channels packed together (RGBARGBA…) in a single plane, 8 bits per channel
cvcuda.Format.BGR8 - 8-bit BGR interleaved - Blue, Green, Red channels packed together (BGRBGRBGR…) in a single plane, 8 bits per channel (common in OpenCV)
cvcuda.Format.RGB8p - 8-bit RGB planar - Red, Green, Blue channels in separate planes (RRR…GGG…BBB…), 8 bits per channel

Single channel formats:

cvcuda.Format.U8 - 8-bit grayscale/single channel - Unsigned 8-bit integer, single plane
cvcuda.Format.F32 - 32-bit float single channel - 32-bit floating point, single plane (useful for depth maps, feature maps)

YUV formats (video/camera):

cvcuda.Format.NV12 - YUV420 semi-planar - 2 planes: full-resolution Y (luma) + half-resolution interleaved UV (chroma). Common in video codecs and cameras
cvcuda.Format.NV21 - YUV420 semi-planar (VU order) - 2 planes: full-resolution Y (luma) + half-resolution interleaved VU (chroma, opposite order from NV12)
cvcuda.Format.YUV8p - YUV444 planar - 3 separate full-resolution planes: Y, U, V. No chroma subsampling
cvcuda.Format.YUYV - YUV422 packed - Single plane with horizontally subsampled chroma (YUYV YUYV…)

Creating Images

You can allocate an Image directly using several methods:

import cvcuda
import torch
from nvidia import nvimgcodec


def main() -> None:
    # Direct allocation (managed by CV-CUDA)
    img1 = cvcuda.Image((640, 480), cvcuda.Format.RGB8)  # noqa: F841

    # Zero-initialized
    img2 = cvcuda.Image.zeros((640, 480), cvcuda.Format.RGB8)  # noqa: F841

    # Wrapping GPU buffer (zero-copy)
    gpu_buffer = torch.zeros((480, 640, 3), device="cuda", dtype=torch.uint8)
    img3 = cvcuda.as_image(gpu_buffer, format=cvcuda.Format.RGB8)  # noqa: F841

    # Wrapping multiple GPU buffers (for planar formats)
    gpu_channels = [
        torch.zeros((480, 640), device="cuda", dtype=torch.uint8) for _ in range(3)
    ]
    img4 = cvcuda.as_image(gpu_channels, format=cvcuda.Format.RGB8p)  # noqa: F841

    # With row alignment for optimized memory access
    img6 = cvcuda.Image((1920, 1080), cvcuda.Format.RGB8, rowalign=32)  # noqa: F841

    # Loading images from disk with nvimgcodec - Create decoder
    decoder = nvimgcodec.Decoder()

    # Load image from disk
    image_path = (
        Path(__file__).parent / ".." / "assets" / "images" / "tabby_tiger_cat.jpg"
    )
    nic_img = decoder.read(str(image_path))

    # Convert to CV-CUDA Image
    cvcuda_img = cvcuda.as_image(nic_img, format=cvcuda.Format.RGB8)  # noqa: F841

Image Documentation

Refer to the following for more information:

Image documentation
Format documentation

Converting Between Image and Tensor

Image to Tensor

Image objects can be wrapped as Tensor objects when they meet certain requirements.

See as_tensor for more details.

Requirements for Image → Tensor conversion:

Must be pitch-linear
No chroma subsampling (must be 4:4:4)
All planes must have the same data type and size

Note

Image planes can be discontiguous (independent buffers in memory). Tensor strides can represent this by encoding the address difference between planes as the stride value for the outermost dimension. This allows wrapping discontiguous multi-planar images (e.g., 2-planar formats like planar RGBA) as tensors, provided all planes have identical dimensions and data types.

Example:

def image_to_tensor() -> None:
    """Convert Image to Tensor with zero-copy wrapping."""
    # Valid conversion: pitch-linear format with no chroma subsampling
    image = cvcuda.Image((640, 480), cvcuda.Format.RGB8)
    tensor = cvcuda.as_tensor(image)  # noqa: F841

    # Valid conversion: single channel format
    grayscale = cvcuda.Image((640, 480), cvcuda.Format.U8)
    tensor_gray = cvcuda.as_tensor(grayscale)  # noqa: F841

    # Invalid conversion: NV12 has chroma subsampling
    try:
        nv12 = cvcuda.Image((1920, 1080), cvcuda.Format.NV12)
        tensor_nv12 = cvcuda.as_tensor(nv12)  # noqa: F841
        raise AssertionError("Expected ValueError for NV12 format")
    except RuntimeError as e:
        assert "sub-sampled" in str(e)

Tensor to Image

Tensor objects can be loaded into Image objects, but not directly. In order to do so, you pass the foreign interface accessible via the .cuda() method to the as_image function.

Note

While it is possible to load a Tensor into an Image, it is not recommended.

Example:

def tensor_to_image() -> None:
    """Convert Tensor to Image using the foreign interface."""
    # Valid conversion: using the foreign interface (.cuda())
    tensor = cvcuda.Tensor((640, 480, 3), cvcuda.Type.U8, layout="HWC")
    image = cvcuda.as_image(tensor.cuda(), format=cvcuda.Format.RGB8)  # noqa: F841

    # Valid conversion: single channel tensor
    tensor_gray = cvcuda.Tensor((640, 480), cvcuda.Type.U8, layout="HW")
    image_gray = cvcuda.as_image(  # noqa: F841
        tensor_gray.cuda(), format=cvcuda.Format.U8
    )

    # Invalid conversion: passing tensor directly without .cuda()
    try:
        tensor_invalid = cvcuda.Tensor((640, 480, 3), cvcuda.Type.U8, layout="HWC")
        image_invalid = cvcuda.as_image(  # noqa: F841
            tensor_invalid, format=cvcuda.Format.RGB8
        )
        raise AssertionError("Expected TypeError when not using foreign interface")
    except TypeError as e:
        assert "ExternalBuffer" in str(e) or "buffer" in str(e).lower()

Batched Data Types

CV-CUDA provides three container types for batching multiple Image or Tensor together for batch processing.

TensorBatch

TensorBatch is a container that holds multiple Tensor with varying shapes.

Requirements:

All Tensors must have the same data type
All Tensors must have the same rank (number of dimensions)
All Tensors must have the same layout

Creating TensorBatch

import cvcuda
import numpy as np


def main() -> None:
    # TensorBatch - all tensors must have same rank, dtype, and layout
    batch = cvcuda.TensorBatch(capacity=10)
    tensor1 = cvcuda.Tensor((100, 100, 3), np.uint8, "HWC")
    tensor2 = cvcuda.Tensor((150, 200, 3), np.uint8, "HWC")
    tensor3 = cvcuda.Tensor((200, 150, 3), np.uint8, "HWC")
    batch.pushback([tensor1, tensor2, tensor3])

    # Different datatypes (each batch needs same dtype)
    batch_float = cvcuda.TensorBatch(capacity=5)
    t_float1 = cvcuda.Tensor((100, 100, 3), np.float32, "HWC")
    t_float2 = cvcuda.Tensor((120, 80, 3), np.float32, "HWC")
    batch_float.pushback([t_float1, t_float2])

    batch_int16 = cvcuda.TensorBatch(capacity=5)
    t_int1 = cvcuda.Tensor((50, 50, 1), np.int16, "HWC")
    batch_int16.pushback([t_int1])

ImageBatch

ImageBatch is a container for Images with uniform dimensions and formats.

Requirements:

All Images must have the same size (width and height)
All Images must have the same format

Creating ImageBatch

import cvcuda

# Python only exports VarShape variant, make an alias
ImageBatch = cvcuda.ImageBatchVarShape


def main() -> None:
    # ImageBatch requires all images to have the same dimensions and format
    batch = ImageBatch(capacity=10)
    img1 = cvcuda.Image((640, 480), cvcuda.Format.RGB8)
    img2 = cvcuda.Image((640, 480), cvcuda.Format.RGB8)
    batch.pushback([img1, img2])

    # Different datatypes - but same dimensions
    batch_rgba = ImageBatch(capacity=5)
    img_rgba = cvcuda.Image((640, 480), cvcuda.Format.RGBA8)  # 4 channels
    batch_rgba.pushback([img_rgba])

    batch_float = ImageBatch(capacity=5)
    img_float = cvcuda.Image((640, 480), cvcuda.Format.RGBf32)  # float32 type
    batch_float.pushback([img_float])

Note

ImageBatch in Python: There is no separate ImageBatch class in the Python API. For uniform image batches, use either a Tensor with cvcuda.TensorLayout.NHWC layout or ImageBatchVarShape. The ImageBatchVarShape class handles both uniform image batches (all same size/format) and variable-shape batches (mixed sizes/formats).

ImageBatchVarShape

ImageBatchVarShape is a container for Image with variable shapes and formats, providing maximum flexibility.

Requirements:

No requirements - each Image can have different dimensions and formats

Creating ImageBatchVarShape

import cvcuda


def main() -> None:
    # ImageBatchVarShape - can mix different sizes AND formats
    batch = cvcuda.ImageBatchVarShape(capacity=10)
    img1 = cvcuda.Image((640, 480), cvcuda.Format.RGB8)  # uint8, 3 channels
    img2 = cvcuda.Image((1280, 720), cvcuda.Format.RGBA8)  # uint8, 4 channels
    img3 = cvcuda.Image((800, 600), cvcuda.Format.BGR8)  # uint8, 3 channels
    batch.pushback([img1, img2, img3])

    # Can even mix datatypes in same batch
    batch_mixed = cvcuda.ImageBatchVarShape(capacity=5)
    img_uint8 = cvcuda.Image((640, 480), cvcuda.Format.RGB8)  # uint8
    img_float32 = cvcuda.Image((320, 240), cvcuda.Format.RGBf32)  # float32
    img_gray = cvcuda.Image((800, 600), cvcuda.Format.U8)  # grayscale
    batch_mixed.pushback([img_uint8, img_float32, img_gray])

Batch Types Comparison

Feature	`TensorBatch`	`ImageBatch`	`ImageBatchVarShape`
Data Type	`Tensor`	`Image`	`Image`
Shape Flexibility	Different shapes (same rank)	All same size	Different sizes
Format Flexibility	Same dtype/layout	Same format	Different formats
Restrictions	Same rank, dtype, layout	Same size, format	None
Use Case	Variable-size feature maps	Uniform image batches	Mixed image sizes/formats
Python API	`TensorBatch`	Use `Tensor` (NHWC) or `ImageBatchVarShape`	`ImageBatchVarShape`

Note

The TensorBatch, ImageBatch and ImageBatchVarShape classes are not interchangeable.

Data Types

Tensor vs. Image

When to Use Each

Tensor

Image

Key Differences

Tensor

Overview

Common Tensor Layouts

Creating Tensors

Tensor Documentation

Image

Overview

Common Image Formats

Creating Images

Image Documentation

Converting Between Image and Tensor

Image to Tensor

Tensor to Image

Batched Data Types

TensorBatch

ImageBatch

ImageBatchVarShape

Batch Types Comparison

Common `Tensor` Layouts