PyNvVideoCodec

PyNvVideoCodec provides Python bindings to NVIDIA’s hardware-accelerated video codec APIs (NVDEC/NVENC). It enables efficient video decoding and encoding directly on GPU, perfect for video processing pipelines with CV-CUDA.

Key Points:

Hardware-accelerated video decoding (H.264, NV12, etc.)
Decodes directly to GPU memory
Batch frame decoding for better throughput
Hardware-accelerated encoding for output
Minimal CPU involvement in the entire pipeline

Required Imports:

from __future__ import annotations

from pathlib import Path

import cvcuda

import PyNvVideoCodec as nvc

Setup PyNvVideoCodec:

# setup paths
video_path = (
    Path(__file__).parent.parent
    / "assets"
    / "videos"
    / "pexels-chiel-slotman-4423925-1920x1080-25fps.mp4"
)
cvcuda_root = Path(__file__).parent.parent.parent
output_dir = cvcuda_root / ".cache"
output_dir.mkdir(parents=True, exist_ok=True)
output_path = output_dir / "pexels-chiel-slotman-640x480.h264"
device_id = 0

# create decoder and encoder
decoder = nvc.SimpleDecoder(
    str(video_path),
    gpu_id=device_id,
    output_color_type=nvc.OutputColorType.RGB,
    need_scanned_stream_metadata=False,
    max_width=1920,
    max_height=1080,
    use_device_memory=True,
)
encoder = nvc.CreateEncoder(
    width=640,
    height=480,
    fmt="NV12",
    codec="h264",
    usecpuinputbuffer=False,
)

The decoder is configured to output RGB frames directly to device memory. The encoder is set up to accept NV12 pixel format frames at the target resolution (640×480).

Read video frames into CV-CUDA tensors and process:

processed_frames: list[cvcuda.Tensor] = []
frame_idx = 0
num_frames = decoder.get_stream_metadata().num_frames
while frame_idx < num_frames:
    batch_size = min(10, num_frames - frame_idx)
    frame_chunk = decoder.get_batch_frames(batch_size)
    for frame in frame_chunk:
        # get the CV-CUDA tensor from the PyNvVideoCodec DecodedFrame
        cvcuda_tensor = cvcuda.as_tensor(frame, "HWC")
        # process the frame with CV-CUDA
        # importantly since CV-CUDA did not copy from the PyNvVideoCodec buffer
        # we must use the buffer before we read the next batch of frames
        # here we process one frame at a time for simplicity
        resized = cvcuda.resize(cvcuda_tensor, (480, 640, 3), cvcuda.Interp.LINEAR)
        nv12_frame = cvcuda.cvtcolor(resized, cvcuda.ColorConversion.RGB2YUV_NV12)
        # store the new CV-CUDA tensor (resize and cvtcolor make new buffers)
        processed_frames.append(nv12_frame)
        frame_idx += 1

Frames are decoded in batches for efficiency. Each frame is converted to a CV-CUDA tensor with HWC (Height × Width × Channels) layout.

CV-CUDA operations can be applied to each frame. Here we resize from 1920×1080 to 640×480 and convert from BGR to NV12 (YUV 4:2:0 planar) to match the encoder’s expected format. Alternatively, you could use cvcuda.stack() to stack the frames into a single tensor and apply operations on the stacked tensor. For clarity, we process each frame individually here.

Encode processed frames:

with output_path.open("wb") as f:
    for frame in processed_frames:
        bitstream = encoder.Encode(frame)
        f.write(bytearray(bitstream))
    bitstream = encoder.EndEncode()
    f.write(bytearray(bitstream))

The processed frames are encoded back to video. The encoder produces compressed bitstreams that are written to the output file. Note that EndEncode() must be called to flush any remaining frames.

Typical Use Cases:

Video analytics preprocessing
Real-time video transformation
Video transcoding pipelines
Video quality enhancement
Object detection/segmentation on video

Complete Example: See samples/interoperability/pynvvideocodec_interop.py