Interoperability

CV-CUDA provides interoperability with various GPU-accelerated Python frameworks through the CUDA Array Interface protocol. This standard protocol enables efficient zero-copy data exchange between CV-CUDA and other libraries, allowing you to:

  • Convert tensors between frameworks without copying data

  • Build end-to-end GPU pipelines that combine multiple libraries

  • Leverage CV-CUDA’s optimized computer vision operations within your existing workflows

  • Move data between CPU and GPU seamlessly

The key to this interoperability is the __cuda_array_interface__ property, which CV-CUDA tensors expose via the .cuda() method. This property provides metadata about the GPU buffer (pointer, shape, dtype, strides) that other frameworks can use to create their own tensor views of the same memory.

Setting Up the Environment

Use the provided installation script to automatically detect your CUDA version and install all required dependencies:

cd samples
./install_interop_dependencies.sh

This script will:

  • Detect your CUDA version (12 or 13)

  • Create a virtual environment at venv_samples

  • Install all required dependencies for interoperability samples (PyTorch, CuPy, PyCUDA, PyNvVideoCodec, CV-CUDA, etc.)

After installation, activate the virtual environment:

source venv_samples/bin/activate

Frameworks

CV-CUDA interoperates with the following frameworks through the CUDA Array Interface protocol:

Best Practices

  1. Memory Management:

    • Be aware of whether tensors share memory (zero-copy) or are copied

    • When converting from CV-CUDA to PyTorch, use .clone() if avoiding shared buffers

    • Ensure CUDA buffers are not freed while other frameworks still reference them

  2. Data Layout:

    • CV-CUDA uses HWC (Height × Width × Channels) layout by default for images

    • PyTorch typically uses CHW (Channels × Height × Width) layout - use .permute() to convert

    • Be explicit about layout when converting with cvcuda.as_tensor() (e.g., cvcuda.as_tensor(obj, "HWC"))

  3. Device Management:

    • Ensure all operations occur on the same GPU device

    • Use appropriate CUDA streams for concurrent operations

    • PyTorch, CuPy, and PyCUDA have their own stream management

  4. Performance:

    • Use batch operations when possible (e.g., PyNvVideoCodec batch decoding)

    • Minimize CPU-GPU transfers

    • Decode/encode directly to/from GPU memory with NvImgCodec/PyNvVideoCodec

    • Consider memory alignment for optimal performance

  5. Error Handling:

    • Check return codes when using CUDA Python directly

    • Validate tensor shapes and dtypes after conversion

    • Handle codec errors appropriately in pipelines