Semantic Segmentation Pre-processing Pipeline using CVCUDA

CVCUDA helps accelerate the pre-processing pipeline of the semantic segmentation sample tremendously. Easy interoperability with PyTorch tensors also makes it easy to integrate with PyTorch and other data loaders that supports the tensor layout.

The exact pre-processing operations are:

Tensor Conversion -> Resize -> Convert Datatype(Float) -> Normalize (to 0-1 range, mean and stddev) -> convert to NCHW

The Tensor conversion operation helps in converting non CVCUDA tensors/data to CVCUDA tensors.

# Need to check what type of input we have received:
# 1) CVCUDA tensor --> Nothing needs to be done.
# 2) Numpy Array --> Convert to torch tensor first and then CVCUDA tensor
# 3) Torch Tensor --> Convert to CVCUDA tensor
if isinstance(frame_nhwc, torch.Tensor):
    frame_nhwc = cvcuda.as_tensor(frame_nhwc, "NHWC")
elif isinstance(frame_nhwc, np.ndarray):
    frame_nhwc = cvcuda.as_tensor(
        torch.as_tensor(frame_nhwc).to(
            device="cuda:%d" % self.device_id, non_blocking=True
        ),
        "NHWC",
    )

The remaining the pipeline code is easy to follow along with only basic operations such as resize and normalized being used.

# Resize the tensor to a different size.
# NOTE: This resize is done after the data has been converted to a NHWC Tensor format
#       That means the height and width of the frames/images are already same, unlike
#       a python list of HWC tensors.
#       This resize is only going to help it downscale to a fixed size and not
#       to help resize images with different sizes to a fixed size. If you have a folder
#       full of images with all different sizes, it would be best to run this sample with
#       batch size of 1. That way, this resize operation will be able to resize all the images.
resized = cvcuda.resize(
    frame_nhwc,
    (
        frame_nhwc.shape[0],
        out_size[1],
        out_size[0],
        frame_nhwc.shape[3],
    ),
    cvcuda.Interp.LINEAR,
)

# Convert to floating point range 0-1.
normalized = cvcuda.convertto(resized, np.float32, scale=1 / 255)

# Normalize with mean and std-dev.
normalized = cvcuda.normalize(
    normalized,
    base=self.mean_tensor,
    scale=self.stddev_tensor,
    flags=cvcuda.NormalizeFlags.SCALE_IS_STDDEV,
)

# Convert it to NCHW layout and return it.
normalized = cvcuda.reformat(normalized, "NCHW")

self.cvcuda_perf.pop_range()

# Return 3 pieces of information:
#   1. The original nhwc frame
#   2. The resized frame
#   3. The normalized frame.
return (
    frame_nhwc,
    resized,
    normalized,
)