Object Detection Pre-processing Pipeline using CVCUDA

CVCUDA helps accelerate the pre-processing pipeline of the object detection sample tremendously. Easy interoperability with PyTorch tensors also makes it easy to integrate with PyTorch and other data loaders that supports the tensor layout.

The exact pre-processing operations are:

Tensor Conversion -> Resize -> Convert Datatype(Float) -> Normalize (to 0-1 range) -> Convert to NCHW

The Tensor conversion operation helps in converting non CVCUDA tensors/data to CVCUDA tensors.

 1# Need to check what type of input we have received:
 2# 1) CVCUDA tensor --> Nothing needs to be done.
 3# 2) Numpy Array --> Convert to torch tensor first and then CVCUDA tensor
 4# 3) Torch Tensor --> Convert to CVCUDA tensor
 5if isinstance(frame_nhwc, torch.Tensor):
 6    frame_nhwc = cvcuda.as_tensor(frame_nhwc, "NHWC")
 7    has_copy = False
 8elif isinstance(frame_nhwc, np.ndarray):
 9    has_copy = True  # noqa: F841
10    frame_nhwc = cvcuda.as_tensor(
11        torch.as_tensor(frame_nhwc).to(
12            device="cuda:%d" % self.device_id, non_blocking=True
13        ),
14        "NHWC",
15    )

The remaining the pipeline code is easy to follow along with only basic operations such as resize and normalized being used.

 1# Resize the tensor to a different size.
 2# NOTE: This resize is done after the data has been converted to a NHWC Tensor format
 3#       That means the height and width of the frames/images are already same, unlike
 4#       a python list of HWC tensors.
 5#       This resize is only going to help it downscale to a fixed size and not
 6#       to help resize images with different sizes to a fixed size. If you have a folder
 7#       full of images with all different sizes, it would be best to run this sample with
 8#       batch size of 1. That way, this resize operation will be able to resize all the images.
 9resized = cvcuda.resize(
10    frame_nhwc,
11    (
12        frame_nhwc.shape[0],
13        out_size[1],
14        out_size[0],
15        frame_nhwc.shape[3],
16    ),
17    cvcuda.Interp.LINEAR,
18)
19
20# Convert to floating point range 0-1.
21normalized = cvcuda.convertto(resized, np.float32, scale=1 / 255)
22
23# Convert it to NCHW layout and return it.
24normalized = cvcuda.reformat(normalized, "NCHW")
25
26self.cvcuda_perf.pop_range()
27
28# Return 3 pieces of information:
29#   1. The original nhwc frame
30#   2. The resized frame
31#   3. The normalized frame.
32return (
33    frame_nhwc,
34    resized,
35    normalized,
36)