Semantic Segmentation Post-processing Pipeline using CVCUDA

CVCUDA helps accelerate the post-processing pipeline of the semantic segmentation sample tremendously. Easy interoperability with PyTorch tensors also makes it easy to integrate with PyTorch and other data loaders that supports the tensor layout.

The exact post-processing operations are:

Create Binary mask -> Upscale the mask -> Blur the input frames -> Joint Bilateral filter to smooth the mask -> Overlay the masks onto the original frame

Since the network outputs the class probabilities (0-1) for all the classes supported by the network, we must first take out the class of interest from it and upscale its values to bring it in the uint8 (0-255) range. These operations will be done using PyTorch math and the resulting tensor will be converted to CVCUDA.

 1# We assume that everything other than probabilities will be a CVCUDA tensor.
 2# probabilities has to be a torch tensor because we need to perform a few
 3# math operations on it. Even if the TensorRT backend was used to run inference
 4# it would have generated output as Torch.tensor
 5
 6actual_batch_size = resized_tensor.shape[0]
 7
 8class_probs = probabilities[:actual_batch_size, class_index, :, :]
 9class_probs = torch.unsqueeze(class_probs, dim=-1)
10class_probs *= 255
11class_probs = class_probs.type(torch.uint8)
12
13cvcuda_class_masks = cvcuda.as_tensor(class_probs.cuda(), "NHWC")

The remaining the pipeline code is easy to follow along.

 1# Upscale the resulting masks to the full resolution of the input image.
 2self.cvcuda_perf.push_range("resize")
 3cvcuda_class_masks_upscaled = cvcuda.resize(
 4    cvcuda_class_masks,
 5    (frame_nhwc.shape[0], frame_nhwc.shape[1], frame_nhwc.shape[2], 1),
 6    cvcuda.Interp.NEAREST,
 7)
 8self.cvcuda_perf.pop_range()
 9
10# Blur the down-scaled input images and upscale them back to their original resolution.
11# A part of this will be used to create a background blur effect later when the
12# overlay happens.
13# Note: We apply blur on the low-res version of the images to save computation time.
14self.cvcuda_perf.push_range("gaussian")
15cvcuda_blurred_input_imgs = cvcuda.gaussian(
16    resized_tensor, kernel_size=(15, 15), sigma=(5, 5)
17)
18self.cvcuda_perf.pop_range()
19
20self.cvcuda_perf.push_range("resize")
21cvcuda_blurred_input_imgs = cvcuda.resize(
22    cvcuda_blurred_input_imgs,
23    (frame_nhwc.shape[0], frame_nhwc.shape[1], frame_nhwc.shape[2], 3),
24    cvcuda.Interp.LINEAR,
25)
26self.cvcuda_perf.pop_range()
27
28# Next we apply joint bilateral filter on the up-scaled masks with the gray version of the
29# input image as guidance to smooth out the edges of the masks. This is needed because
30# the mask was generated in lower resolution and then up-scaled. Joint bilateral will help
31# in smoothing out the edges, resulting in a nice smooth mask.
32cvcuda_frame_nhwc = cvcuda.as_tensor(frame_nhwc.cuda(), "NHWC")
33
34self.cvcuda_perf.push_range("cvtcolor")
35cvcuda_image_tensor_nhwc_gray = cvcuda.cvtcolor(
36    cvcuda_frame_nhwc, cvcuda.ColorConversion.RGB2GRAY
37)
38self.cvcuda_perf.pop_range()
39
40self.cvcuda_perf.push_range("joint_bilateral_filter")
41cvcuda_jb_masks = cvcuda.joint_bilateral_filter(
42    cvcuda_class_masks_upscaled,
43    cvcuda_image_tensor_nhwc_gray,
44    diameter=5,
45    sigma_color=50,
46    sigma_space=1,
47)
48self.cvcuda_perf.pop_range()
49
50# Create an overlay image. We do this by selectively blurring out pixels
51# in the input image where the class mask prediction was absent (i.e. False)
52# We already have all the things required for this: The input images,
53# the blurred version of the input images and the upscale version
54# of the mask.
55self.cvcuda_perf.push_range("composite")
56cvcuda_composite_imgs_nhwc = cvcuda.composite(
57    cvcuda_frame_nhwc,
58    cvcuda_blurred_input_imgs,
59    cvcuda_jb_masks,
60    3,
61)
62self.cvcuda_perf.pop_range()
63
64# Based on the output requirements, we return appropriate tensors.
65if self.output_layout == "NCHW":
66    cvcuda_composite_imgs_out = cvcuda.reformat(
67        cvcuda_composite_imgs_nhwc, "NCHW"
68    )
69else:
70    assert self.output_layout == "NHWC"
71    cvcuda_composite_imgs_out = cvcuda_composite_imgs_nhwc
72
73if self.gpu_output:
74    if self.torch_output:
75        cvcuda_composite_imgs_out = torch.as_tensor(
76            cvcuda_composite_imgs_out.cuda(), device="cuda:%d" % self.device_id
77        )
78else:
79    cvcuda_composite_imgs_out = (
80        torch.as_tensor(cvcuda_composite_imgs_out.cuda()).cpu().numpy()
81    )
82
83self.cvcuda_perf.pop_range()  # postprocess
84