Object Detection Post-processing Pipeline using CVCUDA

CVCUDA helps accelerate the post-processing pipeline of the object detection sample tremendously. Easy interoperability with PyTorch tensors also makes it easy to integrate with PyTorch and other data loaders that supports the tensor layout.

The exact post-processing operations are:

Bounding box and score detections from the network -> Interpolate bounding boxes to the image size -> Filter the bounding boxes using NMS -> Render the bounding boxes -> Blur the ROI's

The postprocessing parameters are initialized based on the model architecture

 1self.logger = logging.getLogger(__name__)
 2self.confidence_threshold = confidence_threshold
 3self.iou_threshold = iou_threshold
 4self.device_id = device_id
 5self.output_layout = output_layout
 6self.gpu_output = gpu_output
 7self.batch_size = batch_size
 8self.cvcuda_perf = cvcuda_perf
 9
10# The Peoplenet model uses Gridbox system which divides an input image into a grid and
11# predicts four normalized bounding-box parameters for each grid.
12# The number of grid boxes is determined by the model architecture.
13# For peoplenet model, the 960x544 input image is divided into 60x34 grids.
14self.stride = 16
15self.bbox_norm = 35
16self.offset = 0.5
17self.network_width = 960
18self.network_height = 544
19self.num_rows = int(self.network_height / self.stride)
20self.num_cols = int(self.network_width / self.stride)
21self.num_classes = 3  # Number of classes the model is trained on
22self.bboxutil = BoundingBoxUtilsCvcuda(
23    self.cvcuda_perf
24)  # Initializes the Bounding Box utils
25# Center of grids
26self.center_x = None
27self.center_y = None
28self.x_values = None
29self.y_values = None
30
31self.logger.info("Using CVCUDA as post-processor.")
32

The output bounding boxes are rendered using the cuOSD based bounding box operators. We will use BndBox and BoxBlur operators to render the bounding boxes and blur the regions inside the bounding boxes. The bounding box display settings and blur parameters are initialized in the BoundingBoxUtilsCvcuda class

1# Settings for the bounding boxes to be rendered
2self.border_color = (0, 255, 0, 255)
3self.fill_color = (0, 0, 255, 0)
4self.thickness = 5
5self.kernel_size = 7  # kernel size for the blur ROI
6self.cvcuda_perf = cvcuda_perf

The model divides an input image into a grid and predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class. These values then need to be interpolated to the original resolution. The interpolated bounding boxes are then filtered using a clustering algorithms like NMS

We will then invoke the BndBox and BoxBlur operators as follows

 1# We will use CV-CUDA's box_blur and bndbox operators to blur out
 2# the contents of the bounding boxes and draw them with color on the
 3# input frame. For that to work, we must first filter out the boxes
 4# which are all zeros. After doing that, we will create 3 lists:
 5#   1) A list maintaining the count of valid bounding boxes per image in the batch.
 6#   2) A list of all bounding box objects.
 7#   3) A list of all bounding boxes stored as blur box objects.
 8#
 9# Once this is done, we can convert these lists to two CV-CUDA
10# structures that can be given to the blur and bndbox operators:
11#   1) cvcuda.Elements : To store the bounding boxes for the batch
12#   2) cvcuda.BlurBoxesI : To store the bounding boxes as blur boxes for the batch.
13#
14self.cvcuda_perf.push_range("forloop")
15bounding_boxes_list = []
16blur_boxes_list = []
17
18# Create an array of bounding boxes with render settings.
19for current_boxes, current_masks in zip(batch_bboxes_pyt, nms_masks_pyt):
20    filtered_boxes = current_boxes[current_masks]
21    # Save the count of non-zero bounding boxes of this image.
22    bounding_boxes = []
23    blur_boxes = []
24
25    for box in filtered_boxes:
26        bounding_boxes.append(
27            cvcuda.BndBoxI(
28                box=tuple(box),
29                thickness=self.thickness,
30                borderColor=self.border_color,
31                fillColor=self.fill_color,
32            )
33        )
34        blur_boxes.append(
35            cvcuda.BlurBoxI(box=tuple(box), kernelSize=self.kernel_size)
36        )
37    bounding_boxes_list.append(bounding_boxes)
38    blur_boxes_list.append(blur_boxes)
39
40batch_bounding_boxes = cvcuda.Elements(elements=bounding_boxes_list)
41batch_blur_boxes = cvcuda.BlurBoxesI(boxes=blur_boxes_list)
42self.cvcuda_perf.pop_range()  # for loop
43
44# Apply blur first.
45self.cvcuda_perf.push_range("boxblur_into")
46cvcuda.boxblur_into(frame_nhwc, frame_nhwc, batch_blur_boxes)
47self.cvcuda_perf.pop_range()
48
49# Render the bounding boxes.
50self.cvcuda_perf.push_range("osd_into")
51cvcuda.osd_into(frame_nhwc, frame_nhwc, batch_bounding_boxes)
52self.cvcuda_perf.pop_range()
53

The output buffer is converted to the required layout for the encoder and returned

 1self.cvcuda_perf.push_range("bboxutil")
 2frame_nhwc = self.bboxutil(batch_bboxes_pyt, nms_masks_pyt, frame_nhwc)
 3self.cvcuda_perf.pop_range()
 4if self.output_layout == "NCHW":
 5    render_output = cvcuda.reformat(frame_nhwc, "NCHW")
 6else:
 7    assert self.output_layout == "NHWC"
 8    render_output = frame_nhwc
 9
10if self.gpu_output:
11    render_output = torch.as_tensor(
12        render_output.cuda(), device="cuda:%d" % self.device_id
13    )
14else:
15    render_output = torch.as_tensor(render_output.cuda()).cpu().numpy()
16
17self.cvcuda_perf.pop_range()  # postprocess
18
19# Return the original nhwc frame with bboxes rendered and ROI's blurred
20return render_output