Object Detection Post-processing Pipeline using CVCUDA

CVCUDA helps accelerate the post-processing pipeline of the object detection sample tremendously. Easy interoperability with PyTorch tensors also makes it easy to integrate with PyTorch and other data loaders that supports the tensor layout.

The exact post-processing operations are:

Bounding box and score detections from the network -> Interpolate bounding boxes to the image size -> Filter the bounding boxes using NMS -> Render the bounding boxes -> Blur the ROI's

The postprocessing parameters are initialized based on the model architecture

self.logger = logging.getLogger(__name__)
self.confidence_threshold = confidence_threshold
self.iou_threshold = iou_threshold
self.device_id = device_id
self.output_layout = output_layout
self.gpu_output = gpu_output
self.batch_size = batch_size
self.cvcuda_perf = cvcuda_perf

# The Peoplenet model uses Gridbox system which divides an input image into a grid and
# predicts four normalized bounding-box parameters for each grid.
# The number of grid boxes is determined by the model architecture.
# For peoplenet model, the 960x544 input image is divided into 60x34 grids.
self.stride = 16
self.bbox_norm = 35
self.offset = 0.5
self.network_width = 960
self.network_height = 544
self.num_rows = int(self.network_height / self.stride)
self.num_cols = int(self.network_width / self.stride)
self.num_classes = 3  # Number of classes the model is trained on
self.bboxutil = BoundingBoxUtilsCvcuda(
    self.cvcuda_perf
)  # Initializes the Bounding Box utils
# Center of grids
self.center_x = None
self.center_y = None
self.x_values = None
self.y_values = None

self.logger.info("Using CVCUDA as post-processor.")

The output bounding boxes are rendered using the cuOSD based bounding box operators. We will use BndBox and BoxBlur operators to render the bounding boxes and blur the regions inside the bounding boxes. The bounding box display settings and blur parameters are initialized in the BoundingBoxUtilsCvcuda class

# Settings for the bounding boxes to be rendered
self.border_color = (0, 255, 0, 255)
self.fill_color = (0, 0, 255, 0)
self.thickness = 5
self.kernel_size = 7  # kernel size for the blur ROI
self.cvcuda_perf = cvcuda_perf

The model divides an input image into a grid and predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class. These values then need to be interpolated to the original resolution. The interpolated bounding boxes are then filtered using a clustering algorithms like NMS

We will then invoke the BndBox and BoxBlur operators as follows

# We will use CV-CUDA's box_blur and bndbox operators to blur out
# the contents of the bounding boxes and draw them with color on the
# input frame. For that to work, we must first filter out the boxes
# which are all zeros. After doing that, we will create 3 lists:
#   1) A list maintaining the count of valid bounding boxes per image in the batch.
#   2) A list of all bounding box objects.
#   3) A list of all bounding boxes stored as blur box objects.
#
# Once this is done, we can convert these lists to two CV-CUDA
# structures that can be given to the blur and bndbox operators:
#   1) cvcuda.Elements : To store the bounding boxes for the batch
#   2) cvcuda.BlurBoxesI : To store the bounding boxes as blur boxes for the batch.
#
self.cvcuda_perf.push_range("forloop")
bounding_boxes_list = []
blur_boxes_list = []

# Create an array of bounding boxes with render settings.
for current_boxes, current_masks in zip(batch_bboxes_pyt, nms_masks_pyt):
    filtered_boxes = current_boxes[current_masks]
    # Save the count of non-zero bounding boxes of this image.
    bounding_boxes = []
    blur_boxes = []

    for box in filtered_boxes:
        bounding_boxes.append(
            cvcuda.BndBoxI(
                box=tuple(box),
                thickness=self.thickness,
                borderColor=self.border_color,
                fillColor=self.fill_color,
            )
        )
        blur_boxes.append(
            cvcuda.BlurBoxI(box=tuple(box), kernelSize=self.kernel_size)
        )
    bounding_boxes_list.append(bounding_boxes)
    blur_boxes_list.append(blur_boxes)

batch_bounding_boxes = cvcuda.Elements(elements=bounding_boxes_list)
batch_blur_boxes = cvcuda.BlurBoxesI(boxes=blur_boxes_list)
self.cvcuda_perf.pop_range()  # for loop

# Apply blur first.
self.cvcuda_perf.push_range("boxblur_into")
cvcuda.boxblur_into(frame_nhwc, frame_nhwc, batch_blur_boxes)
self.cvcuda_perf.pop_range()

# Render the bounding boxes.
self.cvcuda_perf.push_range("osd_into")
cvcuda.osd_into(frame_nhwc, frame_nhwc, batch_bounding_boxes)
self.cvcuda_perf.pop_range()

The output buffer is converted to the required layout for the encoder and returned

self.cvcuda_perf.push_range("bboxutil")
frame_nhwc = self.bboxutil(batch_bboxes_pyt, nms_masks_pyt, frame_nhwc)
self.cvcuda_perf.pop_range()
if self.output_layout == "NCHW":
    render_output = cvcuda.reformat(frame_nhwc, "NCHW")
else:
    assert self.output_layout == "NHWC"
    render_output = frame_nhwc

if self.gpu_output:
    render_output = torch.as_tensor(
        render_output.cuda(), device="cuda:%d" % self.device_id
    )
else:
    render_output = torch.as_tensor(render_output.cuda()).cpu().numpy()

self.cvcuda_perf.pop_range()  # postprocess

# Return the original nhwc frame with bboxes rendered and ROI's blurred
return render_output