Image Decoding using nvImageCodec

The image batch decoder is responsible for parsing the input expression, reading and decoding image data. The actual decoding is done in batches using the library nvImageCodec. Although used in the semantic segmentation sample, this image decoder is generic enough to be used in other applications. The code associated with this class can be found in the samples/common/python/nvcodec_utils.py file.

Before the data can be read or decoded, we must parse it (i.e figure out what kind of data it is). Depending on the input_path’s value, we either read one image and create a dummy list with the data from the same image to simulate a batch or read a bunch of images from a directory.

self.logger = logging.getLogger(__name__)
self.batch_size = batch_size
self.input_path = input_path
self.device_id = device_id
self.total_decoded = 0
self.batch_idx = 0
self.cuda_ctx = cuda_ctx
self.cuda_stream = cuda_stream
self.cvcuda_perf = cvcuda_perf
self.decoder = nvimgcodec.Decoder(device_id=device_id)

# docs_tag: begin_parse_imagebatchdecoder_nvimagecodec
if os.path.isfile(self.input_path):
    if os.path.splitext(self.input_path)[1] == ".jpg":
        # Read the input image file.
        self.file_names = [self.input_path] * self.batch_size
        # We will use the nvImageCodec based decoder on the GPU in case of images.
        # This will be allocated once during the first run or whenever a batch
        # size change happens.
    else:
        raise ValueError("Unable to read file %s as image." % self.input_path)

elif os.path.isdir(self.input_path):
    # It is a directory. Grab file names of all JPG images.
    self.file_names = glob.glob(os.path.join(self.input_path, "*.jpg"))
    self.logger.info("Found a total of %d JPEG images." % len(self.file_names))

else:
    raise ValueError(
        "Unknown expression given as input_path: %s." % self.input_path
    )

Once we have a list of image file names that we can read, we will split them into batches based on the batch size.

self.file_name_batches = [
    self.file_names[i : i + self.batch_size]  # noqa: E203
    for i in range(0, len(self.file_names), self.batch_size)
]
# docs_tag: end_batch_imagebatchdecoder_nvimagecodec

self.max_image_size = 1024 * 1024 * 3  # Maximum possible image size.

self.logger.info(
    "Using nvImageCodec decoder version: %s" % nvimgcodec.__version__
)

That is all we need to do for the initialization. Now as soon as a call to decoder is issued, we would start reading and decoding the data. This begins with reading the data bytes in batches and returning None if there is no data left to be read.

Once the data has been read, we use nvImageCodec to decode it into a list of image tensors. The nvImageCodec instance is allocated either on its first use or whenever there is a change in the batch size (i.e. last batch). Since what we get at this point is a list of images (i.e a python list of 3D tensors), we would need to convert them to a 4D tensor by stacking them up on the first dimension.

tensor_list = []
image_list = self.decoder.decode(data_batch, cuda_stream=self.cuda_stream)

# Convert the decoded images to nvcv tensors in a list.
for i in range(len(image_list)):
    tensor_list.append(cvcuda.as_tensor(image_list[i], "HWC"))

# Stack the list of tensors to a single NHWC tensor.
cvcuda_decoded_tensor = cvcuda.stack(tensor_list)
self.total_decoded += len(tensor_list)

The final step is to pack all of this data into a special CVCUDA samples object called as Batch. The Batch object helps us keep track of the data associated with the batch, the index of the batch and optionally any filename information one wants to attach (i.e. which files the data came from).

batch = Batch(
    batch_idx=self.batch_idx,
    data=cvcuda_decoded_tensor,
    fileinfo=file_name_batch,
)
self.batch_idx += 1