Video Decoding using pyNvVideoCodec

The video batch decoder is responsible for reading an MP4 video as tensors. The actual decoding is done per frame using NVIDIA’s PyNvVideoCodec API. The video decoder is generic enough to be used across the sample applications. The code associated with this class can be found in the samples/common/python/nvcodec_utils.py file.

There are two classes responsible for the decoding work:

  1. VideoBatchDecoder and

  2. nvVideoDecoder

The first class acts as a wrapper on the second class which allows us to:

  1. Stay consistent with the API of other decoders used throughout CVCUDA

  2. Support batch decoding.

  3. Use accelerated ops in CVCUDA to perform the necessary color conversion from NV12 to RGB after decoding the video.

VideoBatchDecoder

Let’s get started by understanding how this class is initialized in its __init__ method. We use PyNvDemuxer to read a few properties of the video. The decoder instance and CVCUDA color conversion tensors both are allocated when needed upon the first use.

Note: Due to the nature of NV12, representing it directly as a CVCUDA tensor is a bit challenging. Be sure to read through the explanation in the comments of the code shown below to understand more.

 1self.logger = logging.getLogger(__name__)
 2self.input_path = input_path
 3self.batch_size = batch_size
 4self.device_id = device_id
 5self.cuda_ctx = cuda_ctx
 6self.cuda_stream = cuda_stream
 7self.cvcuda_perf = cvcuda_perf
 8self.total_decoded = 0
 9self.batch_idx = 0
10self.decoder = None
11self.cvcuda_RGBtensor_batch = None
12nvDemux = nvvc.PyNvDemuxer(self.input_path)
13self.fps = nvDemux.FrameRate()
14self.logger.info("Using PyNvVideoCodec decoder version: %s" % nvvc.__version__)

Once things are defined and initialized, we would start the decoding when a call to the __call__ function is made.

1def __call__(self):
2    self.cvcuda_perf.push_range("decoder.pyVideoCodec")
3
4    # docs_tag: begin_alloc_videobatchdecoder_pyvideocodec
5    # Check if we need to allocate the decoder for its first use.
6    if self.decoder is None:
7        self.decoder = nvVideoDecoder(
8            self.input_path, self.device_id, self.cuda_ctx, self.cuda_stream
9        )

Next, we call the nvdecoder instance to actually do the decoding and stack the image tensors up to form a 4D tensor.

 1# Get the NHWC YUV tensor from the decoder
 2cvcuda_YUVtensor = self.decoder.get_next_frames(self.batch_size)
 3
 4# Check if we are done decoding
 5if cvcuda_YUVtensor is None:
 6    self.cvcuda_perf.pop_range()
 7    return None
 8
 9# Check the code for the color conversion based in the pixel format
10cvcuda_code = pixel_format_to_cvcuda_code.get(self.decoder.pixelFormat)
11if cvcuda_code is None:
12    raise ValueError(f"Unsupported pixel format: {self.decoder.pixelFormat}")
13
14# Check layout to make sure it is what we expected
15if cvcuda_YUVtensor.layout != "NHWC":
16    raise ValueError("Unexpected tensor layout, NHWC expected.")
17
18# this may be different than batch size since last frames may not be a multiple of batch size
19actual_batch_size = cvcuda_YUVtensor.shape[0]
20

Once the video batch is ready, we use CVCUDA’s cvtcolor_into function to convert its data from NV12 format to RGB format. We will use pre-allocated tensors to do the color conversion to avoid allocating same tensors on every batch.

 1# Create a CVCUDA tensor for color conversion YUV->RGB
 2# Allocate only for the first time or for the last batch.
 3if not self.cvcuda_RGBtensor_batch or actual_batch_size != self.batch_size:
 4    self.cvcuda_RGBtensor_batch = cvcuda.Tensor(
 5        (actual_batch_size, self.decoder.h, self.decoder.w, 3),
 6        nvcv.Type.U8,
 7        nvcv.TensorLayout.NHWC,
 8    )
 9
10# Convert from YUV to RGB. Conversion code is based on the pixel format.
11cvcuda.cvtcolor_into(self.cvcuda_RGBtensor_batch, cvcuda_YUVtensor, cvcuda_code)
12
13self.total_decoded += actual_batch_size

The final step is to pack all of this data into a special CVCUDA samples object called as Batch. The Batch object helps us keep track of the data associated with the batch, the index of the batch and optionally any filename information one wants to attach (i.e. which files did the data come from).

nvVideoDecoder

This is a class offering hardware accelerated video decoding functionality using pyNvVideoCodec. It reads an MP4 video file, decodes it and returns a CUDA accessible Tensor per frame. Please consult the documentation of the pyNvVideoCodec to learn more about its capabilities and APIs.

For use in CVCUDA, this class defines the following functions which decode data to a tensor in a given CUDA stream.

 1class nvVideoDecoder:
 2    def __init__(self, enc_file, device_id, cuda_ctx, stream):
 3        """
 4        Create instance of HW-accelerated video decoder.
 5        :param enc_file: Full path to the MP4 file that needs to be decoded.
 6        :param device_id: id of video card which will be used for decoding & processing.
 7        :param cuda_ctx: A cuda context object.
 8        """
 9        self.device_id = device_id
10        self.cuda_ctx = cuda_ctx
11        self.input_path = enc_file
12        self.stream = stream
13        # Demuxer is instantiated only to collect required information about
14        # certain video file properties.
15        self.nvDemux = nvvc.PyNvDemuxer(self.input_path)
16        self.nvDec = nvvc.CreateDecoder(
17            gpuid=0,
18            codec=self.nvDemux.GetNvCodecId(),
19            cudacontext=self.cuda_ctx.handle,
20            cudastream=self.stream.handle,
21            enableasyncallocations=False,
22        )
23
24        self.w, self.h = self.nvDemux.Width(), self.nvDemux.Height()
25        self.pixelFormat = self.nvDec.GetPixelFormat()
26        # In case sample aspect ratio isn't 1:1 we will re-scale the decoded
27        # frame to maintain uniform 1:1 ratio across the pipeline.
28        sar = 8.0 / 9.0
29        self.fixed_h = self.h
30        self.fixed_w = int(self.w * sar)
31
32    # frame iterator
33    def generate_decoded_frames(self):
34        for packet in self.nvDemux:
35            for decodedFrame in self.nvDec.Decode(packet):
36                nvcvTensor = nvcv.as_tensor(
37                    nvcv.as_image(decodedFrame.nvcv_image(), nvcv.Format.U8)
38                )
39                if nvcvTensor.layout == "NCHW":
40                    # This will re-format the NCHW tensor to a NHWC tensor which will create
41                    # a copy in the CUDA device decoded frame will go out of scope and the
42                    # backing memory will be available by the decoder.
43                    yield cvcuda.reformat(nvcvTensor, "NHWC")
44                else:
45                    raise ValueError("Unexpected tensor layout, NCHW expected.")
46
47    def get_next_frames(self, N):
48        decoded_frames = list(itertools.islice(self.generate_decoded_frames(), N))
49        if len(decoded_frames) == 0:
50            return None
51        elif len(decoded_frames) == 1:  # this case we dont need stack the tensor
52            return decoded_frames[0]
53        else:
54            # convert from list of tensors to a single tensor (NHWC)
55            tensorNHWC = cvcuda.stack(decoded_frames)
56            return tensorNHWC
57
58