Video Encoding using VpyNvVideoCodecPF

The video batch encoder is responsible for writing tensors as an MP4 video. The actual encoding is done in batches using NVIDIA’s pyNvVideoCodec. The video encoder is generic enough to be used across the sample applications. The code associated with this class can be found in the samples/common/python/nvcodec_utils.py file.

There are two classes responsible for the encoding work:

  1. VideoBatchEncoder and

  2. nvVideoEncoder

The first class acts as a wrapper on the second class which allows us to:

  1. Stay consistent with the API of other encoders used throughout CVCUDA

  2. Support batch encoding.

  3. Use accelerated ops in CVCUDA to perform the necessary color conversion from RGB to NV12 before encoding the video.

VideoBatchEncoderVPF

To get started, here is how the class is initialized in its __init__ method. The encoder instance and CVCUDA color conversion tensors both are allocated when needed upon the first use.

Note: Due to the nature of NV12, representing it directly as a CVCUDA tensor is a bit challenging. Be sure to read through the explanation in the comments of the code shown below to understand more.

 1class VideoBatchEncoder:
 2    def __init__(
 3        self,
 4        output_path,
 5        fps,
 6        device_id,
 7        cuda_ctx,
 8        cuda_stream,
 9        cvcuda_perf,
10    ):
11        self.logger = logging.getLogger(__name__)
12        self.output_path = output_path
13        self.fps = fps
14        self.device_id = device_id
15        self.cuda_ctx = cuda_ctx
16        self.cuda_stream = cuda_stream
17        self.cvcuda_perf = cvcuda_perf
18
19        self.encoder = None
20        self.cvcuda_HWCtensor_batch = None
21        self.cvcuda_YUVtensor_batch = None
22        self.input_layout = "NCHW"
23        self.gpu_input = True
24        self.output_file_name = None
25
26        self.logger.info("Using PyNvVideoCodec encoder version: %s" % nvvc.__version__)

Once things are defined and initialized, we would start the decoding when a call to the __call__ function is made. We need to first allocate the encoder instance if it wasn’t done so already.

Next, we use CVCUDA’s cvtcolor_into function to convert the batch data from RGB format to NV12 format. We allocate tensors once to do the color conversion and avoid allocating same tensors on every batch.

Once the tensors are allocated, we use CVCUDA ops to perform the color conversion.

 1
 2# Create 2 CVCUDA tensors: reformat NCHW->NHWC and color conversion RGB->YUV
 3current_batch_size = batch.data.shape[0]
 4height, width = batch.data.shape[2], batch.data.shape[3]
 5
 6# Allocate only for the first time or for the last batch.
 7if (
 8    not self.cvcuda_HWCtensor_batch
 9    or current_batch_size != self.cvcuda_HWCtensor_batch.shape[0]
10):
11    self.cvcuda_HWCtensor_batch = cvcuda.Tensor(
12        (current_batch_size, height, width, 3),
13        nvcv.Type.U8,
14        nvcv.TensorLayout.NHWC,
15    )
16    self.cvcuda_YUVtensor_batch = cvcuda.Tensor(
17        (current_batch_size, (height // 2) * 3, width, 1),
18        nvcv.Type.U8,
19        nvcv.TensorLayout.NHWC,
20    )
21
22# Convert RGB to NV12, in batch, before sending it over to pyVideoCodec.
23# Convert to CVCUDA tensor
24cvcuda_tensor = cvcuda.as_tensor(batch.data, nvcv.TensorLayout.NCHW)
25
26# Reformat NCHW to NHWC
27cvcuda.reformat_into(self.cvcuda_HWCtensor_batch, cvcuda_tensor)
28
29# Color convert from RGB to YUV_NV12
30cvcuda.cvtcolor_into(
31    self.cvcuda_YUVtensor_batch,
32    self.cvcuda_HWCtensor_batch,
33    cvcuda.ColorConversion.RGB2YUV_NV12,
34)
35
36# Convert back to torch tensor we are NV12
37tensor = torch.as_tensor(self.cvcuda_YUVtensor_batch.cuda(), device="cuda")

Finally, we call the nvVideoEncoder instance to actually do the encoding.

nvVideoEncoder

This is a class offering hardware accelerated video encoding functionality using pyNvVideoCodec. It encodes tensors and writes as an MP4 file. Please consult the documentation of the pyNvVideoCodec to learn more about its capabilities and APIs.

For use in CVCUDA, this class defines the following encode_from_tensor functions which encode a Torch tensor.

  1class nvVideoEncoder:
  2    def __init__(
  3        self,
  4        device_id,
  5        width,
  6        height,
  7        fps,
  8        enc_file,
  9        cuda_ctx,
 10        cuda_stream,
 11        format,
 12    ):
 13        """
 14        Create instance of HW-accelerated video encoder.
 15        :param device_id: id of video card which will be used for encoding & processing.
 16        :param width: encoded frame width.
 17        :param height: encoded frame height.
 18        :param fps: The FPS at which the encoding should happen.
 19        :param enc_file: path to encoded video file.
 20        :param cuda_ctx: A cuda context object
 21        :param format: The format of the encoded video file.
 22                (e.g. "NV12", "YUV444" see NvPyVideoEncoder docs for more info)
 23        """
 24        self.device_id = device_id
 25        self.fps = round(Fraction(fps), 6)
 26        self.enc_file = enc_file
 27        self.cuda_ctx = cuda_ctx
 28        self.cuda_stream = cuda_stream
 29
 30        self.pts_time = 0
 31        self.delta_t = 1  # Increment the packets' timestamp by this much.
 32        self.encoded_frame = np.ndarray(shape=(0), dtype=np.uint8)
 33        self.container = av.open(enc_file, "w")
 34        self.avstream = self.container.add_stream("h264", rate=self.fps)
 35
 36        aligned_value = 0
 37        if width % 16 != 0:
 38            aligned_value = 16 - (width % 16)
 39        aligned_width = width + aligned_value
 40        width = aligned_width
 41
 42        self.avstream.width = width
 43        self.avstream.height = height
 44
 45        self.avstream.time_base = 1 / Fraction(self.fps)
 46        self.surface = None
 47        self.surf_plane = None
 48
 49        self.tmpTensor = None
 50
 51        self.nvEnc = nvvc.CreateEncoder(
 52            self.avstream.width,
 53            self.avstream.height,
 54            format,
 55            codec="h264",
 56            preset="P4",
 57            cudastream=cuda_stream.handle,
 58        )
 59
 60    def width(self):
 61        """
 62        Gets the actual video frame width from the encoder.
 63        """
 64        return self.nvEnc.Width()
 65
 66    def height(self):
 67        """
 68        Gets the actual video frame height from the encoder.
 69        """
 70        return self.nvEnc.Height()
 71
 72    # docs_tag: begin_imp_nvvideoencoder
 73
 74    def encode_from_tensor(self, tensor):
 75
 76        # Create a CUDA array interface object wit 2 planes one for luma and CrCb for NV12
 77        objCAI = []
 78        # Need to compute the address of the Y plane and the interleaved chroma plane
 79        data = (
 80            tensor.storage().data_ptr()
 81            + tensor.storage_offset() * tensor.element_size()
 82        )
 83        objCAI.append(
 84            AppCAI(
 85                (self.avstream.height, self.avstream.width, 1),
 86                (self.avstream.width, 1, 1),
 87                "|u1",
 88                data,
 89            )
 90        )
 91        chromaAlloc = int(data) + self.avstream.width * self.avstream.height
 92        objCAI.append(
 93            AppCAI(
 94                (int(self.avstream.height / 2), int(self.avstream.width / 2), 2),
 95                (self.avstream.width, 2, 1),
 96                "|u1",
 97                chromaAlloc,
 98            )
 99        )
100        # Encode the frame takes CUDA array interface object as input
101        self.encoded_frame = self.nvEnc.Encode(objCAI)
102        self.write_frame(
103            self.encoded_frame,
104            self.pts_time,
105            self.fps,
106            self.avstream,
107            self.container,
108        )
109        self.pts_time += self.delta_t
110

Finally, we use the av library to write packets to an MP4 container. We must properly flush (i.e. write any pending packets) at the end.

1def write_frame(self, encoded_frame, pts_time, fps, stream, container):
2    encoded_bytes = bytearray(encoded_frame)
3    pkt = av.packet.Packet(encoded_bytes)
4    pkt.pts = pts_time
5    pkt.dts = pts_time
6    pkt.stream = stream
7    pkt.time_base = 1 / Fraction(fps)
8    container.mux(pkt)
9