Video Encoding using VpyNvVideoCodecPF

The video batch encoder is responsible for writing tensors as an MP4 video. The actual encoding is done in batches using NVIDIA’s pyNvVideoCodec. The video encoder is generic enough to be used across the sample applications. The code associated with this class can be found in the samples/common/python/nvcodec_utils.py file.

There are two classes responsible for the encoding work:

VideoBatchEncoder and
nvVideoEncoder

The first class acts as a wrapper on the second class which allows us to:

Stay consistent with the API of other encoders used throughout CVCUDA
Support batch encoding.
Use accelerated ops in CVCUDA to perform the necessary color conversion from RGB to NV12 before encoding the video.

VideoBatchEncoderVPF

To get started, here is how the class is initialized in its __init__ method. The encoder instance and CVCUDA color conversion tensors both are allocated when needed upon the first use.

Note: Due to the nature of NV12, representing it directly as a CVCUDA tensor is a bit challenging. Be sure to read through the explanation in the comments of the code shown below to understand more.

class VideoBatchEncoder:
    def __init__(
        self,
        output_path,
        fps,
        device_id,
        cuda_ctx,
        cuda_stream,
        cvcuda_perf,
    ):
        self.logger = logging.getLogger(__name__)
        self.output_path = output_path
        self.fps = fps
        self.device_id = device_id
        self.cuda_ctx = cuda_ctx
        self.cuda_stream = cuda_stream
        self.cvcuda_perf = cvcuda_perf

        self.encoder = None
        self.cvcuda_HWCtensor_batch = None
        self.cvcuda_YUVtensor_batch = None
        self.input_layout = "NCHW"
        self.gpu_input = True
        self.output_file_name = None

        self.logger.info("Using PyNvVideoCodec encoder version: %s" % nvvc.__version__)

Once things are defined and initialized, we would start the decoding when a call to the __call__ function is made. We need to first allocate the encoder instance if it wasn’t done so already.

Next, we use CVCUDA’s cvtcolor_into function to convert the batch data from RGB format to NV12 format. We allocate tensors once to do the color conversion and avoid allocating same tensors on every batch.

Once the tensors are allocated, we use CVCUDA ops to perform the color conversion.

# Create 2 CVCUDA tensors: reformat NCHW->NHWC and color conversion RGB->YUV
current_batch_size = batch.data.shape[0]
height, width = batch.data.shape[2], batch.data.shape[3]

# Allocate only for the first time or for the last batch.
if (
    not self.cvcuda_HWCtensor_batch
    or current_batch_size != self.cvcuda_HWCtensor_batch.shape[0]
):
    self.cvcuda_HWCtensor_batch = cvcuda.Tensor(
        (current_batch_size, height, width, 3),
        nvcv.Type.U8,
        nvcv.TensorLayout.NHWC,
    )
    self.cvcuda_YUVtensor_batch = cvcuda.Tensor(
        (current_batch_size, (height // 2) * 3, width, 1),
        nvcv.Type.U8,
        nvcv.TensorLayout.NHWC,
    )

# Convert RGB to NV12, in batch, before sending it over to pyVideoCodec.
# Convert to CVCUDA tensor
cvcuda_tensor = cvcuda.as_tensor(batch.data, nvcv.TensorLayout.NCHW)

# Reformat NCHW to NHWC
cvcuda.reformat_into(self.cvcuda_HWCtensor_batch, cvcuda_tensor)

# Color convert from RGB to YUV_NV12
cvcuda.cvtcolor_into(
    self.cvcuda_YUVtensor_batch,
    self.cvcuda_HWCtensor_batch,
    cvcuda.ColorConversion.RGB2YUV_NV12,
)

# Convert back to torch tensor we are NV12
tensor = torch.as_tensor(self.cvcuda_YUVtensor_batch.cuda(), device="cuda")

Finally, we call the nvVideoEncoder instance to actually do the encoding.

nvVideoEncoder

This is a class offering hardware accelerated video encoding functionality using pyNvVideoCodec. It encodes tensors and writes as an MP4 file. Please consult the documentation of the pyNvVideoCodec to learn more about its capabilities and APIs.

For use in CVCUDA, this class defines the following encode_from_tensor functions which encode a Torch tensor.

class nvVideoEncoder:
    def __init__(
        self,
        device_id,
        width,
        height,
        fps,
        enc_file,
        cuda_ctx,
        cuda_stream,
        format,
    ):
        """
        Create instance of HW-accelerated video encoder.
        :param device_id: id of video card which will be used for encoding & processing.
        :param width: encoded frame width.
        :param height: encoded frame height.
        :param fps: The FPS at which the encoding should happen.
        :param enc_file: path to encoded video file.
        :param cuda_ctx: A cuda context object
        :param format: The format of the encoded video file.
                (e.g. "NV12", "YUV444" see NvPyVideoEncoder docs for more info)
        """
        self.device_id = device_id
        self.fps = round(Fraction(fps), 6)
        self.enc_file = enc_file
        self.cuda_ctx = cuda_ctx
        self.cuda_stream = cuda_stream

        self.pts_time = 0
        self.delta_t = 1  # Increment the packets' timestamp by this much.
        self.encoded_frame = np.ndarray(shape=(0), dtype=np.uint8)
        self.container = av.open(enc_file, "w")
        self.avstream = self.container.add_stream("h264", rate=self.fps)

        aligned_value = 0
        if width % 16 != 0:
            aligned_value = 16 - (width % 16)
        aligned_width = width + aligned_value
        width = aligned_width

        self.avstream.width = width
        self.avstream.height = height

        self.avstream.time_base = 1 / Fraction(self.fps)
        self.surface = None
        self.surf_plane = None

        self.tmpTensor = None

        self.nvEnc = nvvc.CreateEncoder(
            self.avstream.width,
            self.avstream.height,
            format,
            codec="h264",
            preset="P4",
            cudastream=cuda_stream.handle,
        )

    def width(self):
        """
        Gets the actual video frame width from the encoder.
        """
        return self.nvEnc.Width()

    def height(self):
        """
        Gets the actual video frame height from the encoder.
        """
        return self.nvEnc.Height()

    # docs_tag: begin_imp_nvvideoencoder

    def encode_from_tensor(self, tensor):

        # Create a CUDA array interface object wit 2 planes one for luma and CrCb for NV12
        objCAI = []
        # Need to compute the address of the Y plane and the interleaved chroma plane
        data = (
            tensor.storage().data_ptr()
            + tensor.storage_offset() * tensor.element_size()
        )
        objCAI.append(
            AppCAI(
                (self.avstream.height, self.avstream.width, 1),
                (self.avstream.width, 1, 1),
                "|u1",
                data,
            )
        )
        chromaAlloc = int(data) + self.avstream.width * self.avstream.height
        objCAI.append(
            AppCAI(
                (int(self.avstream.height / 2), int(self.avstream.width / 2), 2),
                (self.avstream.width, 2, 1),
                "|u1",
                chromaAlloc,
            )
        )
        # Encode the frame takes CUDA array interface object as input
        self.encoded_frame = self.nvEnc.Encode(objCAI)
        self.write_frame(
            self.encoded_frame,
            self.pts_time,
            self.fps,
            self.avstream,
            self.container,
        )
        self.pts_time += self.delta_t

Finally, we use the av library to write packets to an MP4 container. We must properly flush (i.e. write any pending packets) at the end.

def write_frame(self, encoded_frame, pts_time, fps, stream, container):
    encoded_bytes = bytearray(encoded_frame)
    pkt = av.packet.Packet(encoded_bytes)
    pkt.pts = pts_time
    pkt.dts = pts_time
    pkt.stream = stream
    pkt.time_base = 1 / Fraction(fps)
    container.mux(pkt)