Video Encoding using VpyNvVideoCodecPF
The video batch encoder is responsible for writing tensors as an MP4 video. The actual encoding is done in batches using NVIDIA’s pyNvVideoCodec. The video encoder is generic enough to be used across the sample applications. The code associated with this class can be found in the samples/common/python/nvcodec_utils.py
file.
There are two classes responsible for the encoding work:
VideoBatchEncoder
andnvVideoEncoder
The first class acts as a wrapper on the second class which allows us to:
Stay consistent with the API of other encoders used throughout CVCUDA
Support batch encoding.
Use accelerated ops in CVCUDA to perform the necessary color conversion from RGB to NV12 before encoding the video.
VideoBatchEncoderVPF
To get started, here is how the class is initialized in its __init__
method. The encoder instance and CVCUDA color conversion tensors both are allocated when needed upon the first use.
Note: Due to the nature of NV12, representing it directly as a CVCUDA tensor is a bit challenging. Be sure to read through the explanation in the comments of the code shown below to understand more.
1class VideoBatchEncoder:
2 def __init__(
3 self,
4 output_path,
5 fps,
6 device_id,
7 cuda_ctx,
8 cuda_stream,
9 cvcuda_perf,
10 ):
11 self.logger = logging.getLogger(__name__)
12 self.output_path = output_path
13 self.fps = fps
14 self.device_id = device_id
15 self.cuda_ctx = cuda_ctx
16 self.cuda_stream = cuda_stream
17 self.cvcuda_perf = cvcuda_perf
18
19 self.encoder = None
20 self.cvcuda_HWCtensor_batch = None
21 self.cvcuda_YUVtensor_batch = None
22 self.input_layout = "NCHW"
23 self.gpu_input = True
24 self.output_file_name = None
25
26 self.logger.info("Using PyNvVideoCodec encoder version: %s" % nvvc.__version__)
Once things are defined and initialized, we would start the decoding when a call to the __call__
function is made. We need to first allocate the encoder instance if it wasn’t done so already.
Next, we use CVCUDA’s cvtcolor_into
function to convert the batch data from RGB format to NV12 format. We allocate tensors once to do the color conversion and avoid allocating same tensors on every batch.
Once the tensors are allocated, we use CVCUDA ops to perform the color conversion.
1
2# Create 2 CVCUDA tensors: reformat NCHW->NHWC and color conversion RGB->YUV
3current_batch_size = batch.data.shape[0]
4height, width = batch.data.shape[2], batch.data.shape[3]
5
6# Allocate only for the first time or for the last batch.
7if (
8 not self.cvcuda_HWCtensor_batch
9 or current_batch_size != self.cvcuda_HWCtensor_batch.shape[0]
10):
11 self.cvcuda_HWCtensor_batch = cvcuda.Tensor(
12 (current_batch_size, height, width, 3),
13 nvcv.Type.U8,
14 nvcv.TensorLayout.NHWC,
15 )
16 self.cvcuda_YUVtensor_batch = cvcuda.Tensor(
17 (current_batch_size, (height // 2) * 3, width, 1),
18 nvcv.Type.U8,
19 nvcv.TensorLayout.NHWC,
20 )
21
22# Convert RGB to NV12, in batch, before sending it over to pyVideoCodec.
23# Convert to CVCUDA tensor
24cvcuda_tensor = cvcuda.as_tensor(batch.data, nvcv.TensorLayout.NCHW)
25
26# Reformat NCHW to NHWC
27cvcuda.reformat_into(self.cvcuda_HWCtensor_batch, cvcuda_tensor)
28
29# Color convert from RGB to YUV_NV12
30cvcuda.cvtcolor_into(
31 self.cvcuda_YUVtensor_batch,
32 self.cvcuda_HWCtensor_batch,
33 cvcuda.ColorConversion.RGB2YUV_NV12,
34)
35
36# Convert back to torch tensor we are NV12
37tensor = torch.as_tensor(self.cvcuda_YUVtensor_batch.cuda(), device="cuda")
Finally, we call the nvVideoEncoder
instance to actually do the encoding.
nvVideoEncoder
This is a class offering hardware accelerated video encoding functionality using pyNvVideoCodec. It encodes tensors and writes as an MP4 file. Please consult the documentation of the pyNvVideoCodec to learn more about its capabilities and APIs.
For use in CVCUDA, this class defines the following encode_from_tensor
functions which encode a Torch tensor.
1class nvVideoEncoder:
2 def __init__(
3 self,
4 device_id,
5 width,
6 height,
7 fps,
8 enc_file,
9 cuda_ctx,
10 cuda_stream,
11 format,
12 ):
13 """
14 Create instance of HW-accelerated video encoder.
15 :param device_id: id of video card which will be used for encoding & processing.
16 :param width: encoded frame width.
17 :param height: encoded frame height.
18 :param fps: The FPS at which the encoding should happen.
19 :param enc_file: path to encoded video file.
20 :param cuda_ctx: A cuda context object
21 :param format: The format of the encoded video file.
22 (e.g. "NV12", "YUV444" see NvPyVideoEncoder docs for more info)
23 """
24 self.device_id = device_id
25 self.fps = round(Fraction(fps), 6)
26 self.enc_file = enc_file
27 self.cuda_ctx = cuda_ctx
28 self.cuda_stream = cuda_stream
29
30 self.pts_time = 0
31 self.delta_t = 1 # Increment the packets' timestamp by this much.
32 self.encoded_frame = np.ndarray(shape=(0), dtype=np.uint8)
33 self.container = av.open(enc_file, "w")
34 self.avstream = self.container.add_stream("h264", rate=self.fps)
35
36 aligned_value = 0
37 if width % 16 != 0:
38 aligned_value = 16 - (width % 16)
39 aligned_width = width + aligned_value
40 width = aligned_width
41
42 self.avstream.width = width
43 self.avstream.height = height
44
45 self.avstream.time_base = 1 / Fraction(self.fps)
46 self.surface = None
47 self.surf_plane = None
48
49 self.tmpTensor = None
50
51 self.nvEnc = nvvc.CreateEncoder(
52 self.avstream.width,
53 self.avstream.height,
54 format,
55 codec="h264",
56 preset="P4",
57 cudastream=cuda_stream.handle,
58 )
59
60 def width(self):
61 """
62 Gets the actual video frame width from the encoder.
63 """
64 return self.nvEnc.Width()
65
66 def height(self):
67 """
68 Gets the actual video frame height from the encoder.
69 """
70 return self.nvEnc.Height()
71
72 # docs_tag: begin_imp_nvvideoencoder
73
74 def encode_from_tensor(self, tensor):
75
76 # Create a CUDA array interface object wit 2 planes one for luma and CrCb for NV12
77 objCAI = []
78 # Need to compute the address of the Y plane and the interleaved chroma plane
79 data = (
80 tensor.storage().data_ptr()
81 + tensor.storage_offset() * tensor.element_size()
82 )
83 objCAI.append(
84 AppCAI(
85 (self.avstream.height, self.avstream.width, 1),
86 (self.avstream.width, 1, 1),
87 "|u1",
88 data,
89 )
90 )
91 chromaAlloc = int(data) + self.avstream.width * self.avstream.height
92 objCAI.append(
93 AppCAI(
94 (int(self.avstream.height / 2), int(self.avstream.width / 2), 2),
95 (self.avstream.width, 2, 1),
96 "|u1",
97 chromaAlloc,
98 )
99 )
100 # Encode the frame takes CUDA array interface object as input
101 self.encoded_frame = self.nvEnc.Encode(objCAI)
102 self.write_frame(
103 self.encoded_frame,
104 self.pts_time,
105 self.fps,
106 self.avstream,
107 self.container,
108 )
109 self.pts_time += self.delta_t
110
Finally, we use the av
library to write packets to an MP4 container. We must properly flush (i.e. write any pending packets) at the end.
1def write_frame(self, encoded_frame, pts_time, fps, stream, container):
2 encoded_bytes = bytearray(encoded_frame)
3 pkt = av.packet.Packet(encoded_bytes)
4 pkt.pts = pts_time
5 pkt.dts = pts_time
6 pkt.stream = stream
7 pkt.time_base = 1 / Fraction(fps)
8 container.mux(pkt)
9