v0.10.0-beta
Release Highlights
CV-CUDA v0.10.0 includes a critical bug fix (cache growth management) alongside the following changes:
New Features:
Added mechanism to limit and manage cache memory consumption (includes new “Best Practices” documentation) [1].
Performance improvements of color conversion operators (e.g., 2x faster RGB2YUV).
Refactored codebase to allow independent build of NVCV library (data structures).
Bug Fixes:
Fixed unbounded cache memory consumption issue [1].
Improved management of Python-created object lifetimes, decoupled from cache management [1].
Fixed potential crash in Resize operator’s linear and nearest neighbor interpolation from non-aligned vectorized writes.
Fixed Python CvtColor operator to correctly handle NV12 and NV21 outputs.
Fixed Resize and RandomResizedCrop linear interpolation weight for border rows and columns.
Fixed missing parameter in C API for fused ResizeCropConvertReformat.
Fixed several minor documentation and error output issues.
Fixed minor compiler warning while building Resize operator.
Compatibility and Known Limitations
New limitations:
Cache/resource management introduced in v0.10 add micro-second-level overhead to Python operator calls. Based on the performance analysis of our Python samples, we expect the production- and pipeline-level impact to be negligible. CUDA kernel and C++ call performance is not affected. We aim to investigate and reduce this overhead further in a future release.
Sporadic Pybind11-deallocation crashes have been reported in long-lasting multi-threaded Python pipelines with externally allocated memory (eg wrapped Pytorch buffers). We are evaluating an upgrade of Pybind11 (currently using 2.10) as a potential fix in an upcoming release.
For the full list, see main README on CV-CUDA GitHub.
License
CV-CUDA is licensed under the Apache 2.0 license.
Resources
Acknowledgements
CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.