Image Processing Transform Coding Using the Residual Quadtree (RQT)
来源:互联网 发布:windows激活有什么用 编辑:程序博客网 时间:2024/06/06 07:46
In HEVC, each picture is divided into coding tree blocks (CTBs). A CTB is a square block and represents the root of a quadtree, i.e., the coding tree. The CTB size may range from 8×8 to 64×64 luma samples, but typically 64×64 is used. Each CTB can be further split into smaller square blocks called coding blocks (CBs). After the CTB is split recursively into CBs, each CB is further divided into prediction blocks (PBs) and transform blocks (TBs). The partitioning of the CBs into TBs is carried out recursively based on a quadtree approach. The corresponding structure, i.e. the residual quadtree (RQT), allows TB sizes from 4×4 up to 32×32 luma samples. The figure below shows an example where a CB includes 10 TBs, labeled with the letters a to j, and the corresponding block partitioning. The individual TBs are processed in alphabetical order, which follows a recursive Z-scan with depth-first traversal. The quadtree approach enables the adaptation of the transform to the varying space-frequency characteristics of the residual signal. Larger transform block sizes, which have larger spatial support, provide better frequency resolution. However, smaller transform block sizes, which have smaller spatial support, provide better spatial resolution. The trade-off between the two, spatial and frequency resolution, is chosen by the encoder control, for example based on Lagrangian optimization techniques.
Parameter Signaling
The RQT is defined by three parameters: the maximum depth of the tree, the minimum allowed transform size and the maximum allowed transform size. The minimum and maximum transform sizes can vary within the range from 4×4 to 32×32 samples, which correspond to the supported block transforms mentioned in the previous section. The maximum allowed depth of the RQT restricts the number of subdivisions. A maximum depth equal to zero means that a CB cannot be split any further and thus the associated CB contains only one TB.
All these parameters interact and influence the subdivision of the RQT. Consider a case, in which the root CB size is 64×64, the maximum depth is equal to zero and the maximum transform size is equal to 32×32. In this case, the CB has to be subdivided at least once, since otherwise it would lead to a 64×64 TB, which is not allowed. The RQT parameters, i.e. maximum RQT depth, minimum and maximum transform size, are transmitted in the bitstream at the sequence parameter set level. Regarding the RQT depth, different values can be specified and signaled for intra and inter coded CUs.
Fast Encoder Control
In order to determine the optimal partitioning of a CU into TUs, the encoder has to exhaustively evaluate all possible RQT structures, corresponding to all possible TU partitionings for the given CU. Since the number of possible RQT structures grows exponentially with the maximum allowed tree depth, the encoder complexity (e.g. runtime) required to obtain the optimal TU partitioning in terms of rate-distortion (RD) would be exponentially increased with increased RQT depth. This would limit application of the RQT approach in transform coding. Therefore, in addition to the exhaustive search as it is done by the HM reference encoder software, we developed a fast RQT encoder control limiting the number of possible candidates. This leads to a reduction of encoder runtime, which comes at the cost of a slightly inferior coding performance and is designed as follows.
The encoder starts at the RQT root, corresponding to the maximum possible TB size, and continues evaluation at the next RQT level, corresponding to the next smaller TB size, until either an early-termination criterion is fulfilled or the maximum allowed RQT depth is reached. For the early-termination criterion, it is checked whether all the absolute unquantized transform coefficients are below a certain threshold. If this is the case, then the evaluation stops at the current level, and smaller TB sizes are not taken into consideration. A QP-dependent threshold is used, which is higher for the smaller QP values and lower for the larger QP values, such that the reduction of encoder runtime in percentage is approximately the same for the whole QP range. For QP values below 24, the threshold is equal to 125% of the quantizer step size, 50% for QP values above 48, and for QP values in the range of 24 and 48, there is a linear transition between 50% and 125% of the quantizer step size.
References
- D. Marpe, H. Schwarz, S. Bosse, B. Bross, P. Helle, T. Hinz, H. Kirchhoffer, H. Lakshman, T. Nguyen, S. Oudin, M. Siekmann, K. Sühring, M. Winken, and T. Wiegand, "Video Compression Using Nested Quadtree Structures, Leaf Merging and Improved Techniques for Motion Representation and Entropy Coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 20, No. 12, pp. 1676-1687, Dec. 2010.
- M. Winken, P. Helle, D. Marpe, H. Schwarz, and T. Wiegand, "Transform Coding in the HEVC Test Model," 18th IEEE International Conference on Image Processing (ICIP), 2011, pp. 3693 – 3696.
- M. Siekmann, H. Schwarz, B. Bross, D. Marpe, and T. Wiegand, "Fast encoder control for RQT," JCTVC-E425, Mar. 2011.
- Image Processing Transform Coding Using the Residual Quadtree (RQT)
- Image Processing using C#
- Object Detection by Color: Using the GPU for Real-Time Video Image Processing
- Using SIMD Instructions For Image Processing
- Image processing on FPGA using Verilog HDL
- Domain Transform for Edge-Aware Image and Video Processing
- Digital Image Processing(3nd editioin)-Fourier Transform
- Transform Coding
- Image manipulation and processing using Numpy and Scipy
- Digital Image Processing Using MATLAB 之笔记一
- Non-Photorealistic Rendering (Domain transform for edge-aware image and video processing)
- Domain Transform for Edge-Aware Image and Video Processing - 论文阅读
- utilize matlab to implement the image processing (1)
- Adptive Thresholding Using the Integral Image
- Adaptive thresholding using the integral image
- Coding in the Shade: Using Eclipse with Google Data APIs
- 关于2009_CVPR_Linear spatial pyramid matching using sparse coding for image classification
- Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification 理解
- 【SDnoip2014夏令营】【day2】
- AndroidStudio下使用百度地图API引入jar包和so文件路径问题
- Servlet与Jsp
- 运动action学习笔记
- 理解矩阵,矩阵背后的现实意义
- Image Processing Transform Coding Using the Residual Quadtree (RQT)
- bzoj 3333: 排队计划 题解
- python twisted 核心架构 分析体会
- Android-EditText只能输入不能删除的问题
- linux uniq命令详解
- 【控件】banner图片自动轮换—从服务端请求图片,动态改变图片个数
- poj2243 跳马问题,bfs一下就可以了。
- 财报阅读 快速入门
- 使用微信公众号开发<一>