ratecontrol of H264

来源：互联网发布：70台网吧网络组建方案编辑：程序博客网时间：2024/05/20 06:51

A qualitative overview of x264's ratecontrol methods
By Loren Merritt

Historical note:
This document is outdated, but a significant part of it is still accurate.
Here are some important ways ratecontrol has changed since the authoring
of this document:
- By default, MB-tree is used instead of qcomp for weighting frame quality
based on complexity. MB-tree is effectively a generalization of qcomp to
the macroblock level. MB-tree also replaces the constant offsets for B-frame
quantizers. The legacy algorithm is still available for low-latency
applications.
- Adaptive quantization is now used to distribute quality among each frame;
frames are no longer constant quantizer, even if MB-tree is off.
- VBV runs per-row rather than per-frame to improve accuracy.

x264's ratecontrol is based on libavcodec's, and is mostly empirical. But I
can retroactively propose the following theoretical points which underlie most
of the algorithms:

- You want the movie to be somewhere approaching constant quality. However,
constant quality does not mean constant PSNR nor constant QP. Details are less
noticeable in high-complexity or high-motion scenes, so you can get away with
somewhat higher QP for the same perceived quality.
- On the other hand, you get more quality per bit if you spend those bits in
scenes where motion compensation works well: A given artifact may stick around
several seconds in a low-motion scene, and you only have to fix it in one frame
to improve the quality of the whole scene.
- Both of the above are correlated with the number of bits it takes to encode
a frame at a given QP.
- Given one encoding of a frame, we can predict the number of bits needed to
encode it at a different QP. This prediction gets less accurate if the QPs are
far apart.
- The importance of a frame depends on the number of other frames that are
predicted from it. Hence I-frames get reduced QP depending on the number and
complexity of following inter-frames, disposable B-frames get higher QP than
P-frames, and referenced B-frames are between P-frames and disposable B-frames.

The modes:

2pass:
Given some data about each frame of a 1st pass (e.g. generated by 1pass ABR,
below), we try to choose QPs to maximize quality while matching a specified
total size. This is separated into 3 parts:
(1) Before starting the 2nd pass, select the relative number of bits to
allocate between frames. This pays no attention to the total size of the
encode. The default formula, empirically selected to balance between the
1st 2 theoretical points, is "complexity ** 0.6", where complexity is
defined to be the bit size of the frame at a constant QP (estimated from
the 1st pass).
(2) Scale the results of (1) to fill the requested total size. Optional:
Impose VBV limitations. Due to nonlinearities in the frame size predictor
and in VBV, this is an iterative process.
(3) Now start encoding. After each frame, update future QPs to compensate
for mispredictions in size. If the 2nd pass is consistently off from the
predicted size (usually because we use slower compression options than the
1st pass), then we multiply all future frames' qscales by the reciprocal of
the error. Additionally, there is a short-term compensation to prevent us
from deviating too far from the desired size near the beginning (when we
don't have much data for the global compensation) and near the end (when
global doesn't have time to react).

1pass, average bitrate:
The goal is the same as in 2pass, but here we don't have the benefit of a
previous encode, so all ratecontrol must be done during the encode.
(1) This is the same as in 2pass, except that instead of estimating
complexity from a previous encode, we run a fast motion estimation algo
over a half-resolution version of the frame, and use the SATD residuals
(these are also used in the decision between P- and B-frames). Also, we
don't know the size or complexity of the following GOP, so I-frame bonus
is based on the past.
(2) We don't know the complexities of future frames, so we can only scale
based on the past. The scaling factor is chosen to be the one that would
have resulted in the desired bitrate if it had been applied to all frames
so far.
(3) Overflow compensation is the same as in 2pass. By tuning the strength
of compensation, you can get anywhere from near the quality of 2pass (but
unpredictable size, like +- 10%) to reasonably strict filesize but lower
quality.

1pass, constant bitrate (VBV compliant):
(1) Same as ABR.
(2) Scaling factor is based on a local average (dependent on VBV buffer size)
instead of all past frames.
(3) Overflow compensation is stricter, and has an additional term to hard
limit the QPs if the VBV is near empty. Note that no hard limit is done for
a full VBV, so CBR may use somewhat less than the requested bitrate. Note also
that if a frame violates VBV constraints despite the best efforts of
prediction, it is not re-encoded.

1pass, constant ratefactor:
(1) Same as ABR.
(2) The scaling factor is a constant based on the --crf argument.
(3) No overflow compensation is done.

constant quantizer:
QPs are simply based on frame type.

0 0