Reusing the H.264/AVC Deblocking Filter for Efficient Spatio-Temporal Prediction in Video Coding

Abstract

The prediction step is a very important part of hybrid video codecs for effectively compressing video sequences. While existing video codecs predict either in temporal or in spatial direction only, the compression efficiency can be increased by a combined spatio-temporal prediction. In this paper we propose an algorithm for reusing the H.264/AVC deblocking filter for spatio-temporal prediction. Reusing this highly optimized filter allows for a very low computational complexity of this prediction mode and an average rate reduction of up to 7.2% can be achieved.

Index Terms— Video coding, Prediction, Deblocking

1 Introduction

Hybrid video codecs like H.264/AVC [1] are able to effectively compress video sequences and provide a good viewing experience at manageable data rates. This is achieved by irrelevance reduction on the one hand side and redundancy reduction on the other hand side. Whilst irrelevance reduction results from quantization, the redundancy in a video sequence is removed by prediction, transformations and entropy coding. In this context, the area actually being encoded is always predicted from already transmitted and decoded areas. Afterwards, the prediction residual is transformed and transmitted to the decoder. Since prediction takes place at a very early stage of the encoding, it has a very high influence on the coding efficiency.

Current video codecs use two strategies for taking advantage of the correlations within a video sequence for prediction. To cope with temporal similarities, motion compensated prediction [2] is used and spatial similarities are exploited by skillfully continuing the signal from already transmitted areas into the area currently encoded. Although the codecs can switch between spatial and temporal prediction for selecting a rate-distortion optimal mode, the fact that a video sequence possesses temporal and spatial correlations at the same time is ignored. Up to now, only few prediction algorithms exist that can make use of spatial as well as temporal redundancies. The algorithms from Jiang [3] and Matsuda [4] are two of them to mention.

In [5] we introduced a different spatio-temporal prediction algorithm, the spatially refined motion compensation. This algorithm operates in two stages and uses motion compensated prediction for exploiting temporal correlations and a spatial refinement step for including spatial correlations into the prediction. Even though we already proposed modifications of this algorithms with a reduced complexity [6], the computational load produced by spatial refinement is still far too large for real-time implementations. In order to cope with this, we now propose a method for reusing the H.264/AVC deblocking filter [7] for spatio-temporal prediction.

2 Spatially Refined Motion Compensated Prediction Principles

\psfrag{m}[t][t][0.9]{$m$}\psfrag{n}[t][t][0.9]{$n$}\psfrag{x}[t][t][0.9]{$x$}\psfrag{y}[t][t][0.9]{$y$}\psfrag{t}[t][t][0.9]{$t$}\psfrag{B}[l][l][0.9]{$\mathcal{B}$}\psfrag{R}[l][l][0.9]{$\mathcal{R}$}\psfrag{x0}[t][t][0.9]{$x_{0}$}\psfrag{y0}[t][t][0.9]{$y_{0}$}\psfrag{tau}[t][t][0.9]{$\tau$}\psfrag{tau1}[t][t][0.9]{$\tau-1$}\includegraphics[width=177.26375pt]{graphics/refinement_area.eps}

Fig. 1: Relationship between currently predicted macroblock at position $\left(x_{0},y_{0}\right)$ and already decoded neighboring blocks. The predicted block is divided into sub-blocks for motion compensation.

Before spatio-temporal prediction with the utilization of the H.264/AVC deblocking filter is introduced, we will briefly review the basic ideas of spatially refined motion compensated prediction. In order to exploit spatial as well as temporal redundancies for prediction, spatially refined motion compensation operates in two stages. During the first stage, pure motion compensated prediction is carried out. Here, all features of motion compensated prediction that have been developed in the recent years like fractional-pel accuracy, multiple reference frames or sub-block partitioning can be used. Using this, the temporal similarities between the individual frames can be exploited well. But, motion compensated prediction takes place without considering spatial redundancies. In order to account for this, the motion compensated signal is spatially refined in the second step. For this, the spatially adjacent, already decoded macroblocks are regarded. If the macroblocks are processed in line-scan order, the macroblocks to the left and above are available. Fig. 1 shows this relationship between the motion compensated block and the already decoded neighboring blocks for the example that the macroblock at position $\left(x_{0},y_{0}\right)$ in frame $t=\tau$ is predicted. In this example, the feature of H.264/AVC to divide a macroblock into sub-blocks for motion compensation is included.

The originally proposed algorithm for spatial refinement [5] obtains the spatio-temporal prediction by iteratively generating a joint model for the motion compensated block and the already decoded neighboring blocks by Frequency Selective Approximation (FSA). The model results from a weighted superposition of two-dimensional basis functions. Although this algorithm yields a significantly increased coding efficiency, it possesses the drawback of a very high computational complexity. In [6] an alternative model generation is proposed for an accelerated refinement, but the complexity is still too high for real-time implementations.

In order to cope with this, in the following we introduce an algorithm that reuses a slightly modified version of the H.264/AVC deblocking filter [7] for spatial refinement. Due to the reuse of this highly optimized filter, spatial refinement can be carried out at a very high speed.

3 Using the H.264/AVC Deblocking Filter for Spatial Refinement

Refer to caption — Fig. 2: *Simplified block diagram of a hybrid video encoder with spatial refinement.*

The original intention of the H.264/AVC deblocking filter [7] is to reduce the artifacts which result from quantization. Fig. 2 shows the original position of the deblocking filter in a simplified block diagram of a hybrid video encoder with spatial refinement of the motion compensated predicted signal. The deblocking filter aims at reducing blocking artifacts before the decoded image is copied to the reference buffer from which the motion compensated prediction is carried out. Due to this, using the deblocking filter has three advantages. First, the decoded output image contains less artifacts that might be visible to the viewer. Second, the motion compensated predicted signal fits the input macroblock more, since the motion compensation is performed on the deblocked data. Hence, the coding efficiency is increased. Third, compared to a post-processing deblocking filter, one frame buffer less is required.

Blocking artifacts arise as the individual macroblocks, or even the sub-blocks are quantized independently of each other. Due to this, the pixel values in different blocks differ unnaturally strong and the block boundaries become visible. The deblocking filter aims at detecting unnatural edges at block boundaries and at softening them, if required.

But, by taking a close look at motion compensated prediction, one can discover that the predicted signal contains similar artifacts. In the process of motion compensated prediction, in the reference frame the best fitting block, or respectively sub-block if macroblock partitioning is used, is determined for the currently regarded block. However, for selecting a block in the reference frame, only the current block is considered and the neighboring blocks are ignored. Hence, the block that is selected for motion compensated prediction may not fit the neighboring, already decoded macroblocks. In this case, the transition between the prediction signal and the neighboring blocks is similar to a blocking artifact. Obviously, such a motion compensated block cannot be a good predictor since the spatial correlations to the neighboring blocks are not considered. For resolving this problem, we propose to use a deblocking filter for spatial refinement of the motion compensated block and for incorporating spatial redundancies into the prediction process.

\psfrag{p3}[t][t][0.85]{$p_{3}$}\psfrag{p2}[t][t][0.85]{$p_{2}$}\psfrag{p1}[t][t][0.85]{$p_{1}$}\psfrag{p0}[t][t][0.85]{$p_{0}$}\psfrag{q0}[t][t][0.85]{$q_{0}$}\psfrag{q1}[t][t][0.85]{$q_{1}$}\psfrag{q2}[t][t][0.85]{$q_{2}$}\psfrag{q3}[t][t][0.85]{$q_{3}$}\psfrag{Sub-block}[t][t][0.8]{Sub-block}\psfrag{boundary}[t][t][0.8]{boundary}\includegraphics[width=151.93925pt]{graphics/deblocking.eps}

Fig. 3: Samples covered by the deblocking filter.

The filter used for spatial refinement is similar to the strong deblocking filter from H.264/AVC [7] and covers always eight samples, four on the one side of the sub-block boundary and four on the other. As illustrated in Fig. 3, these samples are denoted by $p_{0},\ldots,p_{3}$ and $q_{0},\ldots,q_{3}$ . Independently of the actual sub-block partitioning, the filtering is carried out for all sub-block boundaries, first in vertical direction, then in horizontal direction. In this process, the filtering starts with the boundary to the already reconstructed neighboring macroblocks. Using this, spatial similarities can be drawn from the already decoded blocks into the predictor.

According to [7], two thresholds are required for controlling the amount of filtering. Originally, the thresholds

	$\displaystyle\alpha$	$\displaystyle=$	$\displaystyle 0.8\left(2^{h/6}-1\right)$		(1)
	$\displaystyle\beta$	$\displaystyle=$	$\displaystyle 0.5h-7$		(2)

depend on the quantization parameter. Since the amount of deblocking that is required for spatial refinement does not depend on the quantization, the parameter $h$ can be chosen freely. The choice of $h$ will be discussed in Section 4. Alg. 1 lists the steps that are necessary for deriving the filtered values $p^{\prime}_{0},p^{\prime}_{1},p^{\prime}_{2},p^{\prime}_{3},q^{\prime}_{0},q^{\prime}_{1},q^{\prime}_{2},q^{\prime}_{3}$ from the original values $p_{0},p_{1},p_{2},p_{3},q_{0},q_{1},q_{2},q_{3}$ . Comparing Alg. 1 with the original deblocking filter from [7] one can discover that the calculations for deriving the filtered values are identical with the only exception that the clipping of the filtered values is not used. As motion compensated samples can differ strongly from the original samples, limiting the range of the filtered values would not be advisable. Due to the similarities, the proposed refinement can run as efficiently as the H.264/AVC deblocking filter and the optimized implementations can be reused. After filtering, the original values of the motion compensated block are replaced by the filtered values.

Algorithm 1 Deblocking filter for spatial refinement

p_{0},p_{1},p_{2},p_{3},q_{0},q_{1},q_{2},q_{3}

p^{\prime}_{i}=p_{i},\ \ q^{\prime}_{i}=q_{i},\ \ \forall i

\left|p_{0}-q_{0}\right|<\alpha\ \&\ \left|p_{0}-p_{1}\right|<\beta\ \&\ \left|q_{0}-q_{1}\right|<\beta

then

p_{0}^{\prime}=p_{0}+\left(4\left(q_{0}-p_{0}\right)+\left(p_{1}-q_{1}\right)+4\right)>\!\!>3

q_{0}^{\prime}=q_{0}-\left(4\left(q_{0}-p_{0}\right)+\left(p_{1}-q_{1}\right)+4\right)>\!\!>3

\left|p_{0}-p_{2}\right|<\beta

then

p_{1}^{\prime}=p_{1}+\left(p_{2}+\left(\left(p_{0}+q_{0}+1\right)>\!\!>1\right)-2p_{1}\right)>\!\!>1

end if

\left|q_{0}-q_{2}\right|<\beta

then

q_{1}^{\prime}=q_{1}+\left(q_{2}+\left(\left(q_{0}+p_{0}+1\right)>\!\!>1\right)-2q_{1}\right)>\!\!>1

end if

\left|p_{0}-q_{0}\right|<\left(\alpha>\!\!>2\right)+2

then

\left|p_{0}-p_{2}\right|<\beta

then

p_{0}^{\prime}=\left(p_{2}+2p_{1}+2p_{0}+2q_{0}+q_{1}+4\right)>\!\!>3

p_{1}^{\prime}=\left(p_{2}+p_{1}+p_{0}+q_{0}+2\right)>\!\!>2

p_{2}^{\prime}=\left(2p_{3}+3p_{2}+p_{1}+p_{0}+q_{0}+4\right)>\!\!>3

else

p_{0}^{\prime}=\left(2p_{1}+p_{0}+q_{1}+2\right)>\!\!>2

end if

\left|q_{0}-q_{2}\right|<\beta

then

q_{0}^{\prime}=\left(q_{2}+2q_{1}+2q_{0}+2p_{0}+p_{1}+4\right)>\!\!>3

q_{1}^{\prime}=\left(q_{2}+q_{1}+q_{0}+p_{0}+2\right)>\!\!>2

q_{2}^{\prime}=\left(2q_{3}+3q_{2}+q_{1}+q_{0}+p_{0}+4\right)>\!\!>3

else

q_{0}^{\prime}=\left(2q_{1}+q_{0}+p_{1}+2\right)>\!\!>2

end if

p^{\prime}_{0},p^{\prime}_{1},p^{\prime}_{2},p^{\prime}_{3},q^{\prime}_{0},q^{\prime}_{1},q^{\prime}_{2},q^{\prime}_{3}

These steps are repeated for all vertical and horizontal sub-block boundaries until all motion compensated samples are replaced by the filtered ones. As the already filtered samples are included in the filtering of subsequent samples, the spatial information propagates into the motion compensated block. Finally, the refined block is used for prediction. Since the refined block exploits temporal as well as spatial correlations, it is a better predictor for the block to be encoded and thus increases the coding efficiency. It has to be noted that the proposed usage of the deblocking filter is not intended for replacing the original H.264/AVC deblocking filter. The aim of the original filter is to allay artifacts resulting from quantization whereas the objective of spatial refinement is to exploit spatial as well as temporal redundancies for prediction. As we will show in the next section, the highest coding efficiency can be achieved if spatial refinement and the original deblocking filter are used together.

4 Simulations and Results

\providelength

\AxesLineWidth

\providelength

\GridLineWidth

\providelength

\GridLineDotSep

\providelength

\MinorGridLineWidth

\providelength

\MinorGridLineDotSep

\providelength

\plotwidth

\providelength

\LineWidth

\providelength

\MarkerSize

Fig. 4: Rate-distortion curves for first $399$ P-frames of the 720p-sequence “Crew”.

For evaluating the coding efficiency and the acceleration of the spatial refinement step that can be achieved by the proposed algorithm, we implemented it into H.264/AVC reference encoder JM10.2 running in Main Profile. In addition to the novel refinement, refinement by FSA [5] is considered for comparison. The tests are carried out on four test sequences: two CIF-sequences (“Discovery City”, “Vimto”) of $100$ frames length and two 720p-sequences (“Crew”, “Jets”) of $400$ frames length. All sequences are coded in IPPP order and with different quantization parameters (QP). In this process, a wide quality range from QP $16$ to QP $43$ with spacing $3$ is regarded for the CIF-sequences whereas only high qualities with QPs in the range between $16$ and $28$ , also with spacing $3$ , are regarded for the high resolution sequences. Since not all macroblocks benefit from spatial refinement, one bit per macroblock has to be added as side information to signalize the decoder if spatial refinement should be used or not. Thus, as shown in Fig. 2, the encoder now can switch between three modes: spatial prediction, temporal prediction, and spatio-temporal prediction. We have tested two different scenarios: first, the original H.264/AVC deblocking filter is switched off and the artifacts resulting from quantization are not removed within the prediction loop. Second, the original H.264/AVC deblocking filter is turned on. In this case the deblocking filter is used twice, on the one hand side for reducing quantization artifacts, on the other hand side for spatial refinement of the motion compensated block.

The parameters for spatial refinement by FSA are chosen according to [5]. For the proposed deblocked refinement, only the parameter $h$ can be varied. This parameter controls the amount of filtering, i. e. at which point a transition is regarded as blocking artifact or as natural edge. For large values of $h$ the thresholds $\alpha$ and $\beta$ also become large and the filtering of the pixels takes place more often than for small values. We have tested different values of $h$ from the range between $10$ and $43$ and discovered that the overall improvement only varies marginally if values from the range between $19$ and $40$ are selected. Hence, we decided to select $h=28$ .

Fig. 4 shows the rate-distortion curves of the 720p-sequence “Crew” for the cases that pure motion compensated prediction is used and that spatial refinement by FSA and the proposed algorithm are included, each time with the original H.264/AVC deblocking filter turned on and off. Taking a look at the pure motion compensated prediction, one can recognize the effect of the original deblocking filter since the coding efficiency is higher for lower qualities. But, with increasing quality the quantization artifacts reduce and the influence of the original deblocking filter becomes void. By regarding the curves for the cases that spatial refinement is used, it becomes apparent that the coding efficiency can be significantly increased by spatial refinement, independently of the actual refinement algorithm.

In order to condense the results, Table 1 lists the average rate reduction or respectively the average $\mathrm{PSNR}$ gain that can be achieved by spatial refinement, compared to pure motion compensated prediction. For this, the Bjøntegaard metric [8] is used. For the case that the original H.264/AVC deblocking filter is turned off, spatial refinement yields very large gains over pure motion compensated prediction. In this case, the blocking artifacts resulting from quantization propagate to the motion compensated prediction, causing a lower coding efficiency. Although spatial refinement can also deal with this problem, it is advisable to switch the original deblocking filter on. In this case, an average rate reduction of up to $15.7\%$ can be achieved by spatial refinement using FSA and up to $7.2\%$ by the proposed deblocked refinement.

Avg. Rate Reduction
	FSA [5]		Proposed
H.264/AVC
deblocking	off	on	off	on
“D. City”	$-10.34\%$	$-5.37\%$	$-13.65\%$	$-5.51\%$
“Vimto”	$-13.40\%$	$-5.77\%$	$-12.15\%$	$-2.72\%$
“Crew”	$-16.89\%$	$-10.95\%$	$-15.69\%$	$-7.22\%$
“Jets”	$-22.20\%$	$-9.97\%$	$-19.03\%$	$-6.00\%$
Avg. $\mathrm{PSNR}$ gain
“D. City”	$0.74\,\mathrm{dB}$	$0.35\,\mathrm{dB}$	$0.99\,\mathrm{dB}$	$0.36\,\mathrm{dB}$
“Vimto”	$0.66\,\mathrm{dB}$	$0.26\,\mathrm{dB}$	$0.59\,\mathrm{dB}$	$0.12\,\mathrm{dB}$
“Crew”	$0.81\,\mathrm{dB}$	$0.45\,\mathrm{dB}$	$0.75\,\mathrm{dB}$	$0.29\,\mathrm{dB}$
“Jets”	$0.48\,\mathrm{dB}$	$0.18\,\mathrm{dB}$	$0.41\,\mathrm{dB}$	$0.11\,\mathrm{dB}$

Table 1: Average relative rate reduction and average $\,\mathrm{PSNR}$ gain, calculated according to [8] for spatial refinement by FSA [5] and the proposed algorithm.

Comparing the results for spatial refinement by FSA and the proposed deblocked refinement, it becomes obvious that FSA always can achieve a considerably higher coding efficiency. But, as mentioned before, FSA is computationally very expensive. Due to all the optimization that has been carried out for developing the H.264/AVC deblocking filter, the proposed refinement algorithm operates very fast. To quantify the speed-up, Table 2 shows the time per macroblock that is required for spatial refinement. The simulations have been carried out on one core of an Intel Core2 Quad, running at $2.4\,\mathrm{GHz}$ and equipped with $8\,\mathrm{GB}$ RAM. Since the proposed refinement algorithm only requires $0.075\,\mathrm{msec}$ per macroblock it is approximately $900$ times faster than the original FSA refinement and a real-time implementation becomes realistic. This huge acceleration also can justify the lower quality of the spatial refinement

	Processing time
FSA [5]	$67.1\,\mathrm{msec}$
Proposed	$0.075\,\mathrm{msec}$
Acceleration factor	$895$

Table 2: Processing time per macroblock for spatial refinement.

5 Conclusion

Within the scope of this paper we proposed a method for reusing the H.264/AVC deblocking filter for spatio-temporal prediction in video coding. For this, the motion compensated prediction is spatially refined. By exploiting spatial as well as temporal correlations for prediction, a mean rate reduction of up to $7.2\%$ can be achieved by the proposed algorithm. Since the novel algorithm reuses the highly optimized H.264/AVC deblocking filter, the processing time for the refinement is very small and a real-time implementation is possible.

Although the proposed algorithm already yields an increased coding efficiency, future research has to focus on adapting the deblocking filter to the needs of spatial refinement and to take the actural partitioning of the macroblocks into into account for refinement.

References

[1] Iain Richardson, H.264 & MPEG-4 Video Compression, Wiley, West Sussex, England, Aug. 2003.
[2] Frédéric Dufaux and Fabrice Moscheni, “Motion estimation techniques for digital TV: A review and a new contribution,” Proceedings of the IEEE, vol. 83, no. 6, pp. 858–876, June 1995.
[3] Wenfei Jiang, Longin Jan Latecki, Wenyu Liu, Hui Liang, and Ken Gorman, “A video coding scheme based on joint spatiotemporal and adaptive prediction,” in IEEE Trans. Image Process., 2009, vol. 18, pp. 1025–1036.
[4] Ichiro Matsuda, Kyohei Unno, Hisashi Aomori, and Susumu Itoh, “Block-based spatio-temporal prediction for video coding,” in Proc. European Signal Processing Conference, Aalborg, Denmark, Aug. 2010, pp. 2052–2056.
[5] Jürgen Seiler and André Kaup, “Spatio-temporal prediction in video coding by spatially refined motion compensation,” in Proc. Int. Conf. on Image Processing (ICIP), San Diego, USA, 12.-15. Oct. 2008, pp. 2788–2791.
[6] Jürgen Seiler and André Kaup, “Multiple selection approximation for improved spatio-temporal prediction in video coding,” in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, USA, March 2010, pp. 886–889.
[7] Peter List, Anthony Joch, Jani Lainema, Gisle Bjøntegaard, and Marta Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614–619, July 2003.
[8] Gisle Bjøntegaard, “Calculation of average PSNR differences between RD-curves,” Tech. Rep., ITU-T VCEG Meeting, Austin, Texas, USA, document VCEG-M33, April 2001.