This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A QP-adaptive Mechanism for CNN-based Filter in Video Coding

Abstract

Convolutional neural network (CNN)-based filters have achieved great success in video coding. However, in most previous works, individual models are needed for each quantization parameter (QP) band. This paper presents a generic method to help an arbitrary CNN-filter handle different quantization noise. We model the quantization noise problem and implement a feasible solution on CNN, which introduces the quantization step (Qstep) into the convolution. When the quantization noise increases, the ability of the CNN-filter to suppress noise improves accordingly. This method can be used directly to replace the (vanilla) convolution layer in any existing CNN-filters. By using only 25% of the parameters, the proposed method achieves better performance than using multiple models with VTM-6.3 anchor. Besides, an additional BD-rate reduction of 0.2% is achieved by our proposed method for chroma components.

Index Terms—  Convolutional Neural Network, In-loop filter, Video Coding, H.266/VVC.

1 Introduction

Quantization [1] in the hybrid coding framework like H.265/HEVC [2] and H.266/VVC [3] is a crucial part of the lossy compression. However, it also causes some severe distortion and artifacts like ringing and Gibbs effects. Filters such as deblocking (DB), sample adaptive offset (SAO), and adaptive loop filter (ALF) are proposed to alleviate these artifacts. Also, learning-based filters, especially CNN-based filters[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], have shown great potential and arouse widespread interest. Previous studies show that different structures and designs such as serial network[5, 7, 8, 9, 12, 15], parallel network[4, 6, 11, 13, 14] can make significant improvements to both subjective and objective qualities. Liu et al. [5] proposed to use depth separable convolution (DSC[16]) in CNN to reduce the complexity. Besides, Dai et al. proposed a parallel network structure VRCNN [6], which uses different sizes of convolution kernels in the same layer to extract the features from different receptive fields. Conversely, Wang et al. proposed a DCAD [7] with a serial structure, which stacks 10 convolutional layers, also achieves good performance. Apart from what is mentioned above, Zhou et al. proposed to use the res-net[17] as the backbone of the proposed Tucodec [8], and used leaky-ReLU [18] instead of the ReLU [19] as the activation function.

Although CNN-based in-loop filters have achieved great success. Few previous studies have explored the generalization capabilities of different quantization parameters (QPs) and most works require training a specific model for each QP band. It is impossible to train quantities of models in actual filtering due to limited storage resources. Also, those CNN filters do not make full use of the side information of coding, such as the corresponding QP of the reconstructed image, etc.

In this paper, we propose a novel method to solve this important but easily neglected problem. Specifically, from the frequency domain, we model this problem and obtain a feasible solution of making a simple filtering model adapt to different quantization noises. By further decomposition, the simplified solution is applied to each convolution layer instead of the first one, so it has great robustness and performance. With VTM-6.3 anchor, we conduct extensive experiments on four models of different complexity to demonstrate the versatility of the proposed method. Compared with using multiple models, using a single model with the proposed method achieves about three-quarters reduction on the number of parameters and extra 0.2% BD-rate reductions on chroma components.

2 Literature Review and Analysis

To our best knowledge, only one approach of using QP map [20] as the extra input has been proposed to solve this problem. And similar methods [21, 22] have been proposed based on the QP map. The network [20] can better control the filtering strength by using the internal relationship between the QP map and the distortion. To analysis its working principle, we consider the first convolution layer of this model:

𝒚^=𝒘{𝒚;QP}+𝒃\hat{\bm{y}}=\bm{w}\ast\{\bm{y};QP\}+\bm{b} (1)

The notations 𝒚\bm{y} and 𝒚^\hat{\bm{y}} are the input and the output, 𝒘\bm{w} and 𝒃\bm{b} are the weights and the biases, \ast and \cdot are the convolution and the multiplication, {;}\{\bm{\cdot};\bm{\cdot}\} is the concatenation operation. By expanding it:

𝒚^\displaystyle\hat{\bm{y}} =𝒘1𝒚+𝒘2QP+𝒃\displaystyle=\bm{w}_{1}\ast\bm{y}+\bm{w}_{2}\ast QP+\bm{b}
=𝒘1𝒚+(Σ𝒘2QP+𝒃)\displaystyle=\bm{w}_{1}\ast\bm{y}+(\Sigma\bm{w}_{2}\cdot QP+\bm{b})
=𝒘1𝒚+𝒃(QP)\displaystyle=\bm{w}_{1}\ast\bm{y}+\bm{b^{\prime}}(QP) (2)

where

𝒃(QP)=Σ𝒘2QP+𝒃\bm{b^{\prime}}(QP)=\Sigma\bm{w}_{2}\cdot QP+\bm{b} (3)

From (3), it can be found that the adaptiveness of [20] is actually achieved by adding QP to the bias term with a linear function. There are some drawbacks of this method. 1. Using linear models for bias may not explain the internal relationship between QP and filtering strength. 2. This method may be less effective because it is built with the bias. It is the weight rather than bias that dominates CNN, so building the QP-adaptive method for weight may be more effective. 3. It lacks robustness and does not fully tap the potential of the QP, since only the input introduces QP. Considering these shortcomings, a better adaptive filtering strategy is designed in this paper.

3 Proposed Method

In this section, the proposed QP-adaptive mechanism is introduced. To begin with, we present the modeling to the problem. Then the implemented solution for CNN is provided.

3.1 Proposed QP-adaptive Mechanism

Given a simple filtering model(we focus on the weight, so the bias is dropped here.):

𝒘𝒚=𝒙^\bm{w}\ast\bm{y}=\hat{\bm{x}} (4)

where 𝒘\bm{w} is the trained convolution kernel, 𝒚\bm{y} is the distorted image, and 𝒙^\hat{\bm{x}} is the filtered image. It is known that the spatial domain convolution is essentially equivalent to frequency domain multiplication.

(𝒘)(𝒚)=(𝒙^)\mathcal{F}(\bm{w})\mathcal{F}(\bm{y})=\mathcal{F}(\hat{\bm{x}}) (5)

where the notation \mathcal{F} represents the Fourier transform. We assume that this simple model can effectively remove the specific noise in 𝒚\bm{y} but can’t handle variable quantization noise. So it is approximated as the original image 𝒙\bm{x}.

(𝒙^)(𝒙)\mathcal{F}(\hat{\bm{x}})\approx\mathcal{F}(\bm{x}) (6)

As is known, the increase of the coding parameter QP represents the added noise that related to Qstep in the frequency domain. Here ϵ\bm{\epsilon} represents the noise caused by the change of QP.

𝒘(𝒚+ϵ)=𝒙^\bm{w}^{\prime}\ast(\bm{y}+\bm{\epsilon})=\hat{\bm{x}}^{\prime} (7)

Therefore, a feasible solution to achieve the adaptiveness for various quantization noise is to find a changeable convolution kernel 𝒘\bm{w}^{\prime} to minimize the loss function between the filtered image 𝒙^\hat{\bm{x}}^{\prime} and the original image 𝒙\bm{x}. Similarly, (7) in frequency domain can be expressed as:

(𝒘)((𝒚)+(ϵ))=(𝒙^)\mathcal{F}(\bm{w^{\prime}})(\mathcal{F}(\bm{y})+\mathcal{F}(\bm{\epsilon}))=\mathcal{F}(\hat{\bm{x}}^{\prime}) (8)

Here we choose classical mean square error (MSE) as the loss function.

=𝔼|𝒙𝒙^|2\mathcal{L}=\mathbb{E}|\bm{x}-\hat{\bm{x}}^{\prime}|^{2} (9)

where 𝔼\mathbb{E} is the notation of expectation. Considering (6) and Parseval’s theorem for the Fourier transform, the loss can be transformed into:

=\displaystyle\mathcal{L}= 𝔼|(𝒙)(𝒙^)|2\displaystyle\mathbb{E}|\mathcal{F}(\bm{x})-\mathcal{F}(\hat{\bm{x}^{\prime}})|^{2}
=\displaystyle= 𝔼|(𝒙)(𝒘)((𝒚)+(ϵ))|2\displaystyle\mathbb{E}|\mathcal{F}(\bm{x})-\mathcal{F}(\bm{w}^{\prime})(\mathcal{F}(\bm{y})+\mathcal{F}(\bm{\epsilon}))|^{2}
\displaystyle\approx 𝔼|(𝒙)(𝒘)[(𝒙)/(𝒘)+(ϵ)]|2\displaystyle\mathbb{E}|\mathcal{F}(\bm{x})-\mathcal{F}(\bm{w}^{\prime})\left[\mathcal{F}(\bm{x})/\mathcal{F}(\bm{w})+\mathcal{F}(\bm{\epsilon})\right]|^{2} (10)

By taking the derivative w.r.t. (𝒘)\mathcal{F}(\bm{w}^{\prime}), we can obtain the solution:

(𝒘)=(𝒘)org. filter[11+|(𝒘)|2(𝒏)/(𝒔)]influence factor\mathcal{F}(\bm{w}^{\prime})=\underbrace{\mathcal{F}(\bm{w})}_{\text{org. filter}}\underbrace{\left[\frac{1}{1+|\mathcal{F}(\bm{w})|^{2}\mathcal{F}(\bm{n})/\mathcal{F}(\bm{s})}\right]}_{\text{influence factor}} (11)

where (𝒏)=𝔼|(ϵ)|2\mathcal{F}(\bm{n})=\mathbb{E}|\mathcal{F}(\bm{\epsilon})|^{2} and (𝒔)=𝔼|(𝒙)|2\mathcal{F}(\bm{s})=\mathbb{E}|\mathcal{F}(\bm{x})|^{2}. Here, the first term (𝒘)\mathcal{F}(\bm{w}) is the original filter in (5), and the second term is equivalent to the influence factor that compensats for the increased quantization noise. It can be found that this solution is similar in form to Wiener deconvolution [23]. The difference lies in the motivations and forms. Wiener deconvolution hopes to recover the original signal from the distorted signal by using the priors of the input signal, noise, and degradation function. While this solution doesn’t have the concept of degradation function and it aims at making a specific filter become adaptive to the changing quantization noise.

Table 1: The Comparison of Different Multi-QP strategies on Overall BD-rate Reductions and the Number of Model Parameters
Models Global (single model ×\times1) Separate (single model×\times4) Proposed (single model ×\times1)   \bigstrut
Param. Y U V Param. Y U V Param. Y U V   \bigstrut
Liu et al[5] 12,266 -0.97% -1.55% -2.61% 12,266×\times4 -2.28% -1.64% -2.68% 12,555 -2.28% -1.70% -2.90% \bigstrut
VRCNN[6] 54,512 -0.48% -1.57% -2.23% 54,512×\times4 -1.88% -1.58% -2.34% 54,673 -1.85% -1.62% -2.46% \bigstrut
DCAD[7] 296,641 -2.21% -2.00% -3.07% 296,641×\times4 -3.83% -2.46% -3.84% 297,218 -3.74% -2.78% -3.93% \bigstrut
Tucodec[8] 447,681 -3.72% -2.73% -3.63% 447,681×\times4 -4.49% -2.72% -3.88% 448,514 -4.54% -2.95% -4.21% \bigstrut
Refer to caption
Fig. 1: Schematic diagram of the proposed method. The influence factors 1/(1+𝜽Qstep2)1/(1+\bm{\theta}Q_{step}^{2})are broadcasted to c×\times h×\times w during the element-wise multiplication, where c, h, w are the channel, height, and weight respectively.
Refer to caption
Fig. 2: Relative PSNR gain curves than VTM-6.3 Anchor. Tucodec is used as the backbone of this test.

3.2 Applying QP-adaptive Mechanism to CNN

From the perspective of the frequency domain, the extracted features from CNNs are equivalent to specific selections of the input image at different frequencies, which establishes the relationship between CNN and the frequency domain in the solution. For example, the Gaussian kernel is a low-pass filter and the Laplacian kernel is a high-pass one. The filter 𝒘\bm{w}^{\prime} of the entire frequency band can be decomposed into different sub-filters, and each sub-filter wiw^{\prime}_{i} works in a selected frequency sub-band. With (11), the (𝒘)\mathcal{F}(\bm{w}^{\prime}) could be written as:

(𝒘)\displaystyle\mathcal{F}(\bm{w}^{\prime}) =i(wi)\displaystyle=\sum_{i}\mathcal{F}(w^{\prime}_{i})
=i(wi)[11+|(wi)|2(ni)/(si)]\displaystyle=\sum_{i}\mathcal{F}(w_{i})\left[\frac{1}{1+|\mathcal{F}(w_{i})|^{2}\mathcal{F}(n_{i})/\mathcal{F}(s_{i})}\right] (12)

The first term (wi)\mathcal{F}(w_{i}) in (3.2) is actually equivalent to the convolution kernel in CNN. Due to the decomposition, the second term 1/[1+|(wi)|21/[1+|\mathcal{F}(w_{i})|^{2} (ni)/(si)]\mathcal{F}(n_{i})/\mathcal{F}(s_{i})] that represents the influence factor of each kernel can be regarded as working only in a sub frequency band. In this sub-band, we approximate |(wi)|2|\mathcal{F}(w_{i})|^{2} in the influence factor to a constant. The strength of the original signal (si)\mathcal{F}(s_{i}) is also invariable in the task of adapting to different quantization noises, so |(wi)|2/(si)|\mathcal{F}(w_{i})|^{2}/\mathcal{F}(s_{i}) can be approximated as a constant kik_{i}. For the intensity of the quantization noise (𝒏)\mathcal{F}(\bm{n}), it is proportional to the square of Qstep at all frequencies with the default coding setting. Similarly, the decomposed noise (ni)\mathcal{F}(n_{i}) has the same pattern at the selected frequency.

|(wi)|2(si)(ni)ki(ni)Qstep2\frac{|\mathcal{F}(w_{i})|^{2}}{\mathcal{F}(s_{i})}\mathcal{F}(n_{i})\approx k_{i}\mathcal{F}(n_{i})\propto Q_{step}^{2} (13)

We use trainable parameters 𝜽\bm{\theta} to represent the proportional relationship here. Therefore, (3.2) can be rewritten as follows.

(𝒘)i(wi)[11+θiQstep2]\mathcal{F}(\bm{w}^{\prime})\approx\sum_{i}\mathcal{F}(w_{i})\left[\frac{1}{1+\theta_{i}Q_{step}^{2}}\right] (14)

where the θi\theta_{i} indicates a specific parameter in set 𝜽\bm{\theta}. We assume that NN represents the number of feature maps of the CNN, then the number of parameters introduced by this method should be O(N2)O(N^{2}), which is the same order as the number of the kernels. Inspired by DSC [16] that uses depthwise convolution instead of standard convolution, we apply the influence factor to the feature maps instead of the convolution kernels. Fig. 1 shows the schematic diagram of our proposed method. Thus the parameter quantity becomes O(N)O(N), which is the same order as the number of the biases (the QP map method [20]).

3.3 Implementation Detail

Same as HEVC[1], the relationship between QP and Qstep in VVC can be written as [24]:

Qstep=2(QP4)/6Q_{step}=2^{(QP-4)/6} (15)

so the square of Qstep is:

Qstep2=2(QP4)/3Q_{step}^{2}=2^{(QP-4)/3} (16)

Due to the trainable multiplier θi\theta_{i} in (14), multiplying different Qstep2Q_{step}^{2} by the same constant does not affect the performance of the model. A normalization of using 2(QP32)/32^{(QP-32)/3} to replace Qstep2Q_{step}^{2} is adopted, which can avoid the gradient vanishing problem caused by large Qstep2Q_{step}^{2}. Besides, the parameter θi\theta_{i} should be greater than 0 because both |(wi)|2|\mathcal{F}(w_{i})|^{2} and (𝒔)\mathcal{F}(\bm{s}) are greater than 0. When 𝜽\bm{\theta} are 𝟎\bm{0}, the proposed model will turn into the original CNN filter. There are two common methods to solve this: 1. Using the reparametrization of 𝜽=exp(𝜼)\bm{\theta}=exp(\bm{\eta}), where 𝜼\bm{\eta} are the unconstraint trainable parameters. 2. Directly truncating 𝜽\bm{\theta}. We adopt the second one in this paper.

The order of the filtering process of H.266/VVC with CNN filter is the luma mapping with chroma scaling(LMCS), DB, CNN filter, SAO, and ALF. Both SAO and ALF need to add some bits to indicate the offsets and coefficients. By putting the CNN-based filter before SAO and ALF, a filtered image with higher quality can be sent to SAO and ALF, thereby reducing the number of coded bits. For the DB, which depends on the preset thresholds to perform filtering, putting the CNN filter before it may need to modify its thresholds accordingly. Therefore, instead of using this strategy, we choose to put the CNN filter after it.

4 Experiment

In this section, the experimental setting is first introduced. Then we provide the experiment results on coding efficiency and complexity. Finally, the comparisons with previous work are provided.

4.1 Experimental Setting

By integrating our method into different models, the BD-rate [25] reduction of the proposed method could be tested in various situations, such as complexity, activation, serial or parallel structure, etc. Here we chose four different models, including Liu et al. [5], VRCNN [6], DCAD [7], and Tucodec [8], as the backbones for our experiment. The CNN-filter was integrated into the Versatile Video Coding Test Model (VTM)-6.3 [26] and put between DB and SAO. The DIV2K dataset [27] was used to train and validate all of the mentioned CNN filters. We divided 900 pictures in DIV2K into 800 as the training set and 100 as the validation set. Four QPs including 22, 27, 32, and 37 in common test condition (CTC[28]) were used to encode these pictures. By cutting these pictures into 64×6464\times 64 blocks, we obtained 522,877 samples for training and 66,712 samples for validation under each QP. With framework Keras [29] and optimizer Adam [30], about 40,000 iterations were trained for each QP with batch size 128. In the test phase, the 1-st frames from HEVC test sequences were used for evaluating the performance of the aforementioned filters. It is worth mentioning that these test sequences were not overlapped with the datasets used in the training or validation phase.

4.2 Performance Evaluation

From the results shown in Table 1, the ”Global” column of using a single model has the lowest BD-rate reduction. By using multiple models shown in the ”Separate” column, the overall BD-rate reduction of the CNN-filter has been improved, but the number of required parameters has increased fourfold. From the ”Proposed” column, it can be seen that our proposed method enables a single model to have excellent performance for all four backbones while only increasing a small number of parameters. This shows the versatility and flexibility of our proposed method. Its BD-rate reduction is almost the same as that of the separate method on the luma component. And it achieves extra 0.2% BD-rate reduction on the chroma components. This fully demonstrates that our method effectively improves the generalization ability of the model, because we only use the luma component for training. The performance of the chroma components is completely dependent on the generalization ability of the model. As shown in Fig. 2, the PSNR gains of the methods relative to the VTM baseline are also plotted. The dotted line represents the separated method, and the QP in its legend represents the dataset used for training. We can find that the lines peak in the QP of the corresponding training dataset but perform poorly on the other QPs. Especially for lower QP, it may even lead to a negative impact on performance. The blue solid line represents the global method, which can obtain an ordinary BD-rate gain at higher QP, but similarly, it hurts the reconstructed image at lower QP. On the contrary, the proposed method has a significant filtering performance in a wide range of QPs and almost reaches the optimal performance of using multiple separate models. This further demonstrates the effectiveness and versatility of our model.

Table 2: Comparison of Relative Decoding Complexity
Class Liu et al. [5] DCAD [7] Tucodec [8]\bigstrut
Global Proposed Global Proposed Global Proposed   \bigstrut
A 345.3% 353.4% 602.1% 612.5% 683.2% 694.6% \bigstrut
B 453.4% 461.0% 693.1% 704.9% 794.6% 814.5% \bigstrut
C 432.2% 442.7% 775.4% 770.1% 817.3% 834.0% \bigstrut
D 585.3% 627.4% 1443.2% 1461.3% 1463.7% 1487.9% \bigstrut
E 555.3% 563.8% 1047.2% 1059.8% 1148.0% 1177.0% \bigstrut
Average 474.3% 489.6% 912.2% 921.7% 981.4% 1001.6% \bigstrut

4.3 Complexity Evaluation

In Table 1, the parameter comparisons of different methods are shown in the ”Param.” column. In addition, Table 2 shows comparisons of relative decoding complexity than VTM anchor, which shares the same test setting with Section 4.2. Although the separate method uses more models than the global method, for the filtering of determined QP, they both use a single model with the same structure, so their complexity should be the same. From Table 2, the decoding complexity of our proposed method only increases by about 2% compared with the global method. Therefore, the impact of our proposed method on complexity is minimal. This lays a good foundation for the practical application of our method.

Refer to caption
Fig. 3: Comparison of the overall relative PSNR gain of Song et al. [20] and the proposed method.

4.4 Comparison with Previous Work

The performance comparison of the proposed method and the QP map method (Song et al.[20]) are shown in Fig. 3. These two methods achieve similar relative PSNR gain with the backbone of Liu et al. [5], DCAD [7]. For VRCNN[6], Song et al.[20] has a negative impact on lower QPs, whereas our proposed method still achieves a minor gain when QP=22QP=22. Our method performs more robust, probably because it provides the quantization information for each convolution layer but Song et al.[20] only does it for the input one. This comparison demonstrates the robustness and versatility of our method.

5 Conclusion

In this paper, we present a novel method to improve the adaptability of CNN-filters to different QPs. By adding influence factors related to Qstep to the CNN-filter, CNN can suppress the quantization noise as the noise changes. The proposed method achieves excellent performance on previous CNN-filters and yields similar BD-rate reduction to using multiple models. Besides, the complexity evaluations of different trained models prove that it only brings a slight increase in complexity and has a promising future for practical applications. Finally, the comparison with previous work shows that our proposed method is more robust and stable. We believe that in the future, more efficient methods will emerge based on further design and modeling.

6 Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant 61674041, in part by Alibaba Group through Alibaba Innovative Research (AIR) Program, in part by the STCSM under Grant 16XD1400300, in part by the pioneering project of academy for engineering and technology and Fudan-CIOMP joint fund, in part by the National Natural Science Foundation of China under Grant 61525401, in part by the Program of Shanghai Academic/Technology Research Leader under Grant 16XD1400300, in part by the Innovation Program of Shanghai Municipal Education Commission, in part by JST, PRESTO Grant Number JPMJPR19M5, Japan.

References

  • [1] Madhukar Budagavi, Arild Fuldseth, and Gisle Bjøntegaard, “Hevc transform and quantization,” in High Efficiency Video Coding (HEVC), pp. 141–169. Springer, 2014.
  • [2] Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.
  • [3] “H266.,” https://en.wikipedia.org/wiki/Versatile_Video_Coding.
  • [4] C. Liu, H. Sun, J. Chen, Z. Cheng, M. Takeuchi, J. Katto, X. Zeng, and Y. Fan, “Dual learning-based video coding with inception dense blocks,” in 2019 Picture Coding Symposium (PCS), Nov 2019, pp. 1–5.
  • [5] Chao Liu, Heming Sun, Jiro Katto, Xiaoyang Zeng, and Yibo Fan, “A convolutional neural network-based low complexity filter,” arXiv preprint arXiv:2009.02733, 2020.
  • [6] Yuanying Dai, Dong Liu, and Feng Wu, “A convolutional neural network approach for post-processing in hevc intra coding,” in International Conference on Multimedia Modeling. Springer, 2017, pp. 28–39.
  • [7] Tingting Wang, Mingjin Chen, and Hongyang Chao, “A novel deep learning-based method of improving coding efficiency from the decoder-end for hevc,” in 2017 Data Compression Conference (DCC). IEEE, 2017, pp. 410–419.
  • [8] Lei Zhou, Chunlei Cai, Yue Gao, Sanbao Su, and Junmin Wu, “Variational autoencoder for low bit-rate image compression.,” in CVPR Workshops, 2018, pp. 2617–2620.
  • [9] Heming Sun, Chao Liu, Jiro Katto, and Yibo Fan, “An image compression framework with learning-based filter,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 152–153.
  • [10] Jan P Klopp, Liang-Gee Chen, and Shao-Yi Chien, “Utilising low complexity cnns to lift non-local redundancies in video coding,” IEEE Transactions on Image Processing, 2020.
  • [11] Daowen Li and Lu Yu, “An in-loop filter based on low-complexity cnn using residuals in intra video coding,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2019, pp. 1–5.
  • [12] Xiaoyi He, Qiang Hu, Xiaoyun Zhang, Chongyang Zhang, Weiyao Lin, and Xintong Han, “Enhancing hevc compressed videos with a partition-masked convolutional neural network,” in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 216–220.
  • [13] Yongbing Zhang, Tao Shen, Xiangyang Ji, Yun Zhang, Ruiqin Xiong, and Qionghai Dai, “Residual highway convolutional neural networks for in-loop filtering in hevc,” IEEE Transactions on image processing, vol. 27, no. 8, pp. 3827–3841, 2018.
  • [14] Chuanmin Jia, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Jiaying Liu, Shiliang Pu, and Siwei Ma, “Content-aware convolutional neural network for in-loop filtering in high efficiency video coding,” IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3343–3356, 2019.
  • [15] Chen Li, Li Song, Rong Xie, and Wenjun Zhang, “Cnn based post-processing to improve hevc,” in 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017, pp. 4577–4580.
  • [16] Laurent Sifre and Stéphane Mallat, “Rigid-motion scattering for image classification,” Ph. D. thesis, 2014.
  • [17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [18] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, 2013, vol. 30, p. 3.
  • [19] Vinod Nair and Geoffrey E Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807–814.
  • [20] Xiaodan Song, Jiabao Yao, Lulu Zhou, Li Wang, Xiaoyang Wu, Di Xie, and Shiliang Pu, “A practical convolutional neural network as loop filter for intra frame,” in 2018 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2018, pp. 1133–1137.
  • [21] Shufang Zhang, Zenghui Fan, Nam Ling, and Minqiang Jiang, “Recursive residual convolutional neural network-based in-loop filtering for intra frames,” IEEE Transactions on Circuits and Systems for Video Technology, 2019.
  • [22] Han Zhu, Xiaozhong Xu, and Shan Liu, “Residual convolutional neural network based in-loop filter with intra and inter frames processed respectively for avs3,” in 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 2020, pp. 1–6.
  • [23] “Wiener deconvolution,” https://en.wikipedia.org/wiki/Wiener
    _deconvolution.
  • [24] Chen Jianle, Ye Yan, and Kim Seung Hwan, “Algorithm description for versatile video coding and test model 6 (vtm 6),” document JVET-O2001, 15th JVET meeting: Gothenburg, SE, July 2019.
  • [25] G. Bjontegarrd, “Calculation of average psnr differences between rdcurves,” VCEG-M33, 2001.
  • [26] “Video coding test model,” https://vcgit.hhi.fraunhofer.de/jvet/
    VVCSoftware_VTM/.
  • [27] Eirikur Agustsson and Radu Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
  • [28] Frank Bossen, Jill Boyce, X Li, V Seregin, and K Sühring, “Jvet common test conditions and software reference configurations for sdr video,” Joint Video Experts Team (JVET) of ITU-T SG, vol. 16, 2018.
  • [29] François Chollet et al., “Keras,” https://github.com/fchollet/ker
    as, 2015.
  • [30] Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.