This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

DCT Approximations Based on Chen’s Factorization

C. J. Tablada Signal Processing Group, Departamento de Estatística, Universidade Federal de Pernambuco, Recife, PE, Brazil    T. L. T. da Silveira Programa de Pós-Graduação em Computação, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil    R. J. Cintra Signal Processing Group, Departamento de Estatística, Universidade Federal de Pernambuco, Recife, PE, Brazil E-mail: [email protected]    F. M. Bayer Departamento de Estatística and LACESM, Universidade Federal de Santa Maria, Santa Maria, RS, Brazil
{onecolabstract}

In this paper, two 8-point multiplication-free DCT approximations based on the Chen’s factorization are proposed and their fast algorithms are also derived. Both transformations are assessed in terms of computational cost, error energy, and coding gain. Experiments with a JPEG-like image compression scheme are performed and results are compared with competing methods. The proposed low-complexity transforms are scaled according to Jridi-Alfalou-Meher algorithm to effect 16- and 32-point approximations. The new sets of transformations are embedded into an HEVC reference software to provide a fully HEVC-compliant video coding scheme. We show that approximate transforms can outperform traditional transforms and state-of-the-art methods at a very low complexity cost.

Keywords
Approximate DCT, Chen’s factorization, Fast algorithms, Image and video compression, Low-complexity transforms

1 Introduction

Discrete transforms are very useful tools in digital signal processing and compressing technologies [10, 23]. In this context, the discrete cosine transform (DCT) plays a key role [2] since it is a practical approximation for the Karhunen-Loève transform (KLT) [15]. The KLT has the property of being optimal in terms of energy compaction when the intended signals are modeled after a highly correlated first-order Markov process [10]. This is a widely accepted supposition for natural images [17].

Particularly, the DCT type II (DCT-II) of order 8 is a widely employed method [10], being adopted in several industry standards of image and video compression, such as JPEG [17], MPEG-1 [40], MPEG-2 [25], H.261 [26], H.263 [27], H.264 [39], and the state-of-the-art HEVC [38]. Aiming at the efficient computation of the DCT, many fast algorithms have been reported in literature [12, 33, 43, 53]. Nevertheless, these methods usually need expensive arithmetic operations, as multiplications, and an arithmetic on floating point, demanding major hardware requirements [32].

An alternative to the exact DCT computation is the use of DCT approximations that employ integer arithmetic only and do not require multiplications [10, 23]. In this context, several approximations for the DCT-II were proposed in literature. Often the elements of approximate transform matrices are defined over the set 𝒫={0,12,1,2}\mathcal{P}=\{0,\pm\frac{1}{2},\pm 1,\pm 2\} [20, 14]. Relevant methods include the signed DCT (SDCT) [20] and the Bouguezel-Ahmad-Swamy (BAS) series of approximations [7, 8, 9, 3].

Transform matrices with elements in 𝒫\mathcal{P} have null multiplicative complexity [5]. Thus their associate hardware implementations require only additions and bit-shifting operations [3]. This fact renders such multiplierless approximations suitable for hardware-software implementations on devices/sensors operating at low computational power and severe energy consumption constraints [11, 39, 40].

In this work, we aim at the following goals:

  • the proposition of new multiplication-free approximations for the 8-point DCT-II, based on Chen’s algorithm [12];

  • the derivation of fast algorithms for the introduced transforms;

  • a comprehensive assessment of the new approximations in terms of coding and image compression performance compared to popular alternatives;

  • the extension of the proposed 8-point transforms to 16- and 32-point DCT approximations by means of the scalable recursive method proposed in [29]; and

  • the embedment of the obtained approximations into an HEVC-compliant reference software [28].

This paper unfolds as follows. Section 2 presents Chen’s factorization for the 8-point DCT-II. In Section 3, we present two novel 8-point multiplierless transforms and their fast algorithms. The proposed approximations are assessed and mathematically compared with competing methods in Section 4. Section 5 provides comprehensive image compression analysis based on a JPEG-like compression scheme. Several images are compressed and assessed for quality according to the approximate transforms. Section 6 extends the 8-point transforms to 16- and 32-point DCT approximations and considers a real-world video encoding scheme based on these particular DCT approximations. Final conclusions and remarks are drawn in Section 7.

2 Chen’s factorization for the DCT

Chen et al. [12] proposed a fast algorithm for the DCT-II based on a factorization for the DCT type IV (DCT-IV). These two versions of the DCT differ in the sample points of the cosine function used in their transformation kernel [48, 10]. The (m,n)(m,n)-th element of the NN-point DCT-II and DCT-IV transform matrices, respectively denoted as 𝐂NII\mathbf{C}_{N}^{\mbox{\scriptsize II}} and 𝐂NIV\mathbf{C}_{N}^{\mbox{\scriptsize IV}}, are given by:

(𝐂NIIm,n=\displaystyle\left[\mathbf{C}_{N}^{\mbox{\scriptsize II}}\right]_{m,n}= (2Ncmcos(m(2n+1)π2N)m,n,\displaystyle\left[\sqrt{\frac{2}{N}}\,\,c_{m}\cos\!\left(\frac{m(2n+1)\pi}{2N}\right)\right]_{m,n},
(𝐂NIVm,n=\displaystyle\left[\mathbf{C}_{N}^{\mbox{\scriptsize IV}}\right]_{m,n}= (2Ncos((2m+1)(2n+1)π4N)m,n,\displaystyle\left[\sqrt{\frac{2}{N}}\,\cos\!\left(\frac{(2m+1)(2n+1)\pi}{4N}\right)\right]_{m,n},

where m,n=0,1,,N-1m,n=0,1,\ldots,N-1 and

cm={12,if m=0,1,if m0.\displaystyle c_{m}=\left\{\begin{array}[]{cl}1/\sqrt{2},&\text{if $m=0$},\\ 1,&\text{if $m\neq 0$}.\\ \end{array}\right.

In the following, let 𝟎N\mathbf{0}_{N} the zero matrix of order NN, 𝐈N\mathbf{I}_{N} the NNN\times N identity and 𝐈¯N\bar{\mathbf{I}}_{N} the counter-identity matrix, which is given by:

𝐈¯N=(001010100.\displaystyle\bar{\mathbf{I}}_{N}=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ \\ 0&\@cdots&0&1\\[1.42271pt] 0&\@cdots&1&0\\[1.42271pt] \@vdots&\udots&\@vdots&\@vdots\\[1.42271pt] 1&\@cdots&0&0\\[1.42271pt] \end{smallmatrix}\right].

In [47], Wang demonstrated that the 8-point DCT-II matrix possesses the following factorization:

𝐂8II=12𝐏8(𝐂4II𝟎4𝟎4𝐈¯4𝐂4IV𝐈¯4𝐁8,\mathbf{C}_{8}^{\mbox{\scriptsize II}}=\frac{1}{2}\,\mathbf{P}_{8}\left[\begin{array}[]{cc}\mathbf{C}_{4}^{\mbox{\scriptsize II}}&\mathbf{0}_{4}\\ \mathbf{0}_{4}&\bar{\mathbf{I}}_{4}\,\mathbf{C}_{4}^{\mbox{\scriptsize IV}}\,\bar{\mathbf{I}}_{4}\\ \end{array}\right]\mathbf{B}_{8}, (1)

where 𝐏8\mathbf{P}_{8} and 𝐁8\mathbf{B}_{8} are permutation and pre-addition matrices given by, respectively:

𝐏8=(1000000000000001010000000000001000100000000001000001000000001000,𝐁8=(𝐈4𝐈¯4𝐈¯4𝐈4.\displaystyle\mathbf{P}_{8}=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ &\ \ &\ \ &\ \ &\ \ \\[1.42271pt] 1&0&0&0&0&0&0&0\\[1.42271pt] 0&0&0&0&0&0&0&1\\[1.42271pt] 0&1&0&0&0&0&0&0\\[1.42271pt] 0&0&0&0&0&0&1&0\\[1.42271pt] 0&0&1&0&0&0&0&0\\[1.42271pt] 0&0&0&0&0&1&0&0\\[1.42271pt] 0&0&0&1&0&0&0&0\\[1.42271pt] 0&0&0&0&1&0&0&0\\[1.42271pt] \end{smallmatrix}\right],\ \mathbf{B}_{8}=\left[\begin{array}[]{rr}\mathbf{I}_{4}&\bar{\mathbf{I}}_{4}\\ \bar{\mathbf{I}}_{4}&-\mathbf{I}_{4}\\ \end{array}\right].

Additionally, Chen et al. suggested in [12] that the matrix 𝐂4IV\mathbf{C}_{4}^{\mbox{\scriptsize IV}} admits the following factorization:

𝐂4IV=𝐐𝐀1𝐀2𝐀3,\mathbf{C}_{4}^{\mbox{\scriptsize IV}}=\mathbf{Q}\,\mathbf{A}_{1}\,\mathbf{A}_{2}\,\mathbf{A}_{3}, (2)

where

𝐐=(1000001001000001,𝐀1=(β000β30β2β100β1β20β300β0,𝐀2=(1100110000110011,𝐀3=(00010αα00αα01000,\displaystyle\mathbf{Q}=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ \\ 1&0&0&0\\[1.42271pt] 0&0&1&0\\[1.42271pt] 0&1&0&0\\[1.42271pt] 0&0&0&1\\[1.42271pt] \end{smallmatrix}\right],\;\mathbf{A}_{1}=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ \\ \beta_{0}&\phantom{-}0&\phantom{-}0&\phantom{-}\beta_{3}\\[1.42271pt] 0&\phantom{-}\beta_{2}&\phantom{-}\beta_{1}&\phantom{-}0\\[1.42271pt] 0&\phantom{-}\beta_{1}&-\beta_{2}&\phantom{-}0\\[1.42271pt] \beta_{3}&\phantom{-}0&\phantom{-}0&-\beta_{0}\\[1.42271pt] \end{smallmatrix}\right],\;\mathbf{A}_{2}=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ \\ 1&\phantom{-}1&\phantom{-}0&\phantom{-}0\\[1.42271pt] 1&-1&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&-1&\phantom{-}1\\[1.42271pt] 0&\phantom{-}0&\phantom{-}1&\phantom{-}1\\[1.42271pt] \end{smallmatrix}\right],\;\mathbf{A}_{3}=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ \\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}1\\[1.42271pt] 0&\phantom{-}\alpha&\phantom{-}\alpha&\phantom{-}0\\[1.42271pt] 0&-\alpha&\phantom{-}\alpha&\phantom{-}0\\[1.42271pt] 1&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] \end{smallmatrix}\right],

with α=cos(π4)\alpha=\cos\!\left(\frac{\pi}{4}\right) and βn=cos((2n+1)π16)\beta_{n}=\cos\!\left(\frac{(2n+1)\pi}{16}\right).

Replacing (2) in (1) and expanding the factorization, we obtain:

𝐂8II=12𝐏8𝐌1𝐌2𝐌3𝐌4𝐁8,\mathbf{C}_{8}^{\mbox{\scriptsize II}}=\frac{1}{2}\,\mathbf{P}_{8}\,\mathbf{M}_{1}\,\mathbf{M}_{2}\,\mathbf{M}_{3}\,\mathbf{M}_{4}\,\mathbf{B}_{8}\ , (3)

where

𝐌1=\displaystyle\mathbf{M}_{1}= (𝐈4𝟎4𝟎4𝐈¯4𝐐,𝐌2=(𝐏4𝟎4𝟎4𝐀1,𝐌3=(𝐂widetilde𝟎4𝟎4𝐀2,\displaystyle\left[\begin{array}[]{cc}\mathbf{I}_{4}&\mathbf{0}_{4}\\ \mathbf{0}_{4}&\bar{\mathbf{I}}_{4}\,\mathbf{Q}\end{array}\right],\ \mathbf{M}_{2}=\left[\begin{array}[]{cc}\mathbf{P}_{4}&\mathbf{0}_{4}\\ \mathbf{0}_{4}&\mathbf{A}_{1}\end{array}\right],\ \mathbf{M}_{3}=\left[\begin{array}[]{cc}\widetilde{\mathbf{C}}&\mathbf{0}_{4}\\ \mathbf{0}_{4}&\mathbf{A}_{2}\end{array}\right],
𝐌4=\displaystyle\mathbf{M}_{4}= (𝐁4𝟎4𝟎4𝐀3,𝐏4=(1000000101000010,𝐁4=(𝐈2𝐈¯2𝐈¯2𝐈2,\displaystyle\left[\begin{array}[]{cc}\mathbf{B}_{4}&\mathbf{0}_{4}\\ \mathbf{0}_{4}&\mathbf{A}_{3}\end{array}\right],\ \mathbf{P}_{4}=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ \\ 1&0&0&0\\[1.42271pt] 0&0&0&1\\[1.42271pt] 0&1&0&0\\[1.42271pt] 0&0&1&0\\[1.42271pt] \end{smallmatrix}\right],\ \mathbf{B}_{4}=\left[\begin{array}[]{rr}\mathbf{I}_{2}&\bar{\mathbf{I}}_{2}\\ \bar{\mathbf{I}}_{2}&-\mathbf{I}_{2}\end{array}\right],
𝐂widetilde=\displaystyle\widetilde{\mathbf{C}}= (𝐂2II𝟎4𝟎4𝐈¯2𝐂2IV𝐈¯2=(αα00αα0000γ0γ100γ1γ0,\displaystyle\left[\begin{array}[]{cc}\mathbf{C}_{2}^{\mbox{\scriptsize II}}&\mathbf{0}_{4}\\ \mathbf{0}_{4}&\bar{\mathbf{I}}_{2}\,\mathbf{C}_{2}^{\mbox{\scriptsize IV}}\,\bar{\mathbf{I}}_{2}\end{array}\right]=\left[\begin{smallmatrix}\ \ &\ \ &\ \ &\ \ \\ \alpha&\phantom{-}\alpha&\phantom{-}0&\phantom{-}0\\[1.42271pt] \alpha&-\alpha&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&-\gamma_{0}&\phantom{-}\gamma_{1}\\[1.42271pt] 0&\phantom{-}0&\phantom{-}\gamma_{1}&\phantom{-}\gamma_{0}\\[1.42271pt] \end{smallmatrix}\right],

with γn=cos((2n+1)π8)\gamma_{n}=\cos\!\left(\frac{(2n+1)\pi}{8}\right). The expression in (3) is referred to as Chen’s factorization for the 8-point DCT-II.

Without any fast algorithm, the computation of the DCT-II requires 64 multiplications and 56 additions. Using the Chen’s factorization in (3) the arithmetic complexity is reduced to 16 multiplications and 26 additions. The quantities α\alpha, βn\beta_{n}, and γn\gamma_{n}, presented in 𝐌2\mathbf{M}_{2}, 𝐌3\mathbf{M}_{3}, and 𝐌4\mathbf{M}_{4}, are irrational numbers and demand non-trivial multiplications.

For the sake of notation, hereafter the DCT-II is referred to as DCT.

3 Proposed DCT approximations

In this section, new approximations for the DCT are sought. To this end, we notice that the factorization (3) naturally induces the following mapping:

TC:42\displaystyle\operatorname{T}_{C}:\mathbb{R}\times\mathbb{R}^{4}\times\mathbb{R}^{2} 8()\displaystyle\leftrightline\mathrel{\mkern-3.1mu}\rightarrow\mathcal{M}_{8}(\mathbb{R})
(α,𝜷,𝜸)\displaystyle(\alpha,\bm{\beta},\bm{\gamma}) 𝐏8𝐌1𝐌2𝐌3𝐌4𝐁8,\displaystyle\leftrightline\mathrel{\mkern-3.1mu}\rightarrow\mathbf{P}_{8}\,\mathbf{M}_{1}\,\mathbf{M}_{2}\,\mathbf{M}_{3}\,\mathbf{M}_{4}\,\mathbf{B}_{8}, (4)

where 8()\mathcal{M}_{8}(\mathbb{R}) is the space of 88 matrices with real-valued entries, α\alpha\in\mathbb{R}, 𝜷=(β0β1β2β34\bm{\beta}=[\beta_{0}\ \ \beta_{1}\ \ \beta_{2}\ \ \beta_{3}]\in\mathbb{R}^{4}, and 𝜸=(γ0γ12\bm{\gamma}=[\gamma_{0}\ \ \gamma_{1}]\in\mathbb{R}^{2}. Now the matrices 𝐌2\mathbf{M}_{2}, 𝐌3\mathbf{M}_{3}, and 𝐌4\mathbf{M}_{4} are seen as matrix functions, where the constants in (3) are understood as parameters:

𝐌2=𝐌2(𝜷),𝐌3=𝐌3(α,𝜸),𝐌4=𝐌4(α).\begin{split}\mathbf{M}_{2}&=\mathbf{M}_{2}(\bm{\beta}),\\ \mathbf{M}_{3}&=\mathbf{M}_{3}(\alpha,\bm{\gamma}),\\ \mathbf{M}_{4}&=\mathbf{M}_{4}(\alpha).\end{split} (5)

In particular, for the values

α=cos(π4),βn=cos((2n+1)π16),n=0,1,2,3,γn=cos((2n+1)π8),n=0,1,\displaystyle\begin{split}\alpha=\,&\cos\left(\frac{\pi}{4}\right),\\ \beta_{n}=\,&\cos\left(\frac{(2n+1)\pi}{16}\right),\ n=0,1,2,3,\\ \gamma_{n}=\,&\cos\left(\frac{(2n+1)\pi}{8}\right),\ n=0,1,\end{split} (6)

we have TC(α,𝜷,𝜸)=2𝐂8II\operatorname{T}_{C}(\alpha,\bm{\beta},\bm{\gamma})=2\,\mathbf{C}_{8}^{\mbox{\scriptsize II}}. In the following, we vary the values of parameters α\alpha, 𝜷\bm{\beta}, and 𝜸\bm{\gamma} aiming at the derivation of low-complexity matrices whose elements are restricted to the set 𝒫={0,12,1,2}\mathcal{P}=\{0,\pm\frac{1}{2},\pm 1,\pm 2\}.

To facilitate our approach, we consider the signum and round-off functions, respectively, given by:

sign(x)\displaystyle\operatorname{sign}(x) ={1,ifx>0,0,ifx=0,1,ifx<0,\displaystyle=\begin{cases}1,&\ \mbox{if}\ x>0,\\ 0,&\ \mbox{if}\ x=0,\\ -1,&\ \mbox{if}\ x<0,\end{cases}
round(x)\displaystyle\operatorname{round}(x) =sign(x)x+12\displaystyle=\operatorname{sign}(x)\,\left\lfloor\left|x\right|+\frac{1}{2}\right\rfloor

where \lfloorx\rfloor=max{m\midmx}\lfloor x\rfloor=\max\,\{m\in\mathbb{Z}\mid m\le x\} is the floor function for xx\in\mathbb{R}. These functions coincide with their definitions implemented in C and MATLAB computer languages. When applied to vectors or matrices, sign()\operatorname{sign}(\cdot) and round()\operatorname{round}(\cdot) operate entry-wise.

Thereby, considering the values in (6) and applying directly the functions above, we obtain the approximate vectors shown below:

(αwidetilde,𝜷widetilde,𝜸widetilde)=\displaystyle(\widetilde{\alpha},\widetilde{\bm{\beta}},\widetilde{\bm{\gamma}})= sign((α,𝜷,𝜸)=(1 1 1 1 1 1 1,\displaystyle\operatorname{sign}[(\alpha,\bm{\beta},\bm{\gamma})]=\left[1\ \ 1\ \ 1\ \ 1\ \ 1\ \ 1\ \ 1\right],
(αwidehat,𝜷widehat,𝜸widehat)=\displaystyle(\widehat{\alpha},\widehat{\bm{\beta}},\widehat{\bm{\gamma}})= round((α,𝜷,𝜸)=(1 1 0 1 1 1 0.\displaystyle\operatorname{round}[(\alpha,\bm{\beta},\bm{\gamma})]=\left[1\ \ 1\ \ 0\ \ 1\ \ 1\ \ 1\ \ 0\right].

Then, the following matrices are generated according to (5):

𝐌widetilde2=\displaystyle\widetilde{\mathbf{M}}_{2}=\, 𝐌2(𝜷widetilde),𝐌widehat2=𝐌2(𝜷widehat),\displaystyle\mathbf{M}_{2}(\widetilde{\bm{\beta}}),\ \widehat{\mathbf{M}}_{2}=\mathbf{M}_{2}(\widehat{\bm{\beta}}),
𝐌widetilde3=\displaystyle\widetilde{\mathbf{M}}_{3}=\, 𝐌3(αwidetilde,𝜸widetilde),𝐌widehat3=𝐌3(αwidehat,𝜸widehat),\displaystyle\mathbf{M}_{3}(\widetilde{\alpha},\widetilde{\bm{\gamma}}),\ \widehat{\mathbf{M}}_{3}=\mathbf{M}_{3}(\widehat{\alpha},\widehat{\bm{\gamma}}),
𝐌widetilde4=\displaystyle\widetilde{\mathbf{M}}_{4}=\, 𝐌4(αwidetilde),𝐌widehat4=𝐌4(αwidehat),\displaystyle\mathbf{M}_{4}(\widetilde{\alpha}),\ \widehat{\mathbf{M}}_{4}=\mathbf{M}_{4}(\widehat{\alpha}),

which are explicitly given by:

𝐌widetilde2=\displaystyle\widetilde{\mathbf{M}}_{2}= (1000000000010000010000000010000000001001000001100000011000001001,𝐌widehat2=(1000000000010000010000000010000000001000000001100000011000000001,\displaystyle\begin{bmatrix}\begin{smallmatrix}&&&&&&&\\ 1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}1\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&-1\\[1.42271pt] \end{smallmatrix}\end{bmatrix},\;\ \ \widehat{\mathbf{M}}_{2}=\begin{bmatrix}\begin{smallmatrix}&&&&&&&\\ 1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1\\[1.42271pt] \end{smallmatrix}\end{bmatrix},
𝐌widetilde3=\displaystyle\widetilde{\mathbf{M}}_{3}= (1100000011000000001100000011000000001100000011000000001100000011,𝐌widehat3=(1100000011000000001000000001000000001100000011000000001100000011,\displaystyle\begin{bmatrix}\begin{smallmatrix}&&&&&&&\\ 1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 1&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&-1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1&\phantom{-}1\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1\\[1.42271pt] \end{smallmatrix}\end{bmatrix},\;\ \ \widehat{\mathbf{M}}_{3}=\begin{bmatrix}\begin{smallmatrix}&&&&&&&\\ 1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 1&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1&\phantom{-}1\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1\\[1.42271pt] \end{smallmatrix}\end{bmatrix},
𝐌widetilde4=\displaystyle\widetilde{\mathbf{M}}_{4}= (1001000001100000011000001001000000000001000001100000011000001000=𝐌widehat4.\displaystyle\begin{bmatrix}\begin{smallmatrix}&&&&&&&\\ 1&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}1&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 1&\phantom{-}0&\phantom{-}0&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1&\phantom{-}1&\phantom{-}0\\ 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] \end{smallmatrix}\end{bmatrix}=\widehat{\mathbf{M}}_{4}.

Invoking the factorization from (3), we define the following new transforms:

𝐓widetilde8TC(αwidetilde,𝜷widetilde,𝜸widetilde)=\displaystyle\widetilde{\mathbf{T}}_{8}\triangleq\operatorname{T}_{C}\left(\widetilde{\alpha},\widetilde{\bm{\beta}},\widetilde{\bm{\gamma}}\right)= 𝐏8𝐌1𝐌widetilde2𝐌widetilde3𝐌widetilde4𝐁8,\displaystyle\,\mathbf{P}_{8}\,\mathbf{M}_{1}\,\widetilde{\mathbf{M}}_{2}\,\widetilde{\mathbf{M}}_{3}\,\widetilde{\mathbf{M}}_{4}\,\mathbf{B}_{8}, (7)
𝐓widehat8TC(αwidehat,𝜷widehat,𝜸widehat)=\displaystyle\widehat{\mathbf{T}}_{8}\triangleq\operatorname{T}_{C}\left(\widehat{\alpha},\widehat{\bm{\beta}},\widehat{\bm{\gamma}}\right)= 𝐏8𝐌1𝐌widehat2𝐌widehat3𝐌widehat4𝐁8.\displaystyle\,\mathbf{P}_{8}\,\mathbf{M}_{1}\,\widehat{\mathbf{M}}_{2}\,\widehat{\mathbf{M}}_{3}\,\widehat{\mathbf{M}}_{4}\,\mathbf{B}_{8}. (8)

The numerical evaluation of (7) and (8) reveals the following matrix transforms:

𝐓widetilde8=(1111111112011021111111111021120111111111120110211111111110211201,𝐓widehat8=(1111111111100111100110011021120111111111120110210110011001111110.\displaystyle\widetilde{\mathbf{T}}_{8}=\left[\begin{smallmatrix}&\ \ \ &\ \ \ &\ \ \ &\ \ \ &\ \ \ &\ \ \ &\ \ \ \\[1.42271pt] 1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1\\[1.42271pt] 1&\phantom{-}2&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0&-2&-1\\[1.42271pt] 1&\phantom{-}1&-1&-1&-1&-1&\phantom{-}1&\phantom{-}1\\[1.42271pt] 1&\phantom{-}0&-2&-1&\phantom{-}1&\phantom{-}2&\phantom{-}0&-1\\[1.42271pt] 1&-1&-1&\phantom{-}1&\phantom{-}1&-1&-1&\phantom{-}1\\[1.42271pt] 1&-2&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0&\phantom{-}2&-1\\[1.42271pt] 1&-1&\phantom{-}1&-1&-1&\phantom{-}1&-1&\phantom{-}1\\[1.42271pt] 1&\phantom{-}0&\phantom{-}2&-1&\phantom{-}1&-2&\phantom{-}0&-1\\[1.42271pt] \end{smallmatrix}\right],\;\widehat{\mathbf{T}}_{8}=\left[\begin{smallmatrix}&\ \ \ &\ \ \ &\ \ \ &\ \ \ &\ \ \ &\ \ \ &\ \ \ \\[1.42271pt] 1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1&\phantom{-}1\\[1.42271pt] 1&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0&-1&-1&-1\\[1.42271pt] 1&\phantom{-}0&\phantom{-}0&-1&-1&\phantom{-}0&\phantom{-}0&\phantom{-}1\\[1.42271pt] 1&\phantom{-}0&-2&-1&\phantom{-}1&\phantom{-}2&\phantom{-}0&-1\\[1.42271pt] 1&-1&-1&\phantom{-}1&\phantom{-}1&-1&-1&\phantom{-}1\\[1.42271pt] 1&-2&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0&\phantom{-}2&-1\\[1.42271pt] 0&-1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0\\[1.42271pt] 0&-1&\phantom{-}1&-1&\phantom{-}1&-1&\phantom{-}1&\phantom{-}0\\[1.42271pt] \end{smallmatrix}\right].

Above transformations have simple inverse matrices. Direct matrix inversion rules applied to (7) and (8) furnish:

𝐓widetilde81=\displaystyle\widetilde{\mathbf{T}}_{8}^{-1}= 12𝐁8𝐌widetilde41𝐌widetilde31𝐌widetilde21𝐌1𝐏8,\displaystyle\,\frac{1}{2}\,\mathbf{B}_{8}\,\widetilde{\mathbf{M}}_{4}^{-1}\,\widetilde{\mathbf{M}}_{3}^{-1}\,\widetilde{\mathbf{M}}_{2}^{-1}\,\mathbf{M}_{1}\,\mathbf{P}_{8}, (9)
𝐓widehat81=\displaystyle\widehat{\mathbf{T}}_{8}^{-1}= 12𝐁8𝐌widehat41𝐌widehat31𝐌widehat21𝐌1𝐏8,\displaystyle\,\frac{1}{2}\,\mathbf{B}_{8}\,\widehat{\mathbf{M}}_{4}^{-1}\,\widehat{\mathbf{M}}_{3}^{-1}\,\widehat{\mathbf{M}}_{2}^{-1}\,\mathbf{M}_{1}\,\mathbf{P}_{8}, (10)

where

𝐏81=\displaystyle\mathbf{P}_{8}^{-1}= 𝐏8,𝐌11=𝐌1,𝐌widetilde21=12(2000000000200000000200000200000000001001000001100000011000001001,𝐌widehat21=12(2000000000200000000200000200000000002000000001100000011000000002,\displaystyle\,\mathbf{P}_{8},\;\mathbf{M}_{1}^{-1}=\,\mathbf{M}_{1},\;\widetilde{\mathbf{M}}_{2}^{-1}=\,\frac{1}{2}\,\left[\begin{smallmatrix}&&&&&&&\\[1.42271pt] 2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}1\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&-1\\[1.42271pt] \end{smallmatrix}\right],\;\widehat{\mathbf{M}}_{2}^{-1}=\frac{1}{2}\,\left[\begin{smallmatrix}&&&&&&&\\[1.42271pt] 2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-2\\[1.42271pt] \end{smallmatrix}\right],
𝐌widetilde31=\displaystyle\widetilde{\mathbf{M}}_{3}^{-1}= 12𝐌widetilde3,𝐌widehat31=12(1100000011000000002000000002000000001100000011000000001100000011,𝐌widetilde41=12(1001000001100000011000001001000000000002000001100000011000002000=𝐌widehat41,\displaystyle\,\frac{1}{2}\,\widetilde{\mathbf{M}}_{3},\;\widehat{\mathbf{M}}_{3}^{-1}=\,\frac{1}{2}\,\left[\begin{smallmatrix}&&&&&&&\\[1.42271pt] 1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 1&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&-2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&2&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&-1&\phantom{-}1\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1\\[1.42271pt] \end{smallmatrix}\right],\;\widetilde{\mathbf{M}}_{4}^{-1}=\frac{1}{2}\,\left[\begin{smallmatrix}&&&&&&&\\[1.42271pt] 1&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}1&\phantom{-}1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}1&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 1&\phantom{-}0&\phantom{-}0&-1&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}2\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&-1&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}1&\phantom{-}1&\phantom{-}0\\[1.42271pt] 0&\phantom{-}0&\phantom{-}0&\phantom{-}0&\phantom{-}2&\phantom{-}0&\phantom{-}0&\phantom{-}0\\[1.42271pt] \end{smallmatrix}\right]=\widehat{\mathbf{M}}_{4}^{-1},

and

𝐁81=\displaystyle\mathbf{B}_{8}^{-1}= 12𝐁8.\displaystyle\,\frac{1}{2}\,\mathbf{B}_{8}.

Figure 1 depicts the signal flow graphs (SFG) of the proposed fast algorithm for the transform 𝐓widehat8\widehat{\mathbf{T}}_{8} and its inverse 𝐓widehat81\widehat{\mathbf{T}}_{8}^{-1}. Figure 1(a) and 1(b) are linked to (8) and (10), respectively. The SFGs for the 𝐓widetilde8\widetilde{\mathbf{T}}_{8} and 𝐓widetilde81\widetilde{\mathbf{T}}_{8}^{-1} are similar and were suppressed for brevity.

Refer to caption

x0x_{0}x1x_{1}x2x_{2}x3x_{3}x4x_{4}x5x_{5}x6x_{6}x7x_{7}X0X_{0}X2X_{2}X4X_{4}X6X_{6}X7X_{7}X5X_{5}X3X_{3}X1X_{1}

(a) 𝐓widehat8\widehat{\mathbf{T}}_{8}
Refer to caption

X0X_{0}X2X_{2}X4X_{4}X6X_{6}X7X_{7}X5X_{5}X3X_{3}X1X_{1}x0x_{0}x1x_{1}x2x_{2}x3x_{3}x4x_{4}x5x_{5}x6x_{6}x7x_{7}121/2121/2121/2121/22-2222222

(b) 𝐓widehat81\widehat{\mathbf{T}}_{8}^{-1}
Figure 1: Signal flow graph for 𝐓widehat8\widehat{\mathbf{T}}_{8} and 𝐓widehat81\widehat{\mathbf{T}}_{8}^{-1} (dotted lines indicates multiplication by 1-1).

3.1 Orthogonality and near orthogonality

The proposed transforms lead to nonorthogonal approximations for the DCT. This is also the case for the well-known SDCT [20] and the BAS approximation described in [7]. Indeed, for image/video processing, orthogonality is not strict requirement; being near orthogonality sufficient for very good energy compaction properties.

To quantify how close a given matrix is to an orthogonal matrix, we adopt the deviation from diagonality measure [16], which is described as follows. Let 𝐌\mathbf{M} be a square matrix. The deviation from diagonality of 𝐌\mathbf{M} is given by:

δ(𝐌)=1-\|diag(𝐌)\|F2\|𝐌\|F2,\displaystyle\operatorname{\delta}(\mathbf{M})=1-\frac{\|\operatorname{diag}(\mathbf{M})\|_{\text{F}}^{2}}{\|\mathbf{M}\|_{\text{F}}^{2}},

where \|\|F\|\cdot\|_{\text{F}} denotes the Frobenius norm [51, p. 115]. For diagonal matrices, the function δ()\delta(\cdot) returns zero. Therefore for a full rank low-complexity transformation matrix 𝐓\mathbf{T}, we can measure its closeness to orthogonality by calculating δ(𝐓𝐓)\operatorname{\delta}(\mathbf{T}\,\mathbf{T}).

Both the 8-point SDCT [20] and the BAS approximation proposed in [7] are nonorthogonal and good DCT approximations. Their orthogonalization matrices have deviation from diagonality equal to 0.20 and 0.1774, respectively. Comparatively, matrices 𝐓widetilde8\widetilde{\mathbf{T}}_{8} and 𝐓widehat8\widehat{\mathbf{T}}_{8} have deviation from diagonality equal to 0.07140.0714 and 0.05790.0579, respectively. Because these values are smaller than those for the SCDT and BAS approximations, the proposed transformations are more “nearly orthogonal” than such approximations.

In [45], the problem of deriving DCT approximations based on nonorthogonal matrices was addressed. The approach consists of a variation of the polar decomposition method as described in [21]. Let 𝐓\mathbf{T} be a full rank low-complexity transformation matrix. If 𝐓\mathbf{T} satisfies the condition:

𝐓𝐓=𝐃,\displaystyle\mathbf{T}\,\mathbf{T}=\mathbf{D}, (11)

where 𝐃\mathbf{D} is a diagonal matrix, then it is possible to derive an orthonormal approximation 𝐂^\hat{\mathbf{C}} linked to 𝐓\mathbf{T}. This is accomplished by means of the polar decomposition [21]:

𝐂^=𝐒𝐓,\displaystyle\hat{\mathbf{C}}=\mathbf{S}\,\mathbf{T},

where 𝐒=(𝐓𝐓)1\mathbf{S}=\sqrt{\left(\mathbf{T}\,\mathbf{T}\right)^{-1}} is a positive definite matrix and \sqrt{\cdot} denotes the matrix square root operation [22].

From the computational point-of-view, it is desirable that 𝐒\mathbf{S} be a diagonal matrix [13]. In this case, the computational complexity of 𝐂^\hat{\mathbf{C}} is the same as that of 𝐓\mathbf{T}, except for the scaling factors in the diagonal matrix 𝐒\mathbf{S}. Moreover, in several applications, such scaling factors can be embedded into related computation step. For example, in JPEG-like compression the quantization step can absorb the diagonal elements [3, 9, 31, 14].

Since the transformations 𝐓widetilde8\widetilde{\mathbf{T}}_{8} and 𝐓widehat8\widehat{\mathbf{T}}_{8} do not satisfy (11), one may consider approximating 𝐒\mathbf{S} itself by replacing the off-diagonal elements of 𝐃\mathbf{D} by zeros, at the expense of not obtaining a precisely orthogonal approximation 𝐂^\hat{\mathbf{C}}. Therefore, we consider the following approximate matrix for 𝐒\mathbf{S}:

𝐒\prime=(diag(𝐓𝐓)1\displaystyle\mathbf{S}^{\prime}=\sqrt{\left[\operatorname{diag}(\mathbf{T}\,\mathbf{T})\right]^{-1}}

where diag()\operatorname{diag}(\cdot) returns the diagonal matrix from its matrix argument. Thus, the near orthogonal approximations for the DCT-II associated to the proposed transforms are given by:

𝐂widetilde8=𝐒widetilde8𝐓widetilde8,\displaystyle\widetilde{\mathbf{C}}_{8}=\widetilde{\mathbf{S}}_{8}\,\widetilde{\mathbf{T}}_{8},
𝐂widehat8=𝐒widehat8𝐓widehat8,\displaystyle\widehat{\mathbf{C}}_{8}=\widehat{\mathbf{S}}_{8}\,\widehat{\mathbf{T}}_{8},

where

𝐒widetilde8=\displaystyle\widetilde{\mathbf{S}}_{8}= diag(18,112,18,112,18,112,18,112),\displaystyle\,\operatorname{diag}\left(\frac{1}{\sqrt{8}}\,,\,\frac{1}{\sqrt{12}}\,,\,\frac{1}{\sqrt{8}}\,,\,\frac{1}{\sqrt{12}}\,,\,\frac{1}{\sqrt{8}}\,,\,\frac{1}{\sqrt{12}}\,,\,\frac{1}{\sqrt{8}}\,,\,\frac{1}{\sqrt{12}}\right),
𝐒widehat8=\displaystyle\widehat{\mathbf{S}}_{8}= diag(18,16,12,112,18,112,12,16).\displaystyle\,\operatorname{diag}\left(\frac{1}{\sqrt{8}}\,,\,\frac{1}{\sqrt{6}}\,,\,\frac{1}{2}\,,\,\frac{1}{\sqrt{12}}\,,\,\frac{1}{\sqrt{8}}\,,\,\frac{1}{\sqrt{12}}\,,\,\frac{1}{2}\,,\,\frac{1}{\sqrt{6}}\right).

Notice that 𝐒widetilde8\widetilde{\mathbf{S}}_{8} and 𝐒widehat8\widehat{\mathbf{S}}_{8} derive from (3.1).

4 Performance assessment and computational cost

To measure the proximity of the new multiplierless transforms with respect to the exact DCT, we elected the total error energy [14] as figure of merit. We also considered the coding gain relative to the KLT [18] as the measure for coding performance evaluation. For comparison, we separated the classical approximations SDCT [20] and BAS [7]—which are nonorthogonal—as well as the HT [42] and the WHT [23], both orthogonal.

4.1 Performance measures

4.1.1 Total error energy

The total error energy ϵ total\epsilon_{\scriptsize\mbox{\,total}} is a measure of spectral similarity between the exact DCT and the considered approximate DCT [14]. Although originally defined over the spectral domain [20], the total error energy for a given DCT approximation matrix 𝐂^\hat{\mathbf{C}} can be written as:

ϵ total=π\|𝐂-𝐂^\|F2,\displaystyle\epsilon_{\scriptsize\mbox{\,total}}=\pi\,\|\mathbf{C}-\hat{\mathbf{C}}\|_{{}_{F}}^{2},

where 𝐂\mathbf{C} is the exact DCT matrix and \|\|F\|\cdot\|_{{}_{F}} denotes the Frobenius norm [51]. The total error energy measurements for all discussed approximations are listed in Table 1.

Table 1: Total error energy of the considered transforms
𝐂widehat8\widehat{\mathbf{C}}_{8} 𝐂widetilde8\widetilde{\mathbf{C}}_{8} SDCT BAS WHT HT
ϵ total\epsilon_{\scriptsize\mbox{\,total}} 1.791.79 3.643.64 3.323.32 4.124.12 5.055.05 47.6147.61

The proposed approximation 𝐂widehat8\widehat{\mathbf{C}}_{8} presents the lower total error energy among all considered transforms, at the same time that requires only 22 additions. The BAS transform, which possess the smallest arithmetic cost among the considered methods, presents a considerably higher total error energy. Thus, 𝐂widehat8\widehat{\mathbf{C}}_{8} and SDCT outperform BAS. Comparatively, HT and WHT are less suitable approximations for the DCT.

4.1.2 Coding gain relative to the KLT

For coding performance evaluation, images are assumed to be modeled after a first-order Markov process with correlation coefficient ρ\rho, where 0ρ<10\leq\rho<1 [32, 10, 18]. Natural images satisfy the above assumptions [32]. Then, the (m,n)(m,n)-th element of the covariance matrix 𝐑𝐱\mathbf{R}_{\mathbf{x}} of the input signal 𝐱\mathbf{x} is given by rm,n(𝐱)=ρm-nr^{(\mathbf{x})}_{m,n}=\rho^{|m-n|} [10].

Let 𝐡k\mathbf{h}_{k} and 𝐠k\mathbf{g}_{k} be the kkth rows of 𝐂^\hat{\mathbf{C}} and 𝐂^1\hat{\mathbf{C}}^{-1}, respectively. Thus, the coding gain of 𝐂^\hat{\mathbf{C}} is given by:

Cg(𝐂^)=10log10(\slimits@k=1N1(AkBk)1N(in dB),\displaystyle C_{g}(\hat{\mathbf{C}})=10\,\log_{10}\!\left[\tprod\slimits@_{k=1}^{N}\frac{1}{(A_{k}\,B_{k})^{1/N}}\right]\quad(\text{in dB}),

where Ak=su((𝐡k𝐡k)𝐑𝐱A_{k}=\operatorname{su}[(\mathbf{h}_{k}\,\mathbf{h}_{k})\circ\mathbf{R}_{\mathbf{x}}], su()\operatorname{su}(\cdot) returns the sum of the elements of its matrix argument [35], operator is the element-wise matrix product [41], Bk=\|𝐠k\|22B_{k}=\|\mathbf{g}_{k}\|_{2}^{2}, and \|\|2\|\cdot\|_{2} is the Euclidean norm. For orthogonal transforms, the unified coding gain collapses into the usual coding gain as defined in [10, 18].

High coding gain values indicate better energy compaction capability into the transform domain [32]. In this sense, the KLT is optimal [10, 15]. Thus, an appropriate measure for evaluating the coding gain is given by [18]:

Cg(𝐂^)-Cg(KLT),\displaystyle C_{g}(\hat{\mathbf{C}})-C_{g}({\mbox{KLT}}),

where Cg(KLT)C_{g}({\mbox{KLT}}) denotes the coding gain corresponding to the KLT. For example, for N=8N=8 and ρ=0.95\rho=0.95, the coding gains linked to the KLT and DCT are 8.8462 dB and 8.8259 dB, respectively [10]. Hence, the DCT coding gain relative to the KLT is Cg(𝐂)-Cg(KLT)=0.0203C_{g}({\mathbf{C}})-C_{g}({\mbox{KLT}})=-0.0203.

Refer to caption
Figure 2: Coding gain relative to the KLT for the considered transforms.

Figure 2 compares the values of coding gain relative to the KLT for the considered transforms in the range 0ρ<10\leq\rho<1. As expected, the DCT has the smallest difference with respect to the KLT, followed by the HT and WHT (both with the same values). Roughly orthogonal transforms tend to show better coding gain performance when compared to nonorthogonal. The proposed transforms 𝐂widehat8\widehat{\mathbf{C}}_{8} and 𝐂widetilde8\widetilde{\mathbf{C}}_{8} could outperform the SDCT and BAS, both nonorthogonal transformations. As ρ1\rho\to 1, the approximation 𝐂widehat8\widehat{\mathbf{C}}_{8} performs as well as the HT and WHT. This scenario is realistic for image compression, because natural images exhibit high inter-pixel correlation [17].

4.2 Computational cost

The low-complexity matrices associated to the proposed approximations and their inverses possess multiplierless matrix factorizations as shown in (7), (8), (9), and (10). Therefore, the only truly multiplicative elements are the ones found in the diagonal matrices 𝐒widetilde8\widetilde{\mathbf{S}}_{8} and 𝐒widehat8\widehat{\mathbf{S}}_{8}. However, in the context of image and video coding, such diagonal matrices can easily absorbed in the quantization step [3, 9, 31, 14]. As a consequence, in that context, the introduced approximations can be understood as fully multiplierless operations.

Table 2 presents the arithmetic complexity of the considered transforms. The computational cost of the exact DCT according to the Chen’s factorization (cf. (3)) [12] and the integer DCT for HEVC [34] are also included for comparison. The proposed approximation 𝐂widehat8\widehat{\mathbf{C}}_{8} requires only 22 additions. On the other hand, the computational cost of 𝐂widetilde8\widetilde{\mathbf{C}}_{8} is comparatively larger.

Table 2: Arithmetic complexity of the considered 8-point transforms
Transform Mult Add Shift Total
DCT [12] 1616 2626 0 4242
HEVC [34] 0 5050 3030 8080
𝐂widehat8\widehat{\mathbf{C}}_{8} 0 2222 0 2222
𝐂widetilde8\widetilde{\mathbf{C}}_{8} 0 2626 0 2626
SDCT [20] 0 2424 0 2424
BAS [7] 0 2121 0 2121
WHT [23] 0 2424 0 2424
HT [42] 0 2424 0 2424

5 Low-complexity image compression

In this section, we describe a JPEG-like image compression computational experiment. Proposed transformations are evaluated in terms of their data compression capability.

5.1 JPEG-like compression

We considered a set of 30 standard 8-bit 512512 gray scale images obtained from [1]. The images were subdivided into 88 blocks. Each block 𝐀\mathbf{A} was submitted to the 2-D transformation given by [44]:

𝐁^=𝐂^𝐀𝐂^1,\displaystyle\hat{\mathbf{B}}=\hat{\mathbf{C}}\,\mathbf{A}\,\hat{\mathbf{C}}^{-1},

where 𝐂^\hat{\mathbf{C}} is a particular transformation. The 64 obtained coefficients in 𝐁^\hat{\mathbf{B}} were arranged according to the standard zig-zag sequence [46]. The first rr coefficients were retained, being the remaining coefficients discarded. We adopted 1r451\leq r\leq 45. Then for each block 𝐁^\hat{\mathbf{B}}, the 2-D inverse transform was applied to reconstruct the compressed image:

𝐀^=𝐂^1𝐁^𝐂^.\displaystyle\hat{\mathbf{A}}=\hat{\mathbf{C}}^{-1}\,\hat{\mathbf{B}}\,\hat{\mathbf{C}}.

Finally, the resulting image was compared with the original image according to the peak signal-to-noise ratio (PSNR) [24] and the structural similarity index (SSIM) [50, 49]. The absolute percentage error (APE) for such metrics relative to the exact DCT was also considered. The PSNR is the most commonly used measure for image quality evaluation. However, the SSIM measure consider visual human system characteristics which are not considered by PSNR [50]. Following the methodology proposed in [14], the average values of the measures over the 30 standard images were computed. It leads to statistically more robust results when compared with single image analysis [30].

5.2 Results

Refer to caption
(a) Average PSNR
Refer to caption
(b) Average PSNR absolute percentage error relative to the DCT.
Figure 3: PSNR measures for several compression ratios.
Refer to caption
(a) Average SSIM
Refer to caption
(b) Average SSIM absolute percentage error relative to the DCT
Figure 4: SSIM measures for several compression ratios.

Results of the still image compression experiment are presented in Figure 4 and 4. In Figure 4(b), the curve corresponding to the HT was suppressed because it presents excessively high values in comparison with other curves. In terms of PSNR, 𝐂widehat8\widehat{\mathbf{C}}_{8} outperforms both the SDCT and the BAS approximations; and provides similar results as to those furnished by the WHT, but at a lower computational cost. For SSIM results, both transforms 𝐂widetilde8\widetilde{\mathbf{C}}_{8} and 𝐂widehat8\widehat{\mathbf{C}}_{8} show similar results. In terms of PSNR, the proposed approximation 𝐂widehat8\widehat{\mathbf{C}}_{8} performs closely to the WHT. Figure 3(a) and 3(b) show that 𝐂widehat8\widehat{\mathbf{C}}_{8} is superior to the WHT at high compression ratios (r15r\leq 15).

A qualitative analysis is displayed in Figure 5. Standard Elaine image was compressed and reconstructed according to the SDCT, WHT, HT, BAS, and the proposed approximation 𝐂widehat8\widehat{\mathbf{C}}_{8} for visual inspection. Compressed image resulted from the exact DCT is also exhibited. All images were compressed with r=6r=6, which represents a removal of approximately 90.6%90.6\% of the transformed coefficients. The visual analysis of the obtained images shows the superiority of the proposed transform 𝐂widehat8\widehat{\mathbf{C}}_{8} over the SDCT in image compression. Furthermore, Table 3 lists the PSNR and SSIM values for the Elaine image, for r=6r=6. The PSNR and SSIM values are listed for two additional images (Lenna and Boat images). The proposed transform 𝐂widehat8\widehat{\mathbf{C}}_{8} outperforms the HT, WHT, and SDCT approximations in terms of PSNR and SSIM and BAS approximation in terms of PSNR for the Elaine and Lenna images.

Refer to caption
(a) DCT
Refer to caption
(b) SDCT
Refer to caption
(c) WHT
Refer to caption
(d) HT
Refer to caption
(e) BAS
Refer to caption
(f) 𝐂widehat8\widehat{\mathbf{C}}_{8}
Figure 5: Elaine image (r=6r=6): 5(a) DCT, 5(b) SDCT, 5(c) WHT, 5(d) HT, 5(e) BAS and 5(f) 𝐂widehat8\widehat{\mathbf{C}}_{8}.
Table 3: Quality measures for three compressed images, considering r=6r=6
Elaine image Lenna image Boat image
Transform PSNR SSIM PSNR SSIM PSNR SSIM
DCT [12] 31.0331.03 0.950.95 29.9029.90 0.950.95 26.9426.94 0.920.92
𝐂widehat8\widehat{\mathbf{C}}_{8} 30.0030.00 0.940.94 28.7928.79 0.940.94 26.0426.04 0.910.91
BAS [7] 29.4529.45 0.940.94 28.2828.28 0.950.95 26.1326.13 0.920.92
WHT [23] 28.9128.91 0.920.92 27.9327.93 0.920.92 25.8525.85 0.900.90
SDCT [20] 27.5927.59 0.880.88 26.2626.26 0.880.88 24.0924.09 0.850.85
HT [42] 25.4425.44 0.770.77 24.2724.27 0.750.75 24.2724.27 0.680.68

5.3 Blocking artifact analysis

A visually undesirable effect in image compression is the emergence of blocking artifacts [17, p. 573]. Figure 6 shows a qualitative comparison in terms of blocking artifact resulted from 𝐂widehat8\widehat{\mathbf{C}}_{8}, WHT, and BAS. The proposed approximation 𝐂widehat8\widehat{\mathbf{C}}_{8} effected a lower degree of blocking artifact comparatively with the WHT and BAS.

Refer to caption
(a) WHT
Refer to caption
(b) BAS
Refer to caption
(c) 𝐂widehat8\widehat{\mathbf{C}}_{8}
Figure 6: Blocking artifact effect in the Elaine image (r=6r=6): 6(a) WHT, 6(b) BAS and 6(c) 𝐂widehat8\widehat{\mathbf{C}}_{8}.

6 HEVC-compliant video encoding

In this section, we aim at demonstrating the practical real-world applicability of the proposed DCT approximations for video coding. However, the HEVC standard requires not only an 8-point transform, but also 4-, 16-, and 32-point transforms. In order to derive Chen’s DCT approximations for larger blocklengths, we considered the algorithm proposed by Jridi-Alfalou-Meher (JAM) [29]. We embedded these low-complexity transforms into a publicly available reference software compliant with the HEVC standard [28].

The JAM algorithm consists of a scalable method for obtaining higher order transforms from an 8-point DCT approximation. An NN-point DCT approximation 𝐓ˇN\check{\mathbf{T}}_{N}, where NN is a power of two, is recursively obtained through:

𝐓ˇN=12𝐌Nper(𝐓ˇN2𝟎N2𝟎N2𝐓ˇN2𝐌Nadd,\displaystyle\check{\mathbf{T}}_{N}=\frac{1}{\sqrt{2}}\mathbf{M}^{\text{per}}_{N}\begin{bmatrix}\check{\mathbf{T}}_{\frac{N}{2}}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\check{\mathbf{T}}_{\frac{N}{2}}\end{bmatrix}\mathbf{M}^{\text{add}}_{N},

where

𝐌Nadd=(𝐈N2𝐈¯N2𝐈¯N2𝐈N2 and 𝐌Nper=(𝐏N-1,N2𝟎1,N2𝟎1,N2𝐏N-1,N2,\displaystyle\mathbf{M}^{\text{add}}_{N}=\begin{bmatrix}\mathbf{I}_{\frac{N}{2}}&\mathbf{\bar{I}}_{\frac{N}{2}}\\ \mathbf{\bar{I}}_{\frac{N}{2}}&-\mathbf{I}_{\frac{N}{2}}\end{bmatrix}\quad\text{ and }\quad\mathbf{M}^{\text{per}}_{N}=\begin{bmatrix}\mathbf{P}_{N-1,\frac{N}{2}}&\mathbf{0}_{1,\frac{N}{2}}\\ \mathbf{0}_{1,\frac{N}{2}}&\mathbf{P}_{N-1,\frac{N}{2}}\end{bmatrix},

and 𝐏N-1,N2\mathbf{P}_{N-1,\frac{N}{2}} is an (N-1)(N2)(N-1)\times(N/2) resulting from expanding the identity matrix 𝐈N2\mathbf{I}_{\frac{N}{2}} by interlacing it with zero rows. The matrix 𝐌Nper\mathbf{M}^{\text{per}}_{N} is a permutation matrix and does not introduce any arithmetic cost. Matrix 𝐌Nadd\mathbf{M}^{\text{add}}_{N} contributes with only NN additions. Furthermore, the scaling factor 121/\sqrt{2} can be merged into the image compression quantization step and does not contribute to the arithmetic complexity of the transform. Thus, the additive complexity of 𝐓ˇN\mathbf{\check{T}}_{N} is equal to twice the additive complexity of 𝐓ˇN2\mathbf{\check{T}}_{\frac{N}{2}} plus NN additions [29].

6.1 Chen’s DCT approximations for large blocklengths

In its original form, the JAM algorithm employs the 8-point DCT approximation introduced in [14]. In this section, we adapt the JAM algorithm to derive DCT approximations based on Chen’s algorithm for arbitrary power-of-two (N>8N>8) blocklengths. We are specially interested in 16- and 32-point low-complexity transformations for subsequent embedding into the HEVC standard. Let N>8N>8 be a power of two. We introduce the Chen’s signed and rounded transformations, respectively, according to the following recursion:

𝐓widetildeN=𝐌Nper(𝐓widetildeN2𝟎N2𝟎N2𝐓widetildeN2𝐌Nadd and 𝐓widehatN=𝐌Nper(𝐓widehatN2𝟎N2𝟎N2𝐓widehatN2𝐌Nadd.\displaystyle\widetilde{\mathbf{T}}_{N}=\mathbf{M}^{\text{per}}_{N}\begin{bmatrix}\widetilde{\mathbf{T}}_{\frac{N}{2}}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\widetilde{\mathbf{T}}_{\frac{N}{2}}\end{bmatrix}\mathbf{M}^{\text{add}}_{N}\quad\text{ and }\quad\widehat{\mathbf{T}}_{N}=\mathbf{M}^{\text{per}}_{N}\begin{bmatrix}\widehat{\mathbf{T}}_{\frac{N}{2}}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\widehat{\mathbf{T}}_{\frac{N}{2}}\end{bmatrix}\mathbf{M}^{\text{add}}_{N}. (12)

Based on (7) and (8), 𝐓widetildeN2\widetilde{\mathbf{T}}_{\frac{N}{2}} and 𝐓widehatN2\widehat{\mathbf{T}}_{\frac{N}{2}} admit the following factorizations:

𝐓widetildeN2\displaystyle\widetilde{\mathbf{T}}_{\frac{N}{2}} =𝐏ˇN2𝐌N2(1)𝐌widetildeN2(2)𝐌widetildeN2(3)𝐌widetildeN2(4)𝐁ˇN2,\displaystyle=\check{\mathbf{P}}_{\frac{N}{2}}\,\mathbf{M}_{\frac{N}{2}}^{(1)}\,\,\widetilde{\mathbf{M}}_{\frac{N}{2}}^{(2)}\,\,\widetilde{\mathbf{M}}_{\frac{N}{2}}^{(3)}\,\,\widetilde{\mathbf{M}}_{\frac{N}{2}}^{(4)}\,\,\check{\mathbf{B}}_{\frac{N}{2}},
𝐓widehatN2\displaystyle\widehat{\mathbf{T}}_{\frac{N}{2}} =𝐏ˇN2𝐌N2(1)𝐌widehatN2(2)𝐌widehatN2(3)𝐌widehatN2(4)𝐁ˇN2.\displaystyle=\check{\mathbf{P}}_{\frac{N}{2}}\,\mathbf{M}_{\frac{N}{2}}^{(1)}\,\,\widehat{\mathbf{M}}_{\frac{N}{2}}^{(2)}\,\,\widehat{\mathbf{M}}_{\frac{N}{2}}^{(3)}\,\,\widehat{\mathbf{M}}_{\frac{N}{2}}^{(4)}\,\,\check{\mathbf{B}}_{\frac{N}{2}}.

Thus, applying the factorizations above in (12) and expanding them, we obtain:

𝐓widetildeN\displaystyle\widetilde{\mathbf{T}}_{N} =𝐏ˇN𝐌N(1)𝐌widetildeN(2)𝐌widetildeN(3)𝐌widetildeN(4)𝐁ˇN,\displaystyle=\check{\mathbf{P}}_{N}\,\mathbf{M}_{N}^{(1)}\,\,\widetilde{\mathbf{M}}_{N}^{(2)}\,\,\widetilde{\mathbf{M}}_{N}^{(3)}\,\,\widetilde{\mathbf{M}}_{N}^{(4)}\,\,\check{\mathbf{B}}_{N}, (13)
𝐓widehatN\displaystyle\widehat{\mathbf{T}}_{N} =𝐏ˇN𝐌N(1)𝐌widehatN(2)𝐌widehatN(3)𝐌widehatN(4)𝐁ˇN,\displaystyle=\check{\mathbf{P}}_{N}\,\mathbf{M}_{N}^{(1)}\,\,\widehat{\mathbf{M}}_{N}^{(2)}\,\,\widehat{\mathbf{M}}_{N}^{(3)}\,\,\widehat{\mathbf{M}}_{N}^{(4)}\,\,\check{\mathbf{B}}_{N}, (14)

where

𝐏ˇN\displaystyle\check{\mathbf{P}}_{N} =𝐌Nper(𝐏ˇN2𝟎N2𝟎N2𝐏ˇN2,𝐁ˇN=(𝐁ˇN2𝟎N2𝟎N2𝐁ˇN2𝐌Nadd,𝐌N(1)=(𝐌N2(1)𝟎N2𝟎N2𝐌N2(1),\displaystyle=\mathbf{M}^{\text{per}}_{N}\begin{bmatrix}\check{\mathbf{P}}_{\frac{N}{2}}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\check{\mathbf{P}}_{\frac{N}{2}}\end{bmatrix},\quad\check{\mathbf{B}}_{N}=\begin{bmatrix}\check{\mathbf{B}}_{\frac{N}{2}}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\check{\mathbf{B}}_{\frac{N}{2}}\end{bmatrix}\mathbf{M}^{\text{add}}_{N},\quad\mathbf{M}_{N}^{(1)}=\begin{bmatrix}\mathbf{M}_{\frac{N}{2}}^{(1)}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\mathbf{M}_{\frac{N}{2}}^{(1)}\end{bmatrix},
𝐌widetildeN(i)\displaystyle\widetilde{\mathbf{M}}_{N}^{(i)} =(𝐌widetildeN2(i)𝟎N2𝟎N2𝐌widetildeN2(i),𝐌widehatN(i)=(𝐌widehatN2(i)𝟎N2𝟎N2𝐌widehatN2(i),i=2,3,4.\displaystyle=\begin{bmatrix}\widetilde{\mathbf{M}}_{\frac{N}{2}}^{(i)}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\widetilde{\mathbf{M}}_{\frac{N}{2}}^{(i)}\end{bmatrix},\quad\widehat{\mathbf{M}}_{N}^{(i)}=\begin{bmatrix}\widehat{\mathbf{M}}_{\frac{N}{2}}^{(i)}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\widehat{\mathbf{M}}_{\frac{N}{2}}^{(i)}\end{bmatrix},\quad i=2,3,4.

Their inverse transformations possess the following factorizations:

𝐓widetildeN1\displaystyle\widetilde{\mathbf{T}}_{N}^{-1} =4N𝐁ˇN𝐌widetildeN(4)1𝐌widetildeN(3)1𝐌widetildeN(2)1𝐌N(1)𝐏ˇN,\displaystyle=\frac{4}{N}\,\check{\mathbf{B}}_{N}\,\left.\widetilde{\mathbf{M}}_{N}^{(4)}\right.^{-1}\left.\widetilde{\mathbf{M}}_{N}^{(3)}\right.^{-1}\left.\widetilde{\mathbf{M}}_{N}^{(2)}\right.^{-1}\left.\mathbf{M}_{N}^{(1)}\right.\check{\mathbf{P}}_{N}, (15)
𝐓widehatN1\displaystyle\widehat{\mathbf{T}}_{N}^{-1} =4N𝐁ˇN𝐌widehatN(4)1𝐌widehatN(3)1𝐌widehatN(2)1𝐌N(1)𝐏ˇN,\displaystyle=\frac{4}{N}\,\check{\mathbf{B}}_{N}\,\left.\widehat{\mathbf{M}}_{N}^{(4)}\right.^{-1}\left.\widehat{\mathbf{M}}_{N}^{(3)}\right.^{-1}\left.\widehat{\mathbf{M}}_{N}^{(2)}\right.^{-1}\left.\mathbf{M}_{N}^{(1)}\right.\check{\mathbf{P}}_{N}, (16)

where

𝐏ˇN\displaystyle\check{\mathbf{P}}_{N} =(𝐏ˇN2𝟎N2𝟎N2𝐏ˇN2𝐌Nper,𝐁ˇN=𝐌Nadd(𝐁ˇN2𝟎N2𝟎N2𝐁ˇN2,𝐌N(1)=(𝐌N2(1)𝟎N2𝟎N2𝐌N2(1),\displaystyle=\begin{bmatrix}\check{\mathbf{P}}_{\frac{N}{2}}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\check{\mathbf{P}}_{\frac{N}{2}}\end{bmatrix}\left.\mathbf{M}^{\text{per}}_{N}\right.,\quad\check{\mathbf{B}}_{N}=\left.\mathbf{M}^{\text{add}}_{N}\right.\begin{bmatrix}\check{\mathbf{B}}_{\frac{N}{2}}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\check{\mathbf{B}}_{\frac{N}{2}}\end{bmatrix},\quad\left.\mathbf{M}_{N}^{(1)}\right.=\begin{bmatrix}\left.\mathbf{M}_{\frac{N}{2}}^{(1)}\right.&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\left.\mathbf{M}_{\frac{N}{2}}^{(1)}\right.\end{bmatrix},
𝐌widetildeN(i)1\displaystyle\left.\widetilde{\mathbf{M}}_{N}^{(i)}\right.^{-1} =(𝐌widetildeN2(i)1𝟎N2𝟎N2𝐌widetildeN2(i)1,𝐌widehatN(i)1=(𝐌widehatN2(i)1𝟎N2𝟎N2𝐌widehatN2(i)1,i=2,3,4.\displaystyle=\begin{bmatrix}\left.\widetilde{\mathbf{M}}_{\frac{N}{2}}^{(i)}\right.^{-1}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\left.\widetilde{\mathbf{M}}_{\frac{N}{2}}^{(i)}\right.^{-1}\end{bmatrix},\quad\left.\widehat{\mathbf{M}}_{N}^{(i)}\right.^{-1}=\begin{bmatrix}\left.\widehat{\mathbf{M}}_{\frac{N}{2}}^{(i)}\right.^{-1}&\mathbf{0}_{\frac{N}{2}}\\ \mathbf{0}_{\frac{N}{2}}&\left.\widehat{\mathbf{M}}_{\frac{N}{2}}^{(i)}\right.^{-1}\end{bmatrix},\quad i=2,3,4.

In particular, for N=16N=16, we have from (7) and (8) that 𝐏ˇ8=𝐏8\check{\mathbf{P}}_{8}=\mathbf{P}_{8}, 𝐁ˇ8=𝐁8\check{\mathbf{B}}_{8}=\mathbf{B}_{8}, 𝐌8(1)=𝐌1\mathbf{M}_{8}^{(1)}=\mathbf{M}_{1}, 𝐌widetilde8(i)=𝐌widetildei\widetilde{\mathbf{M}}_{8}^{(i)}=\widetilde{\mathbf{M}}_{i}, 𝐌widehat8(i)=𝐌widehati\widehat{\mathbf{M}}_{8}^{(i)}=\widehat{\mathbf{M}}_{i}, i=2,3,4i=2,3,4, and therefore it yields:

𝐓widetilde16\displaystyle\widetilde{\mathbf{T}}_{16} =𝐏ˇ16𝐌16(1)𝐌widetilde16(2)𝐌widetilde16(3)𝐌widetilde16(4)𝐁ˇ16,\displaystyle=\check{\mathbf{P}}_{16}\,\mathbf{M}_{16}^{(1)}\,\,\widetilde{\mathbf{M}}_{16}^{(2)}\,\,\widetilde{\mathbf{M}}_{16}^{(3)}\,\,\widetilde{\mathbf{M}}_{16}^{(4)}\,\,\check{\mathbf{B}}_{16},
𝐓widehat16\displaystyle\widehat{\mathbf{T}}_{16} =𝐏ˇ16𝐌16(1)𝐌widehat16(2)𝐌widehat16(3)𝐌widehat16(4)𝐁ˇ16,\displaystyle=\check{\mathbf{P}}_{16}\,\mathbf{M}_{16}^{(1)}\,\,\widehat{\mathbf{M}}_{16}^{(2)}\,\,\widehat{\mathbf{M}}_{16}^{(3)}\,\,\widehat{\mathbf{M}}_{16}^{(4)}\,\,\check{\mathbf{B}}_{16},

where

𝐏ˇ16\displaystyle\check{\mathbf{P}}_{16} =𝐌16per(𝐏8𝟎8𝟎8𝐏8,𝐁ˇ16=(𝐁8𝟎8𝟎8𝐁8𝐌16add,𝐌16(1)=(𝐌1𝟎8𝟎8𝐌1,\displaystyle=\mathbf{M}^{\text{per}}_{16}\begin{bmatrix}\mathbf{P}_{8}&\mathbf{0}_{8}\\ \mathbf{0}_{8}&\mathbf{P}_{8}\end{bmatrix},\quad\check{\mathbf{B}}_{16}=\begin{bmatrix}\mathbf{B}_{8}&\mathbf{0}_{8}\\ \mathbf{0}_{8}&\mathbf{B}_{8}\end{bmatrix}\mathbf{M}^{\text{add}}_{16},\quad\mathbf{M}_{16}^{(1)}=\begin{bmatrix}\mathbf{M}_{1}&\mathbf{0}_{8}\\ \mathbf{0}_{8}&\mathbf{M}_{1}\end{bmatrix},
𝐌widetilde16(i)\displaystyle\widetilde{\mathbf{M}}_{16}^{(i)} =(𝐌widetildei𝟎8𝟎8𝐌widetildei,𝐌widehat16(i)=(𝐌widehati𝟎8𝟎8𝐌widehati,i=2,3,4.\displaystyle=\begin{bmatrix}\widetilde{\mathbf{M}}_{i}&\mathbf{0}_{8}\\ \mathbf{0}_{8}&\widetilde{\mathbf{M}}_{i}\end{bmatrix},\quad\widehat{\mathbf{M}}_{16}^{(i)}=\begin{bmatrix}\widehat{\mathbf{M}}_{i}&\mathbf{0}_{8}\\ \mathbf{0}_{8}&\widehat{\mathbf{M}}_{i}\end{bmatrix},\quad i=2,3,4.

The near orthogonal DCT approximations linked to the proposed low-complexity matrices are given by (cf. Section 3.1):

𝐂widetildeN=𝐒widetildeN𝐓widetildeN,𝐂widehatN=𝐒widehatN𝐓widehatN,\displaystyle\widetilde{\mathbf{C}}_{N}=\widetilde{\mathbf{S}}_{N}\,\widetilde{\mathbf{T}}_{N},\quad\widehat{\mathbf{C}}_{N}=\widehat{\mathbf{S}}_{N}\,\widehat{\mathbf{T}}_{N},

where 𝐒widetildeN=(diag(𝐓widetildeN𝐓widetildeN)1\widetilde{\mathbf{S}}_{N}=\sqrt{[\operatorname{diag}(\widetilde{\mathbf{T}}_{N}\,\widetilde{\mathbf{T}}_{N})]^{-1}} and 𝐒widehatN=(diag(𝐓widehatN𝐓widehatN)1\widehat{\mathbf{S}}_{N}=\sqrt{[\operatorname{diag}(\widehat{\mathbf{T}}_{N}\,\widehat{\mathbf{T}}_{N})]^{-1}}.

Because the entries of 𝐓widetilde8\widetilde{\mathbf{T}}_{8} and 𝐓widehat8\widehat{\mathbf{T}}_{8} and their inverses are in the set 𝒫={0,12,1,2}\mathcal{P}=\{0,\pm\frac{1}{2},\pm 1,\pm 2\}, we have that the matrices in (13), (14), (15) and (16) also have entries in 𝒫\mathcal{P}. Moreover, (13), (14), (15) and (16) can also be recursively obtained from (7), (8), (9), and (10). Thus, 𝐂widetildeN\widetilde{\mathbf{C}}_{N} and 𝐂widehatN\widehat{\mathbf{C}}_{N} are low-complexity DCT approximations for blocklength NN. The arithmetic complexity of the proposed 16- and 32-point Chen’s approximations and transformations prescribed in the HEVC standard are presented in Table 4. In terms of hardware implementation, the circuitry corresponding to 𝐂widetilde8\widetilde{\mathbf{C}}_{8} and 𝐂widehat8\widehat{\mathbf{C}}_{8} and their inverses can be re-used for the hardware implementation of both the direct and inverse Chen’s DCT approximations for larger blocklengths.

Table 4: Arithmetic complexity of the considered 16- and 32-point transforms
Transform Mult Add Shift Total
16-point exact DCT [12] 4444 7474 0 118118
16-point transform in HEVC [34] 0 186186 8686 272272
𝐂widehat16\widehat{\mathbf{C}}_{16} 0 6060 0 6060
𝐂widetilde16\widetilde{\mathbf{C}}_{16} 0 6868 0 6868
32-point exact DCT [12] 116116 194194 0 310310
32-point transform in HEVC [34] 0 682682 278278 960960
𝐂widehat32\widehat{\mathbf{C}}_{32} 0 152152 0 152152
𝐂widetilde32\widetilde{\mathbf{C}}_{32} 0 168168 0 168168

6.2 Results

The proposed approximations were embedded into the HEVC reference software [28]. For video coding experiments we considered two set of videos, namely: (i) Group A, which consider eleven CIF videos from [52]; (ii) Group B, with six standard video sequence, one for each class specified in the Common Test Conditions (CTC) document for HEVC [6]. Such classification is based on the resolution, frame rate and, as consequence, the main applications of these kind of media [36]. All test parameters were set according to the CTC document, including the quantization parameter (QP) that assumes values in {22, 27, 32, 37}. As suggested in [29], we selected the Main profile and All-Intra mode for our experiments.

The PSNR measurements—already furnished by the reference software—were obtained for each video frame and YUV channel. The overall PSNR was obtained from each frame according to [37]. We averaged the PSNR values for the first 100 frames of all videos in each group. Figure 7 shows the average PSNR in terms of the quantization parameter (QP) for each set of 8-, 16-, and 32-point transforms: 𝐂widetildeN\widetilde{\mathbf{C}}_{N}, 𝐂widehatN\widehat{\mathbf{C}}_{N}, and the original transforms in the HEVC standard. Results in Figure 7 show no significant degradation in terms of PSNR regardless of the video group. The proposed approximations resulted in essentially the same frame quality while having a very low computational cost when compared to those originally employed in HEVC.

Additionally, we computed the Bjøntegaard delta PSNR (BD-PSNR) and delta rate (BD-Rate) [19, 4] for compressed videos considering all discussed 8- to 32-point transformations. The first 11 rows of Table 5 present the results for the Group A whilst the remaining ones correspond to Group B. We report a negligible impact in video quality associated to the results from the modified HEVC with the approximate transforms. Similar to the still images experiments, 𝐂widehatN\widehat{\mathbf{C}}_{N} performed better than 𝐂widetildeN\widetilde{\mathbf{C}}_{N} with a degradation of no more than 0.70dB and 0.58dB for Groups A and B, respectively. These declines in PSNR represent an increase of 10.63% and 7.02% in bitrate, respectively.

Refer to caption
Refer to caption
Figure 7: Average PSNR for QP in {22,27,32,37} for videos in Groups (a) A and (b) B.
Table 5: Bjøntegaard metrics for the approximate transforms and tested video sequences
Video information BD-PSNR (dB) BD-Rate (%)
𝐂widetildeN\widetilde{\mathbf{C}}_{N} 𝐂widehatN\widehat{\mathbf{C}}_{N} 𝐂widetildeN\widetilde{\mathbf{C}}_{N} 𝐂widehatN\widehat{\mathbf{C}}_{N}
Akiyo 0.4600 0.2990 7.0310-7.0310 4.6870-4.6870
Bowing 0.5301 0.4316 7.4519-7.4519 6.1509-6.1509
Coastguard 0.7596 0.7026 11.3634-11.3634 10.6298-10.6298
Container 0.4075 0.3750 6.3002-6.3002 5.8044-5.8044
Foreman 0.2263 0.1627 4.4006-4.4006 3.2148-3.2148
Hall_monitor 0.2754 0.1952 4.7577-4.7577 3.4125-3.4125
Mobile 0.2752 0.2629 2.7072-2.7072 2.5860-2.5860
Mother_daughter 0.4202 0.3384 7.8362-7.8362 6.4112-6.4112
News 0.2539 0.1975 3.4211-3.4211 2.6772-2.6772
Pamphlet 0.4253 0.3680 5.8660-5.8660 5.1057-5.1057
Silent 0.3029 0.2399 5.7215-5.7215 4.6042-4.6042
PeopleOnStreet 0.5350 0.4734 9.6530-9.6530 8.6227-8.6227
BasketballDrive 0.3372 0.2531 11.7780-11.7780 9.0093-9.0093
RaceHorses 0.6444 0.5781 7.7823-7.7823 7.0233-7.0233
BlowingBubbles 0.2563 0.1986 4.3438-4.3438 3.4080-3.4080
KristenAndSara 0.4651 0.3807 8.8416-8.8416 7.3234-7.3234
BasketballDrillText 0.1984 0.1565 3.7436-3.7436 2.9711-2.9711

As a qualitative example, Figure 6.2 displays particular frames of Silent and PeopleOnStreet video sequences after compression according to the original HEVC and to the modified versions of HEVC based on the proposed transforms. Visual inspection shows no sign of image quality degradation.

Refer to caption
(a) HEVC
Refer to caption
(b) 𝐂widetildeN\widetilde{\mathbf{C}}_{N}
Refer to caption
(d) HEVC
Refer to caption
(e) 𝐂widetildeN\widetilde{\mathbf{C}}_{N}
Refer to caption
(f) 𝐂widehatN\widehat{\mathbf{C}}_{N}
Figure 8: Qualitative comparison of frames from Silent and PeopleOnStreet videos compressed with proposed Chen’s DCT approximations and default HEVC transforms and QP=32.

7 Conclusion

We introduced two new multiplierless DCT approximations based on the Chen’s factorization. The suggested approximations were assessed and compared with other well-known approximations. The proposed transformation 𝐂widehat8\widehat{\mathbf{C}}_{8} presented low total error energy and very close similarity to the exact DCT. Furthermore, 𝐂widehat8\widehat{\mathbf{C}}_{8} presents very close coding gain when compared to the optimal KLT. The approximation 𝐂widehat8\widehat{\mathbf{C}}_{8} outperformed the SDCT, BAS, and HT as tools for JPEG-like still image compression at a lower computational cost. Adapting the JAM scalable method, we also proposed low-complexity Chen’s DCT approximations 𝐂widetildeN\widetilde{\mathbf{C}}_{N} and 𝐂widehatN\widehat{\mathbf{C}}_{N}, were N16N\geq 16 is a power of two; we also provided fast algorithms for their implementations. The introduced 8-, 16-, and 32-point approximations were embedded into an HEVC reference software and assessed for video compression. Finally, the proposed low-complexity transforms are suitable for image and video coding, being a realistic alternative for efficient and low complexity image/video coding.

Acknowledgments

This research was partially supported by CAPES, CNPq, and FAPERGS, Brazil.

References

  • [1] The USC-SIPI Image Database, University of Southern California, Signal and Image Processing Institute., 2011.
  • [2] N. Ahmed, T. Natarajan, and K. R. Rao, Discrete Cosine Transform, IEEE Trans. Comput., C-23 (1974), pp. 90–93.
  • [3] F. M. Bayer and R. J. Cintra, DCT-like transform for image compression requires 14 additions only, Electron. Lett., 48 (2012), pp. 919–921.
  • [4] G. Bjøntegaard, Calculation of average PSNR differences between RD-curves, in 13th VCEG Meeting, Austin, TX, USA, Apr 2001. Document VCEG-M33.
  • [5] R. E. Blahut, Fast Algorithms for Signal Processing, Cambridge University Press, 2010.
  • [6] F. Bossen, Common test conditions and software reference configurations, 2013. Document JCT-VC L1100.
  • [7] S. Bouguezel, M. O. Ahmad, and M. N. S. Swamy, A multiplication-free transform for image compression, in 2nd International Conference on Signals, Circuits and Syst. (SCS), Nov. 2008, pp. 1–4.
  • [8]  , A novel transform for image compression, in IEEE 53rd International Midwest Symposium on Circuits Syst. (MWSCAS), Aug. 2010, pp. 509–512.
  • [9]  , A low-complexity parametric transform for image compression, in IEEE International Symposium on Circuits Syst. (ISCAS), May 2011, pp. 2145–2148.
  • [10] V. Britanak, P. C. Yip, and K. R. Rao, Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations, Elsevier Science, 2010.
  • [11] T. S. Chang, C. S. Kung, and C. W. Jen, A simple processor core design for DCT/IDCT, IEEE Trans. Circuits Syst. Video Technol., 10 (2000), pp. 439–447.
  • [12] W. H. Chen, C. Smith, and S. Fralick, A fast computational algorithm for the Discrete Cosine Transform, IEEE Trans. Commun., 25 (1977), pp. 1004–1009.
  • [13] R. J. Cintra, An integer approximation method for discrete sinusoidal transforms, Circuits, Syst., and Signal Process., 30 (2011), pp. 1481–1501.
  • [14] R. J. Cintra and F. M. Bayer, A DCT approximation for image compression, IEEE Signal Process. Lett., 18 (2011), pp. 579–582.
  • [15] M. Effros, H. Feng, and K. Zeger, Suboptimality of the Karhunen-Loève transform for transform coding, IEEE Trans. Inf. Theory, 50 (2004), pp. 1605–1619.
  • [16] B. N. Flury and W. Gautschi, An algorithm for simultaneous orthogonal transformation of several positive definite symmetric matrices to nearly diagonal form, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 169–184.
  • [17] R. C. Gonzalez and R. E. Woods, Digital image processing, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 3rd ed., 2006.
  • [18] J. Han, Y. Xu, and D. Mukherjee, A butterfly structured design of the hybrid transform coding scheme, in Picture Coding Symposium, 2013, pp. 1–4.
  • [19] P. Hanhart and T. Ebrahimi, Calculation of average coding efficiency based on subjective quality scores, J. Vis. Commun. Image R., 25 (2014), pp. 555–564.
  • [20] T. I. Haweel, A new square wave transform based on the DCT, Signal Process., 81 (2001), pp. 2309–2319.
  • [21] N. J. Higham, Computing the polar decomposition with applications, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 1160–1174.
  • [22] N. J. Higham, Computing real square roots of a real matrix, Linear Algebra Appl., 88–89 (1987), pp. 405–430.
  • [23] K. J. Horadam, Hadamard matrices and their applications, Cryptography Commun., 2 (2010), pp. 129–154.
  • [24] Q. Huynh-Thu and M. Ghanbari, Scope of validity of PSNR in image/video quality assessment, Electron. Lett., 44 (2008), pp. 800–801.
  • [25] International Organisation for Standardisation, Generic coding of moving pictures and associated audio information – Part 2: Video, ISO/IEC JTC1/SC29/WG11 - coding of moving pictures and audio, ISO, 1994.
  • [26] International Telecommunication Union, ITU-T recommendation H.261 version 1: Video codec for audiovisual services at p64p\times 64 kbits, tech. rep., ITU-T, 1990.
  • [27]  , ITU-T recommendation H.263 version 1: Video coding for low bit rate communication, tech. rep., ITU-T, 1995.
  • [28] Joint Collaborative Team on Video Coding (JCT-VC), HEVC reference software documentation, 2013. Fraunhofer Heinrich Hertz Institute.
  • [29] M. Jridi, A. Alfalou, and P. K. Meher, A generalized algorithm and reconfigurable architecture for efficient and scalable orthogonal approximation of DCT, IEEE Trans. Circuits Syst. I, Reg. Papers, 62 (2015), pp. 449–457.
  • [30] S. M. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory, vol. 1 of Prentie Hall Signal Processing Series, Prentice Hall, Upper Saddle River, NJ, 1993.
  • [31] K. Lengwehasatit and A. Ortega, Scalable variable complexity approximate forward DCT, IEEE Trans. Circuits Syst. Video Technol., 14 (2004), pp. 1236–1248.
  • [32] J. Liang and T. D. Tran, Fast multiplierless approximations of the DCT with the lifting scheme, IEEE Trans. Signal Process., 49 (2001), pp. 3032–3044.
  • [33] C. Loeffler, A. Ligtenberg, and G. S. Moschytz, Practical fast 1-D DCT algorithms with 11 multiplications, in International Conference on Acoust., Speech, Signal Process. (ICASSP), May 1989, pp. 988–991.
  • [34] P. K. Meher, S. Y. Park, B. K. Mohanty, K. S. Lim, and C. Yeo, Efficient Integer DCT Architectures for HEVC, IEEE Trans. Circuits Syst. Video Technol., 24 (2014), pp. 168–178.
  • [35] J. K. Merikoski, On the trace and the sum of elements of a matrix, Linear Algebra Appl., 60 (1984), pp. 177–185.
  • [36] M. Naccari and M. Mrak, Chapter 5 - perceptually optimized video compression, in Academic Press Library in signal Processing Image and Video Compression and Multimedia, vol. 5, Elsevier, 2014, pp. 155–196.
  • [37] J.-R. Ohm, G. J. Sullivan, H. Schwarz, T. K. Tan, and T. Wiegand, Comparison of the coding efficiency of video coding standards - including high efficiency video coding (HEVC), IEEE Trans. Circuits Syst. Video Technol., 22 (2012), pp. 1669–1684.
  • [38] M. T. Pourazad, C. Doutre, M. Azimi, and P. Nasiopoulos, HEVC: The new gold standard for video compression: How does HEVC compare with H.264/AVC?, IEEE Consum. Electron. Mag., 1 (2012), pp. 36–46.
  • [39] A. Puri, X. Chen, and A. Luthra, Video coding using the H.264/MPEG-4 AVC compression standard, Signal Process.: Image Commun., 19 (2004), pp. 793–849.
  • [40] N. Roma and L. Sousa, Efficient hybrid DCT-domain algorithm for video spatial downscaling, EURASIP J. Adv. Signal Process, 2007 (2007), pp. 1–16.
  • [41] G. A. F. Seber, A Matrix Handbook for Statisticians, John Wiley & Sons, Inc, 2008.
  • [42] J. Seberry, B. J. Wysocki, and T. A. Wysocki, On some applications of Hadamard matrices, Metrika, 62 (2005), pp. 221–239.
  • [43] N. Suehiro and M. Hatori, Fast algorithms for the DFT and other sinusoidal transforms, IEEE Trans. Acoust., Speech, Signal Process., 34 (1986), pp. 642–644.
  • [44] T. Suzuki and M. Ikehara, Integer DCT based on direct-lifting of DCT-IDCT for lossless-to-lossy image coding, IEEE Trans. Image Process., 19 (2010), pp. 2958–2965.
  • [45] C. Tablada, F. Bayer, and R. Cintra, A class of DCT approximations based on the Feig-–Winograd algorithm, Signal Process., 113 (2015), pp. 38–51.
  • [46] G. Wallace, The JPEG still picture compression standard, IEEE Trans. Consum. Electron., 38 (1992), pp. 18–34.
  • [47] Z. Wang, Reconsideration of: A fast computational algorithm for the Discrete Cosine Transform, IEEE Trans. Commun., 31 (1983), pp. 121–123.
  • [48]  , Fast algorithms for the discrete W transform and for the Discrete Fourier Transform, IEEE Trans. Acoust., Speech, Signal Process., 32 (1984), pp. 803–816.
  • [49] Z. Wang and A. C. Bovik, Mean squared error: Love it or leave it? A new look at signal fidelity measures, IEEE Signal Process. Mag., 26 (2009), pp. 98–117.
  • [50] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process., 13 (2004), pp. 600–612.
  • [51] D. Watkins, Fundamentals of Matrix Computations, Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts, Wiley, 2004.
  • [52] Xiph.Org Foundation, Xiph.org video test media, 2014.
  • [53] W. Yuan, P. Hao, and C. Xu, Matrix factorization for fast DCT algorithms, in IEEE International Conference on Acoust., Speech, Signal Process. (ICASSP), vol. 3, May 2006, pp. 948–951.