This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Amercing: An Intuitive, Elegant and Effective Constraint for Dynamic Time Warping

Matthieu Herrmann [email protected] Geoffrey I. Webb [email protected] Department of Data Science and Artificial Intelligence and Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
Abstract

Dynamic Time Warping (DTW\operatorname{DTW}), and its constrained (CDTW\operatorname{CDTW}) and weighted (WDTW\operatorname{WDTW}) variants, are time series distances with a wide range of applications. They minimize the cost of non-linear alignments between series. CDTW\operatorname{CDTW} and WDTW\operatorname{WDTW} have been introduced because DTW\operatorname{DTW} is too permissive in its alignments. However, CDTW\operatorname{CDTW} uses a crude step function, allowing unconstrained flexibility within the window, and none beyond it. WDTW\operatorname{WDTW}’s multiplicative weight is relative to the distances between aligned points along a warped path, rather than being a direct function of the amount of warping that is introduced. In this paper, we introduce Amerced Dynamic Time Warping (ADTW\operatorname{ADTW}), a new, intuitive, DTW\operatorname{DTW} variant that penalizes the act of warping by a fixed additive cost. Like CDTW\operatorname{CDTW} and WDTW\operatorname{WDTW}, ADTW\operatorname{ADTW} constrains the amount of warping. However, it avoids both abrupt discontinuities in the amount of warping allowed and the limitations of a multiplicative penalty. We formally introduce ADTW\operatorname{ADTW}, prove some of its properties, and discuss its parameterization. We show on a simple example how it can be parameterized to achieve an intuitive outcome, and demonstrate its usefulness on a standard time series classification benchmark. We provide a demonstration application in C++ [1].

keywords:
Time Series, Dynamic Time Warping, Elastic Distance
journal: Pattern Recognition

1 Introduction

Dynamic Time Warping (DTW\operatorname{DTW}) is a distance measure for time series. First developed for speech recognition [2, 3], it has been adopted across a broad spectrum of applications including gesture recognition [4], signature verification [5], shape matching [6], road surface monitoring [7], neuroscience [8] and medical diagnosis [9].

DTW sums pairwise-distances between aligned points in two series. To allow for similar events that unfold at different rates, DTW\operatorname{DTW} provides elasticity in which points are aligned. However, it is well understood that DTW\operatorname{DTW} can be too flexible in the alignments it allows. We illustrate this with respect to three series shown in Figure 1.

Refer to caption
(a) S={1,1,1,1,1,1}S=\{1,1,-1,1,1,1\}
Refer to caption
(b) T={1,1,1,1,1,1}T=\{1,1,1,-1,1,1\}
Refer to caption
(c) U={1,1,1,1,1,1}U=\{1,1,1,1,-1,1\}
Figure 1: Three time series SS, TT and UU. No current variant of DTW\operatorname{DTW} captures the intuition that SS is identical to itself, and not identical to either TT or UU, and that SS is more similar to TT than it is to UU.

A natural expectation for any time series distance function dist\operatorname{dist} is that dist(S,S)=0<dist(S,T)<dist(S,U)\operatorname{dist}(S,S)=0<\operatorname{dist}(S,T)<\operatorname{dist}(S,U). However, DTW(S,S)=DTW(S,T)=DTW(S,U)=0\operatorname{DTW}(S,S)=\operatorname{DTW}(S,T)=\operatorname{DTW}(S,U)=0. While this may be appropriate for some tasks, where all that matters is that these sequences all contain five 1s and one -1, it is inappropriate for others.

Two variants have been developed to constrain DTW\operatorname{DTW}’s flexibility, Constrained DTW\operatorname{DTW} (CDTW\operatorname{CDTW} [2, 3], see Section 2.2) and Weighted DTW\operatorname{DTW} (WDTW\operatorname{WDTW} [10], see Section 2.3). CDTWw\operatorname{CDTW}_{w} constrains the alignments from warping by more than a window, ww. Within the window, any warping is allowed, and none beyond it. Depending on ww, there are three possibilities for how CDTW\operatorname{CDTW} might assess our example:

  1. 1.

    For w2w\geq 2, CDTWw(S,S)=CDTWw(S,T)=CDTWw(S,U)=0\operatorname{CDTW}_{w}(S,S)=\operatorname{CDTW}_{w}(S,T)=\operatorname{CDTW}_{w}(S,U)=0.

  2. 2.

    CDTW1(S,S)=CDTW1(S,T)=0<CDTW1(S,U)=8\operatorname{CDTW}_{1}(S,S)=\operatorname{CDTW}_{1}(S,T)=0<\operatorname{CDTW}_{1}(S,U)=8 (see Figure 3)

  3. 3.

    CDTW0(S,S)=0<CDTW0(S,T)=CDTW0(S,U)\operatorname{CDTW}_{0}(S,S)=0<\operatorname{CDTW}_{0}(S,T)=\operatorname{CDTW}_{0}(S,U).

WDTW\operatorname{WDTW} applies a multiplicative penalty to alignments that increases the more the alignment is warped. However, because the penalty is multiplicative, the perfectly matched elements in our example still lead to WDTW(S,S)=WDTW(S,T)=WDTW(S,U)=0\operatorname{WDTW}(S,S)=\operatorname{WDTW}(S,T)=\operatorname{WDTW}(S,U)=0 (see Figure 4).

None of DTW\operatorname{DTW}, WDTW\operatorname{WDTW} or CDTW\operatorname{CDTW} can be parameterized to obtain the natural expectation that dist(S,S)=0<dist(S,T)<dist(S,U)\operatorname{dist}(S,S)=0<\operatorname{dist}(S,T)<\operatorname{dist}(S,U).

In this paper, we present Amerced DTW (ADTW\operatorname{ADTW}), an intuitive, elegant and effective variant of DTW\operatorname{DTW} that applies a tunable additive penalty ω\omega for warping an alignment. To align the 1-1 in SS with the 1-1 in TT, it is necessary to warp the alignments by 1 step, incurring a cost of ω\omega. A further compensatory warping step is required to bring the two end points back into alignment, incurring another cost of ω\omega. Thus, the total cost of aligning the two series is ADTWω(S,T)=2ω\operatorname{ADTW}_{\omega}(S,T)=2\omega. Aligning the 1-1 in SS with the 1-1 in UU requires warping by two steps, incurring a penalty of 2ω2\omega, with a further 2ω2\omega penalty required to realign the end points, resulting in ADTWω(S,U)=4ω\operatorname{ADTW}_{\omega}(S,U)=4\omega. Thus, ADTW\operatorname{ADTW} can be parameterized to be in line with natural expectations for our example, or to allow the flexibility to treat the three series as equivalent if that is appropriate for an application:

  1. 1.

    For 0<ω<40<\omega<4 (see Figure 5), ADTWω(S,S)=0<ADTWω(S,T)=2ω<ADTWω(S,U)=min(4ω,8)\operatorname{ADTW}_{\omega}(S,S)=0<\operatorname{ADTW}_{\omega}(S,T)=2\omega<\operatorname{ADTW}_{\omega}(S,U)=\min(4\omega,8). We have ADTWω(S,U)=min(4ω,8)\operatorname{ADTW}_{\omega}(S,U)=\min(4\omega,8) because the path will not be warped if the resulting cost of 88 is cheaper than 4 warping penalties.

  2. 2.

    For ω4\omega\geq 4, ADTWω(S,S)=0<ADTWω(S,T)=ADTWω(S,U)=8{\operatorname{ADTW}_{\omega}(S,S)=0<\operatorname{ADTW}_{\omega}(S,T)=\operatorname{ADTW}_{\omega}(S,U)=8}, because the penalty for warping is greater than the cost of not warping.

  3. 3.

    For ω=0\omega=0, ADTW0(S,S)=ADTW0(S,T)=ADTW0(S,U)=0{\operatorname{ADTW}_{0}(S,S)=\operatorname{ADTW}_{0}(S,T)=\operatorname{ADTW}_{0}(S,U)=0}, because there is no penalty for warping.

We show that this approach is not only intuitive, it often provides superior outcomes in practice.

The remainder of this paper is organised as follows. In Section 2, we review the literature related to DTW and its variants. ADTW is presented Section 3, and Section 4 discusses how to parameterize it. We then present the results of our experiment in Section 5, and conclude in Section 6.

2 Background and related Work

DTW\operatorname{DTW} is a foundational technique for a wide range of time series data analysis tasks, including similarity search [11], regression [12], clustering [13], anomaly and outlier detection [14], motif discovery [15], forecasting [16], and subspace projection [17].

In this paper, for ease of exposition, we only consider univariate time series, although ADTW\operatorname{ADTW} extends directly to the multivariate case. We denote series by the capital letters SS, TT and UU. The letter \ell denotes the length of the series. Subscripting (e.g. S\ell{}_{S}) is used to disambiguate between the length of different series. The elements S1,S2,SiSSS_{1},S_{2},\dots S_{i}\dots S_{\ell{}_{S}} with 1iS1\leq{}i\leq{}\ell{}_{S} are the elements of the series S=(S1,S2,SiSS)S=(S_{1},S_{2},\dots S_{i}\dots S_{\ell{}_{S}}), drawn from a domain 𝔻\operatorname{\mathbb{D}} (e.g. real numbers).

We assess ADTW\operatorname{ADTW} on time series classification tasks. Given a training set 𝒟=𝒟1,𝒟2,,𝒟N\operatorname{\mathcal{D}}=\langle\operatorname{\mathcal{D}}_{1},\operatorname{\mathcal{D}}_{2},\ldots,\operatorname{\mathcal{D}}_{N}\rangle of labeled time series 𝒟i\operatorname{\mathcal{D}}_{i}, time series classification learns a classifier which is then assessed with respect to classifying an evaluation set 𝒯=𝒯1,𝒯2,,𝒯M\operatorname{\mathcal{T}}=\langle\operatorname{\mathcal{T}}_{1},\operatorname{\mathcal{T}}_{2},\ldots,\operatorname{\mathcal{T}}_{M}\rangle of labeled time series.

2.1 Dynamic Time Warping and Variants

Dynamic Time Warping (DTW\operatorname{DTW}) [2, 3] handles alignment of series with distortion and disparate lengths. An alignment 𝒜(S,T)=((i1,j1),,(iλ,jλ))\operatorname{\mathcal{A}}(S,T)=((i_{1},j_{1}),\ldots,(i_{\operatorname{\lambda}},j_{\operatorname{\lambda}})) between two series SS and TT is made of tuples 𝒜(S,T)1kλ=(ik,jk)\operatorname{\mathcal{A}}(S,T)_{1\leq{}k\leq{}\lambda}=(i_{k},j_{k}) representing the point-to-point alignments of SikS_{i_{k}} with TjkT_{j_{k}}. A warping path is an alignment 𝒜(S,T)\operatorname{\mathcal{A}}(S,T) with the following properties:

  • 1.

    Series extremities are aligned with each other, i.e.

    𝒜(S,T)1=(1,1)and𝒜(S,T)λ=(S,T)\operatorname{\mathcal{A}}(S,T)_{1}=(1,1)\quad\text{and}\quad\operatorname{\mathcal{A}}(S,T)_{\operatorname{\lambda}}=(\ell_{S},\ell_{T})
  • 2.

    It is continuous, i.e.

    k{2,λ}(ik1ikik1+1)(jk1jkjk1+1)\forall{k\in\{2,\operatorname{\lambda}\}}\ (i_{k-1}\leq i_{k}\leq i_{k-1}+1)\ \wedge\ (j_{k-1}\leq j_{k}\leq j_{k-1}+1)
  • 3.

    It is monotonic, i.e.

    k{2,λ}𝒜k𝒜k1\forall{k\in\{2,\operatorname{\lambda}\}}\ \operatorname{\mathcal{A}}_{k}\neq\operatorname{\mathcal{A}}_{k-1}

Given a cost function γ:𝔻×𝔻\operatorname{\gamma}:\operatorname{\mathbb{D}}\times\operatorname{\mathbb{D}}\rightarrow\operatorname{\mathbb{R}}, DTW\operatorname{DTW} finds a warping path 𝒜(S,T)\operatorname{\mathcal{A}}(S,T) minimizing the cumulative sum of γ(Sik,Tjk)\operatorname{\gamma}(S_{i_{k}},T_{j_{k}}):

DTW(S,T)=min𝒜(S,T)k=1λγ(Sik,Tjk)\operatorname{DTW}(S,T)=min_{\operatorname{\mathcal{A}}(S,T)}\sum_{k=1}^{\operatorname{\lambda}}\operatorname{\gamma}(S_{i_{k}},T_{j_{k}}) (1)

In this paper, following common practice in time series classification, we use the squared L2 norm as the cost function for all distances.

DTW(S,T)\operatorname{DTW}(S,T) can be computed on a 0-indexed (S+1)×(T+1){(\ell_{S}+1)\times(\ell_{T}+1)} cost matrix MDTW(S,T)M_{\operatorname{DTW}(S,T)}. A cell MDTW(S,T)(i,j)M_{\operatorname{DTW}(S,T)}(i,j) represents the minimal cumulative cost of aligning the first ii points of SS with the first jj points of TT. It follows that

DTW(S,T)=MDTW(S,T)(S,T)\operatorname{DTW}(S,T)=M_{\operatorname{DTW}(S,T)}(\ell_{S},\ell_{T})

The cost matrix MDTW(S,T)M_{\operatorname{DTW}(S,T)} is defined by a set of recursive equations (2). The first two equations (2a) and (2b) defines the border conditions. The third one (2c) computes the cost of a cell (i,j)(i,j) by adding the cost of aligning SiS_{i} with TjT_{j}, given by γ(Si,Tj)\operatorname{\gamma}(S_{i},T_{j}), to the cost of its smallest predecessors. Figure 2 shows examples of DTW\operatorname{DTW} cost matrices.

MDTW(S,T)(0,0)\displaystyle M_{\operatorname{DTW}(S,T)}(0,0) =0\displaystyle=0 (2a)
MDTW(S,T)(i,0)\displaystyle M_{\operatorname{DTW}(S,T)}(i,0) =MDTW(S,T)(0,j)=+\displaystyle=M_{\operatorname{DTW}(S,T)}(0,j)=+\infty (2b)
MDTW(S,T)(i,j)\displaystyle M_{\operatorname{DTW}(S,T)}(i,j) =γ(Si,Tj)+min{MDTW(S,T)(i1,j1)MDTW(S,T)(i1,j)MDTW(S,T)(i,j1)\displaystyle=\operatorname{\gamma}(S_{i},T_{j})+\min\left\{\begin{aligned} &M_{\operatorname{DTW}(S,T)}(i-1,j-1)\\ &M_{\operatorname{DTW}(S,T)}(i-1,j)\\ &M_{\operatorname{DTW}(S,T)}(i,j-1)\end{aligned}\right. (2c)

An efficient implementation technique [18] allows DTW\operatorname{DTW} and its variants, including ADTW\operatorname{ADTW}, to be computed with an O()O(\ell) space complexity, and a pruned time complexity; the worst case scenario O(2)O(\ell^{2}) can usually be avoided.

Refer to caption
(a) DTW(S,S)=0\operatorname{DTW}(S,S)=0
Refer to caption
(b) DTW(S,T)=0\operatorname{DTW}(S,T)=0
Refer to caption
(c) DTW(S,U)=0\operatorname{DTW}(S,U)=0
Figure 2: Cost matrices and warping paths for DTW(S,S)\operatorname{DTW}(S,S), DTW(S,T)\operatorname{DTW}(S,T), and DTW(S,U)\operatorname{DTW}(S,U).

2.2 Constrained DTW

In Constrained DTW (CDTW\operatorname{CDTW}), the warping path is strictly restricted to a subarea of the cost matrix. Different constraints, with different subarea “shapes”, exist [3, 19]. We focus on the popular Sakoe-Chiba band [3], also known as a warping window (or window for short). The window is a parameter ww controlling how far the warping path can deviate from the diagonal. Given a cost matrix MCDTWw(S,T)M_{\operatorname{CDTW}_{w}(S,T)}, a line index 1i1\leq{}i\leq{}\ell{} and a column index 1j1\leq{}j\leq{}\ell{}, we constrain (i,j)(i,j) by |ij|w\absolutevalue{i-j}\leq{}w. A window of 0 forces the warping path to be on the diagonal, while a window of 1\ell{}-1 is equivalent to DTW\operatorname{DTW} (no constraint). For example with w=1w=1, the warping path can only step one cell away from each side of the diagonal (Figure 3).

Refer to caption
(a) CDTW1(S,S)=0\operatorname{CDTW}_{1}(S,S)=0
Refer to caption
(b) CDTW1(S,T)=0\operatorname{CDTW}_{1}(S,T)=0
Refer to caption
(c) CDTW1(S,U)=8\operatorname{CDTW}_{1}(S,U)=8
Figure 3: Cost matrices and warping paths for CDTW1(S,S)\operatorname{CDTW}_{1}(S,S), CDTW1(S,T)\operatorname{CDTW}_{1}(S,T), and CDTW1(S,U)\operatorname{CDTW}_{1}(S,U).

An effective window increases the usefulness of CDTW\operatorname{CDTW} over DTW\operatorname{DTW} by preventing spurious alignments. For example, nearest neighbor classification using CDTW\operatorname{CDTW} as the distance measure (NN-CDTW\operatorname{CDTW}) is significantly more accurate than nearest neighbor classification using DTW\operatorname{DTW} (NN-DTW\operatorname{DTW}, see Section 5). CDTW\operatorname{CDTW} is also faster to compute than DTW\operatorname{DTW}, as cells of the cost matrix beyond the window are ignored. However, the window is a parameter that must be learned. The Sakoe-Chiba band is part of the original definition of DTW\operatorname{DTW} and the defacto default constraint for CDTW\operatorname{CDTW} (e.g. in [20, 21, 22]).

One of the issues that must be addressed in any application of CDTW\operatorname{CDTW} is how to select an appropriate value for ww. A method deployed when the UCR benchmark archive was first established has become the defacto default method for time series classification [23]. This method applies leave-one-out cross validation to nearest neighbor classification over the training data 𝒟\operatorname{\mathcal{D}} for all w{0,0.01,0.02,,}w\in\{0,0.01\ell,0.02\ell,\ldots,\ell\}. The best value of ww is selected, with respect to the performance measure of interest (typically the lowest error).

2.3 Weighted DTW

Weighted Dynamic Time Warping (WDTW\operatorname{WDTW}[10] penalizes phase difference between two points. An alignment cost of a cell (i,j)(i,j) is weighted according to its distance to the diagonal, δ=|ij|\delta=\absolutevalue{i-j}. In other words, WDTW\operatorname{WDTW} relies on a weight\operatorname{weight} (3a) function to define a new cost function γ\operatorname{\gamma}^{\prime} (3b). A large weight decreases the chances of a cell to be on an optimal path. The weight factor parameter gg controls the penalization, and usually lies within 0.010.010.60.6 [10]. Figure 4 shows several WDTW\operatorname{WDTW} cost matrices.

weight(δ)\displaystyle\operatorname{weight}(\delta) =11+expg×(δ/2)\displaystyle=\frac{1}{1+\exp^{-g\times{}(\delta-\ell{}/2)}} (3a)
γ(Si,Tj)\displaystyle\operatorname{\gamma}^{\prime}(S_{i},T_{j}) =γ(Si,Tj)weight(|ij|)\displaystyle=\operatorname{\gamma}(S_{i},T_{j})*\operatorname{weight}(\absolutevalue{i-j}) (3b)

WDTW\operatorname{WDTW} applies a penalty to every pair of aligned points that are off the diagonal. Hence, the longer the off diagonal path is, the greater the penalty. Thus, it favors many small deviations from the diagonal over fewer longer deviations.

Refer to caption
(a) WDTW0.1(S,S)\operatorname{WDTW}_{0.1}(S,S)
Refer to caption
(b) WDTW0.1(S,T)\operatorname{WDTW}_{0.1}(S,T)
Refer to caption
(c) WDTW0.1(S,U)\operatorname{WDTW}_{0.1}(S,U)
Figure 4: Cost matrices and warping paths for WDTW0.1(S,S)\operatorname{WDTW}_{0.1}(S,S), WDTW0.1(S,T)\operatorname{WDTW}_{0.1}(S,T), and WDTW0.1(S,U)\operatorname{WDTW}_{0.1}(S,U).

WDTW\operatorname{WDTW} is also confronted with the problem of selecting an appropriate value for its parameter, gg. We use the method employed in the Elastic Ensemble [20], and modeled on the defacto method for parameterizing CDTW\operatorname{CDTW}. This method applies leave-one-out cross validation to nearest neighbor classification over the training data 𝒟\operatorname{\mathcal{D}} for all g{0.01,0.02,1.0}g\in\{0.01,0.02,\ldots 1.0\}.

2.4 Squared Euclidean Distance

One simple distance measure is to simply sum the cost of aligning successive points in the two series. This is equivalent to CDTW\operatorname{CDTW} with w=0w=0, and is only defined for same length series. When the cost function for aligning two points SiS_{i} and TjT_{j} is γ(Si,Tj)=(SiTj)2\operatorname{\gamma}(S_{i},T_{j})=(S_{i}-T_{j})^{2}, this distance measure is known as the Squared Euclidean Distance:

SQED(S,T)=i=1(SiTi)2\operatorname{SQED}(S,T)=\sum_{i=1}^{\ell}(S_{i}-T_{i})^{2} (4)

3 Amerced Dynamic Time Warping

Amerced Dynamic Time Warping (ADTW\operatorname{ADTW}) provides a new, intuitive, elegant, and effective mechanism for constraining the alignments within the DTW\operatorname{DTW} framework. It achieves this by the introduction of a novel constraint, amercing — the application of a simple additive penalty ω\omega every time an alignment is warped (iji-j changes). This addresses the following problems:

  • 1.

    DTW\operatorname{DTW} can be too flexible in its alignments;

  • 2.

    CDTW\operatorname{CDTW} uses a crude step function — any flexibility is allowed within the window and none beyond it;

  • 3.

    WDTW\operatorname{WDTW} applies a multiplicative weight, and hence promotes large degrees of warping if they lead to low cost alignments — the penalty incurred for warping is dependent on the costs of the warped alignments. Further, as the penalty is paid for every off-diagonal alignment (where iji\neq j), WDTW\operatorname{WDTW} penalizes off diagonal paths for their length, rather than for the number of times they adjust the alignment 111By adjust the alignment we mean having successive alignments 𝒜k\operatorname{\mathcal{A}}_{k} and 𝒜k+1\operatorname{\mathcal{A}}_{k+1} such that ik+1ikjk+1jki_{k+1}-i_{k}\neq j_{k+1}-j_{k}..

3.1 Formal definition of ADTW

Given two times series SS and TT, a cost function γ:𝔻×𝔻\operatorname{\gamma}:\operatorname{\mathbb{D}}\times\operatorname{\mathbb{D}}\rightarrow\operatorname{\mathbb{R}}, and an amercing penalty ω\omega\in\operatorname{\mathbb{R}} (see Section 4), ADTW\operatorname{ADTW} finds a warping path 𝒜(S,T)\operatorname{\mathcal{A}}(S,T) minimizing the cumulative sum of amerced costs:

ADTWω(S,T)=min𝒜(S,T)[γ(Si1,Tj1)+k=2λγ(Sik,Tjk)+1(ikik1jkjk1)ω]\operatorname{ADTW}_{\omega}(S,T)=\\ \min_{\operatorname{\mathcal{A}}(S,T)}\left[\operatorname{\gamma}(S_{i_{1}},T_{j_{1}})+\sum_{k=2}^{\operatorname{\lambda}}\operatorname{\gamma}(S_{i_{k}},T_{j_{k}})+1(i_{k}-i_{k-1}\neq j_{k}-j_{k-1})\omega\right] (5)

where 1(ikik1jkjk1)ω1(i_{k}-i_{k-1}\neq j_{k}-j_{k-1})\omega indicates that the amercing penalty ω\omega is only applied if a step from (ik1,jk1)(i_{k-1},j_{k-1}) to (ik,jk)(i_{k},j_{k}) is not an increment on both series, i.e. it is not the case that ik=ik1+1i_{k}=i_{k-1}+1 and jk=jk1+1j_{k}=j_{k-1}+1. ADTWω(S.T)\operatorname{ADTW}_{\omega}(S.T) can be computed on a cost matrix MADTWω(S,T)M_{\operatorname{ADTW}_{\omega}(S,T)} (6) with

ADTWω(S,T)=MADTWω(S,T))(S,T).\operatorname{ADTW}_{\omega}(S,T)=M_{\operatorname{ADTW}_{\omega}(S,T)})(\ell_{S},\ell_{T}).

MADTWM_{\operatorname{ADTW}} has the same border conditions as MDTWM_{\operatorname{DTW}} (equations (6a) and (6b)). The other matrix cells are computed recursively, amercing the off diagonal alignments by the penalty ω\omega (6c). Methods for choosing ω\omega are discussed Section 4.

MADTWω(S,T)(0,0)\displaystyle M_{\operatorname{ADTW}_{\omega}(S,T)}(0,0) =0\displaystyle=0 (6a)
MADTWω(S,T)(i,0)\displaystyle M_{\operatorname{ADTW}_{\omega}(S,T)}(i,0) =MADTWω(S,T)(0,j)=+\displaystyle=M_{\operatorname{ADTW}_{\omega}(S,T)}(0,j)=+\infty (6b)
MADTWω(S,T)(i,j)\displaystyle M_{\operatorname{ADTW}_{\omega}(S,T)}(i,j) =min{MADTWω(S,T)(i1,j1)+γ(Si,Tj)MADTWω(S,T)(i1,j)+γ(Si,Tj)+ωMADTWω(S,T)(i,j1)+γ(Si,Tj)+ω\displaystyle=\min\left\{\begin{aligned} &M_{\operatorname{ADTW}_{\omega}(S,T)}(i-1,j-1)+\operatorname{\gamma}(S_{i},T_{j})\\ &M_{\operatorname{ADTW}_{\omega}(S,T)}(i-1,j)+\operatorname{\gamma}(S_{i},T_{j})+\omega\\ &M_{\operatorname{ADTW}_{\omega}(S,T)}(i,j-1)+\operatorname{\gamma}(S_{i},T_{j})+\omega\end{aligned}\right. (6c)
Refer to caption
(a) ADTW3(S,S)\operatorname{ADTW}_{3}(S,S)
Refer to caption
(b) ADTW3(S,T)\operatorname{ADTW}_{3}(S,T)
Refer to caption
(c) ADTW3(S,U)\operatorname{ADTW}_{3}(S,U)
Figure 5: Cost matrices and warping paths for ADTW3(S,S)\operatorname{ADTW}_{3}(S,S), ADTW3(S,T)\operatorname{ADTW}_{3}(S,T), and ADTW3(S,U)\operatorname{ADTW}_{3}(S,U).

3.2 Properties of ADTW

ADTWω(S,T)\operatorname{ADTW}_{\omega}(S,T) is monotonic with respect to ω\omega.

α<βADTWα(S,T)ADTWβ(S,T)\alpha<\beta\equiv\operatorname{ADTW}_{\alpha}(S,T)\leq\operatorname{ADTW}_{\beta}(S,T) (7)
Proof.

For any warping path 𝒜(S,T)\operatorname{\mathcal{A}}(S,T) that minimizes (5) under ω=β\omega=\beta, the same path will result in an equivalent or lower score for ω=α\omega=\alpha. Hence the minimum score for a warping path under α\alpha cannot exceed the minimum score under β\beta. ∎

ADTW0(S,T)=DTW(S,T)\operatorname{ADTW}_{0}(S,T)=\operatorname{DTW}(S,T) (8)
Proof.

With the amercing term ω\omega set to zero, (5) is equivalent to (1). ∎

If S=T\ell_{S}=\ell_{T},

ADTW(S,T)=SQED(S,T)\operatorname{ADTW}_{\infty}(S,T)=\operatorname{SQED}(S,T) (9)
Proof.

With ω=\omega=\infty, any off diagonal alignment will receive an infinite penalty. Hence the minimal cost warping path must follow the diagonal. ∎

When 0ω0\leq\omega\leq\infty,

DTW(S,T)ADTWω(S,T)SQED(S,T)\operatorname{DTW}(S,T)\leq\operatorname{ADTW}_{\omega}(S,T)\leq\operatorname{SQED}(S,T) (10)
Proof.

This follows from (7), (8) and (9). ∎

This observation provides useful and intuitive bounds for the measure.

ADTW\operatorname{ADTW} is symmetric with respect to the order of the arguments.

ADTWω(S,T)=ADTWω(T,S)\operatorname{ADTW}_{\omega}(S,T)=\operatorname{ADTW}_{\omega}(T,S) (11)
Proof.

If we consider the cost matrix, such as illustrated in Figure 5, swapping SS for TT has the effect of flipping the matrix on the diagonal. This does not affect the cost of the minimal path. ∎

That a distance measure should satisfy this symmetry is intuitive, as it does not seem appropriate that the order of the arguments to a distance function should affect its value. This symmetry also holds for the other variants of DTW\operatorname{DTW}.

ADTW\operatorname{ADTW} is symmetric with respect to the order in which the series are processed. With reverse(S)=(SS,SS1,,S1)\operatorname{reverse}(S)=(S_{\ell_{S}},S_{\ell_{S}-1},\ldots,S_{1}), we have:

ADTWω(S,T)=ADTWω(reverse(S),reverse(T))\operatorname{ADTW}_{\omega}(S,T)=\operatorname{ADTW}_{\omega}(\operatorname{reverse}(S),\operatorname{reverse}(T)) (12)
Proof.

For any warping path 𝒜(S,T)=((i1,j1),,(iλ,jλ))\operatorname{\mathcal{A}}(S,T)=((i_{1},j_{1}),\ldots,(i_{\operatorname{\lambda}},j_{\operatorname{\lambda}})), there is a matching 𝒜(reverse(S),reverse(T))=((iλ,jλ),,(i1,j1))\operatorname{\mathcal{A}}(\operatorname{reverse}(S),\operatorname{reverse}(T))=((i_{\lambda},j_{\lambda}),\ldots,(i_{1},j_{1})). As these warping paths both contain the same alignments, the cost terms γ\operatorname{\gamma} will be the same. As 1(ikik1jkjk1)ω=1(ik1ikjk1jk)ω1(i_{k}{-}i_{k-1}\neq j_{k}{-}j_{k-1})\omega=1(i_{k-1}{-}i_{k}\neq j_{k-1}{-}j_{k})\omega, the amercing penalty terms will also be identical. ∎

This symmetry is also intuitive, as it does not seem appropriate that the direction from which one calculates a distance measure should affect its value. Neither CDTW\operatorname{CDTW} nor WDTW\operatorname{WDTW} has this symmetry when ST\ell_{S}\neq\ell_{T}. This is because both measures apply constraints relative to the diagonal, which is different depending on whether one traces it from the top left or the bottom right of the cost matrix. Unlike CDTW(S,T)\operatorname{CDTW}(S,T), ADTW(S,T)\operatorname{ADTW}(S,T) is well defined when ST\ell_{S}\neq\ell_{T}. If w<|ST|w<|\ell_{S}-\ell_{T}| then CDTWw(S,T)\operatorname{CDTW}_{w}(S,T) is undefined.

4 Parameterization

As shown Section 3.2, ADTWω(S,T)\operatorname{ADTW}_{\omega}(S,T) can be parameterized so as to range from being as flexible as DTW\operatorname{DTW}, to being as constrained as SQED\operatorname{SQED}. If the situation requires a large amount of warping, a small penalty should be used. Reciprocally, a large penalty helps to avoid warping. Hence, ω\omega must be tuned for the task at hand, ideally using expert knowledge. Without the latter, one has to fallback on some automated approach. In this paper, we evaluate ADTW\operatorname{ADTW} on a classification benchmark (Section 5) due to its clearly defined objective evaluation measures. Our automated parameter selection method has been designed to work in this context. It remains an important topic for future investigation on how to best parameterize ADTW in other scenarios.

The penalty range 0ω0\leq\omega\leq\infty is continuous, and hence cannot be exhaustively assessed unless the objective function is convex with respect to ω\omega, or there are other properties that can be exploited for efficient complete search. Instead, we use a similar method to the defacto standard for parameterizing CDTW\operatorname{CDTW} and WDTW\operatorname{WDTW}. To this end we define a range from small to large penalties. We select from this range the one that achieves the lowest error when used as the distance measure for nearest neighbor classification, as assessed through leave-one-out cross validation (LOOCV) on the training set.

This leads to a major issue with an additive penalty such as ω\omega: its scale. A small penalty in a given context may be huge in another one, as the impact of the penalty depends on the magnitude of the costs at each step along a warping path. Hence, we must find a reasonable way to determine the scale of penalties. We achieved this by defining a maximal penalty ω\omega^{\prime}, and multiplying it by a ratio 0r10\leq{}r\leq{}1 (13). Our penalty range is now 0ωω0\leq\omega\leq\omega^{\prime}. This leads to two questions: how to define ω\omega^{\prime}, and how to sample rr.

ω=ωr\omega=\omega^{\prime}*r (13)

Let us first consider two series SS and TT. As ADTWω(S,T)SQED(S,T)\operatorname{ADTW}_{\omega}(S,T)\leq\operatorname{SQED}(S,T), taking ω=SQED(S,T)\omega=\operatorname{SQED}(S,T) does ensure that ADTWω(S,T)=SQED(S,T)\operatorname{ADTW}_{\omega}(S,T)=\operatorname{SQED}(S,T). Indeed, taking one single warping step costs as much as the full diagonal. In turn, this ensures that we do not need to test any other value beyond ω=SQED(S,T)\omega^{\prime}=\operatorname{SQED}(S,T). However, we need to have a single penalty for all series from 𝒟\operatorname{\mathcal{D}}, so we sample random pairs of series, and set ω\omega^{\prime} to their average SQED\operatorname{SQED}, ω=meanS,T𝒟SQED(S,T)\omega^{\prime}=\operatorname{mean}_{S,T\sim\operatorname{\mathcal{D}}}\operatorname{SQED}(S,T).

We now have to find the best rr — we use a detour to better explain it. Intuitively, ω\omega^{\prime} represents the cost of \ell steps along the diagonal, and we can define an average step cost α=ω\alpha=\frac{\omega^{\prime}}{\ell}. In turn, we have

ω=α×(×r)\omega=\alpha\times(\ell\times r^{\prime}) (14)

where 0r10\leq{}r^{\prime}\leq{}1. As α\alpha remains a large penalty, we need to start sampling r1r_{1}^{\prime} at value below 1\frac{1}{\ell}. Without expert knowledge, our best guess is to start at several order of magnitudes mm lower. Simplifying equation (14) back to equation (13), r1r_{1} must be small enough to account for both mm, and the magnitude of \ell. Then, successive rir_{i} must gradually increase toward 11. If several “best” rir_{i} are found, we take the median value.

As leave-one-out cross validation is time consuming, and to ensure a fair comparison with the methods for parameterizing CDTW\operatorname{CDTW} and WDTW\operatorname{WDTW}, we limit the search to a hundred values [20, 24]. The formula ri=(i100)5r_{i}=(\frac{i}{100})^{5} for 1i1001\leq{}i\leq{100} fulfills our requirements, covering a range from 1E101E-10 to 1 and favoring smaller penalties, which tend to be more useful than large ones, as any sufficiently large penalty confines the warping path to the diagonal.

5 Experiments

Due to its clearly defined objective evaluation measures, we evaluate ADTW\operatorname{ADTW} against other distances on a classification benchmark, using the UCR archive [23] in its 128 dataset versions. We retained 112 datasets after filtering out those with series of variable length, containing missing data, or having only one training exemplar per class (leading to spurious results under LOOCV). The distances are used in nearest neighbor classifiers. The archive provides default train and test splits. For each dataset, the parameters ww, gg and ω\omega are learned on the training data 𝒟\operatorname{\mathcal{D}} following the methods described Sections 2 and 4. We report the accuracy obtained on the test split. Our results are available online [1].

Following Demsar [25], we compare classifiers using the Wilcoxon signed-rank test. The result can be graphically presented in mean-rank diagrams (Figures 6 and 9), where classifiers not significantly different (at a 0.050.05 significance level, adjusted with the Holm correction for multiple testing) are lined by a horizontal bar. A rank closer to 1 indicates a more accurate classifier.

We compare ADTW\operatorname{ADTW} against SQED\operatorname{SQED}, DTW\operatorname{DTW}, CDTW\operatorname{CDTW} and WDTW\operatorname{WDTW}. Figure 6 summarizes the outcome of this comparison. It shows that ADTW\operatorname{ADTW} attains a significantly better rank on accuracy than any other measure.

Refer to caption
Figure 6: Test accuracy ranking of NN1 classifiers over 112 datasets from UCR128.

Mean rank diagrams are useful to summarize relative performance, but do not tell the full story — a classifier consistently outperforming another would be deemed as significantly better ranked, even if the actual gain is minimal. Accuracy scatter plots (see figures 7 and 8) allow to visualize how a classifier performs relatively to another one. The space is divided by a diagonal representing equal accuracy, and each dot represents a dataset. Points close to the diagonal indicates comparable accuracy between classifiers; conversely, points far away indicates different accuracies.

Figure 7 shows the accuracy scatter plot of ADTW\operatorname{ADTW} against other distances. Points above the diagonal indicate datasets where ADTW\operatorname{ADTW} is more accurate, and conversely below it. We also indicate the numbers of ties and wins per classifiers, and the resulting Wilcoxon score. ADTW\operatorname{ADTW} is almost always more accurate than SQED\operatorname{SQED} and DTW\operatorname{DTW} — usually substantially so, and the majority of points remain well above the diagonal for CDTW\operatorname{CDTW} and WDTW\operatorname{WDTW}, albeit by a smaller margin. We note a point well below the diagonal in Figures 7(a) and 7(c). This it the “Rock” dataset, for which our parameterization process fails to select a high enough penalty. Although ADTW\operatorname{ADTW} has the capacity to behave like SQED\operatorname{SQED} if appropriately parameterized, our parameterization process fails to achieve this for “Rock.” Effective parameterization remains an open problem.

Refer to caption
(a) ADTW\operatorname{ADTW} vs. SQED\operatorname{SQED}
Refer to caption
(b) ADTW\operatorname{ADTW} vs. DTW\operatorname{DTW}
Refer to caption
(c) ADTW\operatorname{ADTW} vs. CDTW\operatorname{CDTW}
Refer to caption
(d) ADTW\operatorname{ADTW} vs. WDTW\operatorname{WDTW}
Figure 7: Accuracy of NN1-ADTW\operatorname{ADTW} vs. NN1 with other distances on 112 datasets from UCR128. Each point represents a dataset. Points above the diagonal indicate that ADTW\operatorname{ADTW} gives better accuracy than the alternative, and reciprocally below the diagonal.

We also compare against a hypothetical BEST\operatorname{BEST} classifier, a classifier that uses the most accurate among all ADTW\operatorname{ADTW} competitors. Such a classifier is not feasible to obtain in practice as it is not possible to be certain which classifier will obtain the lowest error on previously unseen test data. The accuracy scatter plots against the hypothetical BEST\operatorname{BEST} classifier are shown in Figure 8. A Wilcoxon signed-rank test assesses ADTW\operatorname{ADTW} as significantly more accurate at a 0.050.05 significance level than BEST\operatorname{BEST} (p=4e4<0.05p=4e^{-4}<0.05). These results suggest that ADTW\operatorname{ADTW} is a good default choice of DTW\operatorname{DTW} variant.

Refer to caption
(a) ADTW\operatorname{ADTW} vs. BEST\operatorname{BEST}
Figure 8: Accuracy of NN1-ADTW\operatorname{ADTW} vs. the best classifier per dataset among NN1-{SQED,DTW,CDTW,WDTW}\{\operatorname{SQED},\operatorname{DTW},\operatorname{CDTW},\operatorname{WDTW}\}, on 112 datasets from UCR128. A point above the diagonal indicates that ADTW\operatorname{ADTW} gives better accuracy than any alternative, and a point below the diagonal indicates that the best alternative gives higher accuracy than ADTW\operatorname{ADTW}.

Finally, we present a sensitivity analysis for the exponent used to tune ω\omega. Figure 9 shows the effect of using different values for exponent ee when sampling the ratio rr under LOOCV at train time. All exponents above 11 lead to comparable results. While e=4e=4 leads to the best results on the benchmark, we have no theoretical grounds on which to prefer it and in other applications we have examined, slightly higher values lead to slightly greater accuracy.

Refer to caption
Figure 9: Test accuracy ranking of ADTW\operatorname{ADTW}-ee when varying the exponent ee used at train time, over 112 datasets from UCR128

6 Conclusions

DTW\operatorname{DTW} is a popular time series distance measure that allows flexibility in the series alignments [2, 3]. However, it can be too flexible for some applications, and two main forms of constraint have been developed to limit the alignments. Windows provide a strict limit on the distance over which points may be aligned [3, 19]. Multiplicative weights penalize all off diagonal points proportionally to their distance from the diagonal and the cost of their alignment [10].

However, windows introduce an abrupt discontinuity. Within the window there is unconstrained flexibility and beyond it there is none. Multiplicative weights allow large warping at little cost if the aligned points have low cost. Further, they penalize the length of an off-diagonal path rather than the number of times the path deviates from the diagonal. Whats more, neither windows nor multiplicative weights are symmetric with respect to reversing the series (they do not guarantee dist(S,T)=dist(reverse(S),reverse(T))\operatorname{dist}(S,T)=\operatorname{dist}(\operatorname{reverse}(S),\operatorname{reverse}(T)) when ST\ell_{S}\neq\ell_{T}).

Amerced DTW introduces a tunable additive weight ω\omega that produces a smooth range of distance measures such that ADTWω(S,T)\operatorname{ADTW}_{\omega}(S,T) is monotonic with respect to ω\omega, ADTW0(S,T)=DTW(S,T)\operatorname{ADTW}_{0}(S,T)=\operatorname{DTW}(S,T), and ADTW(S,T)=SQED(S,T)\operatorname{ADTW}_{\infty}(S,T)=\operatorname{SQED}(S,T). It is symmetric with respect to the order of the parameters and reversing the series, irrespective of whether the series share the same length. It has the intuitive property of penalizing the number of times a path deviates from simply aligning successive points in each series, rather than penalizing the length of the paths following such a deviation.

Our experiments indicate that when ADTW\operatorname{ADTW} is employed for nearest neighbor classification on the widely used UCR benchmark, it results in more accurate classification significantly more often than any other DTW\operatorname{DTW} variant.

The application of ADTW\operatorname{ADTW} in the many other types of task to which DTW\operatorname{DTW} is often applied remains a productive direction for future investigation. These include similarity search [11], regression [12], clustering [13], anomaly and outlier detection [14], motif discovery [15], forecasting [16], and subspace projection [17]. One issue that will need to be addressed in each of these domains is how best to tune the amercing penalty ω\omega, especially if a task does not have objective criteria by which utility may be judged. We hope that ADTW\operatorname{ADTW} will prove as effective in these other applications as it has proved to be in classification. A C++ implementation of ADTW\operatorname{ADTW} is available at [1].

Acknowledgments

This work was supported by the Australian Research Council award DP210100072. We would like to thank the maintainers and contributors for providing the UCR Archive, and Hassan Fawaz for the critical diagram drawing code [26]. We are also grateful to Eamonn Keogh, Chang Wei Tan, and Mahsa Salehi for insightful comments on an early draft.

References

  • [1] M. Herrmann, Nn1 adtw demonstration application and results, https://github.com/HerrmannM/paper-2021-ADTW (2021).
  • [2] H. Sakoe, S. Chiba, Recognition of continuously spoken words based on time-normalization by dynamic programming, Journal of the Acoustical Society of Japan 27 (9) (1971) 483–490.
  • [3] H. Sakoe, S. Chiba, Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing 26 (1) (1978) 43–49. doi:10.1109/TASSP.1978.1163055.
  • [4] H. Cheng, Z. Dai, Z. Liu, Y. Zhao, An image-to-class dynamic time warping approach for both 3d static and trajectory hand gesture recognition, Pattern Recognition 55 (2016) 137–147.
  • [5] M. Okawa, Online signature verification using single-template matching with time-series averaging and gradient boosting, Pattern Recognition 112 (2021) 107699.
  • [6] Z. Yasseen, A. Verroust-Blondet, A. Nasri, Shape matching by part alignment using extended chordal axis transform, Pattern Recognition 57 (2016) 115–135.
  • [7] G. Singh, D. Bansal, S. Sofat, N. Aggarwal, Smart patrolling: An efficient road surface monitoring using smartphone sensors and crowdsourcing, Pervasive and Mobile Computing 40 (2017) 71–88.
  • [8] Y. Cao, N. Rakhilin, P. H. Gordon, X. Shen, E. C. Kan, A real-time spike classification method based on dynamic time warping for extracellular enteric neural recording with large waveform variability, Journal of Neuroscience Methods 261 (2016) 97–109.
  • [9] R. Varatharajan, G. Manogaran, M. K. Priyan, R. Sundarasekar, Wearable sensor devices for early detection of alzheimer disease using dynamic time warping algorithm, Cluster Computing 21 (1) (2018) 681–690.
  • [10] Y.-S. Jeong, M. K. Jeong, O. A. Omitaomu, Weighted dynamic time warping for time series classification, Pattern Recognition 44 (9) (2011) 2231–2240. doi:10.1016/j.patcog.2010.09.022.
  • [11] T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, E. Keogh, Searching and mining trillions of time series subsequences under dynamic time warping, in: Proc. 18th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2012, pp. 262–270.
  • [12] C. W. Tan, C. Bergmeir, F. Petitjean, G. I. Webb, Time series extrinsic regression, Data Mining and Knowledge Discovery 35 (3) (2021) 1032–1060. doi:10.1007/s10618-021-00745-9.
  • [13] F. Petitjean, A. Ketterlin, P. Gançarski, A global averaging method for dynamic time warping, with applications to clustering, Pattern Recognition 44 (3) (2011) 678–693.
  • [14] D. M. Diab, B. AsSadhan, H. Binsalleeh, S. Lambotharan, K. G. Kyriakopoulos, I. Ghafir, Anomaly detection using dynamic time warping, in: 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), IEEE, 2019, pp. 193–198.
  • [15] S. Alaee, R. Mercer, K. Kamgar, E. Keogh, Time series motifs discovery under DTW allows more robust discovery of conserved structure, Data Mining and Knowledge Discovery 35 (3) (2021) 863–910.
  • [16] K. Bandara, H. Hewamalage, Y.-H. Liu, Y. Kang, C. Bergmeir, Improving the accuracy of global forecasting models using time series data augmentation, Pattern Recognition 120 (2021) 108148.
  • [17] H. Deng, W. Chen, Q. Shen, A. J. Ma, P. C. Yuen, G. Feng, Invariant subspace learning for time series data based on dynamic time warping distance, Pattern Recognition 102 (2020) 107210. doi:https://doi.org/10.1016/j.patcog.2020.107210.
  • [18] M. Herrmann, G. I. Webb, Early abandoning and pruning for elastic distances including dynamic time warping, Data Mining and Knowledge Discovery 35 (6) (2021) 2577–2601. doi:10.1007/s10618-021-00782-4.
    URL https://doi.org/10.1007/s10618-021-00782-4
  • [19] F. Itakura, Minimum prediction residual principle applied to speech recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing 23 (1) (1975) 67–72. doi:10.1109/TASSP.1975.1162641.
  • [20] J. Lines, A. Bagnall, Time series classification with ensembles of elastic distance measures, Data Mining and Knowledge Discovery 29 (3) (2015) 565–592. doi:10.1007/s10618-014-0361-2.
  • [21] B. Lucas, A. Shifaz, C. Pelletier, L. O’Neill, N. Zaidi, B. Goethals, F. Petitjean, G. I. Webb, Proximity Forest: An effective and scalable distance-based classifier for time series, Data Mining and Knowledge Discovery 33 (3) (2019) 607–635. doi:10.1007/s10618-019-00617-3.
  • [22] A. Shifaz, C. Pelletier, F. Petitjean, G. I. Webb, TS-CHIEF: A scalable and accurate forest algorithm for time series classification, Data Mining and Knowledge Discovery 34 (3) (2020) 742–775. doi:10.1007/s10618-020-00679-8.
  • [23] H. A. Dau, A. Bagnall, K. Kamgar, C.-C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, E. Keogh, The UCR Time Series Archive, arXiv:1810.07758 [cs, stat] (Sep. 2019). arXiv:1810.07758.
  • [24] C. W. Tan, F. Petitjean, G. I. Webb, FastEE: Fast Ensembles of Elastic Distances for time series classification, Data Mining and Knowledge Discovery 34 (1) (2020) 231–272. doi:10.1007/s10618-019-00663-x.
  • [25] J. Demšar, Statistical Comparisons of Classifiers over Multiple Data Sets, Journal of Machine Learning Research 7 (2006) 1–30.
  • [26] H. Ismail Fawaz, Critical difference diagram with Wilcoxon-Holm post-hoc analysis, https://github.com/hfawaz/cd-diagram (2019).