This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Ridge detection for nonstationary multicomponent signals with time-varying wave-shape functions and its applications

Yan-Wei Su Department of Applied Mathematics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan [email protected] Gi-Ren Liu Department of Mathematics, National Cheng-Kung University, Tainan, Taiwan and National Center for Theoretical Sciences, National Taiwan University, Taipei, Taiwan [email protected] Yuan-Chung Sheu Department of Applied Mathematics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan [email protected]  and  Hau-Tieng Wu Courant Institute of Mathematical Sciences, New York University, New York, NY, 10012 USA [email protected]
Abstract.

We introduce a novel ridge detection algorithm for time-frequency (TF) analysis, particularly tailored for intricate nonstationary time series encompassing multiple non-sinusoidal oscillatory components. The algorithm is rooted in the distinctive geometric patterns that emerge in the TF domain due to such non-sinusoidal oscillations. We term this method shape-adaptive mode decomposition-based multiple harmonic ridge detection (SAMD-MHRD). A swift implementation is available when supplementary information is at hand. We demonstrate the practical utility of SAMD-MHRD through its application to a real-world challenge. We employ it to devise a cutting-edge walking activity detection algorithm, leveraging accelerometer signals from an inertial measurement unit across diverse body locations of a moving subject.

1. Introduction

Applying time-frequency (TF) analysis [15] to study nonstationary time series with multiple oscillatory components has gained significant attention. Unlike traditional time or frequency domain analysis in time series [5], TF analysis follows a ’divide-and-conquer’ approach. By assuming local signal stationarity, we extract and combine information to create the TF representation (TFR)111Other domains like time-scale, ambiguity, and time-lag are beyond our scope here.. TFR is a function in the TF domain, a two-dimensional space with time on the x-axis and frequency on the y-axis.

Guided by this concept, various TF analysis algorithms emerged, ranging from linear methods like short-time Fourier transform (STFT) to nonlinear approaches like synchrosqueezing transform (SST) and its variations [41], along with others [15]. An effective TF analysis algorithm accurately captures signal components across frequencies and times in the TF domain [15], enhancing encoded information use. The TFR, an extension of the Fourier transform, serves this purpose. Researchers manipulate the TFR to achieve tasks like signal decomposition, feature extraction, and denoising after obtaining it. Fig. 1 illustrates the signal processing flowchart. This paper focuses on TFR determined by STFT and SST to avoid distracting from the array of TF analysis algorithms.

An essential TFR manipulation step is ridge detection (RD), involving identifying ridges on the TFR that represent dominant signal traits.222RD can be discussed more generally [16, 35], where ridges are defined as curves on a surface. In this paper, we focus on TF analysis, where ridges are time-parametrized [7, Section III]. Mathematically, a ridge is a sequence of local maxima of the TFR along the time-indexed frequency axis, often visualized as a curve on the TF domain. Under conditions like the adaptive harmonic model [12], capturing multiple oscillatory components with slowly varying amplitude and frequency, ridges correspond to instantaneous frequencies (IF) of different components [13], as shown in Fig. 1. Ridge information enables tasks like component reconstruction [12] for signal decomposition [11] and denoising [8]. Further, phase and IF of each component can inform feature design for machine learning [1, 2].

While the TF analysis-based signal processing flow in Fig. 1 has been utilized for real-world problems extensively, the pivotal step, RD, remains challenging. Literature has witnessed various endeavors to create efficient RD algorithms, encompassing path optimization with regularity constraints [6, 8], its frequency modulation extension [17, 10, 19], Markov chain Monte Carlo (MCMC) [7], weighted ridge reconstruction (WRR) and self-paced ridge reconstruction (SPRR) [43], and a pseudo-Bayesian approach [20]. TFR preprocessing via singular value decomposition (SVD) [29] or TF domain division using the reassignment vector [22, 19] before ridge determination is also viable. Noise impact is commonly mitigated via thresholding. Most RD algorithms, except [7], detect one ridge at a time from the entire TF domain or a portion, obtaining all ridges iteratively, often using a peeling scheme.

Despite successful applications, these algorithms have persistent limitations. All are founded on the assumption of sinusoidal oscillations and often require well-separated IFs; that is, no mode mixing. This assumption is challenged when oscillatory patterns lack sinusoidal traits, yielding multiple harmonics in the TFR. Refer to Fig. 1(b)(e) for an instance involving a non-sinusoidal oscillation depicted in Fig. 1(a) using second-order SST [27]. This pattern is termed the wave-shape function (WSF) [40]. Real-world instances, particularly in biomedicine like the wrist accelerometer signal in Fig. 2, exhibit complex non-sinusoidal WSFs. Even with well-separated IFs of fundamental components, overlapping IFs of their harmonics can arise. See an example in Fig. 3. This challenge, especially relevant to digital health, where diverse biomedical signals with intricate WSFs are prevalent, challenges the applicability of existing RD algorithms. Thus, a tailored RD algorithm is urgently needed.

In this paper, we introduce an innovative RD algorithm tailored for non-sinusoidal WSFs’ influence on the TFR’s geometric structure, and show its outperformance over existing methods. The algorithm’s foundation rests on a fundamental theoretical insight. When WSFs exhibit non-sinusoidal traits, STFT and SST generate a unique geometric pattern in the TFR. This pattern, recurrent along the frequency axis at a fixed time point, is evident in Fig. 1(b)(c), denoted by red arrows. Our approach involves fitting multiple ridges simultaneously to the TFR respecting this structure. When there are multiple oscillatory components, shape-adaptive mode decomposition (SAMD) [11] is applied to replace the commonly applied peeling scheme. We term it SAMD-based multiple harmonic RD (SAMD-MHRD). Incorporating this geometric structure is akin to accessing more ridge information, akin to the principles in cepstrum [28] and de-shape STFT [21]. Fig. 1(g) depicts SAMD-MHRD yielding four harmonic fits. SAMD-MHRD not only offers an accurate RD, but also enhances SAMD’s decomposition capabilities. As a practical application, it aids accurate walking activity detection via accelerometers placed on various body areas, particularly the wrist.

Refer to caption
Figure 1. The overall flowchart of analyzing nonstationary oscillatory time series and the challenge of RD. (a) A portion of the simulated noisy signal Y(t)Y(t) with the signal-to-noise level of 5 dB is shown as the gray curve, with the clean signal superimposed as the red curve. The clean signal has one intrinsic mode type (IMT) function, whose fundamental component is weak. See (3) for the precise definition. (b) The spectrogram of Y(t)Y(t). (c) The second order SST. (d) The extracted lowest-frequency ridge from the TFR shown in (c) by the existing curve extraction algorithm, Single-RD (9), is superimposed as the red curve. The extracted ridge is biased by the ridges of the harmonics after the 19th second. The true instantaneous frequency (IF) is superimposed as the blue-dashed line. (e) Same as (d), but by another existing RD algorithm, RRP-RD. (f) Same as (d), but by our proposed curve extraction algorithm SAMD-MHRD. (g) The decomposed harmonics of the IMT function are shown in red, and the true harmonics are shown in gray. Note that from the 29th to the 31st second, the fundamental component is less accurately recovered due to the low signal-to-noise ratio. It is the same for the fourth harmonic. (h) The summation of the decomposed harmonics is shown in red, and the true IMT function is shown in black.

The paper’s structure is as follows: Section 2 introduces the mathematical model for nonstationary multicomponent signals with time-varying WSF. This section also summarizes the application of TF analysis tools in signal processing, discusses RD as a pivotal analysis step, and highlights existing challenges. Section 3 elaborates on the proposed SAMD-MHRD algorithm. Numerical examples are presented in Section 4, while the real-world application is demonstrated in Section 5. Finally, Section 6 contains the discussion and conclusion.

2. Mathematical models and analysis tools

We begin by examining the mathematical model for non-stationary multicomponent signals with time-varying WSFs. Following that, we provide brief overviews of STFT and SST, illustrating their application in diverse signal processing tasks. Subsequently, we identify the constraints of current RD algorithms, which serve as the inspiration for this study.

2.1. Mathematical model

We consider the adaptive non-harmonic model (ANHM) to model a non-sinusoidal oscillatory signal. Fix a small constant ϵ>0\epsilon>0 and LL\in\mathbb{N}. Assume the signal satisfies

(1) Y0(t)==1La(t)s(ϕ(t))+T(t)+Φ(t),Y_{0}(t)=\sum_{\ell=1}^{L}a_{\ell}(t)s_{\ell}(\phi_{\ell}(t))+T(t)+\Phi(t)\,,

where a(t)s(ϕ(t))a_{\ell}(t)s_{\ell}(\phi_{\ell}(t)) is called the \ell-th intrinsic model type (IMT) function that satisfies

  • (C1)

    ϕ(t)\phi_{\ell}(t) is a C2C^{2} function called the phase function of the \ell-th IMT function. We assume ϕ>0\phi^{\prime}_{\ell}>0;

  • (C2)

    ϕ(t)\phi_{\ell}^{\prime}(t) is the instantaneous frequency (IF) of the \ell-th IMT function so that |ϕ′′(t)|ϵϕ(t)|\phi^{\prime\prime}_{\ell}(t)|\leq\epsilon\phi^{\prime}_{\ell}(t) for all tt\in\mathbb{R} and inftϕ1(t)>Δ\inf_{t\in\mathbb{R}}\phi_{1}^{\prime}(t)>\Delta for some Δ>0\Delta>0. When L>1L>1, we assume ϕj+1(t)ϕj(t)>Δ>0\phi_{j+1}^{\prime}(t)-\phi_{j}^{\prime}(t)>\Delta>0 for all j=1,,L1j=1,\ldots,L-1 and tt\in\mathbb{R};

  • (C3)

    a(t)>0a_{\ell}(t)>0 is a C1C^{1} function denoting the amplitude modulation (AM) of the \ell-th IMT function so that |a(t)|ϵϕ(t)|a_{\ell}^{\prime}(t)|\leq\epsilon\phi^{\prime}_{\ell}(t) for all tt\in\mathbb{R};

  • (C4)

    ss_{\ell} is a smooth 11-period function with mean 0 and unit L2L^{2} norm so that |s^(1)|>0|\hat{s}_{\ell}(1)|>0, which is called the wave-shape function (WSF) of the \ell-th IMT function;

  • (C5)

    Φ()\Phi(\cdot) is a mean 0 stationary random process in the wide sense with finite variance;

  • (C6)

    T(t)T(t) is a smooth function so that its Fourier transform, T^\widehat{T}, is compactly supported in [Δ,Δ][-\Delta,\Delta].

The model’s identifiability is discussed in [8]. A typical example adhering to (1) is the accelerometer signal, with L=1L=1, shown in Fig. 2. Clearly, the IMT function does not oscillate sinusoidally. This model assumes one WSF for each IMT function. In essence, the \ell-th IMT function arises from stretching ss_{\ell} by ϕ\phi_{\ell}, subsequently scaled by aa_{\ell}. Nevertheless, in practical instances, researchers have observed WSFs not to be fixed [2, 14, 39]. See Fig. 2 for an example. It is evident that the oscillatory pattern varies over time. Notably, both STFT and SST reveal that the strength of the fundamental component (indicated by the red arrow) weakens, while the third, sixth, and seventh harmonics (indicated by purple arrows) strengthen after 225 seconds. As a result, (1) lacks appropriateness as a model.

Refer to caption
Figure 2. Top row: an accelerometer signal. Bottom left: the spectrogram of the accelerometer signal. Bottom right: the SST of the accelerometer signal. The fundamental component is indicated by the red arrow, and the harmonics are indicated by the blue arrows.

To properly model such signals, we consider the generalization of (1) in [21]. To motivate this generalization, assume L=1L=1 to simplify the discussion. We have

(2) a1(t)s1(ϕ1(t))=j=1(αja1(t))cos(2πjϕ1(t)+βj),\displaystyle a_{1}(t)s_{1}(\phi_{1}(t))=\sum_{j=1}^{\infty}(\alpha_{j}a_{1}(t))\cos(2\pi j\phi_{1}(t)+\beta_{j})\,,

where {αj}[0,)\{\alpha_{j}\}\subset[0,\infty) and {βj}[0,2π)\{\beta_{j}\}\subset[0,2\pi) are determined by Fourier coefficients of s1s_{1}. The generalization idea in [21] involves permitting time-varying characteristics for each harmonic in (2), given specific conditions. Thus, consider

(3) Y(t)==1Lj=1Da,j(t)cos(2πϕ,j(t))+T(t)+Φ(t),Y(t)=\sum_{\ell=1}^{L}\sum_{j=1}^{D_{\ell}}a_{\ell,j}(t)\cos(2\pi\phi_{\ell,j}(t))+T(t)+\Phi(t)\,,

where DD_{\ell}\in\mathbb{N} is called the harmonic order, and

  1. (C7)

    ϕ,1\phi_{\ell,1} behaves like ϕ\phi_{\ell} in (C1)-(C2), ϕ,jC2\phi_{\ell,j}\in C^{2}, and |jϕ,j(t)/ϕ,1(t)|ϵ|j-\phi^{\prime}_{\ell,j}(t)/\phi^{\prime}_{\ell,1}(t)|\leq\epsilon^{\prime} for j=2,j=2,\ldots and a small constant ϵ0\epsilon^{\prime}\geq 0 for all tt\in\mathbb{R};

  2. (C8)

    a,1(t)a_{\ell,1}(t) behaves like aa_{\ell} in (C3), a,jC1a_{\ell,j}\in C^{1} and |a,l(t)|ϵϕ,1(t)|a^{\prime}_{\ell,l}(t)|\leq\epsilon^{\prime}\phi^{\prime}_{\ell,1}(t) for all tt\in\mathbb{R}.

Conditions (C7) and (C8) capture the time-varying WSF, assuming slow variation. This model is a simplified version of that in [21] to facilitate discussion. For j1j\geq 1, a,j(t)cos(2πϕ,j(t))a_{\ell,j}(t)\cos(2\pi\phi_{\ell,j}(t)) is called the jj-th harmonic of the \ell-th IMT function, and a,1(t)cos(2πϕ,1(t))a_{\ell,1}(t)\cos(2\pi\phi_{\ell,1}(t)) is the fundamental component. When D=1D_{\ell}=1, the \ell-th IMT function oscillates sinusoidally. In this work, we assume the knowledge of LL and apply [31] to estimate DD_{\ell} to avoid the distraction. Although there are tools when D=1D_{\ell}=1 for all \ell [37, 32, 19, 31], estimating LL in general is challenging. The slowly varying IF condition (C2) can be extended to a fast varying one [18], which requires more technical details. To stay focused, we will discuss this generalization in future work.

2.2. Analysis by TF analysis tools—a quick summary

Due to the non-stationary nature of ANHM, conventional time series methods may be limited. A common strategy to analyze such signals involves employing TF analysis algorithms [15]. The core notion behind TF analysis is a “divide-and-conquer” approach. Locally assuming a “stable”, “time-invariant”, or “stationary” structure, we estimate this structure accurately in patches. By combining local estimates, we construct a TFR, which enables solving signal processing tasks like AM and IF estimation of each IMT function, signal decomposition into essential components (IMT functions and their harmonics), denoising, and dynamic feature extraction. This paper concentrates on STFT and SST, and we refer readers to [15] for other TF analysis algorithms.

Below, we explain the overall idea. Mathematically, for a function ff satisfying mild conditions (e.g., a tempered distribution), the STFT is defined as

(4) Vf(h)(t,ξ):=f(x)h(xt)ei2πξ(xt)𝑑x,V^{(h)}_{f}(t,\xi):=\int f(x)h(x-t)e^{-i2\pi\xi(x-t)}dx\,,

where tt\in\mathbb{R} is the time, ξ\xi\in\mathbb{R} is the frequency, and hh is a chosen window that is smooth and decays sufficiently fast (e.g., the Gaussian function). The spectrogram is |Vf(h)(t,ξ)|2|V^{(h)}_{f}(t,\xi)|^{2}, yielding a function on 2\mathbb{R}^{2}. When L>1L>1, the essential support of h^\hat{h} should reside in [Δ/2,Δ/2][-\Delta/2,\Delta/2] or the spectral interference of different IMT functions might happen. Generally, any TF analysis method’s output is a function on ×+\mathbb{R}\times\mathbb{R}^{+}, known as the TFR. We call ×+\mathbb{R}\times\mathbb{R}^{+} the TF domain. See Fig. 1(b) for an example, where the input signal follows ANHM with L=1L=1, where the fundamental component is weaker compared to its second harmonic and diminishes over time, as evident in the spectrogram. Nonetheless, STFT encounters the uncertainty principle [30]; that is, the TFR appears ’blurred’ due to the chosen window. To address this, SST [12], a reassignment algorithm [3] variant, and its generalizations like second-order SST [27] were introduced. They utilize the phase information within the complex-valued TFR obtained through STFT to sharpen the STFT-based TFR. Let Sf(h,[1]):×+S_{f}^{(h,[1])}:\mathbb{R}\times\mathbb{R}^{+}\to\mathbb{C} and Sf(h,[2]):×+S_{f}^{(h,[2])}:\mathbb{R}\times\mathbb{R}^{+}\to\mathbb{C} denote the original SST [12] and second-order SST [27] TFRs of function ff, respectively.

To proceed, the conventional procedure entails extracting ridges connected to harmonic IFs. In contrast to STFT, a sharper TFR, as determined by SST, can enhance ridge estimation accuracy [24]. It has been theoretically established that when a signal adheres to ANHM with sinusoidal WSFs, the ridges closely approximate IFs of harmonics of all IMT functions [13, 12]. Once ridges are accurately extracted, IMTs can be reconstructed through a reconstruction formula. Specifically, suppose the extracted ridge c(t)c(t) approximates IF of a,j(t)cos(2πϕ,j(t))a_{\ell,j}(t)\cos(2\pi\phi_{\ell,j}(t)) in (3) for some \ell and jj. Then, if ϕ,j(t)\phi_{\ell,j}^{\prime}(t) does not intersect with and is away from IF of any other harmonics, the following formula holds:

(5) |ξc(t)|<ΔqSf(h,[q])(t,ξ)𝑑ξa,j(t)e2πiϕ,j(t),\int_{|\xi-c(t)|<\Delta_{q}}S_{f}^{(h,[q])}(t,\xi)d\xi\approx a_{\ell,j}(t)e^{2\pi i\phi_{\ell,j}(t)}\,,

where q=1,2q=1,2 and user-defined bandwidth Δq>0\Delta_{q}>0. This formula yields estimates of a,j(t)a_{\ell,j}(t) through absolute value extraction and ϕ,j(t)\phi_{\ell,j}(t) through phase unwrapping [1]. With these estimates, further signal processing tasks become feasible. Regrettably, the condition that ϕ,j(t)\phi_{\ell,j}^{\prime}(t) must not intersect with or approach the IF of other IMT function’s harmonics, as required by (5), is excessively stringent and practically infeasible. Thus, an alternative approach becomes necessary. Various mode retrieval algorithms for STFT exist, such as ridge-based skeleton [6], integration approach [19], basin attractor utilization [22, 23, 19], penalization with detected ridges [7], and others. However, these methods face similar limitations posed by non-sinusoidal WSFs.

Remark.

It is worth noting that the phase can be estimated by cumulatively summing the estimated IF. However, this method is suboptimal due to accumulating discretization errors and the need for a separate initial phase estimation. This approach is only advisable when (5) or other reconstruction formulas cannot be applied.

See Section A for a discussion of the numerical implementation of STFT and SST. Below, we denote the discretized TFR of ff, either determined by STFT or SST, as 𝐑N×M\mathbf{R}\in\mathbb{C}^{N\times M}, where NN is the number of the sampling points of the signal with the sampling period Δt\Delta_{t}, and MM is the number of the frequency domain ticks, each bin spaced by 12MΔt\frac{1}{2M\Delta_{t}}.

2.3. Ridge detection as a key step—existing algorithms

The key step in the signal processing process outlined in Section 2.2 involves an accurate RD algorithm. RD algorithms can be broadly categorized into three groups. One involves detecting ridges individually and gathering all ridges through an iterative peeling scheme [6, 8, 10]. The second is the multiridge scheme, which detects multiple ridges concurrently [7]. The third is the preprocessing scheme, where the TF domain is segmented to include one ridge per portion or the TFR is enhanced before applying either peeling or multiridge algorithms [19]. Given that our proposed algorithm adheres to the peeling scheme and the focus is handling WSFs, our focus remains on reviewing algorithms within this category.

In the peeling scheme, once we obtain a ridge c1:[N][M]c_{1}:[N]\rightarrow[M], where [N]:={1,,N}[N]:=\{1,\cdots,N\}, from 𝐑1:=𝐑\mathbf{R}_{1}:=\mathbf{R}, we set iteratively the following masked TFR, =2,,L+1\ell=2,\ldots,L+1:

(8) 𝐑(n,m)={0if (n,m)B1,n𝐑1(n,m)otherwise.,\displaystyle\mathbf{R}_{\ell}(n,m)=\left\{\begin{array}[]{ll}0&\mbox{if }(n,m)\in B_{\ell-1,n}\\ \mathbf{R}_{\ell-1}(n,m)&\mbox{otherwise}\,.\end{array}\right.\,,

where B1,n:=[c1(n)η(n),c1(n)+η+(n)]B_{\ell-1,n}:=[c_{\ell-1}(n)-\eta_{-}(n),c_{\ell-1}(n)+\eta_{+}(n)], c1c_{\ell-1} is the ridge extracted from 𝐑1\mathbf{R}_{\ell-1}, and η(n)0\eta_{-}(n)\geq 0 and η+(n)0\eta_{+}(n)\geq 0 is the bandwidth selected by the user. Note that we assume the knowledge of LL in this work. Practically, the selection of η(n)\eta_{-}(n) and η+(n)\eta_{+}(n) depends on the TFR and profoundly impacts RD performance. Several methods exist for masking the TFR. For instance, one can opt for a constant frequency bandwidth (CFB) approach, where η(n)\eta_{-}(n) and η+(n)\eta_{+}(n) remain constant for all nn [10]. Alternatively, the bandwidth can adapt to noise levels, known as the varying frequency bandwidth (VFB) [10]. Another option involves the modulation-based (MB) approach, which relies on estimating spectral spreading due to chirp [10].

Numerous algorithms exist for fitting a ridge to the (masked) TFR 𝐑\mathbf{R}_{\ell} in each iteration. These algorithms share the common goal of approximating a curve within the TFR to capture maximum energy while meeting specific conditions. A frequently used method is to solve the following path optimization problem [6, 8]:

(9) c=argmaxc:[N][M]j=1N|𝐑~(j,c(j))|λ1j=1N1|Δc(j)|2,c^{*}=\operatorname*{argmax}_{c:[N]\rightarrow[M]}\sum_{j=1}^{N}\big{|}\widetilde{\mathbf{R}}(j,c(j))\big{|}-\lambda_{1}\sum_{j=1}^{N-1}\left|\Delta c(j)\right|^{2}\,,

where Δc[M]N1:={(m1,,mN1):mk[M], 1kN1}\Delta c\in[M]^{N-1}:=\{(m_{1},\cdots,m_{N-1}):m_{k}\in[M],\,1\leq k\leq N-1\} so that Δc(j):=c(j+1)c(j)\Delta c(j):=c(j+1)-c(j) for j=1,,N1j=1,\ldots,N-1, which is the numerical differentiation of the curve cc, λ1>0\lambda_{1}>0 is the penalty term constraining the regularity of the fit curve cc^{*}, and 𝐑~(,q)=log|𝐑(,q)|i=1Nj=1M|𝐑(i,j)|\widetilde{\mathbf{R}}(\ell,q)=\log\frac{|\mathbf{R}(\ell,q)|}{\sum_{i=1}^{N}\sum_{j=1}^{M}|\mathbf{R}(i,j)|} is a normalization of the matrix 𝐑\mathbf{R}. In practice, various approaches to solving the optimization problem (9) exist, including the simulated annealing algorithm [6]. However, we recommend the more efficient penalized forward-backward greedy algorithm, referred to as the single curve extraction algorithm (Single-RD) [6, 8]. Note that we could further enhance regularity by considering second-order numerical differentiation in (9), or even incorporating available noise information [6]. For simplicity, we omit these aspects from the current discussion.

Several other algorithms proposed in [17, 10, 19] could be viewed as solving variations of (9). For example, the frequency modulation (FM) approach proposed in [10] solves c=argmaxc:[N][M]j=1N|𝐑(j,c(j))|2c^{*}=\operatorname*{argmax}_{c:[N]\rightarrow[M]}\sum_{j=1}^{N}\left|\mathbf{R}(j,c(j))\right|^{2} such that |Δc(j)KN2q^(j,c(j))|C|\Delta c(j)-\frac{K}{N^{2}}\hat{q}(j,c(j))|\leq C for some C0C\geq 0, where q^\hat{q} is the chirp rate estimate [10, eq. (6)]. This optimization problem could be solved by, for example, the recurring principle [19, eq. (11)], leading to the FM-RD algorithm. In [19], to handle substantial noise, the concept of a relevant ridge portion (RRP) is introduced, giving rise to the RRP-RD algorithm. RRPs correspond to the desired ridges and are employed to segment the TF domain into specific structures known as basins of attraction [23]. The spline least-square approximation is applied to RRPs for RD.

As an example, consider the simplest scenario with only one IMT function (L=1L=1 in (3)) that oscillates sinusoidally (a1,j=0a_{1,j}=0 for all j2j\geq 2). In this case, the ridge cc^{*} determined by any of these RD algorithms provides an accurate estimate of the associated fundamental component’s IF [12], followed by other signal processing processes. Nonetheless, this simplest scenario is rarely encountered in practice.

2.4. Challenges of existing approaches

To illustrate the challenges posed by existing RD algorithms, consider scenarios closer to real-world situations. First, when L>1L>1 in (3) and all IMT functions oscillate sinusoidally, we can apply any of the aforementioned RD algorithms to estimate the IFs associated with all IMT functions, assuming the IF separation condition in (C2) holds. Second, when L=1L=1 in (3) but the WSF significantly deviates from being sinusoidal, we can apply the above RD algorithms based on (C7) to estimate all ridges linked to the harmonics. The ridge associated with the lowest IF can then be assumed as the fundamental component’s IF. In these more realistic scenarios, successfully applying existing RD algorithms becomes more challenging. An example in Fig. 1(f) demonstrates this complexity, where the strength of the fundamental component is weak in the middle while the strength of the second harmonic dominates. Such shifts in strengths can easily confuse existing RD algorithms.

In real-world scenarios, more intricate situations often challenge the aforementioned RD algorithms. When L>1L>1 in (3) and the WSFs deviate significantly from sinusoidal patterns, the TFR can become complex with multiple curves representing different harmonics of distinct IMT functions. This complexity involves interference among harmonics from different IMT functions and can lead to mode mixing. For instance, as shown in Fig. 3, two IMT functions exhibit non-sinusoidal WSFs. Here, the second harmonic of the first IMT function (red arrows) intersects with the fundamental component of the second IMT function (blue arrows), making it challenging to accurately determine the IF of the fundamental component for each IMT function. Given that numerous signal processing tasks, such as SAMD [11], rely on accurately acquiring the phase of each IMT function and considering the prevalence of such signals, particularly in biomedical applications, there is a compelling need for a new RD algorithm.

Refer to caption
Figure 3. Illustration of Single-RD, FM-RD and RRP-RD on a two-oscillatory components signal. (a)-Left: Spectrogram. We can see two curves that are associated with the fundamental components of the two IMT functions. (a)-Middle: The second-order SST. (a)-Right: The detection result using SAMD-MHRD (resp. FM-RD and RRP-RD) is superimposed on the 2nd-order SST as the red (resp. purple and blue) curves. (b) A portion of the first and second IMT functions is shown for visual inspection.

3. Proposed ridge detection algorithm

To address the aforementioned challenge, we introduce a novel RD algorithm within a peeling scheme framework, termed SAMD-based multiple harmonic RD (SAMD-MHRD). This algorithm consists of two main iterative steps. First, it involves fitting ridges for multiple harmonics of a single IMT function, referred to as the multiple harmonic RD (MHRD) component. Subsequently, the algorithm progresses by iteratively “peeling off ridges” linked to the identified harmonics of the IMT function, constituting the SAMD phase. These steps are repeated until all IMT functions have been processed. It is important to acknowledge that each of these steps possesses its own interest and can be employed independently in alternative scenarios. Algorithm 1 provides a comprehensive summary of the SAMD-MHRD procedure. Below, we elaborate on these two steps and the iterative procedure. For the purpose of reproducibility, the Matlab implementation of the proposed algorithm and codes to regenerate results in Section 4 are available at https://github.com/yanweiSu/ridges-detection.

Before detailing the algorithm’s specifics, we highlight three innovations in SAMD-MHRD. First, we utilize the geometric structure of the TFR. To illustrate the idea, suppose L=1L=1 and ϵ=0\epsilon^{\prime}=0 in (C7). Then the STFT of j=1a1,j(t)cos(2πjϕ1,1(t))\sum_{j=1}^{\infty}a_{1,j}(t)\cos(2\pi j\phi_{1,1}(t)) can be approximated by 12j=1a1,j(t)h^(ξjϕ1,1(t))e2πijϕ1,1(t){\frac{1}{2}}\sum_{j=1}^{\infty}a_{1,j}(t)\hat{h}(\xi-j\phi^{\prime}_{1,1}(t))e^{2\pi ij\phi_{1,1}(t)} [12]; that is, at a fixed time tt, the ridges associated with different harmonics are “periodic” in the frequency axis with the period ϕ1,1(t)\phi^{\prime}_{1,1}(t). This underlying periodic structure serves as the fundamental geometric element driving the design of SAMD-MHRD. Secondly, unlike the masking approach in (8), SAMD-MHRD employs the SAMD technique for ridge peeling. The masking technique can be limited in handling non-sinusoidal WSFs due to potential harmonic IF overlaps between different IMT functions. This can lead to error accumulation during iterations. In contrast, SAMD-based peeling effectively addresses these challenges and eliminates the need to determine optimal masking bandwidth. Third, SAMD-MHRD not only identifies ridges for all IMT functions but also decomposes the IMT functions themselves. Additionally, while SAMD aids ridge peeling in each iteration, SAMD-MHRD algorithm concurrently achieves signal decomposition like the original SAMD.

3.1. Step 1: fit multiple ridges for harmonics (MHRD)

Given an input signal f1=ff_{1}=f that satisfies (3) with L1L\geq 1, let 𝐑1N×M\mathbf{R}_{1}\in\mathbb{C}^{N\times M} represent the discretized TFR of f1f_{1}, obtained from STFT or SST. Choose K1K\geq 1 as the intended number of harmonics to extract simultaneously. Using the notations from (9), we fit KK ridges for the harmonics of an IMT function by solving the optimization problem:

(10) 𝐜=\displaystyle\mathbf{c}^{*}= argmax𝐜:[N][M]Kk=1K[j=1N|𝐑~1(j,ek𝐜(j))|\displaystyle\,\underset{\mathbf{c}:[N]\rightarrow[M]^{K}}{\operatorname*{argmax}}\sum_{k=1}^{K}\Big{[}\sum_{j=1}^{N}\left|\widetilde{\mathbf{R}}_{1}\left(j,e_{k}^{\top}\mathbf{c}(j)\right)\right|
λkj=2N|ek(𝐜(j)𝐜(j1))|2μkj=1N|ek𝐜(j)ke1𝐜(j)|2],\displaystyle-\lambda_{k}\sum_{j=2}^{N}\left|e_{k}^{\top}\left(\mathbf{c}(j)-\mathbf{c}(j-1)\right)\right|^{2}-\mu_{k}\sum_{j=1}^{N}\left|e_{k}^{\top}\mathbf{c}(j)-ke_{1}^{\top}\mathbf{c}(j)\right|^{2}\Big{]}\,,

where ekKe_{k}\in\mathbb{R}^{K}, k=1,,Kk=1,\ldots,K, is a unit vector with ek(k)=1e_{k}(k)=1, λk0\lambda_{k}\geq 0 is the smoothness penalty as in (9), and μk0\mu_{k}\geq 0 stands for the similarity penalty that captures the time-varying wave-shape condition stated in (C7). Note that μk>0\mu_{k}>0 forces the estimated IF of the kk-th harmonic, ek𝐜e_{k}^{\top}\mathbf{c} to satisfy ek𝐜()ke1𝐜()e_{k}^{\top}\mathbf{c}(\ell)\approx ke_{1}^{\top}\mathbf{c}(\ell) at time Δt\ell\Delta_{t}, and hence the geometric structure is respected. The output 𝐜[M]K×N\mathbf{c}^{*}\in[M]^{K\times N} represents the IFs of the first KK harmonics of an IMT function of the input signal ff.

We simplify (10) by introducing a set

(11) 𝐒(K)\displaystyle\mathbf{S}^{(K)} (β)={𝐜:[N][M]K:\displaystyle\,(\beta)=\big{\{}\mathbf{c}:[N]\to[M]^{K}:
|ek𝐜(j)ke1𝐜(j)|βe1𝐜(j)k[K]},\displaystyle|e_{k}^{\top}\mathbf{c}(j)-ke_{1}^{\top}\mathbf{c}(j)|\leq\beta e_{1}^{\top}\mathbf{c}(j)\;\forall k\in[K]\big{\}}\,,

where β0\beta\geq 0 is the chosen “bandwidth” that echos the similarity penalty μk\mu_{k} in (10), so that (10) becomes

(12) 𝐜=argmax𝐜𝐒(K)(β)k=1K[j=1N|𝐑~1(j,ek𝐜(j))|λkj=2N|ek(𝐜(j)𝐜(j1))|2].\mathbf{c}^{*}=\underset{\mathbf{c}\in\mathbf{S}^{(K)}(\beta)}{\operatorname*{argmax}}\sum_{k=1}^{K}\bigg{[}\sum_{j=1}^{N}\left|\widetilde{\mathbf{R}}_{1}\left(j,e_{k}^{\top}\mathbf{c}(j)\right)\right|-\lambda_{k}\sum_{j=2}^{N}\left|e_{k}^{\top}(\mathbf{c}(j)-\mathbf{c}(j-1))\right|^{2}\bigg{]}\,.

Solving (12) directly can be inefficient. We recommend a segmentwise approach. This method leverages the slowly-varying IF’s uniform continuity by approximating the curve with a finite sequence of “thin rectangles” {[tq1,tq]×[Lq,Uq]}q=1Q\{[t_{q-1},\,t_{q}]\times[L_{q},U_{q}]\}_{q=1}^{Q}. First, estimate the fundamental IF c^\hat{c} by running the algorithm on [t0,t1][t_{0},\,t_{1}] over the whole frequency domain. Then, for each segment [tq,tq+1][t_{q},\,t_{q+1}], run the algorithm over a frequency band [Bq,Bq]+c^(tq)[-B_{q},B_{q}]+\hat{c}(t_{q}), where Bq>0B_{q}>0 is chosen properly to ensure the box [tq,tq+1]×[c^(tq)Bq,c^(tq)+Bq][t_{q},\,t_{q+1}]\times[\hat{c}(t_{q})-B_{q},\hat{c}(t_{q})+B_{q}] covers the ridge. This process determine the fundamental IF over [tq,tq+1][t_{q},t_{q+1}]. Note that a smaller BqB_{q} reduces the search space and improves efficiency, but it must not be too small to avoid missing the ridge. We suggest using Bq=1B_{q}=1 and tqtq1=1t_{q}-t_{q-1}=1 for all qq, based on the slowly-varying IF assumption to ensure the ridge is covered. This section of the algorithm is referred to as MHRD and is summarized in Algorithm 2.

Algorithm 1 SAMD-MHRD algorithm.
1:  Input: 𝐟0\mathbf{f}_{0}: the signal, LL\in\mathbb{N}: the number of IMT functions, II\in\mathbb{N}: the iteration times, {tq}q=0Q\{t_{q}\}_{q=0}^{Q}: a strictly increasing sequence of the time segment points with tQ:=Nt_{Q}:=N and t0:=0t_{0}:=0, {Bq}q=1Q\{B_{q}\}_{q=1}^{Q}: the width of the fundamental band in each time interval.
2:  𝐟𝐟0\mathbf{f}\leftarrow\mathbf{f}_{0}
3:  for i=1:Ii=1:I do
4:     for =1:L\ell=1:L do
5:        Run the second order STFT-SST: 𝐑𝐒𝐟[2]\mathbf{R}\leftarrow\mathbf{S}^{[2]}_{\mathbf{f}}.
6:        𝐜𝖬𝖧𝖱𝖣(𝐑,K,{λk}k=1K,β,{tq},{Bq})\mathbf{c}_{\ell}\leftarrow\mathsf{MHRD}(\mathbf{R},K,\{\lambda_{k}\}_{k=1}^{K},\beta,\{t_{q}\},\{B_{q}\}) (See Algorithm 2).
7:     end for
8:     Reorder {e1𝐜1,,e1𝐜L}\{e_{1}^{\top}\mathbf{c}_{1},\cdots,e_{1}^{\top}\mathbf{c}_{L}\} from low value to high value as {e1𝐜(1),,e1𝐜(L)}\{e_{1}^{\top}\mathbf{c}_{(1)},\cdots,e_{1}^{\top}\mathbf{c}_{(L)}\}, where e1e_{1} is defined in (10).
9:     for =1:L\ell=1:L do
10:        𝐱,1[i]Reconstruction with e1𝐜()\mathbf{x}_{\ell,1}^{[i]}\leftarrow\text{Reconstruction with }e_{1}^{\top}\mathbf{c}_{(\ell)} by (5).
11:        𝐱[i]\mathbf{x}^{[i]}_{\ell}\leftarrow SAMD with the estimated fundamental mode 𝐱,1[i]\mathbf{x}_{\ell,1}^{[i]}.
12:        𝐟𝐟0𝐱[i]\mathbf{f}\leftarrow\mathbf{f}_{0}-\mathbf{x}^{[i]}_{\ell}.
13:     end for
14:  end for
15:  Output: {𝐱1[i]}i=1I,,{𝐱L[i]}i=1I\{\mathbf{x}_{1}^{[i]}\}_{i=1}^{I},\ldots,\{\mathbf{x}_{L}^{[i]}\}_{i=1}^{I}.
Algorithm 2 Multiple harmonic ridge detection algorithm. MHRD(𝐑,K,{λk}k=1K,β,{tq}q=1Q,{Bq}q=2Q)\big{(}\mathbf{R},K,\{\lambda_{k}\}_{k=1}^{K},\beta,\{t_{q}\}_{q=1}^{Q},\{B_{q}\}_{q=2}^{Q}\big{)}
1:  Input: 𝐑N×M\mathbf{R}\in\mathbb{C}^{N\times M}: the TFR, KK: the number of desired harmonics, λ=(λ1,,λK)\lambda=(\lambda_{1},\cdots,\lambda_{K}): the smoothness penalty in (12), β\beta: the “bandwidth” in (11).
2:  Assign t00t_{0}\leftarrow 0. For each q[Q]q\in[Q], define a submatrix 𝐑q(tqtq1)×M\mathbf{R}_{q}\in\mathbb{C}^{(t_{q}-t_{q-1})\times M} of 𝐑\mathbf{R} as 𝐑q(,m)=𝐑(+tq1,m)\mathbf{R}_{q}(\ell,m)=\mathbf{R}(\ell+t_{q-1},m) for all {1,,tqtq1}\ell\in\{1,\cdots,t_{q}-t_{q-1}\} and m[M]m\in[M].
3:  Solve (12) over 𝐑1\mathbf{R}_{1} and get 𝐜^1[M]t1×K\hat{\mathbf{c}}_{1}\in[M]^{t_{1}\times K}. Assign 𝐜^𝐜^1\hat{\mathbf{c}}\leftarrow\hat{\mathbf{c}}_{1}.
4:  for  q=2:Qq=2:Q  do
5:     𝐒(K)(β)𝐒(K)(β){𝐜[M](tqtq1)×K:𝐜(,1)[Bq,Bq]+𝐜^(tq1,1) and 𝐜(1,)=𝐜^(tq1,)}\mathbf{S}^{(K)}(\beta)\leftarrow\mathbf{S}^{(K)}(\beta)\cap\{\mathbf{c}\in[M]^{(t_{q}-t_{q-1})\times K}:\mathbf{c}(\cdot,1)\in[-B_{q},B_{q}]+\hat{\mathbf{c}}(t_{q-1},1)\text{ and }\mathbf{c}(1,\cdot)=\hat{\mathbf{c}}(t_{q-1},\cdot)\}.
6:     Solve (12) over 𝐑q\mathbf{R}_{q} and 𝐒(K){\bf S}^{(K)} and get 𝐜^q[M](tqtq1)×K\hat{\mathbf{c}}_{q}\in[M]^{(t_{q}-t_{q-1})\times K}. Assign 𝐜^[𝐜^𝐜^q]\hat{\mathbf{c}}\leftarrow[\hat{\mathbf{c}}^{\top}\;\hat{\mathbf{c}}_{q}^{\top}]^{\top}.
7:  end for
8:  Output: 𝐜^[M]N×K\widehat{\mathbf{c}}\in[M]^{N\times K}, where the estimated IF of the kk-th harmonic is given by the kk-th column of 𝐜^\hat{\mathbf{c}}.

3.2. Step 2: Peel off ridges for harmonics by SAMD

After identifying one IMT function’s harmonic ridges using MHRD, we employ the SAMD algorithm [11] to extract the corresponding IMT function (𝐱1[1]\mathbf{x}^{[1]}_{1} in Algorithm 1) related to the detected ridges with its fundamental component’s IF. In short, SAMD is an optimization-based fitting algorithm that uses the estimated fundamental component of an IMT function (𝐱,1[i]\mathbf{x}_{\ell,1}^{[i]} in Algorithm 1 using (5) with e1𝐜()e_{1}^{\top}\mathbf{c}_{(\ell)}) as inputs. The loss function accounts for the time-varying WSF, and its minimizer provides an estimate of the IMT function. See [11] for details.

Subtracting this extracted IMT function from the input signal f1f_{1} results in a new signal denoted as f2f_{2}, containing (L1)(L-1) IMT functions. The discretized TFR of f2f_{2}, determined by STFT or SST, is represented as 𝐑2N×M\mathbf{R}_{2}\in\mathbb{C}^{N\times M}. Note that ridges related to the extracted IMT function do not exist in 𝐑2\mathbf{R}_{2}. This step thus serves as a replacement for (8).

Having obtained the initial IMT function and 𝐑2\mathbf{R}_{2}, we can proceed to iterate Step 1 (MHRD) on f2f_{2} and 𝐑2\mathbf{R}_{2} to extract the harmonic ridges of one of the remaining IMT functions. Then, Step 2 is once again executed to extract the associated IMT function using the SAMD algorithm. This iterative process is repeated for a total of LL cycles, resulting in the complete set of harmonic ridges for all IMT functions. These initial ridge estimates, denoted as c1,,cLc_{1}^{*},\ldots,c_{L}^{*}, are ordered based on the fundamental components’ IFs, from low to high.

In practice, it is not guaranteed which IMT function will be extracted in Steps 1 and 2. It is possible that the IF and phase of its associated fundamental component is affected by the harmonics of other IMT functions with lower IF. Thus, with the initial estimate c1,,cLc_{1}^{*},\ldots,c_{L}^{*}, we repeat Steps 1 and 2 on f1f_{1} and 𝐑1\mathbf{R}_{1} by replacing 𝐒(K)(β)\mathbf{S}^{(K)}(\beta) in (12) by

(13) 𝐒¯(K)\displaystyle\bar{\mathbf{S}}^{(K)} (β)={𝐜:[N][M]K1:\displaystyle\,(\beta)=\big{\{}\mathbf{c}:[N]\to[M]^{K-1}:
|ek1𝐜(j)kc1(j)|βc1(j)k{2,,K}}\displaystyle|e_{k-1}^{\top}\mathbf{c}(j)-kc^{*}_{1}(j)|\leq\beta c^{*}_{1}(j)\;\forall k\in\{2,\ldots,K\}\big{\}}

and estimating the phase via (5). This ridge estimate is less influenced by harmonics from other IMT functions due to the IF separation condition (C2). We then repeat Steps 1 and 2 on f2f_{2} and 𝐑2\mathbf{R}_{2} with 𝐒(K)\mathbf{S}^{(K)} modified in the same way as (13) by considering c2c_{2}^{*} . We continue this process iteratively with c3,c4,c_{3}^{*},c_{4}^{*},\ldots and so on until ridges of all LL IMT functions are extracted, denoted as c¯1,,c¯L\bar{c}_{1}^{*},\ldots,\bar{c}_{L}^{*}.

3.3. Iteration

While it is valid to conclude the process after these iterations, it is possible to enhance the results through an additional iteration of the LL cycles with c¯1,,c¯L\bar{c}_{1}^{*},\ldots,\bar{c}_{L}^{*}. This reiteration step has the potential to yield improved outcomes, and we recommend considering this option for I1I\geq 1 times. It is worth noting that the output includes the decomposition of all IMT functions from the input signal ff.

3.4. Speed up SAMD-MHRD with auxiliary information

Solving the optimization problem (10) can be time-consuming, especially when KK is large and β\beta is wide. However, if the phase of the fundamental component of the targeted IMT function is known or can be well estimated in advance, the algorithm can be accelerated. Let 𝐑~\widetilde{\mathbf{R}}, λk\lambda_{k} and μk\mu_{k} be the same as (10), and c1:[N][M]c_{1}:[N]\rightarrow[M] be the known IF of the fundamental component; that is, the extra auxiliary information. In this case, we only need to estimate the higher-order harmonics, and hence (10) is reduced to

(14) 𝐜=argmax𝐜:[N][M]Kk=1K1c1(k+1)(ek𝐜),\displaystyle\mathbf{c}^{*}=\operatorname*{argmax}_{\mathbf{c}:[N]\rightarrow[M]^{K}}\sum_{k=1}^{K-1}\mathcal{L}^{(k+1)}_{c_{1}}(e_{k}^{\top}\mathbf{c})\,,

where c1(k)(𝐟):=j=1N|𝐑~(j,𝐟(j))|λkj=2N|𝐟(j)𝐟(j1)|2μk=1N|𝐟(j)kc1(j)|2\mathcal{L}_{c_{1}}^{(k)}(\mathbf{f}):=\sum_{j=1}^{N}\left|\tilde{\mathbf{R}}\left(j,\mathbf{f}(j)\right)\right|-\lambda_{k}\sum_{j=2}^{N}|\mathbf{f}(j)-\mathbf{f}(j-1)|^{2}-\mu_{k}\sum_{\ell=1}^{N}|\mathbf{f}(j)-kc_{1}(j)|^{2} for k{2,,K}k\in\{2,\cdots,K\} and 𝐟:[N][M]\mathbf{f}:[N]\rightarrow[M]. The following proposition shows that the optimization problem (14) can be reduced to several single-valued curve optimization problems, where the proof can be found in Section C.

Proposition 3.1.

Fix K2K\geq 2. The curve 𝐡k:[N][M]\mathbf{h}_{k}:[N]\rightarrow[M] satisfying

(15) 𝐡k=argmaxh:[N][M]c1(k+1)(h),\mathbf{h}_{k}=\operatorname*{argmax}_{h:[N]\rightarrow[M]}\mathcal{L}_{c_{1}}^{(k+1)}(h)\,,

where 1kK11\leq k\leq K-1, is one of the rows of 𝐜\mathbf{c}^{*} in (14).

This proposition allows us to estimate harmonics’ IFs by solving a modified single curve extraction program with respect to c1c_{1} for each 2kK2\leq k\leq K; that is,

(16) 𝐡k=argmaxh:[N][M]c1(k)(h),\mathbf{h}_{k}^{*}=\operatorname*{argmax}_{h:[N]\rightarrow[M]}\mathcal{L}_{c_{1}}^{(k)}(h),

instead of solving (14) directly. This reduces the complexity of the original problem. This auxiliary information is often available in many applications. For example, when analyzing photoplethysmogram (PPG) and electrocardiogram (ECG) signals, we can use the cardiac phase from the ECG as an accurate phase estimate for the cardiac component in the PPG signal. This allows us to speed up SAMD-MHRD when analyzing the cardiac component in the PPG signal.

3.5. Parameters selection

To optimize the performance of SAMD-MHRD for a given KK, selecting suitable penalties λ1,,λK\lambda_{1},\ldots,\lambda_{K} is crucial. To address this, we propose a simple grid search approach with the following rationale. Assuming that the algorithm accurately detects the ridges corresponding to the true IFs, we can consider masking the extracted curves on the TFR, similar to (8), by appropriately choosing η\eta_{-} and η+\eta_{+}. To simplify the discussion, we apply the CFB approach [10] below, while VFB and MB can also be considered. The resulting TFR should exhibit reduced signal components. This insight prompts us to employ the α\alpha-Rényi entropy [4, 33], where α>0\alpha>0, as a metric to quantify and compare the energy distribution along the frequency axis of both the masked and original TFRs.

Consider the parameter set

𝒮δλ,Λ:={\displaystyle\mathcal{S}_{\delta_{\lambda},\Lambda}:=\big{\{} (λ1,,λK):λk=(1(k1)δλ)λ1\displaystyle(\lambda_{1},\cdots,\lambda_{K}):\lambda_{k}=(1-(k-1)\delta_{\lambda})\lambda_{1}
(17) for k=1,,K,λ1Λ},\displaystyle\text{ for }k=1,\ldots,K\,,\lambda_{1}\in\Lambda\big{\}}\,,

where δλ>0\delta_{\lambda}>0 is small, KK satisfies 1(K1)δλ>01-(K-1)\delta_{\lambda}>0, and Λ(0,)\Lambda\subset(0,\infty) is a finite set. Note that while 𝒮δλ,Λ\mathcal{S}_{\delta_{\lambda},\Lambda} looks like a high-dimensional space, it is effectively a one-dimensional space. In this context, the ordering λk<λ\lambda_{k}<\lambda_{\ell} for k>k>\ell adheres to the slowly varying condition |ϕj′′|ϵjϕ1|\phi_{j}^{\prime\prime}|\leq\epsilon j\phi_{1}^{\prime} for j1j\geq 1. For the parameter β\beta required to satisfy the WSF condition in 𝐒(K)(β)\mathbf{S}^{(K)}(\beta) (11), we consider a finite subset M(0,21)M\subset(0,2^{-1}).

Take the TFR 𝐑\mathbf{R}. For each λ1Λ\lambda_{1}\in\Lambda and βM\beta\in M, extract the curves by SAMD-MHRD with (λ1,,λK)Sδλ,Λ(\lambda_{1},\cdots,\lambda_{K})\in S_{\delta_{\lambda},\Lambda} and β\beta, and denote the result to be 𝐜(λ1,β)\mathbf{c}^{*}(\lambda_{1},\beta). Then, “mask” 𝐑\mathbf{R} by the ridges 𝐜(λ1,β)\mathbf{c}^{*}(\lambda_{1},\beta) following (8), and denote the masked TFR as 𝐑(λ1,β),Δ\mathbf{R}_{(\lambda_{1},\beta),\Delta}, where the masking size Δ>0\Delta>0 respects the one used in the reconstruction formula (5). Define a sequence (q(λ1,β)(1),,q(λ1,β)(N))(q^{(\lambda_{1},\beta)}(1),\cdots,q^{(\lambda_{1},\beta)}(N)) by q(λ1,β)():=Rα(|𝐑(λ1,β),η(,)|)q^{(\lambda_{1},\beta)}(\ell):=R_{\alpha}(|\mathbf{R}_{(\lambda_{1},\beta),\eta}(\ell,\cdot)|), [N]\ell\in[N]. Finally, we choose

(18) (λ1,β)=argmax(λ1,β)Λ×M=1Nq(λ1,β)()N(\lambda_{1}^{*},\beta^{*})=\operatorname*{argmax}_{(\lambda_{1},\beta)\in\Lambda\times M}\sum_{\ell=1}^{N}\frac{q^{(\lambda_{1},\beta)}(\ell)}{N}

as the optimal (λ1,β)(\lambda_{1},\beta) in Λ×M\Lambda\times M. In practice, Λ\Lambda is chosen to range from 10110^{-1} to 10110^{1} and MM is chosen to range from 262^{-6} to 212^{-1}, both with a uniform grid in the log scale.

4. Numerical simulation

4.1. Single IMT function (L=1L=1)

In this first example, we compare different RD algorithms on signals with complicated WSF dynamics. Denote the smoothed standard Brownian motion X1(t)=WKB(t)X_{1}(t)=W\ast K_{B}(t), where WW is the standard Brownian motion, \ast indicates convolution, and KBK_{B} is the Gaussian function with the standard deviation B=20B=20 seconds. Over [0,50][0,50], set

A1(t)=e(t1030)2(30t|X1(s)|X1L[0,50]𝑑s+52)A_{1}(t)=e^{-\left(\frac{t-10}{30}\right)^{2}}\left(3\int_{0}^{t}\frac{|X_{1}(s)|}{\|X_{1}\|_{L^{\infty}[0,50]}}ds+\frac{5}{2}\right)

and

ϕ(t)=[ϕ0(t)+0tX2(s)X2L[0,50]𝑑s]+UKB(t),\phi_{\ell}(t)=\ell[\phi_{0}(t)+\int_{0}^{t}\frac{X_{2}(s)}{\|X_{2}\|_{L^{\infty}[0,50]}}ds]+U_{\ell}\ast K_{B^{\prime}}(t)\,,

where X2X_{2} is an independent and identical copy of X1X_{1}, UU_{\ell} are independent white Gaussian process with mean 0 and variance 0.10.1, B=5B^{\prime}=5 seconds, and

(19) ϕ0(t)=0.97t+t1.938+320t0se(u252.5)2050e(τ252.5)2𝑑τ𝑑u𝑑s.\phi_{0}(t)=0.97t+\frac{t^{1.9}}{38}+\frac{3}{2}\int^{t}_{0}\int_{0}^{s}\frac{e^{-\left(\frac{u-25}{2.5}\right)^{2}}}{\int_{0}^{50}e^{-\left(\frac{\tau-25}{2.5}\right)^{2}}d\tau}duds\,.

Note that the term UKB(t)U_{\ell}\ast K_{B^{\prime}}(t) models the time-varying WSF. Consider a non-stationary noise Φ\Phi, where for n=1,,5,000n=1,\ldots,5,000, Φ(n)\Phi(n) follows an ARMA(1,1)(1,1) process with Student’s t4 innovation process, and for n=5,000,,10,000n=5,000,\ldots,10,000, Φ(n)\Phi(n) i.i.d. follows a Student’s t5 distribution. We assume Φ\Phi is independent of X1X_{1} and X2X_{2} The simulated signal is defined over [0,50][0,50] with the sampling rate 200200Hz so that

(20) Y1(n):=s1(n/200)+σΦ(n),Y_{1}(n):=s_{1}(n/200)+\sigma\Phi(n),

where n=1,,10,000n=1,\ldots,10,000, s1(t):=A1(t)x1(t)s_{1}(t):=A_{1}(t)x_{1}(t), σ0\sigma\geq 0 is determined by the desired signal-to-noise ratio (SNR), defined as 20log10STD(s1)STD(Φ)20\log_{10}\frac{\texttt{STD}(s_{1})}{\texttt{STD}(\Phi)} with STD standing for the standard deviation, x1(t)=D1cos(2πϕ1(t))+(u1+u2)cos(2πϕ2(t))+u1cos(2πϕ3(t))x_{1}(t)=D_{1}\cos(2\pi\phi_{1}(t))+(u_{1}+u_{2})\cos(2\pi\phi_{2}(t))+u_{1}\cos(2\pi\phi_{3}(t)), D1(0,1]D_{1}\in(0,1] models the intensity of the fundamental component, and u1,u2u_{1},u_{2} are independent random variables uniformly distributed on [0,1)[0,1) that are independent of X(t)X(t) and Φ(n)\Phi(n). See Figure 8 in Section B for a realization of Y1(t)Y_{1}(t).

Then, we realize independently Y1(t)Y_{1}(t) over [0,50][0,50] for 100100 times. For a fair comparison with existing algorithms, we utilize the provided code from the authors [10, 19], following their recommended guidelines.333The code for FM-RD is obtained through private communication with the authors. The code for RRP-RD is available at https://github.com/Nils-Laurent/RRP-RD. However, since these algorithms are not tailored to handle WSFs, we need to make minor adjustments and assume the number of harmonics is known. In the case of FM-RD, we incorporate the peeling step (8) as a post-processing procedure. Essentially, we run FM-RD iteratively and apply TFR masking three times. Among the three extracted curves, we identify the one with the lowest frequency value as the result for the fundamental component’s ridge. Regarding RRP-RD, we concurrently detect three relevant ridge portions on the TF domain and extract the corresponding ridges. From these, we choose the ridge associated with the lowest frequency value as the outcome. If RRP-RD is unsuccessful in identifying three ridge portions, we rerun it to detect two RRPs. If this attempt also fails, we proceed with a single RRP. Finally, we assign the ridge with the lowest frequency as the final outcome.

Let c1()c_{1}^{(\square)} be the detected ridge by the algorithm \square, which could be S, FM, RRP or MH, standing for the Single-RD, FM-RD, RRP-RD or SAMD-MHRD. To compare different algorithms, we consider the following metric:

(21) δ():=ϕ1c1()2/ϕ12.\delta^{(\square)}:=\|\phi_{1}^{\prime}-c_{1}^{(\square)}\|_{2}/\|\phi_{1}^{\prime}\|_{2}\,.

We run the Wilcoxon signed-rank test on 100100 realizations to test the null hypothesis that the median of δ()δ(MH)\delta^{(\square)}-\delta^{(\textsf{MH})} is zero, with a significance level of 0.050.05 and Bonferroni correction.

The results are depicted in Fig. 4, illustrating the distributions of δ(){\delta^{(\square)}} for a fixed D1D_{1} (rows: D1=0.1D_{1}=0.1, D1=0.2D_{1}=0.2, D1=0.5D_{1}=0.5) and a constant noise level (columns: noise-free, SNR = 5 dB, SNR = 0 dB) with asterisks indicating instances where p<0.05/27p<0.05/27. The outcomes reveal that in the absence of noise, all algorithms yield comparable results. However, when encountering scenarios with a weak fundamental component and noise interference, FM-RD and RRP-RD exhibit limitations compared to SAMD-MHRD. More numerical results for RRP-RD with different number of ridge portions can be found in Section B.

Refer to caption
Figure 4. Log-scale distributions of δ()\delta^{(\square)} for various RD algorithms. Top to bottom rows correspond to D1=0.1D_{1}=0.1, D1=0.2D_{1}=0.2, and D1=0.5D_{1}=0.5, respectively. From left to right columns: noise-free, SNR = 5 dB, and SNR = 0 dB.

4.2. Multiple IMT functions (L>1L>1)

Following the same notations used in Section 4.1, we consider the case when L=2L=2. Denote X3X_{3} and X4X_{4} as two independent copies of the smoothed standard Brownian motion and are independent of X1X_{1}, X2X_{2}, Φ\Phi, u1u_{1} and u2u_{2}. Over [0,50][0,50], set two random processes

A2(t):=e(t4025)1.8(30t|X3(s)|X3L[0,50]𝑑s+2.3)A_{2}(t):=e^{-\left(\frac{t-40}{25}\right)^{1.8}}\left(3\int_{0}^{t}\frac{|X_{3}(s)|}{\|X_{3}\|_{L^{\infty}[0,50]}}ds+2.3\right)

and

ϕ2(t)=ϕ0(t)+(ϕ(t)ϕ0(t))+0tX4(s)X4L[0,50]𝑑s,\phi_{2}(t)=\phi_{0}^{*}(t)+(\phi(t)-\phi_{0}(t))+\int_{0}^{t}\frac{X_{4}(s)}{\|X_{4}\|_{L^{\infty}[0,50]}}ds\,,

where ϕ0(t)=2.33t+0.2t2\phi_{0}^{*}(t)=2.33t+0.2t^{2}. The second simulated signal is defined over [0,50][0,50] with the sampling rate 200200Hz so that

(22) Y2(n):=s1(n/200)+s2(n/200)+σΦ(n),Y_{2}(n):=s_{1}(n/200)+s_{2}(n/200)+\sigma\Phi(n),

where s2(t):=A2(t)x2(t)s_{2}(t):=A_{2}(t)x_{2}(t), x2(t)=D2cos(2πϕ1(t))+(u1+u2)cos(2πϕ2(t))+(u1)cos(2πϕ3(t))]x_{2}(t)=D_{2}\cos(2\pi\phi_{1}^{*}(t))+(u_{1}^{*}+u_{2}^{*})\cos(2\pi\phi_{2}^{*}(t))+(u_{1}^{*})\cos(2\pi\phi_{3}^{*}(t))\big{]}, D2(0,1]D_{2}\in(0,1] models the intensity of the fundamental component, u1u_{1}^{*} and u2u_{2}^{*} are the independent copies of u1u_{1} and σ0\sigma\geq 0 is again determined by the desired SNR. Then we realize 100 copies of Y2Y_{2} for each selected SNR.

See Fig. 5 for the RD results and a decomposition example of SAMD-MHRD. Both IMT functions are successfully recovered. We then run SAMD-MHRD and evaluate the reconstruction error of the jj-th IMT function as

(23) δj[q]=sjs^j[q]2/sj2,\delta^{[q]}_{j}=\|s_{j}-\widehat{s}^{[q]}_{j}\|_{2}/\|s_{j}\|_{2}\,,

where the superscript qq indicates the qq-th iteration. See Fig. 6 for the comparison of δj[1]\delta^{[1]}_{j} and δj[3]\delta^{[3]}_{j} under different SNRs. As SNR decreases, errors increase. Additionally, the second IMT exhibits significantly larger recovery error than the first IMT, as tested with the Wilcoxon test at a significance level of 0.0050.005, which is reasonable due to its dependence on the harmonics of the first IMT. Iteration improvement is statistically significant except for the second IMT at high noise levels.

Refer to caption
Figure 5. (a) A signal fulfilling the ANHM with L=2L=2 contaminated by noise Φ\Phi with an SNR of 5dB is shown as the grey curve. The clean signal is superimposed as the red curve. (b)-Left: the 2nd-order SST of the noisy signal. (b)-Middle: the true IFs of the first three harmonics of the first IMT function are superimposed as blue-dashed curves. (b)-Right: the 2nd-order SST of the extracted first IMT function by SAMD-MHRD with 3 iterations. (c) and (d): The reconstructed first and second IMT functions are shown as the red curves and the true IMT functions are superimposed as the black curves. (e): The superposition of the reconstructed IMT functions is shown as the red curve and the clean signal is superimposed as the black curve.
Refer to caption
Figure 6. Errorbars of 100 realizations of log(δ1[q])\log(\delta_{1}^{[q]}) (left) and log(δ2[q])\log(\delta_{2}^{[q]}) (right), where δj[q]\delta_{j}^{[q]} is defined in (23). Blue: first iteration (q=1q=1), pink: third iteration (q=3q=3). Asterisk: recovery error of IMT2 is significantly greater than that of IMT1 at the same noise level. Circle: log(δ1[1])\log(\delta_{1}^{[1]}) is significantly greater than log(δ1[3])\log(\delta_{1}^{[3]}) at the same noise level.

5. Application to walking prediction from IMU

We now apply the proposed algorithm to detect the walking activity from the IMU signal. The data obtained from the triaxial accelerometer signals are denoted as fx[loc](t)f^{[\texttt{loc}]}_{x}(t), fy[loc](t)f^{[\texttt{loc}]}_{y}(t), and fz[loc](t)f^{[\texttt{loc}]}_{z}(t), where loc indicates the sensor’s placement location, and xx, yy, and zz represent the axes. We focus on the physical activity signal, defined as

Y[loc](t)=[fx[loc](t)]2+[fy[loc](t)]2+[fz[loc](t)]2.Y^{[\texttt{loc}]}(t)=\sqrt{[f^{[\texttt{loc}]}_{x}(t)]^{2}+[f^{[\texttt{loc}]}_{y}(t)]^{2}+[f^{[\texttt{loc}]}_{z}(t)]^{2}}\,.

To detect the walking activity, we modify the ANHM model in (3) with L=1L=1 slightly to account for Y[loc](t)Y^{[\texttt{loc}]}(t). Specifically, we express it as follows:

(24) Y[loc](t)=q=1Qj=1Daq,j(t)cos(2πϕq,j(t))χIq+Φ(t),Y^{[\texttt{loc}]}(t)=\sum_{q=1}^{Q}\sum_{j=1}^{D}a_{q,j}(t)\cos(2\pi\phi_{q,j}(t))\chi_{I_{q}}+\Phi(t)\,,

where IqI_{q}, q=1,,Qq=1,\ldots,Q, are disjoint connected intervals in \mathbb{R} that describes the wax-and-wane effect of walking; that is, a subject might walk for a bit, stop for a while, and continue to walk. Physically, j=1aq,j(t)2\sqrt{\sum_{j=1}^{\infty}a_{q,j}(t)^{2}} reflects the intensity of the walking activity over IqI_{q}, with both strength and pattern exhibiting variations between different steps. The walking activity encompasses a fusion of activities across body locations, resulting in the dependence of aq,j(t)a_{q,j}(t) on sensor placement and synchronization of ϕq,1(t)\phi_{q,1}(t) across sensors. Even within the same location, ϕq,1(t)\phi_{q,1}(t), ϕq,1(t)\phi_{q,1}^{\prime}(t), and aq,j(t)a_{q,j}(t) can differ between different walking bouts. For instance, the walking pattern on a flat surface during interval IiI_{i} might differ from the pattern during stair climbing in interval IjI_{j}. This model reduces to (3) with L=1L=1 when Q=1Q=1, and has been used in [42] for cadence estimation.

Finally, we make an assumption regarding the durations of walking intervals. The definition of “walking” lacks universal consensus [38], leading to questions about whether a mere one or two steps qualify as “walking” and what is the number of continuous steps needed to classify an interval as “walking”. While our work does not aim to address this scientific matter, we provide a mathematical definition based on the properties of the TF analysis tools we utilize. Building upon (C1)-(C8), we introduce the following assumption:

  • (C9)

    For each q=1,,Qq=1,\ldots,Q, we have |Iq|>C/ϕq,1(t)|I_{q}|>C/\phi_{q,1}^{\prime}(t) for all tIqt\in I_{q} and a sufficiently large C>1C>1.

That is, we require |Iq|>ϕq,11(2πC)ϕq,11(0)|I_{q}|>\phi_{q,1}^{-1}(2\pi C)-\phi_{q,1}^{-1}(0), as the phase advances by 2π2\pi after each cycle. Empirical evidence suggests that, to reliably estimate the IF of an oscillatory signal using STFT or SST, the window function should span around 88 to 1010 oscillatory cycles. Therefore, we designate a process as “walking” only if a subject performs more than 88 continuous steps by setting C=8C=8. Note that IqI_{q} can vary depending on the walking pace; faster cadence leads to shorter IqI_{q}. This definition aligns with the concept of sustained harmonic walking (SHW) presented in [38], where SHW involves walking for at least 1010 seconds (in line with C9) with minimal variability in step frequency (in line with C2).

In practice, I1,,IQI_{1},\ldots,I_{Q} are unknown. The main signal processing mission we consider in this section is estimating I1,,IQI_{1},\ldots,I_{Q} from one realization of the random process YY (or recorded accelerometer signal).

5.1. Database

We analyze the Indiana University Walking and Driving Study (IUWDS)444https://physionet.org/content/accelerometry-walk-climb-drive/1.0.0/ dataset, collected in 2015 from n=32n=32 individuals (19 females) aged 21 to 51 years. This study took place in a semi-controlled environment, where participants were directed to follow a route involving walking on flat ground and climbing stairs. Data was recorded using ActiGraph GT3X+ accelerometers at 100 Hz. Fig. 2 provides an example of this dataset. The study was ethically approved by the IRB of Indiana University, and participants provided informed written consent. The dataset comprises recordings from four locations: wrist (wr), hip (hi), left ankle (la), and right ankle (ra), alongside expert annotations.

5.2. Our proposed index for walking detection

To harness the distinctive properties of WSF, we propose employing SAMD-MHRD for devising a walking detection metric. Denote the extracted IFs of the harmonics by SAMD-MHRD as 𝐜K×N\mathbf{c}\in\mathbb{R}^{K\times N}, with KK\in\mathbb{N} representing the user-specified number of harmonics. We introduce the SST walking strength index (SST-WSI) at time Δt\ell\Delta_{t} (where =1,,N\ell=1,\ldots,N) as follows:

SSST(Δt):=\displaystyle{\textsf{S}}_{\texttt{SST}}(\ell\Delta_{t}):= k=1Q|jB(k)𝐒(Δt,j)|[q=1M|𝐒(Δt,q)|]1,\displaystyle\,\sum_{k=1}^{Q}\bigg{|}\sum_{j\in B^{(k)}_{\ell}}\mathbf{S}(\ell\Delta_{t},\,j)\bigg{|}\left[\sum_{q=1}^{M}\left|\mathbf{S}(\ell\Delta_{t},\,q)\right|\right]^{-1},

where 𝐒\mathbf{S} is the SST of Y[loc](t)Y^{[\texttt{loc}]}(t),

(25) B(k)=\displaystyle B^{(k)}_{\ell}= {q:max{1,ek𝐜()b}q\displaystyle\,\left\{q:\,\max\{1,\,e_{k}^{\top}\mathbf{c}(\ell)-b\}\leq q\right.
min{M,ek𝐜()+b}}[M],\displaystyle\quad\left.\leq\min\{M,\,e_{k}^{\top}\mathbf{c}(\ell)+b\}\right\}\subset[M],

𝐜\mathbf{c} is determined by SAMD-MHRD, bb\in\mathbb{N} is the bandwidth chosen by the user, and [fs2Mq=1M|𝐒(Δt,q)|]1\left[\frac{f_{s}}{2M}\sum_{q=1}^{M}\left|\mathbf{S}(\ell\Delta_{t},\,q)\right|\right]^{-1} is a normalization factor. We set bb so that B(k)B^{(k)}_{\ell} and B(k1)B^{(k-1)}_{\ell} are sufficiently separate. An intuitive interpretation of SST-WSI is capturing the ratio of energy associated with the walking activity. Specifically, the extracted curves e1𝐜,,eK𝐜{e_{1}^{\top}\mathbf{c},\cdots,e_{K}^{\top}\mathbf{c}} correspond to the most prominent features in the TFR, effectively capturing instances of walking activity. See Section D.1.1 for a justification of SST-WSI. Since the WSF depends on the subject, sensor location and the walking pattern, DD needed for the walking activity detection depends on these parameters. To simplify the discussion, we fix an empirical DD in this work.

5.3. Results

The number of subjects is 3232, with the numbers of walking and non-walking segments in the database 21,24521,245 and 38,62338,623. We consider four existing walking detection indices, including the SHW index, the Entropy Ratio index, the Hilbert transform based index called the Hilbert-WSI, and the index of freezing of gait called FOG-WSI [26]. Details of these indices’ implementation can be found in Section D.1 in the supplementary materials. For the SST-WSI index, we set D=8D=8 and b=0.08b=0.08Hz.

We use Leave One Subject Out Cross Validation (LOSOCV) to evaluate the performance. In each round, one subject is used as the test case while the others determine the threshold. This threshold is then applied to the test subject, and performance is recorded. Results are shown in Fig. 7. For Hilbert-WSI, some validation results give zero true positives, and we define its F1-score to be 0. We use the Wilcoxon signed-rank test (p=0.05p=0.05) with Bonferroni correction to assess significance between indices. SST-WSI and Entropy-Ratio are the top-performing indices, with SST-WSI consistently better.

Finally, we combine SST-WSI and Entropy-Ratio using a support vector machine. Fig. 7 presents the LOSOCV results, indicating that this combination outperforms other indices. Additional results for different sensor placements can be found in Section D.2.

Refer to caption
Figure 7. The performance records of conducting LOSOCV on analyzing the WRIST signal. Left: The accuracy distributions. Right: The F1-score distributions. The asterisk (resp. square and pentagram) marks indicate the performance is lower than that of SST-WSI with statistical significance (resp. Entropy-Ratio group and SST-WSI+Entropy-Ratio, denoted as S+E) with the pp-value less than 0.05/18=0.00280.05/18=0.0028.

6. Discussion and Conclusion

We introduce a novel RD algorithm, SAMD-MHRD, designed for both STFT and SST, to effectively manage nonstationary signals featuring intricate non-sinusoidal WSFs. Numerical investigations and real-world applications showcase its practical promise. This endeavor can be seen as an extension of [40], filling the RD gap toward signal processing missions.

Naturally, one may question its adaptability to other TFRs, such as the commonly used CWT. The CWT’s inherent affine structure can be limiting for oscillatory signals with non-sinusoidal WSFs, as it requires a small scale to capture higher-order harmonics while dealing with broad spectral support, leading to spectral interference among these harmonics. On the other hand, SAMD-MHRD may be applicable to other TF analysis algorithms that can separate harmonics within their TFRs. Further exploration in this area is needed.

Another strategy for handling non-sinusoidal WSFs involves using de-shape STFT [21]. This method aims to mitigate the impact of harmonics in complex situations and could potentially be used to extract the fundamental IF of each IMT function. However, its success relies on the presence of a strong fundamental component. In cases where the fundamental component is weak or absent, preprocessing steps like rectification or nonlinear transforms [36], or the application of a comb filter as explored in [38], might be necessary to enhance the fundamental IF before applying de-shape STFT. Note that the fundamental component can be absent since s(t)=k=1αkcos(2πkt+βk)s(t)=\sum_{k=1}^{\infty}\alpha_{k}\cos(2\pi kt+\beta_{k}), where αk0\alpha_{k}\geq 0 and βk[0,2π)\beta_{k}\in[0,2\pi) is 11-periodic if the greatest common divisor of {kαk>0}\{k\mid\alpha_{k}>0\} is 11. We plan to explore the development of an RD algorithm based on this concept in our future work.

Several challenges and open questions remain in this study. Firstly, we assume knowledge of LL to simplify the analysis. Extending existing tools [37, 32, 19, 31] to handle more general WSF cases is a promising avenue. Secondly, our assumption that the IFs of fundamental components do not overlap needs refinement to address the common scenario of mode mixing. Extending the RD algorithm to accommodate the synchrosqueezed chirplet transform [9] could be a solution for this challenge. The walking detection problem is essentially a change point detection problem. We can potentially generalize the existing bootstrapping-based change point detection algorithm by integrating information related to IF and IA and techniques from [31] to yield a more accurate index. These promising directions will be explored in our future work.

Acknowledgement

The authors would like to thank the anonymous reviewers for their constructive feedbacks.

References

  • [1] Aymen Alian, Yu-Lun Lo, Kirk Shelley, and Hau-Tieng Wu. Reconsider phase reconstruction in signals with dynamic periodicity from the modern signal processing perspective. Foundations of Data Science, 4(3):355–393, 2022.
  • [2] Aymen Alian, Kirk Shelley, and Hau-Tieng Wu. Amplitude and phase measurements from harmonic analysis may lead to new physiologic insights: lower body negative pressure photoplethysmographic waveforms as an example. J. Clin. Monit. Comput., 37(1):127–137, 2023.
  • [3] François Auger and Patrick Flandrin. Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Trans. Signal Process, 43(5):1068–1089, 1995.
  • [4] Richard G Baraniuk, Patrick Flandrin, Augustus JEM Janssen, and Olivier JJ Michel. Measuring time-frequency information content using the Rényi entropies. IEEE Trans. Inf. Theory, 47(4):1391–1409, 2001.
  • [5] Peter J Brockwell and Richard A Davis. Time series: theory and methods. Springer science & business media, 2009.
  • [6] René A Carmona, Wen L Hwang, and Bruno Torrésani. Characterization of signals by the ridges of their wavelet transforms. IEEE Trans. Signal Process, 45(10):2586–2590, 1997.
  • [7] René A Carmona, Wen L Hwang, and Bruno Torrésani. Multiridge detection and time-frequency reconstruction. IEEE Trans. Signal Process, 47(2):480–492, 1999.
  • [8] Yu-Chun Chen, Ming-Yen Cheng, and Hau-Tieng Wu. Non-parametric and adaptive modelling of dynamic periodicity and trend with heteroscedastic and dependent errors. J. R. Stat. Soc. Ser. B. Stat. Methodol., 76(3):651–682, 2014.
  • [9] Ziyu Chen and Hau-Tieng Wu. Disentangling modes with crossover instantaneous frequencies by synchrosqueezed chirplet transforms, from theory to application. Appl. Comput. Harmon. Anal., 62:84–122, 2023.
  • [10] Marcelo A. Colominas, Sylvain Meignen, and Duong-Hung Pham. Fully adaptive ridge detection based on stft phase information. IEEE Signal Processing Letters, 27:620–624, 2020.
  • [11] Marcelo A Colominas and Hau-Tieng Wu. Decomposing non-stationary signals with time-varying wave-shape functions. IEEE Trans. Signal Process, 69:5094–5104, 2021.
  • [12] Ingrid Daubechies, Jianfeng Lu, and Hau-Tieng Wu. Synchrosqueezed wavelet transforms: an empirical mode decomposition-like tool. Appl. Comput. Harmon. Anal., 30(2):243–261, 2011.
  • [13] Nathalie Delprat, Bernard Escudié, Philippe Guillemain, Richard Kronland-Martinet, Philippe Tchamitchian, and Bruno Torresani. Asymptotic wavelet and gabor analysis: Extraction of instantaneous frequencies. IEEE Trans. Inf. Theory, 38(2):644–664, 1992.
  • [14] Anna-Maria Eid, Mohamed Elgamal, Antonio Gonzalez-Fiol, Kirk H Shelley, Hau-Tieng Wu, and Aymen Awad Alian. Using the ear photoplethysmographic waveform as an early indicator of central hypovolemia in healthy volunteers utilizing lbnp induced hypovolemia model. Physiol. Meas., 2023.
  • [15] Patrick Flandrin. Time-frequency/time-scale analysis. Academic press, 1998.
  • [16] Peter Hall, Wei Qian, and DM Titterington. Ridge finding from noisy data. Journal of Computational and Graphical Statistics, 1(3):197–211, 1992.
  • [17] D. Iatsenko, P.V.E. McClintock, and A. Stefanovska. Extraction of instantaneous frequencies from ridges in time–frequency representations of signals. Signal Processing, 125:290–303, 2016.
  • [18] Matthieu Kowalski, Adrien Meynard, and Hau-tieng Wu. Convex optimization approach to signals with fast varying instantaneous frequency. Applied and computational harmonic analysis, 44(1):89–122, 2018.
  • [19] Nils Laurent and Sylvain Meignen. A novel ridge detector for nonstationary multicomponent signals: Development and application to robust mode retrieval. IEEE Trans. Signal Process, 69:3325–3336, 2021.
  • [20] Quentin Legros and Dominique Fourer. A novel pseudo-bayesian approach for robust multi-ridge detection and mode retrieval. In 2021 29th European Signal Processing Conference (EUSIPCO), pages 1925–1929, 2021.
  • [21] Chen-Yun Lin, Li Su, and Hau-Tieng Wu. Wave-shape function analysis–when cepstrum meets time-frequency analysis. J. Fourier Anal. Appl., 24(2):451–505, 2018.
  • [22] S. Meignen, T. Gardner, and T. Oberlin. Time-frequency ridge analysis based on the reassignment vector. In 2015 23rd European Signal Processing Conference (EUSIPCO), pages 1486–1490, 2015.
  • [23] Sylvain Meignen, Thomas Oberlin, Philippe Depalle, Patrick Flandrin, and Stephen McLaughlin. Adaptive multimode signal reconstruction from time–frequency representations. Philos. Trans. R. Soc. A-Math. Phys. Eng. Sci., 374(2065):20150205, 2016.
  • [24] Sylvain Meignen, Duong-Hung Pham, and Stephen McLaughlin. On demodulation, ridge detection, and synchrosqueezing for multicomponent signals. IEEE Trans. Signal Process, 65(8):2093–2103, 2017.
  • [25] Adrien Meynard and Hau-Tieng Wu. An efficient forecasting approach to reduce boundary effects in real-time time-frequency analysis. IEEE Transactions on Signal Processing, 69:1653–1663, 2021.
  • [26] Steven T Moore, Hamish G MacDougall, and William G Ondo. Ambulatory monitoring of freezing of gait in parkinson’s disease. Journal of neuroscience methods, 167(2):340–348, 2008.
  • [27] Thomas Oberlin, Sylvain Meignen, and Valérie Perrier. Second-order synchrosqueezing transform or invertible reassignment? towards ideal time-frequency representations. IEEE Trans. Signal Process, 63(5):1335–1344, 2015.
  • [28] Alan V. Oppenheim and Ronald W. Schafer. From frequency to quefrency: A history of the cepstrum. IEEE Signal Processing Magazine, 21(5):95–106, 2004.
  • [29] N. Ozkurt and F.A. Savaci. Determination of wavelet ridges of nonstationary signals by singular value decomposition. IEEE Trans. Circuits Syst. II, 52(8):480–485, 2005.
  • [30] Benjamin Ricaud and Bruno Torrésani. A survey of uncertainty principles and some signal processing applications. Advances in Computational Mathematics, 40:629–650, 2014.
  • [31] Joaquin Ruiz and Marcelo A Colominas. Wave-shape function model order estimation by trigonometric regression. Signal Processing, 197:108543, 2022.
  • [32] N. Saulig, N. Pustelnik, P. Borgnat, P. Flandrin, and V. and Sucic. Instantaneous counting of components in nonstationary signals. In Proc. 21st Eur. Signal Process. Conf. 2013, pages 1–5, 2013.
  • [33] Yae-Lin Sheu, Liang-Yan Hsu, Pi-Tai Chou, and Hau-Tieng Wu. Entropy-based time-varying window width selection for nonlinear-type time–frequency analysis. Int. J. Data. Sci. Anal., 3:231–245, 2017.
  • [34] Matt Sourisseau, Hau-Tieng Wu, and Zhou Zhou. Asymptotic analysis of synchrosqueezing transform—toward statistical inference with nonlinear-type time-frequency analysis. The Annals of Statistics, 50(5):2694–2712, 2022.
  • [35] Carsten Steger. An unbiased detector of curvilinear structures. IEEE Transactions on pattern analysis and machine intelligence, 20(2):113–125, 1998.
  • [36] Stefan Steinerberger and Hau-Tieng Wu. Fundamental component enhancement via adaptive nonlinear activation functions. Appl. Comput. Harmon. Anal., 2022.
  • [37] V. Sucic, N. Saulig, and B. Boashash. Estimating the number of components of a multicomponent nonstationary signal using the short-term time-frequency Rényi entropy. EURASIP Journal on Advances in Signal Processing, 2011(1):1–11, 2011.
  • [38] Jacek K Urbanek, Vadim Zipunnikov, Tamara Harris, William Fadel, Nancy Glynn, Annemarie Koster, Paolo Caserotti, Ciprian Crainiceanu, and Jaroslaw Harezlak. Prediction of sustained harmonic walking in the free-living environment using raw accelerometry data. Physiol. Meas., 39(2):02NT02, 2018-02-28.
  • [39] Shen-Chih Wang, Chien-Kun Ting, Cheng-Yen Chen, Chinsu Liu, Niang-Cheng Lin, Che-Chuan Loong, Hau-Tieng Wu, and Yu-Ting Lin. Arterial blood pressure waveform in liver transplant surgery possesses variability of morphology reflecting recipients’ acuity and predicting short term outcomes. J. Clin. Monit. Comput., pages 1–11, 2023.
  • [40] H.-T. Wu. Instantaneous frequency and wave shape functions (I). Appl. Comput. Harmon. Anal., 35:181–199, 2013.
  • [41] Hau-Tieng Wu. Current state of nonlinear-type time–frequency analysis and applications to high-frequency biomedical signals. Current Opinion in Systems Biology, 23:8–21, 2020.
  • [42] Hau-Tieng Wu and Jaroslaw Harezlak. Application of de-shape synchrosqueezing to estimate gait cadence from a single-sensor accelerometer placed in different body locations. Physiol. Meas., 44(5):055009, 2023.
  • [43] Xiangxiang Zhu, Zhuosheng Zhang, Jinghuai Gao, and Wenting Li. Two robust approaches to multicomponent signal reconstruction from stft ridges. Mechanical Systems and Signal Processing, 115:720–735, 2019.

Appendix A Numerical implementation of TF analysis tools

We summarize the numerical implementation of STFT and SST in this section. Suppose the signal ff is discretized with a sampling rate fsf_{s} over the time interval [0,T][0,T], where T>0T>0, denoted as 𝐟\mathbf{f}; that is, 𝐟=(f(0),f(1/fs),,f(T/fs))\mathbf{f}=(f(0),f(1/f_{s}),\cdots,f(T/f_{s})) with N=T/fsN=\lfloor T/f_{s}\rfloor, where x\lfloor x\rfloor means the largest integer smaller or equal to xx\in\mathbb{R}. Denote Δt=1/fs>0\Delta_{t}=1/f_{s}>0 as the discretization grid in the time domain. Denote Δξ>0\Delta_{\xi}>0 as the frequency resolution chosen by user. Then, the discretization of STFT of ff is given by

(26) 𝐕(,q)=Vf(h)(Δt,qΔξ),\mathbf{V}(\ell,q)=V_{f}^{(h)}(\ell\Delta_{t},q\Delta_{\xi})\,,

where =1,,N\ell=1,\ldots,N is the total number of samples in the time domain and q=1,,Mq=1,\ldots,M, where M=fs2ΔξM=\lfloor\frac{f_{s}}{2\Delta_{\xi}}\rfloor is the total number of ticks in the frequency domain. In other words, the discretization of the TFR determined by STFT is a complex matrix 𝐕N×M\mathbf{V}\in\mathbb{C}^{N\times M}. Note that we skip evaluating STFT at frequency 0.

Numerically, this discretization is carried out in the following way. Discretize the window hh and denote it as 𝐡2K+1\mathbf{h}\in\mathbb{R}^{2K+1}, where KK\in\mathbb{N} and 2K+12K+1 indicates the window length. A concrete example is discretizing the Gaussian with the standard deviation σ>0\sigma>0 over the interval [0.5,0.5][-0.5,0.5] with a discretization grid size of 12K\frac{1}{2K} as the window by setting

(27) 𝐡(k)=e(k12K0.5)22σ2\displaystyle\mathbf{h}(k)=e^{-\frac{\left(\frac{k-1}{2K}-0.5\right)^{2}}{2\sigma^{2}}}

for k=1,,2K+1k=1,\ldots,2K+1. Note that the discretization grid size 12K\frac{1}{2K} may be different from Δt\Delta_{t}. As the rule of thumb, we choose KK so that the window covers about 8108\sim 10 cycles if the data is not too noisy, and choose a larger KK so that the window covers more, like 152015\sim 20 cycles if the data is noisy. Usually, if the Gaussian function is taken as the window, we suggest to choose σ\sigma so that the Gaussian function is “essentially supported” on [0.5,0.5][-0.5,0.5]; that is, 0.50.5et22σ2𝑑t\int_{-0.5}^{0.5}e^{-\frac{t^{2}}{2\sigma^{2}}}dt is close to 11. We mention that it is possible to a time-dependent σ\sigma to enhance the overall performance. For example, take the Reńyi entropy as a metric to determine the optimal σ\sigma [33]. We skip this detail and refer readers to [33] for details. To implement 𝐕\mathbf{V}, extend 𝐟\mathbf{f} beyond 1,,N1,\ldots,N so that 𝐟(l):=0\mathbf{f}(l):=0 when l<1l<1 or l>Nl>N. It is possible to extend 𝐟\mathbf{f} by forecasting to avoid the boundary effect [25], but we skip this technical detail to simplify the discussion. Then, compute

(28) 𝐕(n,m)=k=12K+1𝐟(n+kK1)𝐡(k)ei2π(k1)m2M,\mathbf{V}(n,m)=\sum_{k=1}^{2K+1}\mathbf{f}(n+k-K-1)\mathbf{h}(k)e^{-i2\pi\frac{(k-1)m}{2M}},

where n=1,,Nn=1,\ldots,N is the index in the time domain and m=1,,Mm=1,\ldots,M is the index in the frequency domain.

The numerical implementation of SST is denoted as 𝐒N×M\mathbf{S}\in\mathbb{C}^{N\times M}, which is a nonlinear transform of STFT. Here we focus on its numerical implementation and refer readers to [12, 8] for its theory in the continuous setup. The main idea beyond SST is utilizing the phase information hidden in STFT. Consider the discretization of the derivative of the window function hh, which is denoted as 𝐡2K+1\mathbf{h}^{\prime}\in\mathbb{R}^{2K+1}. In the Gaussian window case, it is defined as

𝐡(k)=(k12K0.5)𝐡(k)σ2.\mathbf{h}^{\prime}(k)=-\left(\frac{k-1}{2K}-0.5\right)\frac{\mathbf{h}(k)}{\sigma^{2}}\,.

Then, compute another STFT of 𝐟\mathbf{f} with the window 𝐡\mathbf{h}^{\prime}, denoted as 𝐕N×M\mathbf{V}^{\prime}\in\mathbb{C}^{N\times M}, by

(29) 𝐕(n,m)=k=12K+1𝐟(n+kK1)𝐡(k)ei2π(k1)m2M,\mathbf{V}^{\prime}(n,m)=\sum_{k=1}^{2K+1}\mathbf{f}(n+k-K-1)\mathbf{h}^{\prime}(k)e^{\frac{-i2\pi(k-1)m}{2M}}\,,

where n=1,,Nn=1,\ldots,N is the index in the time domain and m=1,,Mm=1,\ldots,M is the index in the frequency domain. Next, compute the frequency reassignment operator, denoted as ΩN×M{\Omega}\in\mathbb{C}^{N\times M}, which is defined as

(32) Ω(n,m)={(N2π(2K+1)𝐕(n,m)𝐕(n,m)) when |𝐕(n,m)|>υ when |𝐕(n,m)|υ,\displaystyle{\Omega}(n,m)=\left\{\begin{array}[]{ll}-\Im\left(\frac{N}{2\pi(2K+1)}\frac{\mathbf{V}^{\prime}(n,m)}{\mathbf{V}(n,m)}\right)&\mbox{ when }|\mathbf{V}(n,m)|>\upsilon\\ -\infty&\mbox{ when }|\mathbf{V}(n,m)|\leq\upsilon\,,\end{array}\right.

where \Im is the operator of evaluating the imaginary part and υ>0\upsilon>0 is a threshold chosen by the user. In practice, υ\upsilon is set to avoid possible blowup situation that |𝐕(n,m)|0|\mathbf{V}(n,m)|\approx 0 and |𝐕(n,m)||\mathbf{V}^{\prime}(n,m)| is large, and can be set to 1010 times of the machine epsilon. Ω(n,m){\Omega}(n,m) gives the instantaneous frequency information of some IMT inside the signal at time nΔtn\Delta_{t} and frequency mΔξm\Delta_{\xi}. See [12, 8, 34] for more technical details. Finally, the SST of 𝐟\mathbf{f} is computed by

(33) 𝐒(n,m)=l;m=lΩ(n,m)𝐕(n,l),\displaystyle\mathbf{S}(n,m)=\sum_{l;\,m=l-{\Omega}(n,m)}\mathbf{V}(n,l)\,,

where n=1,,Nn=1,\ldots,N is the index in the time domain and m=1,,Mm=1,\ldots,M is the index in the frequency domain. In short, to detect if the input signal oscillates at frequency mΔξm\Delta_{\xi}, we find all STFT coefficients sharing the same instantaneous frequency information by searching over Ω\Omega, sum them together, and put the result in the mm-th entry of the nn-th row of 𝐒\mathbf{S}. This step is called “squeezing”. Since the squeezing happens at a fixed time, this motivated the “synchro” part of the nomination SST.

In this paper, we denote the discretized TFR of ff, either determined by STFT or SST, as 𝐑N×M\mathbf{R}\in\mathbb{C}^{N\times M}.

Appendix B More details of numerical simulation

In this section, we provide more details of numerical simulation in Section 4. A realization of the simulated signal with a signal-to-noise ratio of 0 dB, as discussed in Section IV.A, is shown in Figure 8. For RRP-RD, a key parameter is the number of ridge portions, set to 33 in the main article. However, noise can cause ridges to split into different ridge portions. We extended the simulation to test various numbers of ridge portions, with results shown in Figure 9. We used Wilcoxon’s signed-rank test with a significance level of 0.050.05 and Bonferroni correction to compare RRP-RD performance across different ridge portions. The results show that performance with 2 ridge portions is significantly lower than with 33 or 44. Increasing from 33 to 44 ridge portions slightly worsens performance, particularly when D1D_{1} is small. In some cases the difference is significant.

Refer to caption
Figure 8. A realization of the simulated signal with a signal-to-noise ratio of 0 dB, as discussed in Section IV.A, is shown. The red curve is the clean signal, and the gray curve is the noisy signal. For better visualization, the segment from the 15th to the 35th second is displayed.
Refer to caption
Figure 9. Additional log-RMSE distributions for the RRP ridge detection method with varying numbers of ridge portions. The box plot displays log-RMSE, with the median represented by a thick black bar. The notation δpRRP\delta_{p}^{\textsf{RRP}} denotes the log-RMSE of the detected fundamental IF curve given by the RRP RD algorithm, with the number of ridge set to be pp, where we have tested on p=2p=2, 33 and 44. Top to bottom rows: D1=0.1D_{1}=0.1, D1=0.2D_{1}=0.2 and D1=0.5D_{1}=0.5, respectively. Left to right columns: noise-free, SNR = 5 dB and SNR = 0 dB, respectively. The asterisk symbol indicates that the median of δpRRP\delta_{p}^{\textsf{RRP}}, where p=3,4p=3,4, is significantly lower than δ2RRP\delta_{2}^{\textsf{RRP}}. The circle symbol indicates that the median of δ4RRP\delta_{4}^{\textsf{RRP}} is significantly higher than δ3RRP\delta_{3}^{\textsf{RRP}}.

Appendix C Proof of Proposition 3.1

Proof.

Let 𝐡[M](K1)×N\mathbf{h}\in[M]^{(K-1)\times N} so that its kk-th row is 𝐡k\mathbf{h}_{k}. Then,

c1(k+1)(ek𝐜)c1(k+1)(ek𝐡)=maxh:[N][M]c1(k+1)(h),\mathcal{L}_{c_{1}}^{(k+1)}(e_{k}^{\top}\mathbf{c}^{*})\leq\mathcal{L}_{c_{1}}^{(k+1)}(e_{k}^{\top}\mathbf{h})=\max_{h:[N]\rightarrow[M]}\mathcal{L}_{c_{1}}^{(k+1)}(h)\,,

where ekK1e_{k}\in\mathbb{R}^{K-1} is a unit vector with ek(k)=1e_{k}(k)=1. Thus,

max𝐜:[N][M]K1k=1K1c1(k+1)(ek𝐜)=\displaystyle\max_{\mathbf{c}:[N]\rightarrow[M]^{K-1}}\sum_{k=1}^{K-1}\mathcal{L}_{c_{1}}^{(k+1)}(e_{k}^{\top}\mathbf{c})= k=1K1c1(k+1)(ek𝐜)\displaystyle\,\sum_{k=1}^{K-1}\mathcal{L}_{c_{1}}^{(k+1)}(e_{k}^{\top}\mathbf{c}^{*})
(34) \displaystyle\leq k=1K1c1(k+1)(ek𝐡).\displaystyle\,\sum_{k=1}^{K-1}\mathcal{L}_{c_{1}}^{(k+1)}(e_{k}^{\top}\mathbf{h}).

Since 𝐜\mathbf{c}^{*} is a solution to (14), it follows that k=1K1c1(k+1)(ek𝐜)=k=1K1c1(k+1)(ek𝐡)\sum_{k=1}^{K-1}\mathcal{L}_{c_{1}}^{(k+1)}(e_{k}^{\top}\mathbf{c}^{*})=\sum_{k=1}^{K-1}\mathcal{L}_{c_{1}}^{(k+1)}(e_{k}^{\top}\mathbf{h}). ∎

Appendix D More details for walking prediction from IMU

D.1. Existing indices for walking detection

For a fair comparison, we consider several existing walking detection indices. In [38], the proposed method is to utilize the comb filters to detect the SHW state. The idea is to use a set of “comb teeth functions” on the spectrogram to capture the most possible fundamental step-to-step frequency that has the maximum energy allocation on its multiples. A comb teeth function is determined by its fundamental frequency sDfs\in D_{f}, where DfD_{f} is a set of candidate fundamental frequencies. With the help of a comb filter, they define a function Y(t,s)=maxk=x,y,z{Yk(t,s)}Y(t,s)=\max_{k=x,y,z}\{Y_{k}(t,s)\}, Yk(t,s)Y_{k}(t,s), where ss is the possible fundamental frequency sDfs\in D_{f} and tt\in\mathbb{R} is time, which describes the strength of the periodic content of the accelerometry signal. Finally, the threshold δ\delta that decides whether there is a SHW or not is determined by the density function of maxsDf{Y(t,s)}\max_{s\in D_{f}}\{Y(t,s)\} for all SHW and non-SHW periods for all subjects. At the end, the SHW index is given by

y^δ(t)={1,if maxsDf{Y(t,s)}>δ0,otherwise,\hat{y}_{\delta}(t)=\left\{\begin{array}[]{l}1,\;\text{if }\max_{s\in D_{f}}\{Y(t,s)\}>\delta\\ 0,\;\text{otherwise,}\end{array}\right.

where y^δ(t)=1\hat{y}_{\delta}(t)=1 means the prediction state is SHW and y^δ(t)=0\hat{y}_{\delta}(t)=0 means non-SHW. For each given accelerometry signal, the STFT is computed using the Hanning window of length 10 seconds, the number of the harmonics is set to be 6, and the range of the fundamental frequency DfD_{f} was set to be 1.2 Hz to 4.0 Hz. The kernel density estimation is applied with the normal kernel function to give the estimated densities. We implement the SHW index [38] following all parameters suggested in [38] by choosing the number of harmonics nm=6n_{m}=6, Hanning window with window length τ=10\tau=10 seconds for the STFT, and 7171 candidates for the step-to-step frequency Df={1.2Hz,1.24Hz,,4Hz}D_{f}=\{1.2\text{Hz},1.24\text{Hz},\cdots,4\text{Hz}\} to generate the statistics maxsDf{Y(t,s)}\max_{s\in D_{f}}\{Y(t,s)\}.

Another common idea to detect walking activity is using the entropy ratio. This idea is similar to that used in the parameter selection in Section 3.5. After extracting the IF curves 𝐜\mathbf{c} from the SST 𝐒\mathbf{S}, we construct a masked TFR 𝐒1\mathbf{S}_{1} following (8), and compute the sequence q():=Rα(|𝐒1(Δt,)|)q(\ell):=R_{\alpha}\left(\left|\mathbf{S}_{1}(\ell\Delta_{t},\cdot)\right|\right). Similarly, we define an entropy sequence p():=Rα(|𝐒(Δt,)|)p(\ell):=R_{\alpha}\left(\left|\mathbf{S}(\ell\Delta_{t},\cdot)\right|\right) associated with 𝐒\mathbf{S}. The pointwise ratio of these two sequences tells whether the harmonics exist at certain time segment or not. Hence, the Entropy Ratio index is defined as

(35) Qα,SST(Δt):=med[p()q(),10],=1,,N.Q_{\alpha,\texttt{SST}}(\ell\Delta_{t}):=\textsf{med}\left[\frac{p(\ell)}{q(\ell)},10\right],\;\ell=1,\cdots,N.

The notation 𝗆𝖾𝖽[,10]\mathsf{med}[\cdot,10] is the median filter with a 1010 seconds bandwidth. Intuitively, the index Qα,SST(Δt)Q_{\alpha,\texttt{SST}}(\ell\Delta_{t}) should be small if Δt\ell\Delta_{t} is during a walking period, since the frequency concentration measurement p()p(\ell) of that time on the SST should be small, and the one for the curves-removed TFR, q()q(\ell), should be relatively large. On the contrary, if the state is not in a walking period, then Qα,SSTQ_{\alpha,\texttt{SST}} is expected to be close to 1. To implement the Entropy Ratio index, with the IF curve 𝐜\mathbf{c}, we construct 𝐒1\mathbf{S}_{1} with a bandwidth of 0.040.04Hz and take α=2.4\alpha=2.4.

The next one is based on a bandpass filter to extract the possible walking pattern via the Hilbert transform. The ratio of the energy between 0.5 and 3 Hz and the energy between 0.3 and 8 Hz is called the Hilbert-WSI. This index comes from capturing the energy associated with the fundamental frequency, and the spectral range is chosen to cover the common momentary speed of stride, which is suitable for our data. Another index used in evaluating the freezing of gait (FOG) in patients with Parkinson’s disease is defined as the ratio of the energy between 0.5 and 3 Hz and the energy between 3 and 8 Hz [26]. The index is called iFOG in [26], but for the purpose of unifying the notation, we call it FOG-WSI. Note that while FOG-WSI was not designed for the walking detection, its design can potentially be used for walking detection, and we considered it in this work.

D.1.1. Justification of SST-WSI

It is important to note that SST-WSI’s design accounts for the expansion (2), and the influence of SSST(Δt){\textsf{S}}_{\texttt{SST}}(\ell\Delta_{t}) characterizes the walking strength at time Δt\ell\Delta_{t}. Indeed, when ΔtIj\ell\Delta_{t}\in I_{j} and bfs2M=ϵ1/3\frac{bf_{s}}{2M}=\epsilon^{1/3}, where ϵ>0\epsilon>0 is from the slowly varying assumption for aq,ka_{q,k} and ϕq,k\phi^{\prime}_{q,k}, by [12, 8] we have for k=1,,Kk=1,\ldots,K,

(36) fs2MqB(k)𝐒(Δt,q)aq,k(Δt)e2πiϕq,k(Δt)\displaystyle\frac{f_{s}}{2M}\sum_{q\in B^{(k)}_{\ell}}\mathbf{S}(\ell\Delta_{t},\,q)\approx a_{q,k}(\ell\Delta_{t})e^{2\pi i\phi_{q,k}(\ell\Delta_{t})}

is the reconstruction of the kk-th harmonic when ΔtIq\ell\Delta_{t}\in I_{q}. Thus, the SST-WSI evaluates directly the walking strength since

SSST(Δt)\displaystyle{\textsf{S}}_{\texttt{SST}}(\ell\Delta_{t}) k=1K|aq,k(Δt)ei2πϕq,k(Δt)|2k=1Kaq,k(Δt)2,\displaystyle\,\approx\sum_{k=1}^{K}\left|a_{q,k}(\ell\Delta_{t})e^{i2\pi\phi_{q,k}(\ell\Delta_{t})}\right|^{2}\approx\sum_{k=1}^{K}a_{q,k}(\ell\Delta_{t})^{2}\,,

which is the strength of the walking activity at time Δt\ell\Delta_{t}.

D.2. More results

The comparison of various walking detection indices when the accelerometer signal is recorded from the left ankle (right ankle and hip, respectively) is shown in Figure 10 (Figures 11 and 12, respectively). Note that in the HIP signal experiment, all the validation results of the Hilbert-WSI group has zero True-Positive rate, which leads to 0 F1 (marked by the red frame).

Refer to caption
Figure 10. The performance distributions of the LEFT-ANKLE signal. Left: The accuracy distributions. Right: The F1-score distributions.
Refer to caption
Figure 11. The performance distributions of the RIGHT-ANKLE signal. Left: The accuracy distributions. Right: The F1-score distributions.
Refer to caption
Figure 12. The performance distributions of the HIP signal. Left: The accuracy distributions. Right: The F1-score distributions.