This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Fundamental component enhancement via adaptive nonlinear activation functions

Stefan Steinerberger Department of Mathematics, University of Washington, Seattle, WA 98195, USA [email protected]  and  Hau-tieng Wu Department of Mathematics and Department of Statistical Science, Duke University, Durham, NC, USA [email protected]
Abstract.

In many real world oscillatory signals, the fundamental component of a signal f(t)f(t) might be weak or does not exist. This makes it difficult to estimate the instantaneous frequency of the signal. Traditionally, researchers apply the rectification trick, working with |f(t)||f(t)| or ReLu(f(t))\mbox{ReLu}(f(t)) instead, to enhance the fundamental component. This raises an interesting question: what type of nonlinear function g:g:\mathbb{R}\rightarrow\mathbb{R} has the property that g(f(t))g(f(t)) has a more pronounced fundamental frequency? g(t)=|t|g(t)=|t| and g(t)=ReLu(t)g(t)=\mbox{ReLu}(t) seem to work well in practice; we propose a variant of g(t)=1/(1|t|)g(t)=1/(1-|t|) and provide a theoretical guarantee. Several simulated signals and real signals are analyzed to demonstrate the performance of the proposed solution.

Key words and phrases:
fundamental component; rectification; activation function
2020 Mathematics Subject Classification:
42A99, 92C55.
S.S. was partly supported by the NSF (DMS-2123224) and the Alfred P. Sloan Foundation.

1. Introduction

1.1. The problem.

We start by describing the problem in the simplest possible setting: suppose we are given a 11-periodic signal

(1) f(t)=k=0akcos(2πkt)+bksin(2πkt)f(t)=\sum_{k=0}^{\infty}a_{k}\cos{(2\pi kt)}+b_{k}\sin{(2\pi kt)}

and define the support of SS to be S={k:ak2+bk2>0}S=\left\{k\in\mathbb{N}:a_{k}^{2}+b_{k}^{2}>0\right\}. Our goal is to recover gcd(S)\gcd(S), the greatest common divisor of all the elements in SS. If we have recorded two or more periods of the signal, then the gcd\gcd will be bigger than 1. In that case, we call gcd(S)\gcd(S) the fundamental frequency. Mathematically speaking, there is no difficulty: compute the Fourier transform of ff and check.

Refer to caption
Figure 1. Left: f(t)=0.8cos(2π6t)+1.4cos(2π9t)+0.9cos(2π33t)f(t)=0.8\cos{(2\pi 6t)}+1.4\cos{(2\pi 9t)}+0.9\cos{(2\pi 33t)} has gcd(S)=3\gcd(S)=3. Right: the spectrogram shows three distinct frequencies but not the fundamental frequency 3n.

There are several reasons why this seemingly trivial problem is difficult in practice.

  1. (1)

    In practice, all Fourier coefficients of ff will be distinct from 0; some form of thresholding to decide significance is required.

  2. (2)

    The actual fundamental frequency may not be constant, it may change from time to time, or it may not exist at all [1, 4, 10, 11, 13].

  3. (3)

    We only have ff sampled at finitely many points; moreover, in practice one should always expect some form of noise.

1.2. Motivation from medicine.

We describe an explicit problem coming from medical signal processing which was partially the motivation for our paper. Our example can be seen in Figure 2(a) and comes from a phonocardiogram (PCG) signal [2]: this is a sound signal created by the vibrations created by the closure of the heart valves. One fundamental component of the signal, one heartbeat, can be seen within each of the two red boxes: it is comprised of two ingredients, S1 and S2 (indicated by blue errors): S1 is due to the atrioventricular valves closing at the beginning of systole and S2 is the consequence of the aortic and pulmonary valves closing at the end of systole.

Refer to caption
Figure 2. Phonocardiogram (a) and Spectrogram (b): no clear fundamental frequency is visible. The heart rate of this subject is about 2Hz; that is, there are two cardiac cycles per second.

Due to the heart rate variability, the periods between two consecutive cycles are not fixed: an important but challenging problem is to estimate the heart rate at a local point in time, this amounts to estimating the time-varying period, or the instantaneous frequency. Naturally, one would be inclined to apply time-frequency (TF) analysis [3]. This signal oscillates roughly twice per second, so the frequency is around 2 Hz, and we would expect to see a dominant curve around 2 Hz that represents the fundamental component of the signal. However, we cannot see anything concrete around 2 Hz from the standard spectrogram shown in Figure 2(b) and, in this sense, the fundamental component of the signal does not exist. In short, we need a different solution if we want to estimate the time-varying heart rate.

1.3. The rectification trick.

There is a surprisingly simple and widely applied solution; that is, take the ‘rectification’ of the signal before running any sort of time-frequency analysis. Mathematically speaking, take the signal |f(t)|\left|f(t)\right| where f(t)f(t) is the signal of interest. We refer to Figure 3 where the signal from Figure 2 has been analyzed after rectification: we analyze |f(t)|\left|f(t)\right| instead of f(t)f(t).

Refer to caption
Figure 3. Rectified Phonocardiogram (a) and its Spectrogram (b): a fundamental frequency around 2 Hz starts emerging, indicated by red arrows.

Here, rectification leads to a dramatically different and much more informative spectrogram in which a clearly defined fundamental frequency exists (indicated by the red arrows). This simple solution has been widely applied in practice if we want to estimate the fundamental frequency, for example, in the problem of extracting fetal electrocardiogram (ECG) from the trans-abdominal maternal ECG [9] or extracting f-wave from the ECG signal of a patient with atrial fibrillation [6], which is a far from complete list. To the best of our knowledge, there is no theoretical argument explaining why the rectification trick works to enhance the fundamental component. We believe this to be of substantial theoretical interest.

Open problem. Why is standard time frequency analysis applied to |f(t)|\left|f(t)\right| better able to recover the fundamental underlying frequency? Why does the ‘rectification’ trick work?

Abstracting the trick, we may interpret the rectification trick as the application of a nonlinear activation function g(t)=|t|g(t)=|t|: instead of analyzing f(t)f(t), one analyzes g(f(t))g(f(t)). Naturally, there are many other choices of the activation function [7], like the commonly used Rectified Linear Unit (ReLU) function g(t)=ReLU(f(t))g(t)=\mbox{ReLU}(f(t)) and other widely applied activation functions from the theory of neural networks that are at our disposal. In practice, ReLU seems to work and work roughly as well as the absolute value. We do not know of any theoretical support for any of these functions and can now formulate the main question that motivates our paper.

Open problem. Which nonlinear activation function g:g:\mathbb{R}\rightarrow\mathbb{R} leads to the ‘best’ recovery of the fundamental frequency?

One of the main points of our paper is to discuss the problem from the point of view of classical Fourier analysis and to propose, based on that, a somewhat unorthodox choice, provide theoretical support and discuss its behavior in practice.

2. Main Result

2.1. An adaptive activation function.

We define, for ε>0\varepsilon>0 small, an adaptive activation function. Introducing, for 0<ε<10<\varepsilon<1, the function hε:[1,1]0h_{\varepsilon}:[-1,1]\rightarrow\mathbb{R}_{\geq 0}

hε(x)=11(1ε)|x|,h_{\varepsilon}(x)=\frac{1}{1-(1-\varepsilon)|x|},

we propose to run standard time frequency on the normalized signal

hε(f(t)fL)=11(1ε)|f(t)|fLh_{\varepsilon}\left(\frac{f(t)}{\|f\|_{L^{\infty}}}\right)=\frac{1}{1-(1-\varepsilon)\frac{|f(t)|}{\|f\|_{L^{\infty}}}}

and expect it to enhance the fundamental component of ff. This activation function is adaptive to the input signal since it depends on the LL^{\infty} norm of the input signal. Note that hεh_{\varepsilon} is also 2π2\pi-periodic. So by construction, it inherits the periodicity of ff. The question is now: can we determine the strength of the fundamental component of hεh_{\varepsilon}? To provide a theoretical analysis this question, we put some assumptions.

  1. (1)

    We assume that the function f:[0,2π]f:[0,2\pi]\rightarrow\mathbb{C} is of the type

    0f(t)=k=1nakeimktak,mk,0<m1<m2<<mn.0\neq f(t)=\sum_{k=1}^{n}a_{k}e^{im_{k}t}\quad a_{k}\in\mathbb{C},m_{k}\in\mathbb{N},~{}0<m_{1}<m_{2}<\dots<m_{n}.

    In particular, this function will have fundamental frequency gcd(m1,,mn)\gcd(m_{1},\dots,m_{n}).

  2. (2)

    We assume that the function g(t)=|f(t)|g(t)=\left|f(t)\right| assumes its global maximum in finitely many points {t1,,tm}[0,2π]\left\{t_{1},\dots,t_{m}\right\}\subset[0,2\pi] and

  3. (3)

    that these maxima are non-degenerate: g′′(ti)<0g^{\prime\prime}(t_{i})<0 for all 1im1\leq i\leq m.

We can now state our result which shows that, as ε0\varepsilon\rightarrow 0, the method will indeed recover the fundamental frequency (which we expect to be 1 in the generic setting).

Theorem.

We have, as ε0\varepsilon\rightarrow 0,

02πhε(f(t)fL)eit𝑑t=1εj=1mπeitjg′′(tj)2gL+𝒪(1ε1/4).\int_{0}^{2\pi}h_{\varepsilon}\left(\frac{f(t)}{\|f\|_{L^{\infty}}}\right)e^{it}dt=\frac{1}{\sqrt{\varepsilon}}\sum_{j=1}^{m}\frac{\pi\cdot e^{it_{j}}}{\sqrt{-\frac{g^{\prime\prime}(t_{j})}{2\|g\|_{L^{\infty}}}}}+\mathcal{O}\left(\frac{1}{\varepsilon^{1/4}}\right).

Remarks.

  1. (1)

    We note that for ‘generic’ signals (say, random signals picked from a suitable probability distribution), we expect that there is a unique maximum t1t_{1}: in that case, the leading order expansion is always guaranteed to be of order

    hε^(1)=1επeit1g′′(t1)2gL+𝒪(1ε1/4)\widehat{h_{\varepsilon}}(1)=\frac{1}{\sqrt{\varepsilon}}\frac{\pi\cdot e^{it_{1}}}{\sqrt{-\frac{g^{\prime\prime}(t_{1})}{2\|g\|_{L^{\infty}}}}}+\mathcal{O}\left(\frac{1}{\varepsilon^{1/4}}\right)

    and we are guaranteed that the fundamental frequency is recovered.

  2. (2)

    If there is more than one maximum, then we only fail at recovering the fundamental frequency if

    j=1meitjg′′(tj)=0.\sum_{j=1}^{m}\frac{e^{it_{j}}}{\sqrt{-g^{\prime\prime}(t_{j})}}=0.

    This is something that we do not expect to happen for generic signal (it corresponds to a precise algebraic identity).

  3. (3)

    If the frequencies share a nontrivial greatest common divisor

    G=gcd(m1,,mn)>1,G=\gcd(m_{1},\dots,m_{n})>1,

    then the maxima of g(t)g(t) will be GG-periodic on the unit circle and the sum will vanish. The Theorem can then be applied to the function g(t)=f(t/G)g(t)=f(t/G) to conclude that the fundamental frequency GG will be recovered.

2.2. General Activation Functions

The purpose of this section is to give some perspective on the problem from the point of view of classical Fourier analysis. We note that the intrinsically nonlinear nature of the problem does pose an interesting challenge, nonetheless, there are some natural considerations partially inspired by a classical paper of Rudin [8] that seem like they might be relevant. We recall that our goal is, given a function ff of the type

0f(t)=k=1nakeimktak,mk,0<m1<m2<<mn0\neq f(t)=\sum_{k=1}^{n}a_{k}e^{im_{k}t}\qquad a_{k}\in\mathbb{C},m_{k}\in\mathbb{N},0<m_{1}<m_{2}<\dots<m_{n}

to come with a nonlinear function g:g:\mathbb{R}\rightarrow\mathbb{R} such that g(f(t))g(f(t)) has a clear frequency contribution at gcd(m1,,mn)\gcd\left(m_{1},\dots,m_{n}\right). The main difficulty in practice is, of course, that we do not actually know the precise formula of ff when only ff is given, and have no real idea what the actual frequencies m1,,mnm_{1},\dots,m_{n} might be. We simplify things by assuming that the nonlinearity gg can be expanded into a Taylor series

g(x)=b0+b1x+b2x2+b3x3+.g(x)=b_{0}+b_{1}x+b_{2}x^{2}+b_{3}x^{3}+\dots.

Then each individual power applied to ff can be properly analyzed: note that

(k=1nakeimkt)k=(mi1++mir=ai1ai2air)eit.\left(\sum_{k=1}^{n}a_{k}e^{im_{k}t}\right)^{k}=\sum_{\ell\in\mathbb{Z}}\left(\sum_{m_{i_{1}}+\dots+m_{i_{r}}=\ell}a_{i_{1}}a_{i_{2}}\dots a_{i_{r}}\right)e^{i\ell t}.

This shows that large powers naturally lead to a function whose frequencies is comprised of sums of the individual frequencies. If we replace powers by powers of the absolute value, then at least for even powers the identity |x|2k=xkx¯k|x|^{2k}=x^{k}\overline{x}^{k} leads to a way of writing

X=|k=1nakeimkt|2kX=\left|\sum_{k=1}^{n}a_{k}e^{im_{k}t}\right|^{2k}

as

X=1,2(mi1++mir=1ai1air)(mi1++mir=2ai1¯air¯)ei(12)t.X=\sum_{\ell_{1},\ell_{2}\in\mathbb{Z}}\left(\sum_{\scriptsize m_{i_{1}}+\dots+m_{i_{r}}=\ell_{1}}a_{i_{1}}\dots a_{i_{r}}\right)\left(\sum_{m_{i_{1}}+\dots+m_{i_{r}}=\ell_{2}}\overline{a_{i_{1}}}\dots\overline{a_{i_{r}}}\right)e^{i(\ell_{1}-\ell_{2})t}.

There is a nontrivial contribution coming from the coefficients (ak)k(a_{k})_{k\in\mathbb{Z}}, however, if we only consider the support of the frequencies, then a rather clear picture emerges: denoting the set of frequencies that make up ff by

M={m1,,mn},M=\left\{m_{1},\dots,m_{n}\right\},

we expect that

supp|f|2k=(M+M++M)ktimes(M+M++M)ktimes.\mbox{supp}~{}\mathcal{F}|f|^{2k}=\underbrace{(M+M+\dots+M)}_{k~{}\mbox{\tiny times}}-\underbrace{(M+M+\dots+M)}_{k~{}\mbox{\tiny times}}.

However, we also expect, for kk sufficiently large, that the numbers that can be appropriately written as sums and differences of elements in MnM_{n} have a nice limiting behavior and (for a suitabye lenient definition of lim\lim)

limk(M+M++M)ktimes(M+M++M)ktimes=gcd(m1,,mn).\lim_{k\rightarrow\infty}\quad\underbrace{(M+M+\dots+M)}_{k~{}\mbox{\tiny times}}-\underbrace{(M+M+\dots+M)}_{k~{}\mbox{\tiny times}}=\gcd(m_{1},\dots,m_{n})\cdot\mathbb{Z}.

These heuristic considerations suggest the following rough guidelines:

  1. (1)

    The Fourier transform of g(f(t))g(f(t)) depends on the Taylor expansion of gg.

  2. (2)

    If the function gg is smooth, then the Taylor expansion will have rapidly decaying coefficients and sum sets of the type

    (M+M++M)ktimesonly contribute for smallk.\underbrace{(M+M+\dots+M)}_{k~{}\mbox{\tiny times}}\quad\mbox{only contribute for small}~{}k.

    A choice like g(x)=cos(x)g(x)=\cos{(x)} will not lead to a good detection function.

  3. (3)

    This suggests functions gg for which the Taylor expansion decays slowly: functions that are discontinuous or have discontinuous first derivative.

  4. (4)

    Moreover, introducing the absolute value g(|x|)g(|x|) leads to sum-difference sets which might generally lead to better results.

These principles naturally suggest functions like g(x)=|x|g(x)=|x| or g(x)=ReLu(x)g(x)=\mbox{ReLu}(x). However, we note that both of these functions are actually continuous which leads to a decay of coefficients in the Taylor expansion. Indeed, considering all these principles, we should focus our attention on functions of the form

g(x)=b0+b1|x|+b2|x|2+b3|x|3+g(x)=b_{0}+b_{1}|x|+b_{2}|x|^{2}+b_{3}|x|^{3}+\dots

for which the (bj)j=0(b_{j})_{j=0}^{\infty} decay slowly. A particularly natural candidate is

11|x|=1+|x|+|x|2+\frac{1}{1-|x|}=1+|x|+|x|^{2}+\dots

and this is how we arrive at our proposed construction. As we will show in Section 3, this construction is not only well motivated by the aforementioned arguments but also naturally incorporates a second approach which is related to asymptotic analysis and encapsulated by Theorem 1.

3. Proof of the Theorem

Proof.

We now give a proof of the result. It follows from the assumptions that each local maximum of gg behaves locally around the maximum like a parabola (since g′′g^{\prime\prime} does not vanish and is negative). This allows us to treat all the points where ff assumes a maximum in isolation. We start by considering a point t0t_{0} such that g(t0)gLg(t_{0})\neq\|g\|_{L^{\infty}}. In that case, we observe that

hε(t0)=111εfL|f(t0)|11|f(t0)|fLfLfL|f(t0)|.h_{\varepsilon}(t_{0})=\frac{1}{1-\frac{1-\varepsilon}{\|f\|_{L^{\infty}}}|f(t_{0})|}\leq\frac{1}{1-\frac{|f(t_{0})|}{\|f\|_{L^{\infty}}}}\leq\frac{\|f\|_{L^{\infty}}}{\|f\|_{L^{\infty}}-|f(t_{0})|}.

This quantity is uniformly bounded as ε0\varepsilon\rightarrow 0. Since our main result is about the asymptotic growth of an integral over a bounded region as ε0\varepsilon\rightarrow 0 and since we are only interested in terms at scale ε1/2\sim\varepsilon^{-1/2} or larger, we may disregard points where hε(t)h_{\varepsilon}(t) remains bounded as ε0\varepsilon\rightarrow 0. We can now proceed as follows: since there are only finitely many points t1,t2,,tmt_{1},t_{2},\dots,t_{m} that achieve the global maxima of gg and since all of them are non-degenerate, there exists δ>0\delta>0 such that the preimage

g1[fLδ,fL]=j=1mIjg^{-1}\left[\|f\|_{L^{\infty}}-\delta,\|f\|_{L^{\infty}}\right]=\bigcup_{j=1}^{m}I_{j}

can be written as the union of mm disjoint intervals. By making δ\delta sufficiently small, we can also ensure that each of the intervals contains a point in which |f(t)||f(t)| assumes its global maximum. Moreover, we can also infer that, as δ0\delta\rightarrow 0, these intervals scale asymptotically like |Ij|δ1/2|I_{j}|\sim\delta^{-1/2}, where the implicit constant depends on the value of the second derivative in tjt_{j}.

fL\|f\|_{L^{\infty}}fLδ\|f\|_{L^{\infty}}-\deltaI1I_{1}I2I_{2}I3I_{3}
Figure 4. A sketch of the decomposition.

Then, uniformly as ε0\varepsilon\rightarrow 0, we have

02πhε(t)1{|f|<fLδ}(t)𝑑t02πfLfLg(t)1{|f|<fLδ}(t)𝑑t\displaystyle\int_{0}^{2\pi}h_{\varepsilon}(t)1_{\left\{|f|<\|f\|_{L^{\infty}}-\delta\right\}}(t)dt\leq\int_{0}^{2\pi}\frac{\|f\|_{L^{\infty}}}{\|f\|_{L^{\infty}}-g(t)}1_{\left\{|f|<\|f\|_{L^{\infty}}-\delta\right\}}(t)dt
\displaystyle\leq 02πfLδ1{|f|<fLδ}(t)𝑑t2πfLδ.\displaystyle\,\int_{0}^{2\pi}\frac{\|f\|_{L^{\infty}}}{\delta}1_{\left\{|f|<\|f\|_{L^{\infty}}-\delta\right\}}(t)dt\leq\frac{2\pi\|f\|_{L^{\infty}}}{\delta}\,.

Note that 2πfL/δ2\pi\|f\|_{L^{\infty}}/\delta is independent of ϵ\epsilon. This shows that contributions at scale ε1/2\varepsilon^{-1/2} can only come from I1,I2,,ImI_{1},I_{2},\dots,I_{m}. Let us now assume that gg assumes a global maximum in t1I1t_{1}\in I_{1}. Then, locally around t1t_{1}, we have the expansion

g(t)=gL+g′′(t1)2(tt1)2+𝒪((tt1)3).g(t)=\|g\|_{L^{\infty}}+\frac{g^{\prime\prime}(t_{1})}{2}(t-t_{1})^{2}+\mathcal{O}((t-t_{1})^{3}).

Therefore, we have, locally around t1t_{1},

hε(t)\displaystyle h_{\varepsilon}(t) =11(1ε)(1+g′′(t1)2gL(tt1)2+𝒪((tt1)3))\displaystyle=\frac{1}{1-(1-\varepsilon)\left(1+\frac{g^{\prime\prime}(t_{1})}{2\|g\|_{L^{\infty}}}(t-t_{1})^{2}+\mathcal{O}((t-t_{1})^{3})\right)}
(2) =1ε(1ε)g′′(t1)2gL(tt1)2+𝒪((tt1)3).\displaystyle=\frac{1}{\varepsilon-(1-\varepsilon)\frac{g^{\prime\prime}(t_{1})}{2\|g\|_{L^{\infty}}}(t-t_{1})^{2}+\mathcal{O}((t-t_{1})^{3})}.

Note that g′′<0g^{\prime\prime}<0. Thus, the denominator is, up to second order, growing away from t1t_{1}. Our next ingredient will be the identity

(3) A,B,C>0|t|>C1A+Bt2𝑑t=π2arctan(BAC)AB\displaystyle\forall~{}A,B,C>0\qquad\quad\int_{|t|>C}\frac{1}{A+Bt^{2}}dt=\frac{\pi-2\arctan\left(\frac{\sqrt{B}}{\sqrt{A}}C\right)}{\sqrt{AB}}

In particular, letting C0C\rightarrow 0, we recover

A,B>01A+Bt2𝑑t=πAB.\forall~{}A,B>0\qquad\quad\int_{\mathbb{R}}\frac{1}{A+Bt^{2}}dt=\frac{\pi}{\sqrt{AB}}.

In our case, we have A=ε>0A=\varepsilon>0, while

B=(1ε)g′′(t1)2gL>0B=-(1-\varepsilon)\frac{g^{\prime\prime}(t_{1})}{2\|g\|_{L^{\infty}}}>0

converges to a fixed constant as ε\varepsilon gets small (and is, in particular, bounded away from 0). This implies

ε1/4ε1/41ε+Bt2𝑑t\displaystyle\int_{-\varepsilon^{1/4}}^{\varepsilon^{1/4}}\frac{1}{\varepsilon+Bt^{2}}dt =1ε+Bt2𝑑t|t|ε1/41ε+Bt2𝑑t\displaystyle=\int_{-\infty}^{\infty}\frac{1}{\varepsilon+Bt^{2}}dt-\int_{|t|\geq\varepsilon^{1/4}}\frac{1}{\varepsilon+Bt^{2}}dt
=πεB+𝒪(B1/2ε1/4)\displaystyle=\frac{\pi}{\sqrt{\varepsilon B}}+\mathcal{O}(B^{-1/2}\varepsilon^{-1/4})

when ε\varepsilon is sufficiently small. The second equality comes from (3). Indeed, by choosing C=ε1/4C=\varepsilon^{1/4}, we see that the term B/AC\sqrt{B}/\sqrt{A}C is of order ε1/4\varepsilon^{-1/4}, and hence

π2arctan(BAC)ABC1\pi-2\arctan\left(\frac{\sqrt{B}}{\sqrt{A}}C\right)\lesssim\frac{\sqrt{A}}{\sqrt{B}}C^{-1}

whenever ε\varepsilon is sufficiently small. Combined with the numerator, we obtain the desired control of |t|ε1/41/(ε+Bt2)𝑑t\int_{|t|\geq\varepsilon^{1/4}}1/(\varepsilon+Bt^{2})dt. From this control, we deduce

I1hε(t)𝑑t=πε1g′′(t1)2gL+𝒪(1ε1/4)\int_{I_{1}}h_{\varepsilon}(t)dt=\frac{\pi}{\sqrt{\varepsilon}}\frac{1}{\sqrt{-\frac{g^{\prime\prime}(t_{1})}{2\|g\|_{L^{\infty}}}}}+\mathcal{O}\left(\frac{1}{\varepsilon^{1/4}}\right)

when ε\varepsilon is sufficiently small. Indeed, since B1B\sim 1, the error, g′′(t1)/(2gL)+𝒪(tt1)g^{\prime\prime}(t_{1})/(2\|g\|_{L^{\infty}})+\mathcal{O}(t-t_{1}) in (2), is controlled when ε\varepsilon is sufficiently small. Using continuity of eite^{it} and summing over the mm intervals, we arrive our conclusion

02πhε(t)eit𝑑t=πεj=1meitjg′′(tj)2gL+𝒪(1ε1/4).\int_{0}^{2\pi}h_{\varepsilon}(t)e^{it}dt=\frac{\pi}{\sqrt{\varepsilon}}\sum_{j=1}^{m}\frac{e^{it_{j}}}{\sqrt{-\frac{g^{\prime\prime}(t_{j})}{2\|g\|_{L^{\infty}}}}}+\mathcal{O}\left(\frac{1}{\varepsilon^{1/4}}\right).

4. Numerical results

All Matlab codes are available in for the reproducibility purposes.

4.1. Synthetic data.

We generate synthetic data to investigate how different activation functions enhance the fundamental component. Fix the sampling rate to be 512512 Hz. First, randomly select an integer KK between 55 and 100100 according to a uniform distribution. Then, randomly select KK integers between 22 and 250250 following the probability density function Ce(l/100)2Ce^{-(l/100)^{2}}, where CC is the normalization constant, so that the chosen integers have 1 as the greatest common divisor. Denote the selected integers as {jk}k=1K\{j_{k}\}_{k=1}^{K}. Then, set the function f(t)f(t) as

f(t)=k=1Kakcos(2πjkt+ϕk),f(t)=\sum_{k=1}^{K}a_{k}\cos(2\pi j_{k}t+\phi_{k})\,,

where aka_{k}, k=1,,Kk=1,\ldots,K, are identically and independently (i.i.d.) sampled from (0,1](0,1], and ϕk\phi_{k}, k=1,,Kk=1,\ldots,K, are i.i.d. sampled from (0,2π](0,2\pi]. Note that by construction, f(t)f(t) is a toy example as that shown in (1) with f^(1)=0\hat{f}(1)=0 but 11 as the fundamental frequency. For a given activation function g(t)g(t), we evaluate the resulting fundamental component enhancement by evaluating the fundamental component energy ratio, defined as

(4) |gf^(1)|2l=1256|gf^(l)|2.\frac{|\widehat{g\circ f}(1)|^{2}}{\sum_{l=1}^{256}|\widehat{g\circ f}(l)|^{2}}\,.

Then, repeat the above procedure for 10510^{5} times. The results of different activation functions are shown in Figure 5 as histograms of fundamental component energy ratios. Quantitatively, the median and median absolute deviation of the fundamental component energy ratio over 10510^{5} times are 0.28% and 0.46% for the rectification, 0.07% and 0.12% for the RELU, 0.29% and 0.46% for h0.2h_{0.2}, 0.31% and 0.42% for h0.1h_{0.1}, and 0.33% and 0.33% for h0.05h_{0.05}.

Refer to caption
Figure 5. Effect of different activation functions on fundamental component enhancement. The histograms of the fundamental component enhancement by 5 different activation functions, including |||\cdot|, RELU, h0.2h_{0.2}, h0.1h_{0.1} and h0.05h_{0.05}, are shown from left to right.

4.2. Semi-real Medical Example.

We generate a semi-real dataset and compare the proposed activation function with ϵ=0.2\epsilon=0.2, ϵ=0.1\epsilon=0.1 and ϵ=0.05\epsilon=0.05 with the usually applied rectification and RELU activation function. We consider two databases. The first one is the noninvasive trans-abdominal ECG database from the PhysioNet Computing in Cardiology Challenge 2013 (CinC2013)111https://physionet.org/content/challenge-2013/1.0.0/. There are in total 75 recording, each recording contains 4 channels, and each record lasts for 1 minute. The second one is the Taiwan Integrated Database for Intelligent Sleep (TIDIS)222https://tidis.org. There are in total 20 whole night polysomnogram recordings, each recording contains 2 channels, and each record lasts for about 6 hours. For each recording in CinC2013, we detect the maternal R peaks by applying the standard R peak detection algorithm to the mean of 4 channels; for each recording in TIDIS, we apply the same standard R peak detection algorithm to the mean of two channels. For the kk-th channel in the ii-th recording, we randomly generate an 1-periodic function, denoted as sik(0)(t)s_{ik}^{(0)}(t), by averaging 1010 randomly selected cardiac cycles. Then we intentionally remove the fundamental component via the Fourier transform, and denote the resulting 1-periodic function as sik(t)s_{ik}(t) that is sampled at 512 Hz. Next, we apply the jj-th activation function to sik(t)s_{ik}(t), and denote the resulting signal as gik(j)(t)g^{(j)}_{ik}(t). To evaluate how much the fundamental is activated, we record the energy ratio of the fundamental component defined in (4), denoted as rik(j)r^{(j)}_{ik}. For the kk-th channel in the ii-th recording, the above procedure is repeated 100100 times. The histograms of all collected energy ratios for different activation functions are shown in Figure 6. We see that the proposed activation function gives a delta-like signal with the peak centered at the maximal value. Hence, the power spectrum is flatter compared with both rectification and RELU. This explains why the median of the energy ratio is smaller if we apply the proposed activation function since the proposed activation function tends to flatten the spectrum, particularly when ϵ\epsilon is small. We shall mention that the main difference between these two databases is that the cardiac cycle is recorded from the abdomen in the CinC2013 database, and from the chest in the TIDIS database. Thus the cardiac cycles are of different morphology, and hence different energy ratio of the fundamental component. Note that after the averaging, the fetal ECG impact is reduced, and we obtain a reasonably clean maternal ECG cycles. See Figure 7 for more details.

Refer to caption
Figure 6. Illustration of the effect of different activation functions on the fundamental component enhancement. The first row shows one cardiac cycle (left two subplots) and the simulated cardiac cycle without the fundamental component (right two subplots) from the CinC2013 database. The second row shows the resulting signal after applying 5 different activation functions, including |||\cdot|, RELU, h0.2h_{0.2}, h0.1h_{0.1} and h0.05h_{0.05} from left to right. The third row shows the associated power spectra (PS) of those 1-periodic functions shown on the second row. The fourth row shows the histograms of energy ratio of the fundamental component after applying different activation functions, where the x-axis is the log of the energy ratio. The (medians, median absolute deviation) of the energy ratios are (11.7%, 7%), (2.9%, 5.2%), (4.4%, 3.9%), (2.2%, 3.3%) and (9.8%, 10.8%) respectively. a.u. means arbitrary unit.

To have a complete picture of the fundamental enhancement, we repeat the same data preparation procedure above, except removing the fundamental component; that is, we simulate the practical situation that the fundamental component may or may not exist, and its strength may or may not be weak. In addition to record the energy ratio, also denoted as rik(j)r^{(j)}_{ik} by a bit abuse of notation, after applying the activation function, we also record the energy ratio before, denoted as rik(0)r^{(0)}_{ik}. In Figure LABEL:Figure6simu, we plot the 2-dim histogram of the distribution of {(log(rik(0)),log(rik(j))/log(rik(0)))}\{(\log(r^{(0)}_{ik}),\log(r^{(j)}_{ik})/\log(r^{(0)}_{ik}))\}. By definition, when log(rik(j))/log(rik(0)))>1\log(r^{(j)}_{ik})/\log(r^{(0)}_{ik}))>1, we obtain an enhancement of the energy ratio of the fundamental component. The portion of cardiac cycles that the fundamental component strength is enhanced is recorded in the title. It is clear that all activation functions enhance the energy ratio of the fundamental component when it is weak.

Note that at the first glance, our proposed activation function seems to provide a worse result due to the smaller portion of enhanced cardiac cycles. However, we should emphasize that due to the “flattening” nature of our activation function, the energy ratio of the fundamental component is smaller.

Refer to caption
Refer to caption
Figure 7. Top: energy ratio of the fundamental component, rik(0)r^{(0)}_{ik} of all cardiac cycles. The first part marked by the red box comes from the TIDIS database, and the second part marked by the blue box comes from the CinC2013 database. Bottom: two-dimensional histograms of energy ratio of the fundamental component after applying different activation functions, |||\cdot|, RELU, h0.2h_{0.2}, h0.1h_{0.1} and h0.05h_{0.05} from left to right, where the x-axis is log(rik(0))\log(r^{(0)}_{ik}), and the y-axis is log(rik(j))/log(rik(0)))\log(r^{(j)}_{ik})/\log(r^{(0)}_{ik})). The portion of cardiac cycles that are enhanced by the nonlinear activation function is listed in the title.

4.3. Medical Example.

Next, we consider the PCG signal as a real example. In Figure 8, we show the spectrogram of the PCG signal shown in shown in Figure LABEL:Figure3PCG(a) composed with RELU (left) and h0.1h_{0.1}. We could see that both RELU and h0.1h_{0.1} provide the fundamental component information; that is, there is an identifiable curve around 2Hz. To have a quantification of this finding, we analyze the maternal PCG signals in the Shiraz University Fetal Heart Sounds Database333https://physionet.org/content/sufhsdb/1.0.1/. There are in total 92 recordings. See Figure 9(a) for a PCG signal different from that in Figure 2(a). It is clear that the spectrogram provides limited information about the heart rate. See Figure 9(b) and recall Figure 2(b) for an illustration. To quantify the fundamental component enhancement performance, we carry out the following steps. First, run the de-shape algorithm [5] to determine the instantaneous frequency (IF) of the PCG. See Figure 9(c-d) for the de-shape spectrogram and the detected IF. Denote the estimated IF as ϕ(t)\phi^{\prime}(t). The detected heart rate is confirmed by reading and hearing the signal. Second, determine the energy ratio around the band [ϕ(t)0.2,ϕ(t)+0.2][\phi^{\prime}(t)-0.2,\phi^{\prime}(t)+0.2] Hz; that is,

R:=0Tϕ(t)0.2ϕ(t)+0.2|V(t,ξ)|2𝑑ξ0T1/TU|V(t,ξ)|2𝑑ξ,R:=\frac{\int_{0}^{T}\int_{\phi^{\prime}(t)-0.2}^{\phi^{\prime}(t)+0.2}|V(t,\xi)|^{2}d\xi}{\int_{0}^{T}\int_{1/T}^{U}|V(t,\xi)|^{2}d\xi}\,,

where |V|2|V|^{2} is the spectrogram of the signal of length TT with the sampling rate 1/U1/U. Here, since our focus is the heart rate, we downsample the signal to U=100U=100 Hz. Over the 92 recordings, the mean and standard deviation of RR determined from the original signal, the rectified signal, the RELU activated signal and h0.1h_{0.1} are 0.03%±0.01%0.03\%\pm 0.01\%, 2.33%±0.56%2.33\%\pm 0.56\%, 1.64%±0.41%1.64\%\pm 0.41\%, and 1.25%±0.28%1.25\%\pm 0.28\%. See Figure 9 for an example of the overall procedure.

Refer to caption
Figure 8. The spectrogram of the PCG signal composed with RELU (left) and h0.01h_{0.01}, where we use the same window and the same dynamic range as those shown in Figure 2(b).
Refer to caption
Figure 9. (a) a PCG signal f(t)f(t). (b) the spectrogram of f(t)f(t), where we show only the frequency range from 0 Hz to 5 Hz. (c) the de-shape spectrogram of |f(t)||f(t)|, where we show only the frequency range from 0 Hz to 5 Hz. (d) the de-shape spectrogram superimposed with a red curve representing the detected instantaneous frequency. (e)-(g) are spectrograms associated with |f(t)||f(t)|, RELU(f(t))\texttt{RELU}(f(t)) and h0.1h_{0.1}. In (e), two red curves are superimposed representing the spectral band that we use to calculate the energy ratio of the fundamental component associated with the detected instantaneous frequency. In all time-frequency representations, the dynamic range is set to be the 0% and 99.95% percentiles of all entries.

References

  • [1] Marcelo A. Colominas, and Hau-Tieng Wu. Decomposing Non-Stationary Signals With Time-Varying Wave-Shape Functions. IEEE Transactions on Signal Processing 69 (2021): 5094-5104.
  • [2] Louis-Gilles Durand, and Philippe Pibarot. Most recent advancements in digital signal processing of the phonocardiogram. Critical Reviews in Biomedical Engineering 45 (2017): 1-6.
  • [3] Patrick Flandrin. Time-frequency/time-scale analysis. Academic press, 1998.
  • [4] Thomas Y. Hou, and Zuoqiang Shi. Extracting a shape function for a signal with intra-wave frequency modulation. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374.2065 (2016): 20150194.
  • [5] Chen-Yun Lin, Li Su, and Hau-Tieng Wu. Wave-shape function analysis. Journal of Fourier Analysis and Applications 24.2 (2018): 451-505.
  • [6] John Malik, Neil Reed, Chun-Li Wang, and Hau-Tieng Wu. Single-lead f-wave extraction using diffusion geometry. Physiological measurement 38:7 (2017): 1310.
  • [7] Andrinandrasana David Rasamoelina, Fouzia Adjailia, and Peter Sincak. A review of activation function for artificial neural network. 2020 IEEE 18th World Symposium on Applied Machine Intelligence and Informatics (SAMI). IEEE, 2020.
  • [8] Walter Rudin, Trigonometric series with gaps. J. Math. Mech. 9 (1960): 203–227.
  • [9] Pei-Chun Su, Stephen Miller, Salim Idriss, Pier Barker and Hau-Tieng Wu, Recovery of the fetal electrocardiogram for morphological analysis from two trans-abdominal channels via optimal shrinkage. Physiological measurement 40.11 (2019): 115005.
  • [10] Adrien Meynard, and Bruno Torrésani. Spectral analysis for nonstationary audio. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.12 (2018): 2371-2380.
  • [11] Hau-Tieng Wu. Instantaneous frequency and wave shape functions (I). Applied and Computational Harmonic Analysis 35.2 (2013): 181-199.
  • [12] Jieren Xu, Haizhao Yang, and Ingrid Daubechies. Recursive diffeomorphism-based regression for shape functions. SIAM Journal on Mathematical Analysis 50.1 (2018): 5-32.
  • [13] Haizhao Yang. Multiresolution mode decomposition for adaptive time series analysis. Applied and Computational Harmonic Analysis (2019).