This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newfloatcommand

capbtabboxtable[][\FBwidth]

Analog Gated Recurrent Neural Network for Detecting Chewing Events

Kofi Odame, Maria Nyamukuru, Mohsen Shahghasemi, Shengjie Bi, David Kotz
Abstract

We present a novel gated recurrent neural network to detect when a person is chewing on food. We implemented the neural network as a custom analog integrated circuit in a 0.18 μ\mum CMOS technology. The neural network was trained on 6.46.4 hours of data collected from a contact microphone that was mounted on volunteers’ mastoid bones. When tested on 1.61.6 hours of previously-unseen data, the neural network identified chewing events at a 2424-second time resolution. It achieved a recall of 91%91\% and an F1-score of 94%94\% while consuming 1.1μ1.1~{}\muW of power. A system for detecting whole eating episodes—like meals and snacks—that is based on the novel analog neural network consumes an estimated 18.8μ18.8~{}\muW of power.

Index Terms:
Eating detection, wearable devices, analog LSTM, neural networks.

I Introduction

Monitoring food intake and eating habits are important for managing and understanding obesity, diabetes and eating disorders [1, 2, 3]. Because self-reporting is unreliable, many wearable devices have been proposed to automatically monitor and record individuals’ dietary habits [4, 5, 6]. The challenge is that if these devices are too bulky (generally due to a large battery), or if they require frequent charging, then they intrude on the user’s normal daily activities and are thus prone to poor user adherence and acceptance [7, 8, 9, 10].

We recently addressed this problem with a long short-term memory (LSTM) neural network for eating detection that is implementable on a low-power microcontroller [11, 12]. However, our previous approach relied on a power-consumptive analog-to-digital converter (ADC). It also required the microcontroller unit (MCU) to unnecessarily spend power to process irrelevant (i.e. non-eating related) data.

Analog LSTM neural networks have been proposed as a way to eliminate the ADC and also to minimize the microcontroller’s processing of irrelevant data. Unfortunately, the state-of-the-art analog LSTMs [13, 14, 15, 16, 17] are implemented with operational amplifiers (opamps), current/voltage converters, Hadamard multiplications and internal ADCs and digital-to-analog converters (DACs). These peripheral components represent a significant amount of overhead cost in terms of power consumption, which diminishes the benefits of an analog LSTM (see Table I).

In this paper, we introduce a power-efficient analog neural network that contains no DACs, ADCs, opamps or Hadamard multiplications. Our novel approach is based on a current-mode adaptive filter, and it eliminates over 90%90\% of the power requirements of a more conventional solution.

Refer to caption
Figure 1: Block diagram of proposed eating detection system. From the contact microphone output, the ZCR and RMS blocks extract features based on zero-crossing rate and root-mean-square. The analog neural network (labelled “AFUA”) processes these features and produces a one-hot encoded output that predicts the presence or absence of a chewing event. The microcontroller (“μ\muC”) merges and filters the individual chewing events into whole eating episodes. The analog signal processing chain up to the AFUA block consumes 1.8μ~{}\muW of power. The microcontroller is active only 9%9~{}\% of the time, during which it consumes 180180 μ\muW of power.

II Eating Detection System

Figure 1 shows our proposed Adaptive Filter Unit for Analog (AFUA) long short-term memory as part of a signal processing system for detecting eating episodes. The input to the system is produced by a contact microphone that is mounted on the user’s mastoid bone. Features are extracted from the contact microphone signal and input to the AFUA neural network, which infers whether or not the user is chewing. The AFUA’s output is a one-hot encoding ((2,02,0)=chewing; (0,20,2)=not chewing) of the predicted class label. Finally, a microcontroller processes the predicted class labels and groups the chewing events into discrete eating episodes, like a meal, or a snack [4, 5]. Following is a detailed description of the feature extraction and neural network components of the system.

Circuit Neuron Type m×nm\times n Power Consumption Overhead (%)
ADC DAC Buffer Opamp, V/I Total
This work AFUA 10×1610\times 16 0 0 3 0 3
[18] GRU 10×1610\times 16 0 0 32 0 32
[16] LSTM 128×128128\times 128 12 25 1 30 68
[17] LSTM 16×1616\times 16 3 17 8 1 29
TABLE I: Compared to other analog LSTM circuits, AFUA has the fewest peripheral components and hence the lowest overhead cost (see Section IV-A derivation). mm and nn are number of hidden states and inputs, respectively. Note: for a given circuit, the larger the m×nm\times n product, the smaller the overhead. For fair comparison, we report AFUA overhead cost for m×n=10×16m\times n=10\times 16.

II-A Feature Extraction

As demonstrated in Fig. 2, chewing is characterized by quasi-periodic bursts of large amplitude, low frequency signals that can be measured by a contact microphone or accelerometer that is mounted on the head [11, 5]. We can use the root mean square (RMS) and the zero-crossing rate (ZCR) to capture the signal’s amplitude and frequency, respectively. A second ZCR operation applied to the RMS and the initial ZCR will produce information about the signal’s periodicity. We implement the ZCR and RMS blocks based on the well-known rectifying current mirror. The details of the ZCR and RMS design may be found in [19, 20].

Refer to caption
Figure 2: Typical time series data for chewing and talking events. Top panel: data from contact microphone shows that chewing (time <0<0 s) is characterized by quasi-periodic bursts. No quasi-periodicity is observed during talking (time 0\geq 0 s). Bottom panel: duration between signal bursts (“Tperiod”). For the chewing event (time <0<0 s), Tperiod is relatively constant. In contrast, Tperiod varies widely during the talking event.

II-B AFUA Neural Network

Fundamentally, an LSTM is a neuron that selectively retains, updates or erases its memory of input data [21]. The gated recurrent unit (GRU) is a simplified version of the classical LSTM, and it is described with the following set of equations [22]:

rj\displaystyle r_{j} =\displaystyle= σ([𝐖r𝐱]j+[𝐔r𝐡t1]j)\displaystyle\sigma([\mathbf{W}_{r}\mathbf{x}]_{j}+[\mathbf{U}_{r}\mathbf{h}_{\langle t-1\rangle}]_{j}) (1)
zj\displaystyle z_{j} =\displaystyle= σ([𝐖z𝐱]j+[𝐔z𝐡t1]j)\displaystyle\sigma([\mathbf{W}_{z}\mathbf{x}]_{j}+[\mathbf{U}_{z}\mathbf{h}_{\langle t-1\rangle}]_{j}) (2)
h~jt\displaystyle\tilde{h}^{\langle t\rangle}_{j} =\displaystyle= tanh([𝐖𝐱]j+[𝐔(𝐫𝐡t1)]j)\displaystyle\tanh([\mathbf{W}\mathbf{x}]_{j}+[\mathbf{U}(\mathbf{r}\odot\mathbf{h}_{\langle t-1\rangle})]_{j}) (3)
hjt\displaystyle h^{\langle t\rangle}_{j} =\displaystyle= zjhjt1+(1zj)h~jt,\displaystyle z_{j}h^{\langle t-1\rangle}_{j}+(1-z_{j})\tilde{h}^{\langle t\rangle}_{j}, (4)

where 𝐱\mathbf{x} is the input, hjh_{j} is the hidden state, h~j\tilde{h}_{j} is the candidate state, rjr_{j} is the reset gate and zjz_{j} is the update gate. Also, 𝐖\mathbf{W}_{*} and 𝐔\mathbf{U}_{*} are learnable weight matrices.

To implement the GRU in an efficient analog integrated circuit that contains no DACs, ADcs, operational amplifiers or multipliers, we can transform Eqn. (1)-(4) as follows. The σ\sigma function of Eqn. (2) gives zjz_{j} a range of (0,1)(0,1), and the extrema of this range reveals the basic mechanism of the update equation, Eqn. (4). For zj=0z_{j}=0, the update equation is hjt=h~jth^{\langle t\rangle}_{j}=\tilde{h}^{\langle t\rangle}_{j}. For zj=1z_{j}=1, the update equation becomes hjt=hjt1h^{\langle t\rangle}_{j}=h^{\langle t-1\rangle}_{j}. Without loss of generality, we can replace (1zj)(1-z_{j}) with zjz_{j} (this merely inverts the logic of the update gate, and inverts the sign of the 𝐖z\mathbf{W}_{z} and 𝐔z\mathbf{U}_{z} weight matrices). So, replacing (1zj)(1-z_{j}) and rearranging the update equation gives us

(hjthjt1)/zj+hjt1=h~jt,\left(h^{\langle t\rangle}_{j}-h^{\langle t-1\rangle}_{j}\right)/z_{j}+h^{\langle t-1\rangle}_{j}=\tilde{h}^{\langle t\rangle}_{j}, (5)

which is simply a first-order low pass filter with a continuous-time form of

τzj(t)dhjdt+hj(t)=h~j(t),\frac{\tau}{z_{j}(t)}\frac{dh_{j}}{dt}+h_{j}(t)=\tilde{h}_{j}(t), (6)

where τ=ΔT\tau=\Delta T, the time step of the discrete-time system. The gating mechanics of the continuous- versus discrete-time update equations are equivalent, modulo the inverted logic: For zj(t)=0z_{j}(t)=0, Eqn. (6) is a low-pass filter with an infinitely large time constant, and hj(t)h_{j}(t) does not change (this is equivalent to hjt=hjt1h^{\langle t\rangle}_{j}=h^{\langle t-1\rangle}_{j} in discrete time). For zj(t)=1z_{j}(t)=1, Eqn. (6) is a low-pass filter with a time constant of τ=ΔT\tau=\Delta T. Since the ΔT\Delta T time step is small relative to the GRU’s dynamics, a time constant of τ=ΔT\tau=\Delta T produces hj(t)h~j(t)h_{j}(t)\approx\tilde{h}_{j}(t) (equivalent to hjt=h~jth^{\langle t\rangle}_{j}=\tilde{h}^{\langle t\rangle}_{j} in discrete time).

Various studies have found the reset gate unnecessary with slow-changing signals, and also for event detection [12]. Both these scenarios describe our eating detection application, so we can discard the reset gate.

Finally, if we translate the origins [23] of both hj(t)h_{j}(t) and h~j(t)\tilde{h}_{j}(t) to 11, then we can replace the tanhtanh with a saturating function that has a range of (0,2)(0,2). Such a saturating function can easily be implemented in analog circuitry, by taking advantage of the unidirectional nature of a transistor’s drain-source current. We replace both the tanhtanh and the σ\sigma with the following saturating function,

f(y)=max(y,0)21+max(y,0)2,f(y)=\frac{{\rm max}(y,0)^{2}}{1+{\rm max}(y,0)^{2}}, (7)

translate the origin and discard the reset gate to arrive at the Adaptive Filter Unit for Analog LSTM (AFUA):

zj\displaystyle z_{j} =\displaystyle= f([𝐖𝐳𝐱]j+[𝐔𝐳(𝐡𝟏)+𝐛𝐳]j)\displaystyle f([\mathbf{W_{z}x}]_{j}+[\mathbf{U_{z}(h-1)+b_{z}}]_{j}) (8)
h~j\displaystyle\tilde{h}_{j} =\displaystyle= f([𝐖𝐱]j+[𝐔(𝐡𝟏)+𝐛]j)\displaystyle f([\mathbf{Wx}]_{j}+[\mathbf{U(h-1)+b}]_{j}) (9)
τzjdhjdt\displaystyle\frac{\tau}{z_{j}}\frac{dh_{j}}{dt} =\displaystyle= 2h~jhj,\displaystyle 2\tilde{h}_{j}-h_{j}, (10)

where []j[\cdot]_{j} is the j’th element of the vector. Also, 𝐱\mathbf{x} is the input, hjh_{j} is the hidden state and h~j\tilde{h}_{j} is the candidate state. The variable τ\tau is the nominal time constant, while zjz_{j} controls the state update rate in Eqn. (10). 𝐖𝐳\mathbf{W_{z}}, 𝐔𝐳\mathbf{U_{z}}, 𝐖\mathbf{W}, 𝐔\mathbf{U} are learnable weight matrices, while 𝐛𝐳\mathbf{b_{z}}, 𝐛\mathbf{b} are learnable bias vectors. The AFUA resembles the eGRU [12], which we previously showed can be used for cough detection and keyword spotting. But while the eGRU is a conventional digital, discrete-time neural network, the AFUA is a continuous-time system, implementable as an analog integrated circuit.

Refer to caption
Figure 3: High level architecture of the AFUA neural network, which has a two-dimensional input feature vector, 𝐱=[x0,x1]T\mathbf{x}=[x_{0},x_{1}]^{\rm T}. The network keeps a memory of past inputs by feeding back its hidden states, h0,h1h_{0},h_{1}, to the vector matrix multiplier (VMM). The persistence of the network’s memory depends on the time constants, z0,z1z_{0},z_{1}, of the adaptive low pass filters in the “update” block. Finally, the “activation” block provides saturating nonlinearities described by Eqn. (7).

III AFUA circuit implementation

Figure 3 shows the high-level block diagram of the AFUA neural network. It comprises two AFUA cells (with corresponding hidden states h0h_{0} and h1h_{1}), and it accepts two inputs, x0x_{0} and x1x_{1}. Unlike previous LSTMs [13, 14, 15, 16, 17], the AFUA network contains no digital-to-analog converters, analog-to-digital converters, operational amplifiers or four-quadrant multipliers. Avoiding these power-consumptive components is what makes the AFUA implementation so efficient. Following are the circuit implementation details of the AFUA.

III-A Dimensionalization

To realize the AFUA Eqns. (8), (9), (10) and (7) as an analog circuit, we first “dimensionalize” each variable and implement it as the ratio of a time-varying current and a fixed unit current, IunitI_{\rm unit} [24, 25]. For instance, we represent the update gate variable, zjz_{j}, as Iz/IunitI_{z}/I_{\rm unit}.

III-B Activation Function

The Eqn. (7) function is implemented as the current-starved current mirror shown in Fig. 4. Kirchhoff’s Current Law applied to the source of transistor M3 gives

Iout=I3=IunitI4.I_{\rm out}=I_{3}=I_{\rm unit}-I_{4}. (11)

The transistors are all sized equally, meaning that, from Kirchhoff’s Voltage Law, the gate source voltage of transistor M3 is

VGS3=2VGS1+VGS42VGSa,V_{\rm GS3}=2V_{\rm GS1}+V_{\rm GS4}-2V_{\rm GSa}, (12)

where we have assumed that the body effect in M2 and Mb is negligible. If we operate the transistors in the subthreshold region, then Eqn. (12) implies

Iout=I3=I4I12Iunit2.I_{\rm out}=I_{3}=\frac{I_{4}I_{1}^{2}}{I_{\rm unit}^{2}}. (13)

Combining Eqns. (11) and (13) gives us

Iout=IunitI12Iunit2+I12.I_{\rm out}=\frac{I_{\rm unit}I_{1}^{2}}{I_{\rm unit}^{2}+I_{1}^{2}}. (14)

Now, the current flowing through a diode-connected nMOS is unidirectional, meaning I1=max(Iin,0)I_{1}={\rm max}(I_{\rm in},0), and we can write

Iout=Iunitmax(Iin,0)2Iunit2+max(Iin,0)2,I_{\rm out}=I_{\rm unit}\cdot\frac{{\rm max}(I_{\rm in},0)^{2}}{I_{\rm unit}^{2}+{\rm max}(I_{\rm in},0)^{2}}, (15)

which is a dimensionalized analog of Eqn. (7). The measurement results in Fig. 5 illustrate the nonlinear, saturating behavior of this activation function.

Refer to caption
Figure 4: Activation function circuit schematic. A version of the input signal, IinI_{\rm in}, is reflected as current IoutI_{\rm out}. The tail bias current source of the M3-M4 differential pair limits the output current to Iout<IunitI_{\rm out}<I_{\rm unit}. Also, the one-sidedness of the nMOS drain current limits IoutI_{\rm out} to positive values only. In summary, the activation function circuit produces 0AIout<Iunit0~{}\rm{A}\leq I_{\rm out}<I_{\rm unit}.
Refer to caption
Figure 5: Activation function transfer curve. Chip measurements of the Fig. 4 circuit closely match the theoretically-predicted behavior of Eqn. (15) for Iunit=10.5I_{\rm unit}=10.5~{}nA.

III-C State Update

The AFUA state update, Eqn. (10), is implemented as the adaptive filter shown in Fig. 6. The currents IhI_{h}, Ih~I_{\tilde{h}} and IzI_{z} represent the hidden state hjh_{j}, the candidate state h~j\tilde{h}_{j} and the update gate, zjz_{j}, respectively. From the translinear loop principle, the Fig. 6 circuit’s dynamics can be written as [26, 24]

CzUTκIunitτIunitIzdIhdt=2Ih~Ih,\underbrace{\frac{C_{z}U_{\rm T}}{\kappa I_{\rm unit}}}_{\tau}\frac{I_{\rm unit}}{I_{z}}\frac{dI_{h}}{dt}=2I_{\tilde{h}}-I_{h}, (16)

where κ\kappa is the body-effect coefficient and UTU_{\rm T} is the thermal voltage [27]. Just as zjz_{j} does for hjh_{j} in Eqn. (10), IzI_{z} controls the update speed of IhI_{h} (see Fig. 7).

Refer to caption
Figure 6: State update circuit schematic. The output IhI_{h} is a low-pass-filtered version of the input, 2Ih~2I_{\tilde{h}}. The filter’s time constant is inversely proportional to the value of the current IzI_{z}. So, large values of IzI_{z} increase the rate at which IhI_{h} updates to 2Ih~2I_{\tilde{h}}, while small values of IzI_{z} slow down this process.
Refer to caption
Figure 7: State update circuit response. Chip measurements of the Fig. 6 circuit show that the output, IhI_{h} follows the input, Ih~I_{\tilde{h}} at a rate that is determined by the value of current IzI_{z}.

III-D Vector Matrix Multiplication

Figure 8 depicts the components of our vector-matrix multiplication (VMM) block. These are the soma and synapse circuits that are common in the analog neuromorphic literature [28]. Crucially, the soma-synapse architecture is current-in, current-out. This means that, unlike other approaches for implementing GRU and LSTM networks [14, 15, 16], the VMM does not need power-consumptive operational amplifiers to convert signals between the current and voltage domains.

Refer to caption
Figure 8: Vector matrix multiplier circuit components. The soma (top panel) and the synapse (bottom panel) form a programmable current mirror. The current mirror’s gain is stored in registers wsgn, w0, w1. These represent the neural network’s 3-bit quantized learned weights.

IV Circuit Analysis

IV-A Current Consumption

Since the activation function, Eqn. (7), has a range of (0,1)(0,1), the zjz_{j} and h~j\tilde{h}_{j} variables are likewise limited to (0,1)(0,1). Also, from Eqn. (10), hjh_{j} spans (0,2)(0,2). This means that all update gate and candidate state currents have a maximum value of IunitI_{\rm unit}, while the hidden state currents have a maximum value of 22IunitI_{\rm unit}. With this information, we can calculate upper-bounds on the current consumption of each circuit component.

Refer to caption
Figure 9: AFUA network current consumption when presented with 200 different input patterns. The scatter plot shows that the network’s total current consumption is largely determined by the VMM. The average total current consumption is 62Iunit62I_{\rm unit}.

IV-A1 Activation Function

Not counting the input current that is supplied by the VMM, Fig. 4 shows that the only current consumed by the activation function block is the differential-pair tail current of IunitI_{\rm unit}. There are two activation functions per AFUA cell (one each for zjz_{j} and h~j\tilde{h}_{j}). So, for an mm-unit AFUA layer, the activation function blocks draw a total current of m×2Iunitm\times 2I_{\rm unit}.

IV-A2 State Update

The total current flowing through the four branches of the state update circuit (Fig. 6) is 2I~h+2Iz+Ih2\tilde{I}_{h}+2I_{z}+I_{h}, which has a worst-case value of 6Iunit6I_{\rm unit}. For our mm-unit AFUA network, the state update circuits consume at most m×6Iunitm\times 6I_{\rm unit}.

IV-A3 VMM soma

The soma is a current-mode buffer that drives a differential signal onto each row of the VMM (see Fig. 8). For the somas on the input and bias rows, the maximum current consumption is 22IunitI_{\rm unit}. The somas driving the hidden state rows consume at most 44IunitI_{\rm unit} each. So, with nn inputs, mm hidden states and one bias row, the somas will consume a maximum total current of (n+2m+1)×2Iunit(n+2m+1)\times 2I_{\rm unit}.

IV-A4 VMM core

As depicted in Fig. 8, each multiplier element in the VMM core comprises a number of current sources that are switched on or off, depending on the values of the weight bits. At worst, all current sources are switched on, in which case the VMM elements that process state variables each consume 6Iunit6I_{\rm unit}, while those that process input variables or biases each consume 3Iunit3I_{\rm unit}. The maximum current draw of each VMM column for an nn-input AFUA layer with mm hidden states is therefore (n+2m+1)×3Iunit(n+2m+1)\times 3I_{\rm unit}. There are 2m2m columns, to give a total maximum VMM core current consumption of m(n+2m+1)×6Iunitm(n+2m+1)\times 6I_{\rm unit}.

IV-A5 Total Current Consumption

From the previous subsections, we conclude that the worst-case total current consumption of an nn-input AFUA layer with mm hidden states is

Itot(m(14+6(n+2m))core+4m+2n+2VMMsoma)×Iunit,I_{\rm tot}\leq(\underbrace{m(14+6(n+2m))}_{\rm core}+\underbrace{4m+2n+2}_{\rm VMM~{}soma})\times I_{\rm unit}, (17)

where ‘core’ includes the activation function, VMM core and state update current consumption. The VMM soma is peripheral to the AFUA’s operation and represents overhead cost. For instance, a 1616-input, 1010-unit AFUA layer would spend 3%3~{}\% of its power budget as overhead.

Empirically, we found that the average current consumption of some of the AFUA blocks is significantly lower than their estimated worst-case values. In particular, the VMM consumes only 48Iunit48I_{\rm unit} on average (see Fig. 9). This leads to an average AFUA total current consumption of 62Iunit62I_{\rm unit}. The specific choice of IunitI_{\rm unit} depends on the desired operating speed, as we discuss in the following subsection.

IV-B Estimated Power Efficiency

The power efficiency of neural networks is conventionally measured in operations per Watt. But this metric does not apply directly to a system like the AFUA, since it executes all of its operations continuously and simultaneously. However, we can estimate the AFUA’s power efficiency by considering the performance of an equivalent discrete time system.

To arrive at the discrete-form AFUA unit, we first replace the state variables of Eqns. (8), (9) and (10) with their discrete-time counterparts. This includes the discretization dhj/dt=(hjthjt1)/ΔTdh_{j}/dt=(h^{\langle t\rangle}_{j}-h^{\langle t-1\rangle}_{j})/\Delta T, where ΔT\Delta T is the sampling period. Then, we set τ=ΔT\tau=\Delta T to produce the following expression.

zj\displaystyle z_{j} =\displaystyle= f([𝐖𝐳𝐱]j+[𝐔𝐳(𝐡𝟏)+𝐛𝐳]j)\displaystyle f([\mathbf{W_{z}x}]_{j}+[\mathbf{U_{z}(h-1)+b_{z}}]_{j})
h~jt\displaystyle\tilde{h}^{\langle t\rangle}_{j} =\displaystyle= f([𝐖𝐱]j+[𝐔(𝐡t1𝟏)+𝐛]j)\displaystyle f([\mathbf{Wx}]_{j}+[\mathbf{U(h}_{\langle t-1\rangle}\mathbf{-1)+b}]_{j})
hjt\displaystyle h^{\langle t\rangle}_{j} =\displaystyle= zj2h~jt(1zj)hjt1.\displaystyle z_{j}2\tilde{h}^{\langle t\rangle}_{j}-(1-z_{j})h^{\langle t-1\rangle}_{j}. (18)

Recall that 𝐖,𝐖𝐳\mathbf{W},\mathbf{W_{z}} are 2×22\times 2 matrices, 𝐔,𝐔𝐳\mathbf{U},\mathbf{U_{z}} are 1×21\times 2 vectors and zjz_{j} are scalars, meaning that each discretized AFUA unit executes 1414 multiply operations per time step. Also, there are 22 divisions due to the two activation functions (see Eqn. (7)). Not counting additions and subtractions, each discretized AFUA unit executes 1616 operations per time step, to make for a total of 3232 operations/step performed by the network. Assuming the sampling period of ΔT=2\Delta T=2 ms used in our previous eating detection systems [6, 11], this implies the AFUA performs the equivalent of 16,00016,000 operations per second.

Now, setting τ=ΔT=2\tau=\Delta T=2 ms requires a unit current of

Iunit=CzUTκτ=500CzUTκ,I_{\rm unit}=\frac{C_{\rm z}U_{\rm T}}{\kappa\tau}=500\cdot\frac{C_{\rm z}U_{\rm T}}{\kappa}, (19)

where Cz=57C_{z}=57 fF is the integrating capacitor of the translinear loop filter, UT=26U_{\rm T}=26 mV at room temperature and κ0.42\kappa\approx 0.42. This gives Iunit=1.8I_{\rm unit}=1.8 pA. With a total current consumption of 62Iunit62I_{\rm unit}, a voltage supply of 1.81.8 V and 1616K operations per second, the AFUA’s equivalent operations per Watt is 7676 TOps/W.

IV-C Mismatch

Due to random variations in doping and geometry, transistors that are nominally identical will exhibit mismatch when fabricated in a physical ASIC. To understand the effect of mismatch and other non-idealities on the AFUA neural network’s performance, we performed Monte Carlo analyses with foundry-provided manufacturing and test data. The Monte Carlo analyses included mismatch and process variation, as well as power supply voltage and temperature corners of {1.6V,2V}\{1.6V,2V\} and {0C,35C}\{0^{\circ}C,35^{\circ}C\}, respectively.

Figure 10 shows the variation in classification accuracy for 250250 Monte Carlo runs of one implementation of the AFUA neural network. The median accuracy across all runs is 0.900.90. Most of the variation in accuracy is due to mismatch, and the AFUA neural network is largely robust to temperature, voltage and process variation. The neural network is also unaffected by circuit noise (this is a direct result of the network’s ability to generalize). To mitigate the effect of mismatch, we can use larger transistors [29], calibrate the network’s learning algorithm for each individual chip [28], or incorporate mismatch data into a fault-tolerant learning algorithm [30].

Refer to caption
Figure 10: Monte Carlo analysis performed for 250250 runs, including mismatch and process variation, as well as power supply voltage and temperature corners of {1.6V,2V}\{1.6V,2V\} and {0C,35C}\{0^{\circ}C,35^{\circ}C\}, respectively. Nominal power supply voltage and temperature are 1.8V1.8~{}V, 27C27^{\circ}C. Median accuracy is 90%90~{}\%.

V Experimental Methods

V-A Data Collection

Training and testing data was collected from study volunteers in a laboratory setting. All aspects of the study protocol were reviewed and approved by the Dartmouth College Institutional Review Board (Committee for the Protection of Human Subjects-Dartmouth; Protocol Number: 00030005).

Refer to caption
Figure 11: Left panel: a contact microphone was used to collect acoustic data from the mastoid bone as study participants performed various eating and non-eating tasks [31]. Right panel: prototype of the complete wearable device that we are developing for dietary monitoring [6].

The data used for this study was previously collected in a controlled laboratory setting from 20 participants (8 females, 12 males; aged 21-30) that were instructed to perform both eating and non-eating-related activities. During these activities, a contact microphone (see Fig. 11) was secured behind the ear with a headband, to measure any acoustic signals present at the tip of the mastoid bone [31]. The output of the contact microphone was digitized and stored using a 20 kSa/s, 24-bit data acquisition device (DAQ).

Participants were asked to eat a variety of foods—including carrots, protein bars, crackers, canned fruit, instant food, and yogurt—for at least 2 minutes per food type. This resulted in a 4 hour total eating dataset. Non-eating activities included talking and silence for 5 minutes each and then coughing, laughing, drinking water, sniffling, and deep breathing for 24 seconds each. This resulted in 4 hours total of non-eating data. Each activity occurred separately and was classified based on activity type as eating or non-eating.

We down-sampled the DAQ data to 500500 Hz and applied a high pass filter with a 2020 Hz cutoff frequency to attenuate noise. We segmented the positive class data (chewing), and negative class data (not chewing) into 2424-second windows with no overlap. The positive and negative class data were labelled with the one-hot encoding (2,0)(2,0) and (0,2)(0,2), respectively. Finally, we extracted the ZCR-RMS and ZCR-ZCR features of the windows to produce 22-dimensional input vectors to be processed by the AFUA network.

V-B Neural Network Training

For training, the AFUA neural network was implemented in Python, using a custom layer defined by the discretized system of Eqn. (18). Chip-specific parameters were extracted for each neuron and incorporated into the custom layers. The AFUA network was trained and validated on the laboratory data (train/valid/test split: 68/12/2068/12/20) using the TensorFlow Keras v2.0 package. Training was performed with the adam optimizer [32] and a weighted binary cross-entropy loss function to learn full-precision weights.

Python training was followed by a quantization step that converted the full-precision weights to signed 33-bit values (0,±1,±2,±30,\pm 1,\pm 2,\pm 3). An alternative approach would have been to directly incorporate the quantization process into the network’s computational graph [12]. However, we found that such an approach only slows down training with no improvement in our network’s classification performance.

Refer to caption
Figure 12: Accuracy and loss training graphs for discretized AFUA neural network. We performed training in Python using the TensorFlow Keras v2.0 package. Validation set performance tracked that of the training set, indicating good generalization. The learned weights were quantized and programmed onto the AFUA ASIC’s on-chip registers.

V-C Chip Measurements

Refer to caption
Figure 13: Die photo of the AFUA ASIC, implemented in a 0.18μ0.18~{}\mum CMOS process. The synapse circuits (labelled “VMM core”) consume most of the 200μ200~{}\mum×280μ\times 280~{}\mum circuit area.

The AFUA was implemented, fabricated and tested as an integrated circuit in a standard 0.18μ0.18~{}\mum mixed-signal CMOS process with a 1.81.8 V power supply. To simplify the measurement process and associated instrumentation, the ASIC I/O infrastructure includes current buffers that scale input currents by 1/1001/100 and that multiply output currents by 100100.

The AFUA neural network was programmed by storing the 33-bit version of each learned weight onto its corresponding on-chip register in the VMM array.

The network was then evaluated on the test dataset. Specifically, each 2424-second long window of 22-dimensional feature vectors from the test dataset was dimensionalized and scaled to 100×Iunit100\times I_{\rm unit} and input to the ASIC with an arbitrary waveform generator. We set Iunit10I_{\rm unit}\approx 10 nA with an off-chip resistor. According to Eqn. (19), this IunitI_{\rm unit} creates a time constant of τ=0.36μ\tau=0.36~{}\mus, allowing for faster-than-real-time chip measurements—an important consideration, given the large amount of test data to be processed.

Refer to caption
Figure 14: AFUA chip measurement response to different input patterns (Ix1,Ix0I_{x1},I_{x0}) taken from the test dataset. The circuit’s class prediction is encoded as output currents (Ih1,Ih0I_{h1},I_{h0}).

Output currents Ih0I_{h0}, Ih1I_{h1} were each measured from the voltage drop across an off-chip sense resistor. The ASIC’s steady-state response was then taken as the classification decision. An output value of (Ih1,Ih0)=(2Iunit,0)(I_{h1},I_{h0})=(2I_{\rm unit},0) means that the circuit classified the input as eating, while (Ih1,Ih0)=(0,2Iunit)(I_{h1},I_{h0})=(0,2I_{\rm unit}) corresponds to non-eating. From these measurements, we calculated the algorithm’s test accuracy, loss, precision, recall, and F1-score.

VI Results and Discussion

Window Size (s) Accuracy F1-Score Precision Recall Power (mW)
This work 24 0.94 0.94 0.96 0.91 0.019
FitByte [33] 5 - - 0.83 0.94 105
TinyEats [11] 4 0.95 0.95 0.95 0.95 40
Auracle [31] 3 0.91 - 0.95 0.87 offline
EarBit [4] 35 0.90 0.91 0.87 0.96 offline
AXL [5] 20 - 0.91 0.87 0.95 offline
TABLE II: Comparison between proposed eating detection system and previous solutions. Three of the classification algorithms [31, 4, 5] were implemented offline; since these are not embedded solutions, their power consumption is not reported.

VI-A Classification Performance

Figure 14 shows the AFUA chip’s typical response to input data. The input currents Ix1,Ix0I_{x1},I_{x0} represent the ZCR-RMS and ZCR-ZCR features extracted from the contact microphone signal. Inputting a stream of Ix1,Ix0I_{x1},I_{x0} patterns produces output currents Ih1,Ih0I_{h1},I_{h0}, which represent the hidden states of the AFUA network.

According to our encoding scheme, (Ih1,Ih0)=(2Iunit,0)(I_{h1},I_{h0})=(2I_{\rm unit},0) means that the circuit classified the input as chewing, while (Ih1,Ih0)=(0,2Iunit)(I_{h1},I_{h0})=(0,2I_{\rm unit}) corresponds to a prediction of not chewing. But the presence of noise and circuit non-ideality produces some ambiguity in the encoding: some AFUA output patterns can be interpreted as either chewing or not chewing, depending on the choice of threshold used to distinguish between 00~{}A and 2Iunit2I_{\rm unit}. Figure 15 is the receiver operating curve (ROC) produced by varying this threshold current. The highlighted point on the ROC is a representative operating point, where the classifier produced a sensitivity of 0.910.91 and a specificity of 0.960.96. This corresponds to a false alarm rate of (1(1-specificity)=0.039)=0.039.

Refer to caption
Figure 15: Receiver operating curve from AFUA chip measurements. These results were produced from the AFUA chip response to 1.61.6 hours of previously-unseen test data. The highlighted point corresponds to a sensitivity of 0.910.91 and a specificity of 0.960.96.
Refer to caption
Figure 16: Power consumption of eating detection system. The feature extraction and AFUA circuitry continuously consume 1.8μ1.8~{}\muW of power. The microcontroller is active for 9%9\% of the time, during which it consumes 180μ180~{}\muW of power. For the remaining 91%91\% of the time, the microcontoller consumes 0.72μ0.72~{}\muW while in standby mode. On average (red dashed line), the whole system consumes an estimated 18.8μ18.8~{}\muW.

VI-B System-level Considerations

In this section, we consider the impact of using the AFUA neural network in a complete eating event detection system. To process a 500500 Hz signal, the ZCR and RMS feature extraction blocks consume a total of 0.68μ0.68~{}\muW [19]. Also, the AFUA network consumes 1.1μ1.1~{}\muW, assuming Iunit=10I_{\rm unit}=10~{}nA. Finally, a microcontroller from the MSP430x series (Texas Instruments Inc., Dallas, TX) running at 11~{}MHz consumes 180μ180~{}\muW when active and 0.72μ0.72~{}\muW when in standby mode [34].

The feature extraction and AFUA circuitry are always on, while the microcontroller remains in standby mode until a potential chewing event is detected. The fraction of time the microcontroller is in the active mode depends on how often the user eats, as well as the sensitivity and specificity of the AFUA network. Assuming the user spends 6%6\% of the day eating [35], then, using the classifier operating point highlighted in Fig. 15, the fraction of time that the microcontroller is active is

active =\displaystyle= eat×sens+(1spec)×(1eat)\displaystyle\textsc{eat}\times\textsc{sens}+(1-\textsc{spec})\times(1-\textsc{eat}) (20)
=\displaystyle= 0.06×0.91+(10.96)×(10.06)\displaystyle 0.06\times 0.91+(1-0.96)\times(1-0.06)
=\displaystyle= 0.09.\displaystyle 0.09.

So, the microcontroller consumes an average of 180μW×0.09+0.72μW×(10.09)=16.9μ180~{}\mu W\times 0.09+0.72~{}\mu W\times(1-0.09)=16.9~{}\muW. As Fig. 16 shows, the average power consumption of the complete AFUA-based eating detection system is 18.8μ18.8~{}\muW. If we attempted to implement the system with a front-end ADC (12-bit, 500 Sa/s [6, 36]) followed by a digital LSTM [37, 38], then the ADC alone would consume over 240μ240~{}\muW of power [39].

Table II compares our work to other recent eating detection solutions. The different approaches all yield generally the same level of classification accuracy, but our work differs in one critical aspect: while others depend on offline processing, or on tens of milliWatts of power to operate, our approach only requires an estimated 18.8μ18.8~{}\muW.

VII Conclusion

We have introduced the AFUA—an adaptive filter unit for analog long short-term memory—as part of an eating event detection system. Measurement results of the AFUA implemented in a 0.18 μ\mum CMOS technology showed that it can identify chewing events at a 24-second time resolution with a recall of 91%\% and an F1-score of 94%\%, while consuming 1.1μ1.1~{}\muW of power. The AFUA precludes the need for an analog-to-digital converter, and it also prevents a downstream microcontroller from unnecessarily processing irrelevant data. If a signal processing system were built around the AFUA for detecting eating episodes (that is, meals and snacks), then the whole system would consume less than 20μ20~{}\muW of power. This opens up the possibility of unobtrusive, batteryless wearable devices that can be used for long-term monitoring of dietary habits.

VIII Acknowledgments

This work was supported in part by the U.S. National Science Foundation, under award numbers CNS-1565269 and CNS-1835983. The views and conclusions contained in this document are those of the authors and do not necessarily represent the official policies, either expressed or implied, of the sponsors.

References

  • [1] Ki Soo Kang “Nutritional counseling for obese children with obesity-related metabolic abnormalities in Korea” In Pediatric gastroenterology, hepatology & nutrition 20.2 The Korean Society of Pediatric Gastroenterology, HepatologyNutrition, 2017, pp. 71–78
  • [2] Laura M O’Connor et al. “Dietary dairy product intake and incident type 2 diabetes: a prospective study using dietary data from a 7-day food diary” In Diabetologia 57.5 Springer, 2014, pp. 909–917
  • [3] Robert Turton et al. “To go or not to go: A proof of concept study testing food-specific inhibition training for women with eating and weight disorders” In European Eating Disorders Review 26.1 Wiley Online Library, 2018, pp. 11–21
  • [4] Abdelkareem Bedri et al. “EarBit: using wearable sensors to detect eating episodes in unconstrained environments” In Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 1.3 ACM New York, NY, USA, 2017, pp. 1–20
  • [5] Muhammad Farooq and Edward Sazonov “Accelerometer-based detection of food intake in free-living individuals” In IEEE sensors journal 18.9 IEEE, 2018, pp. 3752–3758
  • [6] Shengjie Bi et al. “Auracle: Detecting eating episodes with an ear-mounted sensor” In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2.3 ACM New York, NY, USA, 2018, pp. 1–27
  • [7] Ana Isabel Canhoto and Sabrina Arp “Exploring the factors that support adoption and sustained use of health and fitness wearables” In Journal of Marketing Management 33.1-2 Taylor & Francis, 2017, pp. 32–60
  • [8] Yiwen Gao, He Li and Yan Luo “An empirical study of wearable technology acceptance in healthcare” In Industrial Management & Data Systems Emerald Group Publishing Limited, 2015
  • [9] Lucy E Dunne et al. “The social comfort of wearable technology and gestural interaction” In 2014 36th annual international conference of the IEEE engineering in medicine and biology society, 2014, pp. 4159–4162 IEEE
  • [10] Brian K Hensel, George Demiris and Karen L Courtney “Defining obtrusiveness in home telehealth technologies: A conceptual framework” In Journal of the American Medical Informatics Association 13.4 BMJ Group BMA House, Tavistock Square, London, WC1H 9JR, 2006, pp. 428–431
  • [11] Maria T Nyamukuru and Kofi M Odame “Tiny Eats: Eating Detection on a Microcontroller” In 2020 IEEE Second Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML), 2020, pp. 19–23 IEEE
  • [12] Justice Amoh and Kofi M Odame “An optimized recurrent unit for ultra-low-power keyword spotting” In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3.2 ACM New York, NY, USA, 2019, pp. 1–17
  • [13] Ian D Jordan and Il Memming Park “Birhythmic analog circuit maze: a nonlinear neurostimulation testbed” In Entropy 22.5 Multidisciplinary Digital Publishing Institute, 2020, pp. 537
  • [14] Kazybek Adam, Kamilya Smagulova and Alex Pappachen James “Memristive LSTM network hardware architecture for time-series predictive modeling problems” In 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2018, pp. 459–462 IEEE
  • [15] Olga Krestinskaya, Khaled Nabil Salama and Alex Pappachen James “Learning in memristive neural network architectures using analog backpropagation circuits” In IEEE Transactions on Circuits and Systems I: Regular Papers 66.2 IEEE, 2018, pp. 719–732
  • [16] Jianhui Han et al. “ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory” In IEEE Transactions on Parallel and Distributed Systems 31.6 IEEE, 2019, pp. 1328–1342
  • [17] Zhou Zhao, Ashok Srivastava, Lu Peng and Qing Chen “Long short-term memory network design for analog computing” In ACM Journal on Emerging Technologies in Computing Systems (JETC) 15.1 ACM New York, NY, USA, 2019, pp. 1–27
  • [18] Qin Li et al. “NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting” In IEEE Transactions on Circuits and Systems I: Regular Papers IEEE, 2021
  • [19] Michael W Baker, Serhii Zhak and Rahul Sarpeshkar “A micropower envelope detector for audio applications [hearing aid applications]” In Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03. 5, 2003, pp. V–V IEEE
  • [20] R Sarpeshkar et al. “An analog bionic ear processor with zero-crossing detection” In ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005., 2005, pp. 78–79 IEEE
  • [21] Sepp Hochreiter and Jürgen Schmidhuber “Long short-term memory” In Neural computation 9.8 MIT Press, 1997, pp. 1735–1780
  • [22] Kyunghyun Cho et al. “Learning phrase representations using RNN encoder-decoder for statistical machine translation” In arXiv preprint arXiv:1406.1078, 2014
  • [23] Steven H Strogatz “Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering” CRC press, 2018
  • [24] K.M. Odame and B.A. Minch “The translinear principle: a general framework for implementing chaotic oscillators” In International Journal of Bifurcation and Chaos 15.08 World Scientific, 2005, pp. 2559–2568
  • [25] Kofi Odame and Bradley Minch “Implementing the Lorenz oscillator with translinear elements” In Analog Integrated Circuits and Signal Processing 59.1 Springer, 2009, pp. 31–41
  • [26] J Mulder, WA Serdijn, AC Van der Woerd and AHM Van Roermund “Dynamic translinear RMS-DC converter” In Electronics letters 32.22 IET, 1996, pp. 2067–2068
  • [27] Christian C. Enz, François Krummenacher and Eric A. Vittoz “An analytical MOS transistor model valid in all regions of operation and dedicated to low-voltage and low-current applications” In Analog Integr. Circuits Signal Process. 8.1 Hingham, MA, USA: Kluwer Academic Publishers, 1995, pp. 83–114 DOI: http://dx.doi.org/10.1007/BF01239381
  • [28] Jonathan Binas et al. “Precise deep neural network computation on imprecise low-power analog hardware” In arXiv preprint arXiv:1606.07786, 2016
  • [29] Marcel JM Pelgrom, Aad CJ Duinmaijer and Anton PG Welbers “Matching properties of MOS transistors” In IEEE Journal of solid-state circuits 24.5 IEEE, 1989, pp. 1433–1439
  • [30] AS Orgenci, G Dundar and S Balkur “Fault-tolerant training of neural networks in the presence of MOS transistor mismatches” In IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 48.3 IEEE, 2001, pp. 272–281
  • [31] Shengjie Bi et al. “Toward a wearable sensor for eating detection” In Proceedings of the 2017 Workshop on Wearable Systems and Applications, 2017, pp. 17–22
  • [32] Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In arXiv preprint arXiv:1412.6980, 2014
  • [33] Abdelkareem Bedri et al. “Fitbyte: Automatic diet monitoring in unconstrained situations using multimodal sensing on eyeglasses” In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 2020, pp. 1–12
  • [34] “MSP430FR596x, MSP430FR594x Mixed-Signal Microcontrollers datasheet” (Rev. G), 2018 Texas Instruments
  • [35] Jim P Stimpson, Brent A Langellier and Fernando A Wilson “Peer Reviewed: Time Spent Eating, by Immigrant Status, Race/Ethnicity, and Length of Residence in the United States” In Preventing Chronic Disease 17 Centers for Disease ControlPrevention, 2020
  • [36] Texas Instruments “CC2640R2F Datasheet”, 2020
  • [37] Dongjoo Shin, Jinmook Lee, Jinsu Lee and Hoi-Jun Yoo “14.2 DNPU: An 8.1 TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks” In 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 240–241 IEEE
  • [38] Juan Sebastian P Giraldo and Marian Verhelst “Laika: A 5uW programmable LSTM accelerator for always-on keyword spotting in 65nm CMOS” In ESSCIRC 2018-IEEE 44th European Solid State Circuits Conference (ESSCIRC), 2018, pp. 166–169 IEEE
  • [39] Texas Instruments “ADS1000-Q1 Datasheet”, 2015