Optimising complexity of CNN models for resource constrained devices: QRS detection case study

Ahsan Habib, , Chandan Karmakar, and John Yearwood This paragraph of the first footnote will contain the date on which you submitted your paper for review.Ahsan Habib, John Yearwood and Chandan Karmakar are with the School of Information Technology, Deakin University, Geelong, 3225, Australia (e-mail: mahabib@deakin.edu.au, karmakar@deakin.edu.au, john.yearwood@deakin.edu.au).

Abstract

Traditional DL models are complex and resource hungry and thus, care needs to be taken in designing Internet of (medical) things (IoT, or IoMT) applications balancing efficiency-complexity trade-off. Recent IoT solutions tend to avoid using deep-learning methods due to such complexities, and rather classical filter-based methods are commonly used. We hypothesize that a shallow CNN model can offer satisfactory level of performance in combination by leveraging other essential solution-components, such as post-processing that is suitable for resource constrained environment. In an IoMT application context, QRS-detection and R-peak localisation from ECG signal as a case study, the complexities of CNN models and post-processing were varied to identify a set of combinations suitable for a range of target resource-limited environments. To the best of our knowledge, finding a deploy-able configuration, by incrementally increasing the CNN model complexity, as required to match the target’s resource capacity, and leveraging the strength of post-processing, is the first of its kind. The results show that a shallow 2-layer CNN with a suitable post-processing can achieve $>$ 90% F1-score, and the scores continue to improving for 8-32 layer CNNs, which can be used to profile target constraint environment. The outcome shows that it is possible to design an optimal DL solution with known target performance characteristics and resource (computing capacity, and memory) constraints.

Index Terms:

Convolutional neural network (CNN), deep-learning, ECG, generalization, internet of (medical) things, post-processing, QRS-complex

I Introduction

Often, deep-learning (DL) based solutions stress to design a model complex enough, in terms of depth or other techniques, to understand a phenomena and achieve high performance for a given task. This is particularly valid for computer vision tasks [1, 2, 3], as well as, physiological time-series tasks [4, 5]. A deep model consists of a large number of layers, thus, a large number of parameters which require high computation capacity to train with a big enough dataset. The time and space complexity of such a deep model to deploy often exceeds the capacity of resource constrained environments (computing capacity, memory, energy etc.). The use of DL-based models in the client-side environment (sensor, smart-watch, or smart-phone) of IoT-based solutions, particularly, Internet of Medical Things (IoMT) based physiological signal monitoring solution are still unpopular and traditional filter-based approaches are commonly used [6, 7, 8, 9]. A shallow DL model may not be outstanding but has the capacity to offer a better solution together by leveraging the strengths of other essential components in an end-to-end IoMT solution architecture, which still lacks a rigorous exploration.

Refer to caption — Figure 1: A conceptual block-diagram showing a deep-learning solution’s decomposition to its constituent parts, including pre-processing, DL model, and post-processing. Pre-processing mainly used to prepare an environment, so the remaining two parts were analysed varying their complexities in order to find a suitable compositional solution for a target resource constrained environment.

DL algorithms, as well as, filter-based classical methods which deal with physiological signals often require the input signal to pre-process and the algorithm’s output to post-process before a decision is finalised. The back-propagation training for DL models generally require the data to contain a certain-level of noise, in case where the data is clean, noise from external source is added to it so that the model is forced to be robust enough to learn sufficient characteristic features and be able to reduce over-fitting and better generalise over unknown test data [10, 11, 12]. This characteristic of noise robustness, in particular, is crucial for tasks with physiological signal which inherently contaminated with noise from multiple sources where threshold-based classical methods commonly fail to generalise over unknown test data. Classical filter-based methods in QRS-detection literature, for example, rarely report performance with multiple ECG datasets [13, 14, 15, 16, 17], whereas, recent DL approaches were found to test models across a broad range of datasets [5, 18]. In spite of this benefit, recent IoMT studies often found to using traditional methods [13, 14, 15, 16, 17]. This reluctance to using DL models can probably be characterised by the resource required by deep models which seem to be overwhelming for resource constrained environment.

A solution to a given physiological bio-signal related task, offered by a DL method, can be decomposed to its components including a pre-processing, the model itself, and a post-processing, as shown in Figure 1. Since the pre-processing is mainly used to prepare an environment, remaining two components were considered as variable. This study decouples the CNN models from post-processing to clearly understand their relative importance, later their complexities were varied to leverage the strengths of their combinations in order to find a solution customised for a resource constrained environment. Considering a DL-based solution as a compositional optimisation of its components, that is suitable for resource constrained environment, were not actively sought in recent IoMT studies, which could be an attractive alternative to traditional approaches currently used.

A task of QRS-detection and R-peak localisation was selected as a case study to explore the strategy of incrementally increasing the complexity of a DL-based method’s components (DL model and post-processing) to offer an IoMT solution for continuous ECG signal monitoring. Monitoring physiological signals using wearable sensors is increasing in recent time and monitoring ECG signal is of particular importance since it can detect a variety of heart problems, including arrhythmias, coronary heart disease, heart attacks, and cardiomyopathy [19, 20, 21, 22]. ECG signal consists of P-wave, QRS-complex, and T-wave. The QRS-detection literature contains traditional digital filter-based approaches [23, 13, 14, 15, 16, 17] and classical machine-learning approaches [24, 25, 26] to detect QRS and localise R-peaks. Recent deep-learning-based QRS-detection and R-peak localisation studies reported superior performance across multiple validation-datasets, which involves the use of complex CNN-model-architectures and essential post-processing [5, 27, 28, 29, 30, 31]. The post-processing (PP) in these studies can be summarised as -

•

minimal PP (basic Salt and Pepper filter) [27],
•

moderate PP (use domain-knowledge to group 1s as QRS candidate, i.e. 64ms equivalent sample-length) [5, 30, 31, 29], and
•

advanced PP (use domain-knowledge to filter close QRS neighbours, i.e. minimum R-R distance to be 100 milliseconds) [27, 31, 5]

Sometime it is unclear if the resultant QRS-detection performance is due to the strength of the model or the post-processing and very limited information is available to quantify the post-processing effect. In addition, none of these deep learning based approaches consider limitations of resource constrained devices for their algorithm design.

This study identifies relative complexities of CNN models and post-processing so that a set of combinations can be obtained. CNN model’s complexity was increased by increasing its depth, since the depth parameter significantly contributes to a model’s learning capacity [2] and the complexity of post-processing was dealt by segregating to three components based on the amount of domain-knowledge used. A set of model-and-postprocessing configurations was then identified to be suitable for corresponding target environments’ resource capacity. To the best of our knowledge, the selection of a CNN model by incrementally increasing its complexity, based on a target resource-constraint environment, and leveraging the strength of post-processing to propose a set of configurations is unique of its kind.

The implementation is released as open source on GitHub.¹¹1https://github.com/deakin-deep-dreamer/qrs_postprocess012

II METHODOLOGY

II-A Problem Formulation

The QRS-detection and R-peak localisation task was formulated as a segmentation problem where each input sample receives a prediction through a CNN model. By labeling five samples around an annotated R-location (approx. 0.05 seconds per side, forming a 0.1 second QRS region) as 1s and the rest samples as 0s, similar to Jia et al. [27], a binary-mask was created to train a CNN model to be able to predict a similar mask which later used to localise R-peaks.

II-B Application Framework

The proposed QRS detection algorithm fits within an application framework as shown in Figure 1. ECG signal can be acquired using sensors and sent to a destination using wireless protocols, including Bluetooth low-energy (BLE), among others. A resource constraint embedded device works as an end-point to receive raw ECG signal. A suitable QRS detection method can be determined by leveraging the strengths of DL model and required post-processing for a target resource-constraint environment, including smart-phone, smart-watch, or an embedded router hardware, which may connect multiple sensors Figure 1). The output information (ECG signal and R-peak locations) can be processed locally or could be transferred to the cloud server for further processing.

II-C ECG Data

ECG datasets from PhysioNet data-bank [32] were used for this study (summarised in Table I),

TABLE I: Characteristics of PhysioNet datasets used in current study.

DB-Name	Source Hz	No. of Rec.	Used no. of Rec.	Length (minute/rec.)	No. of Beats
EDB	250	90	90	120	791665
INCART	257	75	75	30	175900
MIT-BIH-Arr	360	48	46	30	102941
NSTDB	360	12	12	30	25590
QT	250	82	80	15	84883
SVDB	128	78	78	30	184583
STDB	360	28	28	vary	76175
TWADB	500	100	100	2	18993

including the MIT-BIH-Arrhythmia [33], INCART, QT [34], EDB (European ST-T Database) [35], STDB (MIT-BIH ST Change Database), TWADB (T-Wave Alternans Challenge Database), NSTDB (MIT-BIH Noise Stress Test Database) [36], and SVDB (MIT-BIH Supraventricular Arrhythmia Database) [37]. The first channel signal, among others, was used. The STDB contains variable length records and on average, 28 minutes long.

II-D Pre-processing

Resampling

PhysioNet datasets are sampled at a range of frequencies, so they were resampled at a minimal 100Hz frequency, considering the fact that average QRS-complex may go up to 25Hz or beyond [38], so the Nyquest sampling frequency [39] should be at least 50Hz.

Segmentation

A three second segmentation was applied with two seconds overlapping, allowing each QRS region to receive the model prediction multiple times, which may increase the detection likelihood. A three second segmentation was applied with two seconds overlapping, allowing more long-enough segments for the training, without increasing the model’s complexity significantly (longer input segment increases convolution operation computation). For test ECG records, a non-overlapped 3s segmentation was used to generate predictions.

Normalisation

The heterogeneous datasets were normalised using standard-scores and such normalisation was performed at segment-level instead of the recording-level to localise the effect of noise.

II-E CNN Model

The focus of this study was to observe the performance of CNN models, with a gradual increase of their complexity (by increasing their depths [2], in the range of 2-64 layers) and leverage the strength of post-processing (PP 1-3) to yield a set of configurations for a range of resource-constrained environments. Shallow CNN models are often utilised in QRS detection [40, 18, 29], thus a simple convolution-only model (no sub-sampling layer, with output dimension the same as the input, using a similar philosophy of Pelt et al. [41]) was selected as a baseline model, which will be referred as baseline-convnet in the text.

The block-diagram of the network (Figure 2) consists of a low-convolution-block, transforming raw-input into an intermediate representation, followed by a feature-extraction-block, consisting of one or more convolution-layer(s), and finally, a scoring-layer that projects into binary decision-planes to decide sample-wise predictions. The convolution kernel hyper-parameter is optimised by adopting a domain-specific heuristic that the average duration of QRS-complex is 60 milliseconds and a 44 milliseconds equivalent (which is 5 samples for 100Hz signal) kernel should be long-enough to capture the most QRS features, following a study of Guo et al. [22]. The CNN model was implemented in Python 3.7 using PyTorch 1.8.1 API.

II-F Model Training and Testing

The MIT-BIH-Arrhythmia dataset is comparatively a less noisy dataset and was used for the model training to learn signal morphology, while the other datasets for cross-database testing. The baseline-convnet was optimised by an internal subject-wise five-fold cross-validation of the MIT-BIH-Arrhythmia dataset, where subject-wise four-fold was used for training and the rest-fold subjects were used to validate the model and stop the training when suitable. The five-fold approach yields five such optimised models, which were then used for cross-database testing, producing five validation scores per validation-dataset and finally, the scores were averaged.

An early-stopping mechanism was used in model-training, instead of using a fixed number of epochs, to early detect the convergence. At the end of each epoch, the trained model was evaluated using one of the folds (internal five-fold approach, as mentioned above) and if the validation-loss does not improve for seven consecutive epochs (a.k.a. early-stopping patience), the training is stopped, while the maximum number of epochs was set to 50. In most of the cases, it was found that the training process was terminated well before reaching the maximum epoch limit. An initial learning rate was set to 0.01 to facilitate eager learning and a learning-rate-scheduler was used to decrease the rate by a factor of 0.1 of patience value set to 5, meaning that if the validation-loss does not decrease for five-consecutive epochs, the rate is decreased. A cross-entropy loss was found suitable, possibly due to it’s usage of natural logarithm that takes into account the difference between the actual label and predicted outcomes in a more granular way. To back-propagate the calculated loss, the Adam optimiser was adopted for faster convergence [42].

II-G Post-processing Algorithm

The post-processing tries to refine the CNN model prediction-stream just before the R-peaks are localised. The post-processing is separated into three levels - minimal (Algorithm 1), moderate (Algorithm 2), and advanced level (Algorithm 3), based on the level of domain-specific knowledge required.

Data:

•

PRED_STREAM: Binary prediction stream (CNN model’s output.)
•

ONES_PATTERNS: sequence of 1s with isolated 0s (11011, 1101).
•

ZEROS_PATTERNS: sequence of 0s with isolated 1s (00100, 0010).

Result: Refined binary prediction stream.

Minimal: Salt and Pepper filter:

begin

1 Scan for ONES_PATTERNS: Scan binary stream to match ones patterns in sequence and replace with all ones of corresponding pattern length.

2 Scan for ZEROS_PATTERNS: Scan binary stream to match zeros patterns in sequence and replace with all zeros of corresponding pattern length.

3 Repeat above 2 steps: Repeat scan for ONES_PATTERNS and ZEROS_PATTERNS until no match found.

end

Algorithm 1 Salt and Pepper filter (Minimal post-processing).

Data:

•

PRED_STREAM: Binary prediction stream (CNN model’s output.)
•

DOMAIN_KNOWLEDGE: Valid QRS extent should be at least 64 milli-seconds. Note that 64 milli-seconds of 100Hz signal yields 6 samples.

Result: Refined binary prediction stream.

Moderate: Remove less confident QRS candidates:

begin

1 Do Salt and Pepper filtering (Algorithm 1).

2 Calculate confidence score: Scan all QRS candidates (occurrence of consecutive ones) in the binary prediction stream, where each candidate’s confidence is the number of consecutive ones.

3 Filter out less confidants: Scan all QRS candidates in the binary prediction stream and remove them with confidence score less than 6.

end

Algorithm 2 Remove less confident QRS candidates (Moderate post-processing).

Data:

•

PRED_STREAM: Binary prediction stream (CNN model’s output.)
•

DOMAIN_KNOWLEDGE: Two consecutive QRS nodes should be at least 200 milli-seconds (20 samples for 100Hz) distant (minimum R-R distance).

Result: Refined binary prediction stream.

Advanced: Remove short-distant QRS neighbors:

begin

1 Remove less confident QRS candidates (Algorithm 2).

2 Calculate confidence score: See Algorithm 2

3 Calculate R-R interval: Scan the QRS candidates and calculate the distance between two neighboring nodes.

4 Filter candidates with small R-R interval: Scan the QRS candidates and for each node with R-R interval less than 20, remove either the candidate or its next neighbor whoever has comparatively a low confidence score.

end

Algorithm 3 Remove short-distant QRS neighbors (Advanced post-processing).

The minimal post-processing (PP) is a basic Salt and Pepper filter which removes isolated ones (or zeros) recursively within a group of consecutive zeros (or ones). Two patterns of consecutive ones (or zeros), with isolated zeros (or ones) inside, $11011$ and $1101$ (or $00100$ and $0010$ ), were searched in sequence for a match into the input binary stream, in order to replace them with all ones (or zeros) of corresponding pattern lengths (i.e. 5 and 4 ones). Matching the longer pattern $11011$ (or $00100$ ) before the shorter $1101$ (or $0010$ ) was found effective which seems maximised the lengths of consecutive ones (or zeros). The moderate PP tries to filter out nodes with the confidence-score less than a threshold of 64 milliseconds equivalent number of samples [5] (approx. 6 samples for 100Hz signal) to remove QRS-like short lived artefacts. The advanced PP takes into account the R-R interval and filters out nodes whose R-R interval falls below a minimum threshold of 200 milliseconds equivalent number of samples (i.e. 20 samples for 100Hz signal worked well for our case, but Cai et al. [5] used 100 milliseconds).

II-H Validation score

In this study, an F1-score was used as a validation score for proposed models. F1-score is calculated as

F1=2*\frac{PPV*Sensitivy}{PPV+Sensitivy}

(1)

III Implementation

An application framework that was shown in the Abstract, has been implemented in Raspberry Pi and shown in Figure 3.

In this Raspberry Pi 4 model-B based implementation, ECG signal was received from the sensor over a Bluetooth low-energy interface, processed by our proposed set of QRS detection algorithms (set of configurations combining CNN model and post-processing complexity), and finally, sends the ECG signal, along with R-peak annotations, to the cloud using web interface. Other low-resource devices were not considered for implementation in this study, rather a resource-configuration-based assumption has been discussed.

IV RESULTS

Figure 4 shows the variation in F1-scores with increasing depths (scaled using $log_{2}D$ ) of the model using the post-processing (PP) methods - minimal, moderate, and advanced (indexed as PP 1-3, shown in Algorithm 1, 2 3).

PP-1 F1-scores of 2-layer CNN (in Figure 4-a) are the lowest and increase through 4 and 8-layer until 16-layer. The F1-scores between 16 and 32-layer-depth marginally increases for INCART, QT, EDB, STDB, and SVDB, marginally decreases for the NSTDB, however, increases significantly for the TWADB (around 7%). Beyond 32-layer-depth, the F1-scores declined for all the datasets, where the INCART and TWADB decreased comparatively at a higher margin of around 3%. The NSTDB F1-score response varied around only 2% (between 86.5% and 89.8%) across the depths, similar to a group of datasets (QT, EDB, STDB, SVDB), which varied $<$ 2%, but the INCART and TWADB F1-scores varied around 8% (between 87% and 95%) and 14% (between 81% and 95%).

With the moderate and advanced PP (in Figure 4-b,c), the F1-score growth variation flattens better compared to the minimal PP (in Figure 4-a) across the datasets. The advanced post-processed F1-score response is almost similar to the moderate post-processed scores across network-depths with an exception of TWADB, that the advanced PP shows significant improvement starting from 2-layer-deep network (improved by around 13% margin, from 83% in moderate to 96% in advanced PP).

Figure 5 shows three post-processing steps’ relative importance across the validation datasets for networks with 2, 4, and 64 layer depths (intermediate depths not reported since the variation increases marginally, as well as, to keep it brief for better explainability).

Three post-processing methods (indexed as PP 1-3) are represented as blue, yellow, and green color-bar in order. In Figure 5-a, the PP-1 (blue-bar) shows the lowest F1-scores, compared to the other two PPs, while its lowest score is around 81% for TWADB. PP-2 (yellow-bar) is higher than PP-1 (blue-bar), for the 2-layer-depth (Figure 5-a), where this margin is $<$ 5% for all the datasets. The PP-3, in the same Figure 5-a, on the other hand, is superior to the PP-2 marginally for all, but the TWADB, for which the margin is around 15%. With 4-layer deep CNN (Figure 5-b), the scores of PP-1 improved proportionately, but the TWADB and NSTDB marginally improved ( $<$ 2%). The highest 64-layer deep CNN (Figure 5-c) almost equalises the effects of the PP methods (PP 1-3) for the datasets, including the TWADB, where PP-1 is lower than PP-3 by only around 1% margin.

The characteristic curves, in the Figure 6, depicts the score variation of different PPs w.r.t. the network-depths, for each validation dataset. For almost all the datasets, the PP-2 and PP-3 are indistinguishable, except TWADB, for which all PPs tend to come closer at 32-layer-deep CNN (shown in Figure 6-e). The PP-2 is marginally ( $<$ 1%) lower than PP-3 for datasets like QTDB, and STDB (in Figure 6-b, and d) across network depths, but this difference is in the range of 2-10% for datasets like INCART, EDB, NSTDB, (in Figure 6-a, c, f) and $>$ 12% for TWADB (in Figure 6-e).

The characteristic curves in Figure 7 shows the response of F1-scores for network-depths w.r.t. PPs, across the validation datasets. The shallow 2-layer CNN, with PP-1, yields the lowest F1-scores across the datasets, but with the PP-2, the score improves (with margin in the range around 1-6%). The PP-3, on the other hand, always achieves marginal F1-score improvement compared to the moderate post-processing, with the exception of TWADB (Figure 7-e), that improved with around 13% margin. Unlike the INCART and TWADB, the 8, 16, and 32-layer-deep networks are almost indistinguishable for the rest validation-datasets across PPs.

CNN models were trained with large amount of data in computing hardware (GPU server with Nvidia Tesla Volta V100-SXM2-32GB), where the amount of additional time, required for each training epoch for increasing depths of CNN models (shown in Table II), increases exponentially.

TABLE II: Model statistics and post-processing (PP) complexity for various depths of CNN models, used in this study.

No. of Layers:	2	4	8	16	32	64
CNN no. of param:	6244	12100	23812	47236	94084	187780
CNN memory in kB:	34	61	117	227	448	891
Avg. train time /mini-batch	26s	33s	46s	70s	120s	220s
Avg. test time for INCART	$<$ 8s	$<$ 8s	$<$ 8s	$<$ 8s	$<$ 8s	$<$ 8s
PP Complexity minimal, moderate, or advanced:	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$	$\mathcal{O}(n)$

The table also shows the number of CNN model parameters, and memory requirement (in kB), which also grow exponentially with the increase of model depths. The models requires more memory, due to the number of parameters, for higher depths. The validation or test time, on the other hand, is constant across network depths, as shown for INCART dataset, run in Raspberry Pi 4. Each post-processing method scans the output prediction-stream sequentially and updates the individual bits of the stream as required, thus, their run-time complexity is $\mathcal{O}(n)$ .

V DISCUSSION

IoMT solutions often seek traditional filter-based algorithms for embedded device-based resource constrained environments and reluctant to use deep-learning-based algorithms due to their high resource requirement, in spite of their strengths including noise resistance, automated feature extraction and learning. In a sensor-based end-to-end solution, a shallow CNN model has the capacity to offer better performance by leveraging other core components, such as post-processing. To do that, in this study, the complexity of CNN models (only the depth parameter varied, since it solely dominates a model’s learning capacity) were incrementally increased and combined with a set of post-processing. A use case of IoMT based ECG signal monitoring was chosen, where the post-processing was segregated based on an increasing use of domain-knowledge including minimal PP-1 (basic salt & pepper filter), moderate PP-2 (minimum QRS extent 64 milli-seconds), and advanced PP-3 (minimum R-R distance to be 200 milli-seconds). Results of this study shows that it is possible to design an optimal DL solution with known target performance characteristics and resource (computing capacity and memory) constraints.

A combination of a shallow 2-layer CNN and PP-1 seems capable to achieve more than 85% F1-score for almost all the datasets (except TWADB, that scores around 81%). If this is considered as a basic configuration, a superior performance can be achieved by altering the post-processing only. For example, with PP-2, and PP-3, the score of a shallow 2-layer CNN improves by 3% and 15% w.r.t. PP-1. Different CNN model architecture and complexity were used in the QRS detection literature [5, 28, 43, 27], however, the use of a shallow 2-layer CNN model is scarce, where the emphasis was put on post-processing. The strength of post-processing can be leveraged to process a primitive CNN model’s output to achieve comparable performance across a broad range of datasets. Such a configuration would be embedded device friendly, both in terms of CNN model (2-layer CNN has the lowest memory footprint), and post-processing (linear run-time complexity).

The quality of CNN model’s prediction-stream improves with the increase of its complexity. This can be understood by observing the improvement of PP-1 F1-scores for CNNs with increasing depths (characteristic curve, shown in Figure 6). The PP-1 is a basic Salt & Pepper filter, which was found to reach almost the highest level scores of PP-3 for CNN model of depth-8 (QT, and STDB, in Figure 6-b,d), depth-16 (INCART, EDB, and NSTDB, in Figure 6-a,c,f), and depth-32 (TWADB, in Figure 6-e). Deep CNN models (depth $>$ 8 layers) learn better from the training data [44], yield better prediction-stream and reduces the effect of post-processing, although, each dataset has its own depth requirement to achieve the top score. It is particularly important to observe that a shallow 2-layer CNN model may be difficult to train to learn a complex function [45, 44] (i.e. if an input sample belongs to a QRS region or not), that requires PP-2 (for most datasets, in Figure 6,7) or PP-3 (for all datasets) to perform better than PP-1, because it seems that the use of domain-knowledge-based post-processing is complementing the shortcoming of the capacity of a shallow model. The selection of PP-2 or PP-3 (instead of PP-1) with deep CNN models (depth $>$ 8 layers) was found to be a better choice to achieve the top performance for most of the datasets (except TWADB, for which PP-3 is required, instead of PP-2). This requirement, however, may not comply with a target resource-constrained device, which has memory or computation limitation, thus, requires a shallow CNN model.

For a CNN model to have greater generalisability, it is required to perform almost at the same level for a broad range of test ECG data. The characteristic curves of datasets, w.r.t. PPs, or CNN depths (in Figure 6, 7), show the peculiarity of some of them (i.e. INCART, and TWADB), which achieves better performance with a resource-constrained configuration (shallow 2-layer CNN with PP-3), but requires a deep CNN model (8 or 16-layer CNN with PP-3) to achieve the peak performance. A deep CNN model seems trying to understand subtle patterns which enables it to produce better prediction-stream for a wide range of test records (with greater inter and intra-patient variance). As one can assume, the minimum configuration (shallow 2-layer CNN with PP-3) may not guarantee a similar level of performance for a test ECG record if it is much different than the training set and the shallow CNN likely be failing to learn the general features during training. Data diversity is one of the challenges in designing a universal QRS-detector [46], and there are studies in the literature, which cross-validates the model with fractions of a single dataset or a few datasets. Having a deep CNN model to generate better prediction-stream seems beneficial for a clinical context where the performance is the prime requirement.

A suitable configuration can be derived for a target resource-constraint device by matching its memory and computation capacity with those of the CNN models. These observations can be summarised as below points, considering a simplified assumption that the validation datasets represent a universal set of datasets -

•

Target device has extreme low memory and computing capacity: a combination of 2-layer CNN and PP-3 would yield the best performance.
•

Target device has low memory and computing capacity: a combination of 8-layer CNN and PP-3 yields the top performance (for all but the TWADB).
•

Target device has sufficient resources: a combination of 32-layer CNN and PP-2 or PP-3 yields the top performance for all the datasets.

VI CONCLUSION

Deep-learning models often are not a default choice of method in an IoMT end-to-end applications which require such models to deploy in resource constrained environment. Classical filter-based approaches are commonly encountered in such context due to their small memory and computation footprint. In this study, we explored ways for optimising DL solutions for improving their deployability in resource (computation capacity and memory) constrained devices. In a QRS-detection case study, it was shown that by effectively combining CNN models and post-processing by varying their complexities could facilitate to yielding a set of configurations suitable for a range of target resource constrained environments. It was found that a narrow 2-layer CNN can achieve comparable performance with a suitable post-processing, targeting a low-resource environment, however, $>$ 8-layer CNN would be required to achieve the top scores, which may be suitable for comparatively rich target environments. Results of this study pave ways to design DL solution for known resource constraint and target performance characteristics. Finally, this study contributes in improving deploy-ability of DL solutions in resource constrained devices.

VII Future Work

CNN model’s complexity optimisation to yielding a compositional solution to test on real-life data for a given task (i.e. QRS-detection and R-peak localisation) to monitor performance and power-consumption of resource constrained environment could be a promising future direction.

Acknowledgment

This research was undertaken with the assistance of resources and services from the National Computational Infrastructure (NCI), which is supported by the Australian Government.

References

[1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
[2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
[4] A. Y. Hannun, P. Rajpurkar, M. Haghpanahi, G. H. Tison, C. Bourn, M. P. Turakhia, and A. Y. Ng, “Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,” vol. 25, no. 1, pp. 65–69. [Online]. Available: http://www.nature.com/articles/s41591-018-0268-3
[5] W. Cai and D. Hu, “Qrs complex detection using novel deep learning neural networks,” IEEE Access, vol. 8, pp. 97 082–97 089, 2020.
[6] K. Zhao, Y. Li, G. Wang, Y. Pu, and Y. Lian, “A robust qrs detection and accurate r-peak identification algorithm for wearable ecg sensors,” Science China Information Sciences, vol. 64, no. 8, pp. 1–17, 2021.
[7] C.-L. Chen and C.-T. Chuang, “A qrs detection and r point recognition method for wearable single-lead ecg devices,” Sensors, vol. 17, no. 9, p. 1969, 2017.
[8] X. Tang, Q. Hu, and W. Tang, “A real-time qrs detection system with pr/rt interval and st segment measurements for wearable ecg sensors using parallel delta modulators,” IEEE transactions on biomedical circuits and systems, vol. 12, no. 4, pp. 751–761, 2018.
[9] D. Berwal, A. Kumar, and Y. Kumar, “Design of high performance qrs complex detector for wearable healthcare devices using biorthogonal spline wavelet transform,” ISA transactions, vol. 81, pp. 222–230, 2018.
[10] R. Reed and R. J. MarksII, Neural smithing: supervised learning in feedforward artificial neural networks. Mit Press, 1999.
[11] C. M. Bishop et al., Neural networks for pattern recognition. Oxford university press, 1995.
[12] D. S. Chen and R. C. Jain, “A robust backpropagation learning algorithm for function approximation,” IEEE Transactions on Neural Networks, vol. 5, no. 3, pp. 467–479, 1994.
[13] M. L. Ahlstrom and W. J. Tompkins, “Automated high-speed analysis of holter tapes with microcomputers,” IEEE Transactions on Biomedical Engineering, no. 10, pp. 651–657, 1983.
[14] H. Baharestani, W. Tompkins, J. Webster, and R. Mazess, “Heart rate recorder,” Medical and Biological Engineering and Computing, vol. 17, no. 6, pp. 719–723, 1979.
[15] P. O. Borjesson, O. Pahlm, L. Sornmo, and M.-E. Nygards, “Adaptive qrs detection based on maximum a posteriori estimation,” IEEE Transactions on Biomedical Engineering, no. 5, pp. 341–351, 1982.
[16] V. X. Afonso, W. J. Tompkins, T. Q. Nguyen, and S. Luo, “Ecg beat detection using filter banks,” IEEE transactions on biomedical engineering, vol. 46, no. 2, pp. 192–202, 1999.
[17] M. Bahoura, M. Hassani, and M. Hubin, “Dsp implementation of wavelet transform for real time ecg wave forms detection and heart rate analysis,” Computer methods and programs in biomedicine, vol. 52, no. 1, pp. 35–44, 1997.
[18] Y. Xiang, Z. Lin, and J. Meng, “Automatic qrs complex detection using two-level convolutional neural network,” Biomedical engineering online, vol. 17, no. 1, pp. 1–17, 2018.
[19] D. De Bacquer, G. De Backer, M. Kornitzer, and H. Blackburn, “Prognostic value of ecg findings for total, cardiovascular disease, and coronary heart disease death in men and women,” Heart, vol. 80, no. 6, pp. 570–577, 1998.
[20] E. Vallès, V. Bazan, and F. E. Marchlinski, “Ecg criteria to identify epicardial ventricular tachycardia in nonischemic cardiomyopathy,” Circulation: Arrhythmia and Electrophysiology, vol. 3, no. 1, pp. 63–71, 2010.
[21] E. J. d. S. Luz, W. R. Schwartz, G. Cámara-Chávez, and D. Menotti, “Ecg-based heartbeat classification for arrhythmia detection: A survey,” Computer methods and programs in biomedicine, vol. 127, pp. 144–164, 2016.
[22] L. Guo, G. Sim, and B. Matuszewski, “Inter-patient ecg classification with convolutional and recurrent neural networks,” Biocybernetics and Biomedical Engineering, vol. 39, no. 3, pp. 868–879, 2019.
[23] B.-U. Kohler, C. Hennig, and R. Orglmeister, “The principles of software qrs detection,” IEEE Engineering in Medicine and biology Magazine, vol. 21, no. 1, pp. 42–57, 2002.
[24] S. Mehta and N. Lingayat, “Svm-based algorithm for recognition of qrs complexes in electrocardiogram,” Irbm, vol. 29, no. 5, pp. 310–317, 2008.
[25] M. Kropf, D. Hayn, and G. Schreier, “Ecg classification based on time and frequency domain features using random forests,” in 2017 Computing in Cardiology (CinC). IEEE, 2017, pp. 1–4.
[26] I. Saini, D. Singh, and A. Khosla, “Qrs detection using k-nearest neighbor algorithm (knn) and evaluation on standard ecg databases,” Journal of advanced research, vol. 4, no. 4, pp. 331–344, 2013.
[27] M. Jia, F. Li, Z. Chen, X. Xiang, and X. Yan, “High noise tolerant r-peak detection method based on deep convolution neural network,” IEICE Transactions on Information and Systems, vol. 102, no. 11, pp. 2272–2275, 2019.
[28] W. Liu, X. Wang, H. Gao, C. Yang, J. Li, and C. Liu, “An octave convolution neural network-based qrs detector,” in 2020 International Conference on Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD). IEEE, 2020, pp. 413–418.
[29] M. Šarlija, F. Jurišić, and S. Popović, “A convolutional neural network based approach to qrs detection,” in Proceedings of the 10th international symposium on image and signal processing and analysis. IEEE, 2017, pp. 121–125.
[30] J. S. Lee, M. Seo, S. W. Kim, and M. Choi, “Fetal QRS detection based on convolutional neural networks in noninvasive fetal electrocardiogram,” vol. 4, pp. 75–78.
[31] J. S. Lee, S. J. Lee, M. Choi, M. Seo, and S. W. Kim, “Qrs detection method based on fully convolutional networks for capacitive electrocardiogram,” Expert Systems with Applications, vol. 134, pp. 66–78, 2019.
[32] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” circulation, vol. 101, no. 23, pp. e215–e220, 2000.
[33] G. B. Moody and R. G. Mark, “The impact of the mit-bih arrhythmia database,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 45–50, 2001.
[34] P. Laguna, R. G. Mark, A. Goldberg, and G. B. Moody, “A database for evaluation of algorithms for measurement of qt and other waveform intervals in the ecg,” in Computers in cardiology 1997. IEEE, 1997, pp. 673–676.
[35] A. Taddei, G. Distante, M. Emdin, P. Pisani, G. Moody, C. Zeelenberg, and C. Marchesi, “The european st-t database: standard for evaluating systems for the analysis of st-t changes in ambulatory electrocardiography,” European heart journal, vol. 13, no. 9, pp. 1164–1172, 1992.
[36] G. B. Moody, W. Muldrow, and R. G. Mark, “A noise stress test for arrhythmia detectors,” Computers in cardiology, vol. 11, no. 3, pp. 381–384, 1984.
[37] S. D. Greenwald, R. S. Patil, and R. G. Mark, Improved detection and classification of arrhythmias in noise-corrupted electrocardiograms using contextual information. IEEE, 1990.
[38] E. Ajdaraga and M. Gusev, “Analysis of sampling frequency and resolution in ecg signals,” in 2017 25th Telecommunication Forum (TELFOR). IEEE, 2017, pp. 1–4.
[39] J. G. Proakis, Digital signal processing: principles algorithms and applications. Pearson Education India, 2001.
[40] B. S. Chandra, C. S. Sastry, and S. Jana, “Robust heartbeat detection from multimodal data via cnn-based generalizable information fusion,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 3, pp. 710–717, 2018.
[41] D. M. Pelt and J. A. Sethian, “A mixed-scale dense convolutional neural network for image analysis,” vol. 115, no. 2, pp. 254–259. [Online]. Available: http://www.pnas.org/lookup/doi/10.1073/pnas.1715832114
[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[43] J. Laitala, M. Jiang, E. Syrjälä, E. K. Naeini, A. Airola, A. M. Rahmani, N. D. Dutt, and P. Liljeberg, “Robust ecg r-peak detection using lstm,” in Proceedings of the 35th annual ACM symposium on applied computing, 2020, pp. 1104–1111.
[44] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. [Online]. Available: http://arxiv.org/abs/1409.1556
[45] F. Seide, G. Li, and D. Yu, “Conversational speech transcription using context-dependent deep neural networks,” in Twelfth annual conference of the international speech communication association, 2011.
[46] A. Habib, C. Karmakar, and J. Yearwood, “Impact of ecg dataset diversity on generalization of cnn model for detecting qrs complex,” IEEE access, vol. 7, pp. 93 275–93 285, 2019.