SAIA: Split Artificial Intelligence Architecture for Mobile Healthcare Systems

Di Zhuang, Nam Nguyen, Keyu Chen, and J. Morris Chang
{dizhuang, namnguyen2, keyu, chang5}@usf.edu
Department of Electrical Engineering, University of South Florida, Tampa, FL 33620

Abstract

As the advancement of deep learning (DL), the Internet of Things and cloud computing techniques for biomedical and healthcare problems, mobile healthcare systems have received unprecedented attention. Since DL techniques usually require enormous amount of computation, most of them cannot be directly deployed on the resource-constrained mobile and IoT devices. Hence, most of the mobile healthcare systems leverage the cloud computing infrastructure, where the data collected by the mobile and IoT devices would be transmitted to the cloud computing platforms for analysis. However, in the contested environments, relying on the cloud might not be practical at all times. For instance, the satellite communication might be denied or disrupted. We propose SAIA, a Split Artificial Intelligence Architecture for mobile healthcare systems. Unlike traditional approaches for artificial intelligence (AI) which solely exploits the computational power of the cloud server, SAIA could not only relies on the cloud computing infrastructure while the wireless communication is available, but also utilizes the lightweight AI solutions that work locally on the client side, hence, it can work even when the communication is impeded. In SAIA, we propose a meta-information based decision unit, that could tune whether a sample captured by the client should be operated by the embedded AI (i.e., keeping on the client) or the networked AI (i.e., sending to the server), under different conditions. In our experimental evaluation, extensive experiments have been conducted on two popular healthcare datasets. Our results show that SAIA consistently outperforms its baselines in terms of both effectiveness and efficiency.

keywords:

Split Artificial Intelligence; Mobile Healthcare System; Internet of Things; Algorithm Selection; Deep Learning; Machine Learning; Fusion; Skin Lesion; Nail Fungus; Onychomycosis; Embedded AI; Networked AI; Decision Unit; Data Pre-processing; Resource-constrained.

^†^†journal: Knowledge-based Systems

1 Introduction

As the advancement of modern technologies, such as wireless communication, data mining, machine learning, the Internet of Things (IoT), cloud computing and edge computing, the mobile healthcare systems become more and more feasible and popular. Numerous intelligent mobile healthcare systems are developed on various mobile and IoT devices [1]. The emergence and breakthrough of deep learning, that has been shown to achieve extraordinary results in a variety of real-world applications, such as skin lesion analysis [2], active authentication [3], facial recognition [4, 5], botnet detection [6, 7] and community detection [8], is one of the primary driver for such mobile healthcare systems. However, since the deep learning techniques require enormous amount of computation resources, most of them cannot be directly deployed on the resource-constrained mobile and IoT devices.

One common solution to tackle such problem is cloud computing, where the data could be transmitted to the cloud computing platforms for operations. For instance, several Machine Learning as a Service (MLaS) systems were introduced in the recent years (e.g., Google Cloud AutoML [9] and Amazon SageMaker [10]). These systems were mostly intended to utilize the high computational power of cloud servers, for ML applications, in addition to enable scalability in the cloud (horizontal scaling). However, in contested environments, relying on the server to generate actionable intelligence might not be practical at all times. For instance, the satellite communication might be denied or disrupted. For such situations, the mobile and IoT devices have to be enabled to generate actionable intelligence that might be required for the success of certain operations (i.e., providing healthcare services). Hence, it is imperative to design a Split Artificial Intelligence Architecture (SAIA), unlike the traditional AI architecture, that can not only exploit the computational power of the server, but also utilize lightweight AI solutions that work locally on the mobile or IoT devices.

Designing an effective and efficient SAIA system has to meet several challenging requirements. First, the client side (i.e., mobile or IoT devices) should have lightweight (in terms of storage size, power consumption and inference time) AI solutions, that could provide fundamental services (i.e., acceptable classification precision for certain classes or subsets of data) even when the satellite communication is denied or disrupted. Second, the server side (i.e., cloud server) should have complex “full-sized” AI solutions, that could provide the state-of-the-art performance on the selected applications. Third, the usage of AI solutions (on the whole or subset of the data) shifting between the client side and the server side should depend on the application precision requirement, the resource availability and the data characteristics, and such trade-off should be able to be optimized. Last but not least, the adjustment of AI usage between the client and the server should be efficient and intelligent. For instance, if the lightweight AI is able to recognize the class of given data (w.h.p.), the data should not be sent to the server, even when the communication is unimpeded.

To date, a few approaches have been proposed to tackle the problem of running deep learning techniques on the mobile and IoT devices. For instance, Knowledge Distillation (KD) [11, 12, 13] has been proposed to compress a model by teaching a simplified student DNN model, step by step, exactly what to do using a complex pre-trained teacher DNN model, and then deploy the student DNN model on the mobile devices [14]. Although KD could dramatically reduce the complexity of the student model, the overall performance of a student model still would be as good as its teacher model. Moreover, solely deploying a lightweight model on the client side loses the chance and advantage of using a more advanced model on the server side, that could be the ensemble/fusion of several well-trained DNN models. Split-DNN architectures [15, 16, 17] have also been proposed to offload the execution of complex DNN models to compute-capable servers from the mobile and IoT devices, where a DNN is split into head and tail sections, deployed at the client side and the server side, respectively. Matsubara et al. [18] proposes a KD-based Split-DNN framework to reduce the communication cost between the client and the server. However, such approaches usually cannot fully rely on the client-side model, thus unable to work if the communication is impeded. To summarize, KD and split-DNN focus on either deploying lightweight models on the client side or pushing the most of the DNN computation to the server side in an efficient fashion. However, none of the existing approaches could adjust the AI usage on between the client and the server depending on the device’s condition (e.g., storage size, power consumption and communication bandwidth).

In this paper, we propose SAIA, a Split Artificial Intelligence Architecture for mobile healthcare systems. SAIA enables the client to produce actionable intelligence locally using its embedded AI unit (e.g., conventional ML classifiers). When the satellite communication is available, the reduced feature data (or compressed raw data) could be uploaded to the server, and be processed by the networked AI units that utilize more powerful AI algorithms (e.g., ensemble of multiple advanced DNN classifiers); thus generating more confident and detailed AI results. The embedded AI client might need communication with the server if the confidence score of the decision is below a certain threshold, or periodically, when the satellite communication is available, to generate more confident and detailed AI results using the more powerful networked AI on the server side. In SAIA, we also propose a decision unit that trains on the meta-information (e.g., soft labels) outputted by the embedded AI and is deployed on the client side to decide and control whether a sample captured by the client should be operated on the client side or sent to the server side. We also enable the decision unit to utilize a parameter, namely $\epsilon$ , to tune the criteria of how much data could be sent to the networked AI. As such, our SAIA framework could work under different conditions (e.g., unimpeded communication bandwidth or satellite communication is denied or disrupted).

In the experimental evaluation, we trained three conventional machine learning models (i.e., SVM, RF and DART) for the embedded AI and an ensemble of twelve advanced DNN models for the networked AI, on two popular healthcare benchmark datasets: the ISIC research dataset for skin image analysis [19, 20, 21] and the onychomycosis (a.k.a. Nail fungus) dataset [22]. Our experimental results show that our SAIA framework is effective and efficient while switching the computation between the embedded AI and the networked AI. Also, our design of SAIA’s decision unit consistently outperforms its baseline (i.e., randomly selected sending) in terms of both effectiveness and efficiency.

To summarize, our work has the following contributions:

$\bullet$ We present SAIA, a novel, effective and efficient split artificial intelligence architecture. To the best of our knowledge, this is the first work to apply split artificial intelligence architecture in mobile healthcare systems.

$\bullet$ In SAIA, we propose a meta-information based decision unit, that could tune whether a sample captured by the client should be operated by the embedded AI or the networked AI, under different conditions.

$\bullet$ A comprehensive experimental evaluation on two large scale healthcare datasets has been conducted. We have implemented three popular conventional MLs as the embedded AI, and utilized an ensemble of twelve advanced DNN classifiers as the networked AI. For the sake of reproducibility and convenience of future studies about split artificial intelligence architecture, we have released our prototype implementation of SAIA, information regarding the experiment datasets and the code of our evaluation experiments.¹¹1https://tinyurl.com/y92epzfd

The rest of this paper is organized as follows: Section 2 presents SAIA, including the design of the embedded AI, the networked AI and the decision unit. Section 3 presents the experimental evaluation. Section 4 presents the related literature review. Section 5 concludes.

Refer to caption — Figure 1: The Overview of SAIA Framework.

2 Methodology

2.1 SAIA Framework Overview

Our proposed Split Artificial Intelligence Architecture (SAIA), as shown in Fig. 1, consists of four components (i.e., the data pre-processing interface, the embedded AI, the networked AI and the decision unit) that work synergistically between the client side and the server side. For each use case, SAIA has two phases: preparation and operation. In the preparation phase, four components would be prepared and trained accordingly: (i) the data pre-processing interface (including objection detection, semantic segmentation and feature extraction) (Section 2.2); (ii) the embedded AI contains certain lightweight classification classifier(s) (Section 2.3); (iii) the networked AI trains a multi-classifier fusion of several advanced DNN classifiers (Section 2.4); and (iv) the decision unit is a lightweight ML classifier that trains on a set of labeled meta data (Section 2.5).

In the operation phase, (i) the client (i.e., mobile or IoT devices) receives the data, passes the data through the data pre-processing interface (including objection detection, semantic segmentation and feature extraction); (ii) the client evaluates the data on the embedded AI (i.e., the lightweight classifier(s)), and produces the unlabeled meta data accordingly; (iii) the decision unit (DU) evaluates the meta data, if DU decides to keep the data on the client side, the testing result of the embedded AI is returned, otherwise, the data would be sent to the networked AI for further evaluation. Below presents the details about the design of each components.

2.2 Data Pre-processing Interface

In this component, we design and implement a set of objection detection, semantic segmentation and feature extraction algorithms that could fit on various image-based healthcare applications.

Objection detection: Since the medical images captured by the mobile and IoT devices usually contain complex background, it is of vital importance to separate the region-of-interest (ROI) from the background. We investigate two fast object detection approaches: (i) Faster R-CNN [23] and (ii) Single Shot Detector (SSD) [24]. We also conducted a preliminary experiment using both object detection approaches on the onychomycosis dataset [22] (more details about the dataset are described in Section 3.2), where we annotated 2,000 images that each contains a full-hand, and then trained and applied Faster R-CNN and SSD on the annotated images. In the result, Faster R-CNN obtains in a Jaccard Index of 99.6 in comparison with 98.2 obtained from SSD. Therefore, we decided to apply Faster R-CNN on the onychomycosis dataset (Section 3.2).

Semantic segmentation: We utilize Otsu’s thresholding segmentation algorithm [25] for the image semantic segmentation. By finding the optimal threshold from the histogram of pixel counts, the algorithm isolates the region-of-interest (ROI) (e.g., objects) from the complex backgrounds. For instance, it could separate the skin lesion from the normal skin and artifacts (hairs, badges and black borders). Compared with the other semantic segmentation approaches, such as U-Net CNN [26], Otsu’s thresholding segmentation algorithm not only has been shown to be effective in many medical image segmentation tasks [27, 28, 29], but also is more efficient in terms of storage usage, energy consumption and inference time while deploying on mobile and IoT devices.

Feature extraction: We design and apply different sets of feature extraction techniques to different healthcare applications. For instance, in skin lesion detection, melanoma (i.e., cancerous skin lesion) usually proliferates asymmetrically then appears as irregular shapes. Hence, the derived segmentation maps and corresponding gray-scale images are used to extract 9 structural features, including: asymmetric index [30], eccentricity, perimeter, max/min/mean intensity, solidity, compactness and circularity [31] . Furthermore, color variations are also effective characteristics to distinguish different types of skin lesions. Instead of using convention RGB color space, we employ CIELUV color space that enables us to well perceive the differences in colors. Besides, LUV color space also decouples the chromaticity (UV) and luminance (L), which yields invariant features in respect to the light condition. Instead of taking only statistics (mean, standard deviation, skewness and kurtosis) from LUV histogram [32], we utilize the whole distribution of colors , which are separated into 3 channels (i.e., L, U and V). As a result, $3\times 255$ color features would be generated from three normalized histograms. Moreover, we observe that the texture of lesions can also distinguish skin lesion types. Hence, local binary patterns (LBP) [33] analysis is applied to capture the textured information. We investigated several sets of radius and number of surrounding points then observed that using radius of 3 and 8 neighboring points yields the best performance of Embedded-AI with 26 textured features from each normalized LBP histogram. On the other hand, since the onychomycosis detection does not appear to be “shape-sensitive”, we only adopt the color-based features (i.e., LUV) and texture features (i.e., LBP) to it.

2.3 Client-side Embedded AI

Since the client device has limited computational power, and limited battery life, we intend to utilize lightweight ML algorithms. The embedded AI solutions will be used to generate initial classification results; thus enabling embedded artificial intelligence. The use of lightweight algorithms would decrease the burden on battery life; which enables the operator’s equipment to last longer in contested environments. These algorithms also require less computational power, which results in producing intelligence in a more timely manner (than more complex algorithms).

These lightweight classification algorithms could be distance-based algorithms that are based on Euclidean or Manhattan distances, or it could be logistic regression. In both cases, the computation would be linear in the number of (reduced) features; and do not involve multiple layers of computations; thus, providing more timely intelligent results and consuming less power. Examples of such algorithms include:

Decision Tree (DT): DT [34] is a non-linear classifier. It is rule-based learning method that would construct a tree where the leaves represent class labels, and the branches represent conjunctions of features that lead to those class labels. The tree structure would depend on the algorithm and data used to generate it, but in certain situations, it might be lightweight and suitable for our embedded AI (e.g. if it was of linear complexity). The advantage is that it can handle non-linearly separable data (better than Logistic Regression).

Random Forests (RF): Decision tree classifier [34] usually yields high-variance and low-bias results, thus bagging (or bootstrap aggregation) is a remedy for such issues. RF [35] is a large collection of de-correlated trees, which are aggregated by taking their averages. Besides, trees generated in bagging is identically distributed, the expected value from this bagging set of trees is the same as the expectation of any tree in this set. Thus, variance reduction is only remedy of improvement.

Support Vector Machine (SVM): SVM [36] is a very popular non-linear classifier. It is a maximal margin classifier, meaning it tries to find a separating hyperplane that maximizes the margin between the different classes (while logistic regression, for example, tries to find any separating boundary). Using the Kernel trick, Kernel SVM can effectively discriminate between non-linearly separable classes, without incurring the cost of explicitly transforming the data to higher dimensions. A sample Kernel function is the RBF Kernel:

K(x,y)=exp\left(-\frac{\|x-y\|^{2}}{2\sigma^{2}}\right)

(1)

Kernel SVM requires storing a number of the training samples, called support vectors. Let this number be $n<<N$ , and it is much smaller than the total number of training samples $N$ . This means that Kernel SVM would have space and time complexity of $O(n\cdot M)$ . While Kernel SVM is very useful in many applications, its training might not scale well with datasets that have large number of training samples (beyond tens of thousands).

Dropouts meet Multiple Additive Regression Trees (DART): DART [37] is an evolution of gradient boosting machine, that adopts the dropouts for regularization (preventing over-fitting) from deep neural networks. Boosted trees with XGboost [38] is one of the most well-performed learning structure, which results in a great number of winning solutions of data science competitions [39]. Apart from classification, boosting tree can be used in a wide range of problem such as regularized regression (Ridge and Lasso) [40], quantile regression [41] or survival analysis [42, 43]. Motivated by systems optimization and fundamental principle of machine learning, XGboost [38] is an efficient and flexible library with implementation of parallel tree boosting which enables fast and accurate results.

In our preliminary experiments, DART [37] outperforms the conventional multiple additive regression tree (MART) [44] and AdaBoost [45] in both datasets (i.e., skin lesion [46, 20, 47] and onychomycosis dataset [22]) in terms of both training time and accuracy. All the implementations regarding DART [37] in this work utilize both XGboost library for Python 3 [48] and Microsoft’s LightGBM framework [49] for gradient boosting machine.

2.4 Server-side Networked AI

Since the embedded AI algorithms might not handle non-linearly separable classification problems well, we aim to use more powerful algorithms in the networked AI. Such algorithms would include various advanced DNNs, and we design it as a multi-classifier fusion of those classifiers. These algorithms are more computationally intensive; but they can produce more accurate results; thus, providing more confident intelligence. This type of computation can be facilitated on the server side by using state-of-the-art big data technologies. Below presents the details of our multi-classifier fusion approach.

2.4.1 Multi-classifier Fusion

In multi-classifier fusion, we define a classification space, as shown in Figure 2, where there are $m$ classes and $k$ classifiers. Let $\mathcal{M}=\{M_{1},M_{2},\dots,M_{k}\}$ denote the set of base classifiers and $\mathcal{C}=\{C_{1},C_{2},\dots,C_{m}\}$ denote the set of classes. Let $p^{m}_{kj}$ denote the posterior probability of given sample $j$ identified by classifier $M_{k}$ as belonging to class $C_{m}$ , where $P_{kj}=\{p^{1}_{kj},p^{2}_{kj},\dots,p^{m}_{kj}\}$ and $\sum_{l=1}^{m}p^{l}_{kj}=1$ . Hence, all the posterior probabilities form a $k\times m$ decision matrix as follows:

P_{j}=\begin{bmatrix}p^{1}_{1j}&p^{2}_{1j}&\cdots&p^{m}_{1j}\\ p^{1}_{2j}&p^{2}_{2j}&\cdots&p^{m}_{2j}\\ \vdots&\vdots&\ddots&\vdots\\ p^{1}_{kj}&p^{2}_{kj}&\cdots&p^{m}_{kj}\\ \end{bmatrix}

(2)

Since the importance of different classifiers might be different, we assign a wight $w_{i}$ to the decision vector (i.e., posterior probabilities vector) of each classifier $C_{i}$ , where $i\in\{1,2,\dots,k\}$ . Let $P_{m}(j)$ denote the sum of the posterior probabilities, that sample $j$ belonging to class $m$ , of all the classifiers. Then, we have

P_{m}(j)=\sum_{i=1}^{k}w_{i}\cdot p^{m}_{ij}

(3)

The final decision (i.e., class) $D(j)$ of sample $j$ is determined by the maximum posterior probabilities sum:

D(j)=\textit{$\underset{i}{max}\ P_{i}(j)$},\ i\in\{1,2,\dots,m\}

(4)

In our networked AI, we adopt the average fusion strategy in the multi-classifier fusion, where all the classifiers use the same static weight as $\frac{1}{k}$ .

2.5 Split Artificial Intelligence Decision Unit

The core component of our proposed SAIA framework is the decision unit component, which controls whether a client-side captured sample (e.g., image) would be sent to the networked AI, or would be processed by the embedded AI. We adopt a meta-information based algorithm selection approach in the design of our decision unit component. In the training phase, we (i) use a set of meta-information generation samples (apart from the training/testing samples of the embedded AI and the networked AI) to generate a set of meta-information (e.g., various features directly extracted from each sample, and the soft predicted probabilities by the embedded AI for each sample), (ii) use our customized decision rule to generate the true label (i.e., “kept for the embedded AI” or “sent to the networked AI”) of each sample, and (iii) use the meta-information and true labels of those samples to train a lightweight binary classifier as the decision unit. In the testing phase, our framework extracts the same set of meta-information from each testing sample, and tests it through the pre-trained decision unit to determine whether sending the sample to the server or not.

To be simplified, we use the soft predicted probabilities provided by the embedded AI as the meta-information, and use gradient boosted trees to build the decision unit classifier. In this work, we adopt a basic decision rule: (i) if communication resources are available, we will send the meta-information of given sample to the decision unit, and then, if the embedded AI and the networked AI produce different predicted results (e.g., classes of a healthcare application) and the networked AI is correct, the sample will be sent to the server (i.e., using the networked AI), otherwise, it will be kept on the client (i.e., using the embedded AI); (ii) if the communication resources are not available, we will keep everything on the client (i.e., using the embedded AI).

Other than just using a “yes or no” binary decision, we also design the decision unit to utilize a parameter, namely $\epsilon$ , to tune the criteria of how much data could be sent to the networked AI. To be specific, we propose a weighted loss function, where the gradients of the samples that should be “sent to the networked AI” are scaled by a parameter $\epsilon$ . In the binary classification problem of the decision unit, let us denote the samples that should be “sent to the networked AI” as the positive class, i.e., $(y_{i}=1)$ , and denote the samples that should be “kept for the embedded AI” as the negative class, i.e., $(y_{i}=0)$ .

Then, the objective function of the gradient boosted trees at iteration $t$ is optimized by the simplified second-order approximation [38] of the original loss function, which is defined as below:

\begin{split}L^{(t)}\approx&\sum_{i=1}^{n}S(y_{i})\cdot\bigg{[}l(y_{i},\hat{y}^{(t-1)})\cdot g_{i}\cdot f_{t}(x_{i})\\ &+\frac{1}{2}\cdot h_{i}\cdot f_{t}^{2}(x_{i})\bigg{]}+\Omega(f_{t})\end{split}

(5)

where $l(y_{i},\hat{y}^{(t-1)})$ is the cross-entropy loss function, $g_{i}=\partial_{\hat{y}^{(t-1)}}l(y_{i},\hat{y}^{(t-1)})$ and $h_{i}=\partial^{2}_{\hat{y}^{(t-1)}}l(y_{i},\hat{y}^{(t-1)})$ are the gradient and hessian statistic of the loss function and $\Omega(f_{t})$ is the penalized term. Our customized function $S(y_{i})$ is defined as below:

S(y_{i})=\begin{cases}\epsilon,&\mbox{if }y_{i}=1\\ 1,&\mbox{if }y_{i}=0\end{cases}

(6)

where $\epsilon$ is a predefined hyperparameter that could be adjusted to increase/decrease the expected true positive rate of the decision unit, so that to optimize the amount of data that should be sent to the server.

3 Experimental Evaluation

3.1 Experiment Environment

We implemented our embedded AI on a Google Pixel 4 XL smartphone that has a Qualcomm Snapdragon 855 chip-set, 6GB RAM and Android 10.0 OS. Our network-AI was implemented and performed on a server with Intel $\circledR$ Core^TM [email protected] CPU, 128GB RAM and 4 GTX 1080Ti 11GB GPUs.

3.2 Experiment Datasets

We investigated two popular benchmark healthcare image datasets in our experimental evaluation: (i) International Skin Imaging Collaboration Challenge 2019 (ISIC 2019) [46, 20, 47] and (ii) Onychomycosis dataset [22]. ISIC 2019 has training and testing sets with overall 33,569 images. Since the ground truth of the testing data was not available, we only employed its original training data in our evaluation. It contains 25,331 images of 8 skin lesion diseases (i.e., 8 classes): melanoma (4,522), melanocytic nevus (12,875), basal cell carcinoma (3,323), actinic keratosis (876), benign keratosis (2,624) dermatofibroma (239), vascular lesion (253) and squamous cell carcinoma (628). We randomly split 80%, 5% and 15% as training data, meta-information data and testing data, respectively. In order to have enough data to train the decision unit and base classifiers, data augmentation was applied to enlarge the training data and meta-information data by performing different rotation degrees (i.e., 90, 180 and 270), horizontal flipping and combinations of both. Thus, the training data and meta-information data became 81,020 and 10,400 in total, respectively. Regarding onychomycosis dataset, which contains 53,794 region-of-interest extracted abnormal (34,014) and normal (19,780) fingernail images. We split it into training/meta-information/testing by the ratio of 70%/10%/20%, respectively. Due to enough number of samples, we only performed horizontal flipping on the meta-information dataset. Note that our embedded AI and networked AI were trained on both original and augmented images.

3.3 Embedded/Networked AI and Decision Unit Preparation

Embedded AI. In the preparation of the Embedded AI, for each dataset, we trained three conventional machine learning classifiers (i.e., SVM, RF and DART). To figure out the optimal set of hyperparameters for each classifier, we performed 5-fold cross validation for each classifiers. Table 1 shows the performance (i.e., accuracy) of three classifiers on two datasets, where with the optimal settings, DART classifier outperforms SVM and RF classifiers on both datasets. Furthermore, while applying One-vs-All strategy for training SVM and RF classifiers in skin lesion dataset that has 8 classes, using SVM and RF result in considerably much larger model size than using DART, that consumes more storage space of the mobile and IoT devices. Also, since DART uses softmax function as the objective function, DART classifier directly provides the soft predicted probabilities of each sample, which would be taken as the meta-information to train our decision unit (as described in Section 2.5). Therefore, we decided to deploy the DART classifier as the embedded AI for both datasets in the rest of our experiments.

Table 1: The performance (accuracy in %) of conventional machine learning classifiers (i.e., the embedded AI) on Skin Lesion and Onychomycosis datasets.

Embedded-AI Models	Skin Lesion	Onychomycosis
DART [37]	75.89	78.61
SVM [36]	68.90	72.77
RF [35]	65.04	70.07

Networked AI We evaluated twelve different CNN architectures (as shown in Table 2) on the server side with pre-trained weights on ImageNet [50]. Different networks expect different input sizes: 331 $\times$ 331 for PNASNet-5-Large and NASNet-A-Large; 320 $\times$ 320 for ResNeXt101-32 $\times$ 16d; 299 $\times$ 299 for InceptionResNet-V2, Xception, Inception-V4 and Inception-V3; 224 $\times$ 224 for SENet154, SE-ResneXt101-32, EfficientNet-B7, Dual Path Net-107 $\times$ 4d and ResNet152. All the networks were fine-tuned in Pytorch, using SGD optimizer with learning rate 0.001 (degraded after 20 epochs by 0.1) and momentum 0.9. We stopped the training process either in 40 epochs or the validation accuracy failed to improve for over 7 consecutive epochs. To keep the same batch size 32 in each evaluation, and due to the memory constraint of single GPU, certain networks were trained parallelly with multiple GPUs: PNASNet-5-Large (4), NASNet-A-Large (4), ResNext101-32 $\times$ 16d (4), SENet154 (2), EfficientNet-B7 (2) and Dual Path Net-107 (2). The performance result of each base CNN classifier on each dataset has been shown in Table 2. As described in Section 2.4, we utilize the multi-classifier fusion of those twelve advanced CNN architectures as our networked AI aiming to provide the SOTA performance of each dataset/application.

Table 2: The performance (accuracy in %) of the base classifiers of 12 CNN architectures on Skin Lesion and Onychomycosis datasets.

Networked-AI Models	Skin Lesion	Onychomycosis
SENet154 [51]	88.00	92.06
PNASNet-5-Large [52]	87.87	92.36
NASNet-A-Large [53]	87.79	91.82
ResNeXt101-32 $\times$ 16d [54]	87.76	91.69
SE-ResneXt101-32 $\times$ 4d [51]	87.55	91.99
InceptionResNet-V2 [55]	87.53	91.59
Xception [56]	87.18	91.65
EfficientNet-B7 [57]	86.78	92.36
Dual Path Net-107 [58]	86.23	91.79
Inception-V4 [55]	85.99	92.02
Inception-V3 [59]	85.41	91.68
ResNet152 [60]	84.00	92.28

Decision Unit As presented in Section 2.5, we prepared the decision unit for both datasets accordingly. To evaluate the effectiveness of the tuning hyperparameter $\epsilon$ , we use a discrete set of integers ranging from $0$ (i.e., meaning no decision unit deployed) to $100$ for $\epsilon$ .

3.4 Effectiveness Analysis

In this section, we compare the effectiveness of our proposed SAIA framework with three baselines: only using the embedded AI, only using the networked AI, and SAIA but with a randomized decision unit, that randomly determines whether sending a given sample to the server-side or not. All the experiments that utilized the randomized decision unit are performed 100 times and evaluated using the averaged results. Fig. 3a and Fig. 4a show that as we increase the value of $\epsilon$ , more and more data would be sent to the networked AI. Also, the curves quickly converges as the $\epsilon$ increasing. Thus, one can tune $\epsilon$ based on the communication resource available to adjust how much data to be sent to the server-side for processing.

Fig. 3b and Fig. 4b illustrate that as we increase the value of $\epsilon$ , the accuracies obtained by SAIA for both datasets are also increasing and quickly converged to the SOTA accuracy achieved by the networked AI. For instance, as shown in Fig. 3b, while $\epsilon=25$ , SAIA achieves nearly the same accuracy as the networked AI (i.e., 90% vs. 90.6%), but only sends 70% of the samples to the networked AI (Fig. 3a). In Fig. 4b, while $\epsilon=17$ , SAIA achieves exactly the same accuracy as the networked AI (i.e., 93.2%), but only sends around 80% of the samples to the networked AI (Fig. 4a).

Fig. 3c and Fig. 4c present the comparison among our proposed SAIA, SAIA with a randomized decision unit, only using the embedded AI and only using the networked AI. We observed that (i) our proposed SAIA consistently outperforms the SAIA with a randomized decision unit (other than while keeping all the data at the embedded AI or the network AI); (ii) as sending more samples to the server, compared with the SAIA with a randomized decision unit, the accuracy of our proposed SAIA framework quickly converges to the accuracy of the networked AI (i.e., while sending 75% samples of the skin lesion dataset, and while sending 80% samples of the onychomycosis dataset); (iii) while sending nearly half of the samples to the server, our proposed SAIA obtains the highest accuracy advantage over the SAIA with a randomized decision unit, e.g., as shown in Fig. 3c, while $\epsilon=10$ , almost over half of the skin lesion samples (51%) were sent to the server, and the difference of the accuracies is 88.87% (ours) vs. 83.95% (randomized), and as shown in Fig. 4c, while $\epsilon=5$ , 53% of the onychomycosis samples were sent to the server, and the difference of the accuracies is 92% (ours) vs. 86% (randomized). To summarize, our proposed SAIA could control how much data to be sent to the server based on the environment and accuracy requirement. Our framework could also achieve the same accuracy as processing on the server side, while sending much less data to the server side.

3.5 The Performance of Decision Unit and the Effectiveness of the Hyperparameter $\epsilon$

In this section, we evaluate the performance (i.e., accuracy) of our proposed decision unit (i.e., a lightweight binary classifier), and the effectiveness of the hyperparameter $\epsilon$ while influencing the performance of the decision unit. Fig. 5 and Fig. 6 illustrate the confusion matrices of the decision units while applying different $\epsilon$ for the skin lesion and onychomycosis datasets respectively. As we increase $\epsilon$ , more samples that suppose to be sent to the server are determined by the decision unit to be sent to the server (i.e. the true positive rate increases). For instance, in Fig. 5, as $\epsilon$ increasing from $3$ to $10$ , the ratio of data supposed to be sent to the server being sent to the server changed from 0.63 to 0.88. On the other hand, as $\epsilon$ increasing, more samples that suppose to be kept on the client are also determined by the decision unit to be sent to the server (i.e. the false positive rate also increases). For instance, in Fig. 5, as $\epsilon$ increasing from $3$ to $10$ , the ratio of data supposed to be kept on the client being sent to the server changed from 0.26 to 0.44. Fig. 6 presents the same patterns. However, as illustrated in Fig. 8 and Fig. 9, as we increase $\epsilon$ , even though both TPR and FPR are increasing, the TPR is always much higher than the FPR, and the average increasing speed of the TPR is always higher than that of the FPR.

3.6 Efficiency Analysis

In this section, we evaluate the efficiency of our proposed SAIA regarding to the elapsed time averaged over each sample. As illustrated in Fig. 8 and Fig 9, in our experiments on the skin lesion dataset, the elapsed time (second per sample) of the embedded AI and the networked AI are 0.308s and 2.51s respectively, and that of the onychomycosis dataset are 0.3s and 2.5s respectively. Fig. 8a and Fig 9a show that the elapsed time of SAIA system is linearly dependent on the percentage of data sent from the client to the server. Fig. 8b and Fig 9b illustrate that as we increase $\epsilon$ , the elapsed time (second per sample) of SAIA would quickly converge to a constant value (e.g., for the skin lesion dataset, while $\epsilon=50$ , the elapsed time of SAIA converges to around 1.89s; for the onychomycosis dataset, while $\epsilon=17$ , the elapsed time of SAIA converges to around 2.11s). Furthermore, as presented in Fig. 8c and Fig 9c, while SAIA reach the same accuracy as the networked AI, SAIA has less elapsed times on both datasets (i.e., 1.89s vs. 2.51s on the skin lesion dataset, and 2.11s vs. 2.5 on the onychomycosis dataset). To summarize, even with enough communication resource, by applying the decision unit, our system does not have to send all the data to the server-side, while obtaining the same accuracy as sending all the data to the server-side, and much less processing time compared with the networked AI.

4 Related Work

4.1 Compact Deep Neural Networks

Many real-world applications (e.g., mobile healthcare, smart home, wearable technologies) require to collect and analyze the data on mobile and IoT devices. Hence, compact DNNs have been proposed to conduct inference on such devices. For instance, SqueezeNet [61] obtains AlexNet [62] level of accuracy with 50x fewer parameters and less than 0.5MB model size, by downsampling the data using $1\times 1$ convolution filters. MobileNet [63, 64, 65] proposes a useful building block, “inverted residual block” into its design of DNNs, that significantly reduces computation complexity without accuracy loss, compared with traditional DNN models. YOLO, a state-of-the-art, real-time object detection system, is designed by using customized architecture, that only has one forth operations of VGG16 [66]. EfficientNet [57] is one of the state-of-the-art DNN models recently proposed for execution on mobile and IoT devices, that uniformly scales each dimension (e.g., width, depth and resolution) of DNN models with a fixed set of scaling coefficients. Although the compact DNNs could dramatically reduce it computation complexity, the overall performance of compact DNN model still would not be as good as the more advanced models deployed on the server side, that could also be the ensemble/fusion of several well-trained DNN models

4.2 Compressed Deep Neural Networks

DNN model compression techniques [11, 12, 13, 67, 68, 69, 70, 71] have been proposed to reduce the size and computation workload of DNN models running on the mobile and IoT devices. For instance, Knowledge distillation [11, 12, 13] has been proposed to compress a model by teaching a simplified student DNN model, step by step, exactly what to do using a complex pre-trained teacher DNN model, and then deploy the student DNN model on the mobile devices. Network pruning [72] has been proposed to trim the network connections within DNNs that have less influence on the inference accuracy. Data quantization [69] has been proposed to educe the number of bits to represent each weight value of DNN models. However, certain recent DNN models, such as MobileNet [65] and EfficientNet [57] are already very compact and hard to compress significantly. Deploying compressed DNN models on the mobile and IoT devices also cannot take advantage of the more advanced models deployed on the server side.

4.3 Split Deep Neural Networks

Split-DNN architectures [15, 16, 17, 73] have been proposed to offload the execution of complex DNN models to compute-capable servers from the mobile or IoT devices, where a DNN is split into head and tail sections, deployed at the client side and the server side, respectively. For instance, Osia et al. [73] proposes a hybrid architecture where a DNN model, that has previously been trained and fine-tuned on the cloud, would be split into two smaller neural networks: a feature extraction network that runs on the mobile or IoT devices, and a classification network that runs on the cloud system, and both neural networks on the local device and the cloud system would collaborate on running the original complex DNN model. Matsubara et al. [18] proposes a KD-based Split-DNN framework to reduce the communication cost between the client and the server. However, such approaches usually cannot fully rely on the client-side model, thus unable to work if the communication is impeded. However, none of such approaches directly address the communication bottleneck between the client and the server. Also, the existing approaches cannot adjust the AI usage on between the client and the serve depending on the device’s condition (e.g., storage size, power consumption and communication bandwidth).

5 Conclusion

In this paper, we propose SAIA, a novel, effective and efficient split artificial intelligence architecture for mobile healthcare systems, where we design four components: the data pre-processing interface (including objection detection, semantic segmentation and feature extraction), the embedded AI that contains a lightweight classification classifier(s), the networked AI that trains a multi-classifier fusion of several advanced DNN classifiers, and the core component, the decision unit that is another lightweight ML classifier that trains on a set of labeled meta data. A comprehensive experimental evaluation on two large scale healthcare datasets has been conducted. Our results show that SAIA consistently outperforms its baselines in terms of both effectiveness and efficiency. Our proposed decision unit with hyperparameter $\epsilon$ could effectively tune whether a sample captured by the client should be operated by the embedded AI or the networked AI, under different conditions. In our future work, we plan to design and implement fully-fledged split AI architecture that considers more factors, such as energy consumption, communication bandwidth and accuracy requirements.

Acknowledgments

Effort sponsored in whole or in part by United States Special Operations Command (USSOCOM), under Partnership Intermediary Agreement No. H92222-15-3-0001-01. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. ¹¹1The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the United States Special Operations Command.

References

[1] B. Farahani, F. Firouzi, K. Chakrabarty, Healthcare iot, in: Intelligent Internet of Things, Springer, 2020, pp. 515–545.
[2] F. Perez, S. Avila, E. Valle, Solo or ensemble? choosing a cnn architecture for melanoma classification, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
[3] P.-Y. Wu, C.-C. Fang, J. M. Chang, S.-Y. Kung, Cost-effective kernel ridge regression implementation for keystroke-based active authentication system, IEEE transactions on cybernetics 47 (11) (2016) 3916–3927.
[4] H. Nguyen, D. Zhuang, P.-Y. Wu, M. Chang, Autogan-based dimension reduction for privacy preservation, Neurocomputing (2019).
[5] D. Zhuang, S. Wang, J. M. Chang, Fripal: Face recognition in privacy abstraction layer, in: 2017 IEEE Conference on Dependable and Secure Computing, IEEE, 2017, pp. 441–448.
[6] D. Zhuang, J. M. Chang, Peerhunter: Detecting peer-to-peer botnets through community behavior analysis, in: 2017 IEEE Conference on Dependable and Secure Computing, IEEE, 2017, pp. 493–500.
[7] D. Zhuang, J. M. Chang, Enhanced peerhunter: Detecting peer-to-peer botnets through network-flow level community behavior analysis, IEEE Transactions on Information Forensics and Security 14 (6) (2018) 1485–1500.
[8] D. Zhuang, M. J. Chang, M. Li, Dynamo: Dynamic community detection by incrementally maximizing modularity, IEEE Transactions on Knowledge and Data Engineering (2019).
[9] Google’s cloud automl, https://cloud.google.com/automl/, accessed: 2018-11-09.
[10] Amazon sagemaker, https://aws.amazon.com/sagemaker/, accessed: 2018-11-09.
[11] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531 (2015).
[12] J. Ba, R. Caruana, Do deep nets really need to be deep?, in: Advances in neural information processing systems, 2014, pp. 2654–2662.
[13] A. Polino, R. Pascanu, D. Alistarh, Model compression via distillation and quantization, arXiv preprint arXiv:1802.05668 (2018).
[14] J. Wang, W. Bao, L. Sun, X. Zhu, B. Cao, S. Y. Philip, Private model compression via knowledge distillation, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 1190–1197.
[15] H.-J. Jeong, I. Jeong, H.-J. Lee, S.-M. Moon, Computation offloading for machine learning web apps in the edge server environment, in: 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), IEEE, 2018, pp. 1492–1499.
[16] Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, L. Tang, Neurosurgeon: Collaborative intelligence between the cloud and mobile edge, ACM SIGARCH Computer Architecture News 45 (1) (2017) 615–629.
[17] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, F. Kawsar, Deepx: A software accelerator for low-power deep learning inference on mobile devices, in: 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), IEEE, 2016, pp. 1–12.
[18] Y. Matsubara, S. Baidya, D. Callegaro, M. Levorato, S. Singh, Distilled split deep neural networks for edge-assisted real-time systems, in: Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019, pp. 21–26.
[19] D. Gutman, N. C. Codella, E. Celebi, B. Helba, M. Marchetti, N. Mishra, A. Halpern, Skin lesion analysis toward melanoma detection: A challenge at the international symposium on biomedical imaging (isbi) 2016, hosted by the international skin imaging collaboration (isic), arXiv preprint arXiv:1605.01397 (2016).
[20] N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler, et al., Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic), in: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), IEEE, 2018, pp. 168–172.
[21] N. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti, et al., Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic), arXiv preprint arXiv:1902.03368 (2019).
[22] S. S. Han, G. H. Park, W. Lim, M. S. Kim, J. Im Na, I. Park, S. E. Chang, Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network, PloS one 13 (1) (2018).
[23] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in: Advances in neural information processing systems, 2015, pp. 91–99.
[24] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37.
[25] J. Zhang, J. Hu, Image segmentation based on 2d otsu method with histogram analysis, in: 2008 International Conference on Computer Science and Software Engineering, Vol. 6, IEEE, 2008, pp. 105–108.
[26] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.
[27] X. Xiao, S. Lian, Z. Luo, S. Li, Weighted res-unet for high-quality retina vessel segmentation, in: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), IEEE, 2018, pp. 327–331.
[28] T. Zhao, D. Gao, J. Wang, Z. Tin, Lung segmentation in ct images using a fully convolutional neural network with multi-instance and conditional adversary loss, in: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), IEEE, 2018, pp. 505–509.
[29] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, P.-A. Heng, H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes, IEEE transactions on medical imaging 37 (12) (2018) 2663–2674.
[30] O. Abuzaghleh, B. D. Barkana, M. Faezipour, Automated skin lesion analysis based on color and shape geometry feature set for melanoma early detection and prevention, in: IEEE Long Island Systems, Applications and Technology (LISAT) Conference 2014, IEEE, 2014, pp. 1–6.
[31] A. Sancen-Plaza, R. Santiago-Montero, H. Sossa, F. J. Perez-Pinal, J. J. Martinez-Nolasco, J. A. Padilla-Medina, Quantitative evaluation of binary digital region asymmetry with application to skin lesion detection, BMC medical informatics and decision making 18 (1) (2018) 50.
[32] R. Seeja, A. Suresh, Deep learning based skin lesion segmentation and classification of melanoma using support vector machine (svm), Asian Pacific Journal of Cancer Prevention: APJCP 20 (5) (2019) 1555.
[33] T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary patterns: Application to face recognition, IEEE transactions on pattern analysis and machine intelligence 28 (12) (2006) 2037–2041.
[34] J. R. Quinlan, Induction of decision trees, Machine learning 1 (1) (1986) 81–106.
[35] A. Liaw, M. Wiener, et al., Classification and regression by randomforest, R news 2 (3) (2002) 18–22.
[36] J. A. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural processing letters 9 (3) (1999) 293–300.
[37] K. V. Rashmi, R. Gilad-Bachrach, Dart: Dropouts meet multiple additive regression trees., in: AISTATS, 2015, pp. 489–497.
[38] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
[39] D. Nielsen, Tree boosting with xgboost-why does xgboost win” every” machine learning competition?, Master’s thesis, NTNU (2016).
[40] G. Tutz, H. Binder, Boosting ridge regression, Computational Statistics & Data Analysis 51 (12) (2007) 6044–6059.
[41] N. Fenske, T. Kneib, T. Hothorn, Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression, Journal of the American Statistical Association 106 (494) (2011) 494–510.
[42] P. Bühlmann, T. Hothorn, et al., Boosting algorithms: Regularization, prediction and model fitting, Statistical Science 22 (4) (2007) 477–505.
[43] N. P. Nguyen, Gradient boosting for survival analysis with applications in oncology (2020).
[44] J. H. Friedman, Stochastic gradient boosting, Computational statistics & data analysis 38 (4) (2002) 367–378.
[45] T. Hastie, S. Rosset, J. Zhu, H. Zou, Multi-class adaboost, Statistics and its Interface 2 (3) (2009) 349–360.
[46] P. Tschandl, C. Rosendahl, H. Kittler, The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions, Scientific data 5 (2018) 180161.
[47] M. Combalia, N. C. Codella, V. Rotemberg, B. Helba, V. Vilaplana, O. Reiter, A. C. Halpern, S. Puig, J. Malvehy, Bcn20000: Dermoscopic lesions in the wild, arXiv preprint arXiv:1908.02288 (2019).
[48] T. Chen, C. Guestrin, Xgboost package - https://xgboost.readthedocs.io/en/latest/index.html (2016).
[49] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, in: Advances in neural information processing systems, 2017, pp. 3146–3154.
[50] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database, in: CVPR09, 2009.
[51] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
[52] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, K. Murphy, Progressive neural architecture search, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 19–34.
[53] B. Zoph, Q. V. Le, Neural architecture search with reinforcement learning, arXiv preprint arXiv:1611.01578 (2016).
[54] S. Xie, R. Girshick, P. Dollár, Z. Tu, K. He, Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.
[55] C. Szegedy, S. Ioffe, V. Vanhoucke, A. A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, in: Thirty-first AAAI conference on artificial intelligence, 2017.
[56] F. Chollet, Xception: Deep learning with depthwise separable convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
[57] M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, arXiv preprint arXiv:1905.11946 (2019).
[58] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, J. Feng, Dual path networks, in: Advances in neural information processing systems, 2017, pp. 4467–4475.
[59] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818–2826.
[60] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[61] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer, Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size, arXiv preprint arXiv:1602.07360 (2016).
[62] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105.
[63] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861 (2017).
[64] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
[65] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al., Searching for mobilenetv3, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1314–1324.
[66] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014).
[67] L. N. Huynh, Y. Lee, R. K. Balan, Deepmon: Mobile gpu-based deep learning framework for continuous vision applications, in: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, 2017, pp. 82–95.
[68] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, W. J. Dally, Eie: efficient inference engine on compressed deep neural network, ACM SIGARCH Computer Architecture News 44 (3) (2016) 243–254.
[69] S. Han, H. Mao, W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv:1510.00149 (2015).
[70] S. Liu, Y. Lin, Z. Zhou, K. Nan, H. Liu, J. Du, On-demand deep model compression for mobile devices: A usage-driven model selection framework.(2018) (2018).
[71] Z. Zhao, K. M. Barijough, A. Gerstlauer, Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (11) (2018) 2348–2359.
[72] J.-H. Luo, J. Wu, W. Lin, Thinet: A filter level pruning method for deep neural network compression, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 5058–5066.
[73] S. A. Osia, A. S. Shamsabadi, S. Sajadmanesh, A. Taheri, K. Katevas, H. R. Rabiee, N. D. Lane, H. Haddadi, A hybrid deep learning architecture for privacy-preserving mobile analytics, IEEE Internet of Things Journal (2020).