This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SR2CNN: Zero-Shot Learning for Signal Recognition

Yihong Dong, Xiaohan Jiang, Huaji Zhou, Yun Lin, and Qingjiang Shi This work was supported in part by the National Key Research and Development Project under grant 2017YFE0119300, and in part by the NSFC under Grants 61731018 and U1709219. (Corresponding author: Qingjiang Shi)Y. Dong, X. Jiang and Q. Shi are all with the School of Software Engineering, Tongji University, Shanghai 201804, China. Q. Shi is also with the Shenzhen Research Institute of Big Data, Shenzhen 518172, China. (e-mail: [email protected]; [email protected]; [email protected])H. Zhou is with the School of Artificial Intelligence, Xidian University, Xi’an 710071, China (e-mail: [email protected])Y. Lin is with the School of College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China (e-mail: [email protected])
Abstract

Signal recognition is one of the significant and challenging tasks in the signal processing and communications field. It is often a common situation that there’s no training data accessible for some signal classes to perform a recognition task. Hence, as widely-used in image processing field, zero-shot learning (ZSL) is also very important for signal recognition. Unfortunately, ZSL regarding this field has hardly been studied due to inexplicable signal semantics. This paper proposes a ZSL framework, signal recognition and reconstruction convolutional neural networks (SR2CNN), to address relevant problems in this situation. The key idea behind SR2CNN is to learn the representation of signal semantic feature space by introducing a proper combination of cross entropy loss, center loss and reconstruction loss, as well as adopting a suitable distance metric space such that semantic features have greater minimal inter-class distance than maximal intra-class distance. The proposed SR2CNN can discriminate signals even if no training data is available for some signal class. Moreover, SR2CNN can gradually improve itself in the aid of signal detection, because of constantly refined class center vectors in semantic feature space. These merits are all verified by extensive experiments with ablation studies.

Index Terms:
Zero-Shot Learning, Signal Recognition, CNN, Autoencoder, Deep Learning.

I Introduction

Nowadays, developments in deep convolutional neural networks (CNNs) have made remarkable achievement in the area of signal recognition, improving the state of the art significantly, such as [1, 2, 3, 4, 5] and so on. Generally, a vast majority of existing learning methods follow a closed-set assumption[6], that is, all of the test classes are assumed to be the same as the training classes. However, in the real-world applications, new signal categories often appear while the model is only trained for the current dataset with some limited known classes. It is open-set learning [7, 8, 9, 10] that was proposed to partially tackle this issue (i.e., test samples could be from unknown classes). The goal of an open-set recognition system is to reject test samples from unknown classes while maintaining the performance on known classes. However, in some cases, the learned model should be able to not only differentiate the unknown classes from known classes, but also distinguish among different unknown classes. Zero-shot learning (ZSL) [11, 12, 13] is one way to address the above challenges and has been applied in image tasks. For images, it is easy for us to extract some human-specified high-level descriptions as semantic attributes. For example, from a picture of zebra, we can extract the following semantic attributes 1) color: white and black, 2) stripes: yes, 3) size: medium, 4) shape: horse, 5) land: yes. However, for a real-world signal it is almost impossible to have a high-level description due to obscure signal semantics. Therefore, although ZSL has been widely used in image tasks, to the best of our knowledge it has not yet been studied for signal recognition.111A closely related work is [14] which proposed a ZSL method for fault diagnosis based on vibration signal. Notice that fault diagnosis is a binary classification problem, which is different from the multi-class signal recognition. More importantly, the ZSL definition in this paper is standard and quite different from the ZSL definition of [14], where ZSL refers to fault diagnosis with unknown motor loads and speeds, which is essentially domain adaptation, while in our paper, ZSL refers to recognition of unknown classes of the signal.

Refer to caption


Figure 1: Overview of SR2CNN. In SR2CNN, a pre-processing (top left) transforms signal data to input xx. A deep net (right) is trained to provide semantic feature zz within known classes while maintaining the performance on decoder and classifier according to reconstruction x~\tilde{x} and prediction yy. A zero-shot learning classifier, which consists of a known classifier and an unknown classifier, exploits zz for discriminator.

Refer to caption

Figure 2: The architecture of feature extractor (FF), classifier (CC) and decoder (DD). FF takes any input signal xx and produces a latent semantic feature zz. zz is used by CC and DD to predict class label and to reconstruct the signal x~\widetilde{x}, respectively. The LceL_{ce}, LctL_{ct} and LrL_{r} are calculated on training these networks.

In this paper, unlike the conventional signal recognition task where a classifier is learned to distinguish only known classes (i.e., the labels of test data and training data are all within the same set of classes), we aim to propose a learning framework that can not only classify known classes but also unknown classes without annotations. To do so, a key issue that needs to be addressed is to automatically learn a representation of semantic attribute space of signals. In our scheme, CNN combined with autoencoder is exploited to extract the semantic attribute features. Afterwards, semantic attribute features are well-classified using a suitably defined distance metric. The overview of proposed scheme is illustrated in Fig. 1.

In addition, to make a self-evolution learning model, incremental learning needs to be considered when the algorithm is executed continuously. The goal of incremental learning is to dynamically adapt the model to new knowledge from newly coming data without forgetting the already learned one. Based on incremental learning, the obtained model will gradually improve its performance over time.

In summary, the main contribution of this paper is threefold:

  • First, we propose a deep CNN-based zero-shot learning framework, called SR2CNN, for open-set signal recognition. SR2CNN is trained to extract semantic feature zz while maintaining the performance on decoder and classifier. Afterwards, the semantic feature zz is exploited to discriminate signal classes.

  • Second, extensive experiments on various signal datasets show that the proposed SR2CNN can discriminate not only known classes but also unknown classes and it can gradually improve itself.

  • Last but not least, we provide a new signal dataset SIGNAL-202002 including eight digital and three analog modulation classes.

II Related Work

In recent years, signal recognition via deep learning has achieved a series of successes. O’Shea et al. [15] proposed the convolutional radio modulation recognition networks, which can adapt itself to the complex temporal radio signal domain, and also works well at low signal-to-noise ratios (SNRs). The work [1] used residual neural network [16] to perform the signal recognition tasks across a range of configurations and channel impairments, offering referable statistics. Peng et al. [3] used two convolutional neural networks, AlexNet and GoogLeNet, to address modulation classification tasks, demonstrating the significant advantage of deep learning based approach in this field. The authors in [17] presented a deep learning based big data processing architecture for end-to-end signal processing task, seeking to obtain important information from radio signals. The work presented in [18] evaluated the adversarial evasion attacks that causes the misclassification in the context of wireless communications. In [19], the authors proposed an automatic multiple multicarrier waveforms classification and used the principal component analysis to suppress the additive white Gaussian noise and reduce the input dimensions of CNNs. Additionally, the work [20] proposed a specific emitter identification using CNN-Based inphase/quadrature (I/Q) imbalance estimators. The work [21] proposed a compressive convolutional neural network for automatic modulation classification. In [22], the authors used unsynchronized off-the-shelf software-defined radios to build up a complete communications system which is solely composed of deep neural networks, demonstrating that over-the-air transmissions are possible.

Moreover, the work [23] proposed an LPI radar waveform recognition technique based on a single-shot multi-box detector and a supplementary classifier. The work [24] proposed a more flexible network architecture with an augmented hierarchical-leveled training techniques to decently classify a total of 29 signals. O’Shea et al. [25] used both the auto-encoder-based communications system and the feature learning-based radio signal sensor to emulate the optimization procedure directly on real-world data samples and distributions. Baldini et al. [26] utilized various techniques to transform the time series derived from the radio frequency to images, then applied a deep CNN to conduct the identification task, finally outperforming those conventional dissimilarity-based methods. The work [27] trained a convolutional neural network on time and stockwell channeled images for radio modulation classification tasks, performing superior to those networks trained on just raw I-Q time series samples or time-frequency images. The authors for [28] demonstrated the generality of the effectiveness of deep learning at the interference source identification task, while using band selection, SNR selection and sample selection to optimize training time. The work [29] presented a novel system based on CNNs to “fingerprint” a unique radio from a large pool of devices by deep-learning the fine-grained hardware impairments imposed by radio circuitry on physical-layer I/Q samples. The work [30] proposed a DNN based power control method that aims at solving the non-convex optimization problem of maximizing the sum rate of a fading multi-user interference channel. Chen et al. [31] proposed adaptive transmission scheme and generalized data representation scheme to address the limited data rate issue. In [32], the authors proposed the radio frequency (RF) adversarial learning framework for building a robust system to identify rogue RF transmitters by designing and implementing a generative adversarial net. The work [33] presented an intelligent duty-cycle medium access control protocol to realize the effective and fair spectrum sharing between LTE and WiFi systems without requiring signalling exchanges.

For semi-supervised learning, the work [34] proposed a generative adversarial networks-based automatic modulation recognition for cognitive radio networks. Besides, when it comes to unsupervised learning, the authors in [35] provided a comprehensive survey of the applications of unsupervised learning in the domain of networking, offering certain instructions. The work [36] built an automatic modulation recognition architecture, based on stack convolution autoencoder, using the reconfigurability of field-programmable gate arrays. These experiments basically follow closed-set assumption, namely, their deep models are expected to, whilst are only capable to distinguish among already-known signal classes.

All the above works cannot handle the case with unknown signal classes. When considering the recognition task of those unknown signal classes, some traditional machine learning methods like anomaly (also called outlier or novelty) detection can more or less provide some guidance. Isolation Forest [37] constructs a binary search tree to preferentially isolate those anomalies. Elliptic Envelope [38], fits an ellipse for enveloping these central data points, while rejecting the outsiders. One-class SVM [39], an extension of SVM, finds a decision hyperplane to separate the positive samples and the outliers. Local Outlier Factor [40], uses distance and density to determine whether a data point is abnormal or not. The work [41] proposed a classification-reconstruction learning for open-set recognition method that utilizes latent representations for reconstruction and enables robust unknown detection without harming the known-class classification accuracy. Geng et al. [42] provided a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons. The work [43] proposed a multitask deep learning method that simultaneously conducts classification and reconstruction in the open world where unknown classes may exist. Moreover, the work [44] proposed a generative adversarial networks based technique to address an open-set problem, which is to identify rogue RF transmitters and classify trusted ones. The work [45] presented spectrum anomaly detector with interpretable features, which is an adversarial autoencoder based unsupervised model for wireless spectrum anomaly detection. The above open-set learning methods can indeed identify known samples (positive samples) and detect unknown ones (outliers). However, a common and inevitable defect of these methods are that they can never carry out any further classification tasks for the unknown signal classes.

Zero-shot learning is well-known to be able to classify unknown classes and it has already been widely used in image tasks. For example, the work [11] proposed a ZSL framework that can predict unknown classes omitted from a training set by leveraging a semantic knowledge base. Another paper [12] proposed a novel model for jointly doing standard and ZSL classification based on deeply learned word and image representations. The efficiency of ZSL in image processing field majorly profits from the perspicuous semantic attributes which can be manually defined by high-level descriptions. However, it is almost impossible to give any high-level descriptions regarding signals and thus the corresponding semantic attributes cannot be easily acquired beforehand. This may be the main reason why ZSL has not yet been studied in signal recognition.

Refer to caption
(a) Max unpooling
Refer to caption
(b) Average unpooling
Refer to caption
(c) Deconvolution
Figure 3: The diagrams of max unpooling, average unpooling and deconvolution. (a) Max unpooling with grid of 2×22\times 2, where the stride and padding are 2 and 0. (b) Average unpooling with grid of 2×22\times 2, where the stride and padding are 2 and 0. (c) Deconvolution with kernel of 3×33\times 3, where the stride and padding are 1 and 0 respectively.

III Problem Definition

We begin by formalizing the problem. Let XX, YY be the signal input space and output space. The set YY is partitioned into KK and UU, denoting the collection of known class labels and unknown labels, respectively.

Given training data {(x1,y1),,(xn,yn)}X×K\{(x_{1},y_{1}),\ldots,(x_{n},y_{n})\}\subset X\times K, the task is to extrapolate and recognize signal class that belongs to YY. Specifically, when we obtain the signal input data xXx\in X, the proposed learning framework, elaborated in the sequel, can rightly predict the label yy. Notice that our learning framework differs from open-set learning in that we not only classify the xx into either KK or UU, but also predict the label yYy\in Y. Note that YY includes both known classes KK and unknown classes UU.

We restrict our attention to ZSL that uses semantic knowledge to recognize KK and extrapolate to UU. To this end, we first map from XX into the semantic space ZZ, and then map this semantic encoding to a class label. Mathematically, we can use nonlinear mapping to describe our scheme as follows. HH is the composition of two other functions, FF and PP defined below, such that:

H=P(F())\displaystyle H=P(F(\cdot)) (1)
F:XZ\displaystyle F:X\rightarrow Z
P:ZY\displaystyle P:Z\rightarrow Y

Therefore, our task is left to find proper FF and PP to build up a learning framework that can identify both known signal classes and unknown signal classes.

IV Proposed Approach

This section formally presents a non-annotation zero-shot learning framework for signal recognition. Overall, the proposed framework is mainly composed of the following four modules:

  1. 1.

    Feature Extractor (FF)

  2. 2.

    Classifier (CC)

  3. 3.

    Decoder (DD)

  4. 4.

    Discriminator (PP)

Our approach consists of two main steps. In the first step, we build a semantic space for signals through FF, CC and DD. Fig. 2 shows the architecture of FF, CC and DD. FF is modeled by a CNN architecture that projects the input signal onto a latent semantic space representation. CC, modeled by a fully-connected neural network, takes the latent semantic space representation as input and determines the label of data. DD, modeled by another CNN architecture, aims to produce the reconstructed signal which is expected to be as similar as possible to the input signal. In the second step, we find a proper distance metric for the trained semantic space and use the distance to discriminate the signal classes. PP is devised to discriminate among all classes including both known and unknown.

IV-A Feature Extractor, Classifier and Decoder

Signal is a special data type, which is very different from image. While it is easy to give a description of semantic attributes of images in terms of visual information, extracting semantic features of signals without relying on any computation is almost impossible. Hence, a natural way to automatically extract the semantic information of signal data is using feature extractor networks FF. Considering about the unique features of signals, the input shape of FF should be a rectangle matrix with 2 rows rather than square matrix. In our scheme, FF consists of four convolutional layers and two fully connected layers.

Generally, FF can be represented by a mapping from the input space XX to the latent semantic space ZZ. In order to minimize the intra-class variations in space ZZ while keeping the inter-classes’ semantic features well separated, center loss [46] is used. Let xiXx_{i}\in X and yiy_{i} be the label of xix_{i}, then zi=F(xi)Zz_{i}=F(x_{i})\in Z. Assuming that batch size is NN, the center loss is expressed as follows:

Lct=12Ni=1NF(xi)cyi22L_{ct}=\frac{1}{2N}\sum_{i=1}^{N}||F(x_{i})-c_{y_{i}}||^{2}_{2} (2)

where cyic_{y_{i}} denotes the semantic center vector of class yiy_{i} in ZZ and the cyic_{y_{i}} needs to be updated as the semantic features of class yiy_{i} changed. Ideally, entire training dataset should be taken into account and the features of each class need to be averaged in every iterations. In practice, cyic_{y_{i}} can be updated for each batch according to cyicyiαΔcyic_{y_{i}}\leftarrow c_{y_{i}}-\alpha\Delta_{c_{y_{i}}}, where α\alpha is the learning rate and Δcyi\Delta_{c_{y_{i}}} is computed via

{Δcyi=0,ifj=1Nδ(yj=yi)=0,Δcyi=j=1Nδ(yj=yi)(cyiF(xi))j=1Nδ(yj=yi),otherwise.\left\{\begin{aligned} &\Delta_{c_{y_{i}}}=0,\quad if\,\,\sum_{j=1}^{N}\delta(y_{j}=y_{i})=0,\\ &\Delta_{c_{y_{i}}}=\frac{\sum_{j=1}^{N}\delta(y_{j}=y_{i})(c_{y_{i}}-F(x_{i}))}{\sum_{j=1}^{N}\delta(y_{j}=y_{i})},\quad otherwise.\end{aligned}\right. (3)

where δ()=1\delta(\cdot)=1 if the condition inside ()() holds true, and δ()=0\delta(\cdot)=0 otherwise.

The classifier CC will discriminate the label of samples based on semantic features. It consists of several fully connected layers. Furthermore, cross entropy loss LceL_{ce} is utilized to control the error of classifier CC, which is defined as

Lce=1Ni=1Nyilog(C(F(xi)))L_{ce}=-\frac{1}{N}\sum_{i=1}^{N}y_{i}\log(C(F(x_{i}))) (4)

where C(F(xi))C(F(x_{i})) is the prediction of xix_{i}.

Further, auto-encoder [47, 48, 49] is used in order to retain the effective semantic information in ZZ. As shown in the right part of Fig 2, decoder DD is used to reconstruct XX from ZZ. It is made up of deconvolution, unpooling and fully connected layers. Among them, unpooling is the reverse of pooling and deconvolution is the reverse of convolution. Specifically, max unpooling keeps the maximum position information during max pooling, and then it restores the maximum values to the corresponding positions and set zeros to the rest positions as shown in Fig. 3(a). Analogously, average unpooling expands the feature map in the way of copying it as shown in Fig. 3(b).

The deconvolution is also called transpose convolution to recover the shape of input from output, as shown in Fig. 3(c). See appendix A for the detailed convolution and deconvolution Operation, as well as toy examples.

In addition, reconstruction loss is utilized to evaluate the difference between original signal data and reconstructed signal data.

Lr=12Ni=1ND(F(xi))xi22L_{r}=\frac{1}{2N}\sum_{i=1}^{N}||D(F(x_{i}))-x_{i}||_{2}^{2} (5)

where D(F(xi))D(F(x_{i})) is the reconstruction of signal xix_{i}. Intuitively, the more complete signal is reconstructed, the more valid information is carried within ZZ. Thus, the auto-encoder greatly helps the model to generate appropriate semantic features.

As a result, the total loss function combines cross entropy loss, center loss and reconstruction loss as

Lt=Lce+λctLct+λrLrL_{t}=L_{ce}+\lambda_{ct}L_{ct}+\lambda_{r}L_{r} (6)

where the weights λct\lambda_{ct} and λr\lambda_{r} are used to balance the three loss functions. We have carefully designed the total loss function. The cross entropy loss is used to learn information from labels. And center loss minimizes the intra-class variations in the semantic space while keeping the inter-classes’ semantic features well separated, which also helps unknown classes to separate. Reconstruction loss makes model learn more information about signal data, because of well-reconstructed data. Ablation study in Section V also validates the above points. The whole learning process with loss LtL_{t} is summarized in Algorithm 1, where θF\theta_{F}, θC\theta_{C}, θD\theta_{D} denote the model parameters of the feature extractor FF, the classifier CC and the decoder DD, respectively.

Algorithm 1 Pseudocode for SR2CNN Update
0:  Labeled input and output set {(xi,yi)}\left\{(x_{i},y_{i})\right\} and hyperparameters N,η,α,λct,λrN,\eta,\alpha,\lambda_{ct},\lambda_{r}.
0:  Parameters θF,θC,θD\theta_{F},\theta_{C},\theta_{D} and {cj}\left\{c_{j}\right\}.
  Initial parameters θF,θC,θD\theta_{F},\theta_{C},\theta_{D}.
  Initial parameter {cj|jK}\left\{c_{j}|j\in K\right\}.
  repeat
     for each batch with size NN do
        Update cjc_{j} for each jj : cjcjαΔcjc_{j}\leftarrow c_{j}-\alpha\Delta_{c_{j}}
        Calculate LctL_{ct} via Eq. (2).
        Calculate LceL_{ce} via Eq. (4).
        Calculate LrL_{r} via Eq. (5).
        Lt=Lce+λctLct+λrLrL_{t}=L_{ce}+\lambda_{ct}L_{ct}+\lambda_{r}L_{r}.
        Update θF\theta_{F} : θFθFηθFLt\theta_{F}\leftarrow\theta_{F}-\eta\nabla_{\theta_{F}}L_{t}.
        Update θC\theta_{C} : θCθCηθCLt\theta_{C}\leftarrow\theta_{C}-\eta\nabla_{\theta_{C}}L_{t}.
        Update θD\theta_{D} : θDθDηθDLt\theta_{D}\leftarrow\theta_{D}-\eta\nabla_{\theta_{D}}L_{t}.
     end for
  until convergence

IV-B Discriminator

The discriminator PP is the tail but the core of the proposed framework. It discriminates among known and unknown classes based on the latent semantic space ZZ. For each known class kk, the feature extractor FF extracts and computes the corresponding semantic center vector SkS_{k} as:

Sk=j=1mδ(yj=k)F(xj)j=1mδ(yj=k)S_{k}=\frac{\sum_{j=1}^{m}\delta(y_{j}=k)F(x_{j})}{\sum_{j=1}^{m}\delta(y_{j}=k)} (7)

where mm is the number of all training samples. When a test signal \mathcal{I} appears and F()F(\mathcal{I}) is obtained, the difference between the vector F()F(\mathcal{I}) and SkS_{k} can be measured for each kk. Specifically, the generalized distance between F()F(\mathcal{I}) and SkS_{k} is used, which is defined as follows:

d(F(),Sk)=(F()Sk)TAk1(F()Sk)d(F(\mathcal{I}),S_{k})=\sqrt{(F(\mathcal{I})-S_{k})^{T}A_{k}^{-1}(F(\mathcal{I})-S_{k})} (8)

where AkA_{k} is the transformation matrix associated with class kk and Ak1A_{k}^{-1} denotes the inverse of matrix AkA_{k}. When AkA_{k} is the covariance matrix Σ\Sigma of semantic features of signals of class kk, d(,)d(\cdot,\cdot) is called Mahalanobis distance. When AkA_{k} is the identity matrix222This is also the only possible choice in the case when the covariance matrix Σ\Sigma is not available, which happens for example when the signal set of some class is singleton. II, d(,)d(\cdot,\cdot) is reduced to Euclidean distance. AkA_{k} also can be Λ\Lambda and σ2I\sigma^{2}I where Λ\Lambda is a diagonal matrix formed by taking diagonal elements of Σ\Sigma and σ2trace(Σ)t\sigma^{2}\triangleq\frac{trace(\Sigma)}{t} with tt being the dimension of SkS_{k}. The corresponding distance based on Ak=ΛA_{k}=\Lambda and Ak=σ2IA_{k}=\sigma^{2}I are called the second distance and third distance. Note that when the Mahalanobis distance, second distance and third distance are applied, the covariance matrix of each known class needs to be computed in advance.

With the above distance metric, we can establish our discriminant model which is divided into two steps. Firstly, distinguish between known and unknown classes. Secondly, discriminate which known classes or unknown classes the test signal belongs to. The first step is done by comparing the threshold Θ1\Theta_{1} with the minimal distance d1d_{1} given by

d1=minSkSd(F(),Sk)d_{1}=\min_{S_{k}\in S}d(F(\mathcal{I}),S_{k}) (9)

where SS is the set of known semantic center vectors. Let us denote by yy_{\mathcal{I}} the prediction of \mathcal{I}. If d1<Θ1d_{1}<\Theta_{1}, yKy_{\mathcal{I}}\in K, otherwise yUy_{\mathcal{I}}\in U. Owing to utilizing the center loss in training, the semantic features of signals of class kk are assumed to obey multivariate Gaussian distribution. Inspired by the three-sigma rule [50], we set Θ1\Theta_{1} as follows

Θ1=λ1×3t\Theta_{1}=\lambda_{1}\times 3\sqrt{t} (10)

where λ1\lambda_{1} is a control parameter referred to as the discrimination coefficient.

Two remarks are made as follows to explain the Gaussian distribution assumption and the choice of Θ1\Theta_{1}, respectively.

Remark 1

In our loss function, we have the center loss component which aims to minimize (2) with respect to the semantic layer. It is not difficult to show that

argminθFLct=argmaxθFLct=argmaxθF12Ni=1NF(xi)cyi22=argmaxθF12Ni=1N(F(xi)cyi)T(F(xi)cyi)\displaystyle\begin{split}&\arg\min_{\theta_{F}}L_{ct}=\arg\max_{\theta_{F}}-L_{ct}\\ &=\arg\max_{\theta_{F}}-\frac{1}{2N}\sum_{i=1}^{N}||F(x_{i})-c_{y_{i}}||^{2}_{2}\\ &=\arg\max_{\theta_{F}}-\frac{1}{2N}\sum_{i=1}^{N}(F(x_{i})-c_{y_{i}})^{T}(F(x_{i})-c_{y_{i}})\end{split} (11)

Because of the monotonicity of exponential function, we have

argmaxθF12Ni=1N(F(xi)cyi)T(F(xi)cyi)=argmaxθFe1Ni=1Ne(F(xi)cyi)T(F(xi)cyi)2=argmaxθFe1Ni=1Ne(F(xi)cyi)TI1(F(xi)cyi)2=argmaxθFe1N(2π)t2|I|12i=1N1(2π)t21|I|12e(F(xi)cyi)TI1(F(xi)cyi)2\displaystyle\begin{split}&\arg\max_{\theta_{F}}-\frac{1}{2N}\sum_{i=1}^{N}(F(x_{i})-c_{y_{i}})^{T}(F(x_{i})-c_{y_{i}})\\ &=\arg\max_{\theta_{F}}e^{\frac{1}{N}}\prod_{i=1}^{N}e^{-\frac{(F(x_{i})-c_{y_{i}})^{T}(F(x_{i})-c_{y_{i}})}{2}}\\ &=\arg\max_{\theta_{F}}e^{\frac{1}{N}}\prod_{i=1}^{N}e^{-\frac{(F(x_{i})-c_{y_{i}})^{T}I^{-1}(F(x_{i})-c_{y_{i}})}{2}}\\ &=\arg\max_{\theta_{F}}e^{\frac{1}{N}}(2\pi)^{\frac{t}{2}}|I|^{\frac{1}{2}}\\ &\qquad\qquad\prod_{i=1}^{N}\frac{1}{(2\pi)^{\frac{t}{2}}}\frac{1}{|I|^{\frac{1}{2}}}e^{-\frac{(F(x_{i})-c_{y_{i}})^{T}I^{-1}(F(x_{i})-c_{y_{i}})}{2}}\end{split} (12)

where tt denotes the dimension of Gaussian distribution and II denotes the identity matrix. Let βe1N(2π)t2|I|12\beta\triangleq e^{\frac{1}{N}}(2\pi)^{\frac{t}{2}}|I|^{\frac{1}{2}} and P(F(xi)|yi)=1(2π)t21|I|12e(F(xi)cyi)TI1(F(xi)cyi)2P(F(x_{i})|y_{i})=\frac{1}{(2\pi)^{\frac{t}{2}}}\frac{1}{|I|^{\frac{1}{2}}}e^{-\frac{(F(x_{i})-c_{y_{i}})^{T}I^{-1}(F(x_{i})-c_{y_{i}})}{2}}, the above equation can be equivalently written as

argmaxθFβi=1NP(F(xi)|yi)\arg\max_{\theta_{F}}\beta\prod_{i=1}^{N}P(F(x_{i})|y_{i}) (13)

where P(F(xi)|yi)=𝒩(cyi,I)P(F(x_{i})|y_{i})=\mathcal{N}(c_{y_{i}},I). This indicates that very likely the output of the semantic layer follows the Gaussian distribution.333Note that, however, due to the existence of the other two component loss functions, we propose using a general covariance matrix to describe the output of the semantic layer, as shown in (8).

Remark 2

The choice of Θ1\Theta_{1} in (10) is made due to the following two considerations. First, the well-known three-sigma rule of thumb is often used for identification of outliers [51]. It is shown in [51] that this rule should be properly generalized due to the impact of the dimension in the mult-dimensional case. We here present a natural generalization to the t-dimensional case by simply averaging the Mahalanobis distance over t\sqrt{t}, so as to remove the impact of the dimension on the choice of Θ1\Theta_{1}. The above explains why we have the term 3t3\sqrt{t} in (10). Second, a control parameter λ1\lambda_{1} is incorporated to make the choice of Θ1\Theta_{1} more sophisticated so that it can work well for complex recognition tasks. Our numerical experiments later validate the effectiveness of the choice of Θ1\Theta_{1}.

The second step is more complicated. If \mathcal{I} belongs to the known classes, its label yy_{\mathcal{I}} can be easily obtained via

y=argminkd(F(),Sk).y_{\mathcal{I}}=\arg\min_{k}d(F(\mathcal{I}),S_{k}). (14)

Obviously the main difficulty lies in dealing with the case when \mathcal{I} is classified as unknown in the first step. To illustrate, let us denote by RR the recorded unknown classes and define SRS_{R} to be the set of the semantic center vectors of RR. In this difficult case with RR\subseteq\varnothing, a new signal label R1R_{1} is added to RR and F()F(\mathcal{I}) is set to be the semantic center vector SR1S_{R_{1}}. The unknown signal \mathcal{I} is saved in set GR1G_{R_{1}} and let y=R1y_{\mathcal{I}}=R_{1}. While in the difficult case with RR\not\subseteq\varnothing, the threshold Θ2\Theta_{2} is compared to the minimal distance d2d_{2} which is defined by

d2=minSRuSRd(F(),Ru)d_{2}=\min_{S_{R_{u}}\in S_{R}}d(F(\mathcal{I}),{R_{u}}) (15)
TABLE I: Standard metadata of dataset 2016.10A. For a larger version, 2016.10B, the class ”AM-SSB” is removed, while the number of samples for each class is sixfold (120000). For a smaller one, 2016.04C, all 11 classes is included, but the number of samples for each class is disparate (range from 4120 to 24940).

total samples # of samples each class # of samples each SNR feature dimension classes (modulations) 220000 20000 1000 2×1282\times 128 11 modulation types 8PSK, AM-DSB, AM-SSB, BPSK, CPFSK, GFSK, PAM4, QAM16, QAM64, QPSK, WBFM # of SNR values SNR values 20 -20,-18,-16,-14,-12,-10,-8,-6,-4,-2,0,2,4,6,8,10,12,14,16,18

Intuitively, a good choice of Θ2\Theta_{2} may be made based on the distance between F(x)F(x) and SkS_{k}’s. d1d_{1} is the minimum distance which is firstly used in our test of choice of Θ2\Theta_{2}. Actually, we test a set of choices of Θ2\Theta_{2} and numerically find that unknown classes can be often correctly identified when Θ2\Theta_{2} is set between d1d_{1} and dmedd_{med}, where dmedd_{med} is the median distance between F(x)F(x) and each SkS_{k}. Therefore, the threshold Θ2\Theta_{2} is finally set as

Θ2=d1+λ2×dmed1+λ2\Theta_{2}=\frac{d_{1}+\lambda_{2}\times d_{med}}{1+\lambda_{2}} (16)

where λ2\lambda_{2} is used to balance the two distances d1d_{1} and dmedd_{med}.

To proceed, let nRn_{R} denote the number of recorded signal labels in RR. Then, if d2>Θ2d_{2}>\Theta_{2}, a new signal label RnR+1R_{n_{R}+1} is added to RR and set y=nR+1y_{\mathcal{I}}=n_{R}+1. Note that we don’t impose any prior restrictions on the value of nRn_{R} (the size of set RR), i.e., our model can never know the number of the unknown classes pending to be discriminated. Then if d2Θ2d_{2}\leq\Theta_{2}, we set

y=argminRud(F(),SRu).y_{\mathcal{I}}=\arg\min_{R_{u}}d(F(\mathcal{I}),S_{R_{u}}). (17)

and save the signal \mathcal{I} in GyG_{y_{\mathcal{I}}}. Accordingly, SyS_{y_{\mathcal{I}}} is updated via

Sy=kGyF(k)#(Gy)S_{y_{\mathcal{I}}}=\frac{\sum_{k\in G_{y_{\mathcal{I}}}}F(k)}{\#(G_{y_{\mathcal{I}}})} (18)

where #(Gy)\#(G_{y_{\mathcal{I}}}) denotes the number of signals in set GyG_{y_{\mathcal{I}}}. As a result, with the increase of the number of predictions for unknown signals, the model will gradually improve itself by way of refining SRuS_{R_{u}}’s.

Algorithm 2 Pseudocode for Discriminator PP
0:  Test input {()}\left\{(\mathcal{I})\right\}, transformation matrices {Ak,ARu}\left\{A_{k},A_{R_{u}}\right\}, sets S,R,SR,DS,R,S_{R},D and hyperparametes Θ1\Theta_{1}, Θ2\Theta_{2}.
0:  yy_{\mathcal{I}}.
  Calculate F()F(\mathcal{I}).
  Calculate d1d_{1} via Eq. (9).
  Calculate d2d_{2} via Eq. (15).
  if d1<Θ1d_{1}<\Theta_{1} then
     Calculate yy_{\mathcal{I}} via Eq. (14).
  else if d1Θ1d_{1}\geq\Theta_{1} and RR\subseteq\varnothing then
     Add R1R_{1} to RR.
     y=R1y_{\mathcal{I}}=R_{1} .
  else if d1Θ1d_{1}\geq\Theta_{1}, RR\not\subseteq\varnothing and d2>Θ2d_{2}>\Theta_{2} then
     Add RnR+1R_{n_{R}+1} to RR.
     y=RnR+1y_{\mathcal{I}}=R_{n_{R}+1}.
  else
     Calculate yy_{\mathcal{I}} via Eq. (17)
  end if
  Save \mathcal{I} in GyG_{y_{\mathcal{I}}}.
  update SyS_{y_{\mathcal{I}}} via Eq. (18).

Refer to caption

Figure 4: In-training statistics on three datasets. The accuracy is based on the known test set.

To summarize, we present the whole procedure of the discriminator in Algorithm 2. We emphasize that our SR2CNN is different from the common open-set recognition methods. Assuming that there are nn known classes and an uncertain number of unknown classes, the traditional open-set recognition method will only distinguish the test samples into n+1n+1 classes, while SR2CNN will distinguish the test samples into n+nRn+n_{R} classes via Algorithm 2, where nRn_{R} is the number of unknown classes recognized by the discriminator. Specifically, for the case when a test sample belongs to an unknown class, we determine whether it belongs to an existing unknown class or a new unknown class by comparing d2d_{2} with threshold Θ2\Theta_{2}. Hence, the notable advantage of SR2CNN over the common open-set recognition method lies in that SR2CNN can roughly distinguish how many unknown classes there are in the test set, not just label the test sample as unknown.

V Experiments and Results

TABLE II: Contrast between supervised learning and our ZSL learning scenario on three datasets. Dash lines in the ZSL column specify the boundary between known and unknown classes. Bold: accuracy for a certain unknown class. Italic: accuracy computed only to help draw a transverse comparison. Items split by slash ”/” like ”75.9%/8.4%” denote the accuracy respectively for two isotopic classes. “-” denotes no corresponding result for such case.

indicator scenario 2016.10A 2016.10B 2016.04C supervised ZSL supervised ZSL supervised ZSL accuracy 8PSK (1) 85.0% 85.5% 95.5% 86.7% 74.9% 69.3% AM-DSB (2) 100.0% 73.5% 100.0% 41.3% 100.0% 91.1% BPSK (4) 99.0% 95.0% 99.8% 96.5% 99.8% 97.6% PAM4 (7) 98.5% 94.5% 97.6% 93.4% 99.6% 96.8% QAM16 (8) 41.6% 49.3% 56.8% 40.0% 97.6% 98.4% QAM64 (9) 60.6% 44.0% 47.5% 49.6% 94.0% 97.6% QPSK (10) 95.0% 90.5% 98.9% 90.6% 86.8% 81.5% WBFM (11) 38.2% 32.0% 39.6% 50.4% 88.8% 86.9% \cdashline6-6 CPFSK (5) 100.0% 99.0% 100.0% 75.9%/8.4% 100.0% 96.2% \cdashline4-4\cdashline8-8 GFSK (6) 100.0% 99.0% 100.0% 95.6%/2.3% 100.0% 82.0% AM-SSB (3) 100.0% 100.0% - - 100.0% 100.0% total accuracy 83.5% 78.4% 83.6% 72.0% 94.7% 91.5% average known accuracy 79.8% 73.7% 79.5% 68.5% 93.5% 91.6% true known rate - 95.9% - 86.9% - 97.0% true unknown rate - 99.5% - 91.1% - 90.0%

Refer to caption

Figure 5: Correlation between true known/unknown accuracy and discrimination coefficient (λ1\lambda_{1}) on three datasets.

In this section, we demonstrate the effectiveness of the proposed SR2CNN approach by conducting extensive experiments with the dataset 2016.10A, as well as its two counterparts, 2016.10B and 2016.04C [15]. The data description is presented in Table I. All 1111 types of modulations are numbered with class labels from left to right.

Sieve samples. Samples with SNR less than 16 are firstly filtered out, only leaving a purer and higher-quality portion (one-tenth of origin) to serve as the overall datasets in our experiments.

Choose unknown classes. Empirically, a class whose features are hard to learn is an arduous challenge for a standard supervised learning model, let alone when it plays an unknown role in our ZSL scenario (especially when no prior knowledge about the number of the unknown classes is given, as we mentioned in the Subsection 4.2). Hence, necessarily, a completely supervised learning stage is carried out beforehand, to help us nominate suitable unknown classes. If the prediction accuracy of the full supervision method is rather low for certain class, it is reasonable to exclude this class in ZSL, because ZSL will definitely not yield a good performance for this class. In our experiments, unknown classes are randomly selected from a set of classes for which the accuracy of full supervision is higher than 50%. As shown in Table II, the ultimate candidates fall on AM-SSB(3) and GFSK(6) for 2016.10A and 2016.04C, while CPFSK(5) and GFSK(6) for 2016.10B.

Split training, validation and test data. 70% of the samples from the known classes make up the overall training set while 15% makes up the known validation set and the rest 15% makes up the known test set. For the unknown classes, there’s only a test set needed, which consists of 15% of the unknown samples.

Due to the three preprocessing steps, we get a small copy of, e.g., dataset 2016.10A, which contains a training set of 1260012600 samples, a known validation set of 27002700 samples, a known test set of 27002700 samples and an unknown test set of 600600 samples.

All of the networks in SR2CNN are computed on a single GTX Titan X graphic processor and implemented in Python, and trained using the Adam optimizer with learning rate η=0.001\eta=0.001 and batch size N=256N=256. Generally, we allow our model to learn and update itself maximally for 250 epochs. In addition, the grid search is applied to the validation set to determine the hyperparameters.

V-A In-training Views

Basically, the average softmax accuracy of the known test set will converge roughly to 80%80\% on both 2016.10A and 2016.10B, while to 94%94\% on 2016.04C, as indicated in Fig. 4. Note that there’s almost no perceptible loss on the accuracy when using the clustering approach (i.e., the distance measure-based classification method described in Section IV) to predict instead of softmax, meaning that the semantic feature space established by our SR2CNN functions very well. For ease of exposition, we will refer to the known cluster accuracy as upbound (UB).

During the training course, the cross entropy loss undergoes sharp and violent oscillations. This phenomenon makes sense, since the extra center loss and reconstruction loss will intermittently shift the learning focus of the SR2CNN.

TABLE III: Ablation study about the discrimination task via PP on 2016.10A in test. Bold: performance of the original SR2CNN model. F1 score denotes 2×accuracy×precision/(accuracy+precision)2\times accuracy\times precision/(accuracy+precision)

indicator modification SR2CNN without Cross Entropy Loss without Center Loss without Reconstruction Loss L1 Loss accuracy AM-SSB(3) 100.0% 100.0% 99.5% 100.0% 100.0% GFSK(6) 99.5% 98.5% 61.0% 94.8% 95.8% average known accuracy 73.7% 72.1% 69.0% 72.3% 70.4% precision known 76.8% 75.3% 79.1% 74.5% 82.8% unknown 96.1% 95.2% 82.4% 94.5% 86.1% F1 score known 75.3% 73.6% 73.7% 73.3% 76.1% unknown 98.0% 97.2% 81.3% 95.9% 91.6%

V-B Critical Results

The most critical results are presented in Table II. To better illustrate it, we will firstly make a few definitions in analogy to the binary classification problem. By superseding the binary condition positive and negative with known and unknown respectively, we can similarly elicit true known (TK), true unknown (TU), false known (FK) and false unknown (FU). Subsequently, we get two important indicators as follows:

trueknownrate(TKR)=TKK=TKTK+FUtrue~{}known~{}rate~{}(TKR)~{}=\frac{TK}{K}=\frac{TK}{TK+FU}
trueunknownrate(TUR)=TUU=TUTU+FKtrue~{}unknown~{}rate~{}(TUR)~{}=\frac{TU}{U}=\frac{TU}{TU+FK}

Furthermore, we define precision likewise as follows:

knownprecision(KP)=ScorrectTK+FKknown~{}precision~{}(KP)=\frac{S_{correct}}{TK+FK}
unknownprecision(UP)=Udominantly_correctTU+FUunknown~{}precision~{}(UP)=\frac{U_{dominantly\_correct}}{TU+FU}

where ScorrectS_{correct} denotes the total number of known samples that are classified to their exact known classes correctly. Udominantly_correctU_{dominantly\_correct} denotes the total number of unknown samples that are classified to their exact newly-identified unknown classes correctly. For evaluation, the real label of a certain newly-recorded unknown class is determined as the label of the most signal samples in that class. Note that sometimes unexpectedly, our SR2CNN may classify a small portion of signals into different unknown classes but their real labels are actually identical and correspond to one certain unknown class (we name these unknown classes as isotopic classes). In this rare case, we only count the identified unknown class with the highest accuracy in calculating Udominantly_correctU_{dominantly\_correct}.

For ZSL, we test our SR2CNN with several different combinations of aforementioned parameters λ1\lambda_{1} and λ2\lambda_{2}, hopefully to snatch a certain satisfying result out of multiple trials. Fixing λ2\lambda_{2} to 1 simply leads to fair performance, though still, we adjust λ1\lambda_{1} in a range between 0.05 and 1.0. Here, the pre-defined indicators above play an indispensable part to help us sift the results. Generally, a well-chosen result is supposed to meet the following requirements: 1. the weighted true rate (WTR):  0.4×0.4\timesTKR+0.6×0.6\timesTUR  is as great as possible; 2. KP>0.95×>0.95\timesUB, where UB is the upbound defined as the known cluster accuracy; 3. #isotopicj<=\#^{j}_{isotopic}<=2 for all possible jj, where #isotopicj\#^{j}_{isotopic} denotes the number of isotopic classes corresponding to a certain unknown class jj.

Refer to caption
Figure 6: Ablation study about the known accuracy on 2016.10A in training.
Refer to caption
Figure 7: Effect of center loss. The presence of center loss is distinguished by line shape(solid or dash). Interviewees(known or unknown accuracy) are distinguished by line color(blue or green).
TABLE IV: Performance among different set of chosen unknown classes on 2016.10A. Bold: recall rate. Item split by slash ”/” like ”87.8%/9.0%” and ”-” basically are of the same meanings with Table II.

indicator training config unknown classes AM-SSB and GFSK CPFSK and GFSK AM-SSB and CPFSK AM-SSB, CPFSK and GFSK accuracy AM-SSB(3) 100.0% - 100.0% 100.0% CPFSK(5) - 71.0% 87.8%/9.0% 65.5% GFSK(6) 99.5% 100.0% - 90.5% average known accuracy 73.7% 68.3% 75.6% 69.6% true known rate 95.9% 89.6% 96.2% 90.9% true unknown rate 99.8% 85.5% 98.4% 85.4% precision known 76.8% 73.6% 78.3% 74.0% unknown 96.1% 89.2% 91.9% 90.4%

In order to better make a transverse comparision, we compute two extra indicators, average total accuracy in ZSL scenario and also average known accuracy in completely supervised learning, shown as italics in Table II. On the whole, the results are promising and excellent. However, we have to admit that ZSL learning somewhat incurs a little bit performance loss as compared with the fully supervised model. Looking vertically, among all modulations, the performance loss especially occurs in the class AM-DSB. While looking horizontally among all datasets, the performance loss especially occurs in dataset 2016.10B. After all, when losing sight of the two unknown classes, SR2CNN can only acquire a segment of the intact knowledge that shall be totally learned in a supervised case. It is this imperfection that presumably leads to an apparent variation on each class’s accuracy when compared with supervised learning. Among these classes, the poorest victim is always AM-DSB, with considerable portion of its samples rejected as unknown ones. Besides, the features, especially those of the unknown classes, among these three datasets are not exactly in the same difficulty levels of learning. Some unknown features may even be similar to those known ones, which can consequently cause confusions in the discrimination tasks. It is no doubt that these uncertainties and differences in the feature domain matter a lot. Take 2016.10B, compared with its two counterparts, it emanates the greatest loss (more than 10%) on average accuracy (both total and known), and also a pair of inferior true rates. Moreover, it is indeed the single case, where both two unknown classes are separately identified into two isotopic classes.

It is obvious that average accuracy strongly depends on the weighted true rate (WTR). Since the clearer for the discrimination between known and unknown, the more accurate for the further classification and identification. Therefore, to better study this discrimination ability, we depict Fig. 5 to elucidate its variation trends regarding discrimination coefficient (λ1\lambda_{1}). At the same time, we introduce a new concept discrimination interval as an interval where the weighted true rate is always greater than 80%. The width of the above interval is used to help quantify this discrimination ability.

TABLE V: Comparison between our SR2CNN model and several traditional open-set model and outlier detectors on 2016.10A. Bold: performance of the dominant SR2CNN model. Italic: performance of these traditional methods when true known rates reach the highest. Vertical bar ”||” is used to split the standard results and the italic ones.

indicator detector SR2CNN IsolationForest [37] EllipticEnvelope [38] OneClassSVM [39] LocalOutlierFactor [40] OpenMax [8] MDL4OW [43] AM-SSB(3) 100.0% 72.3% || 00.0% 100.0% || 100.0% 96.3% || 26.0% 100.0% 100.0% 99.3% GFSK(6) 99.5% 01.3% || 00.0% 90.0% || 00.0% 00.0% || 00.0% 00.0% 00.0% 26.5% true known rate 95.9% 81.3% || 99.9% 46.1% || 97.6% 85.5% || 92.0% 96.7% 98.1% 79.4% true unknown rate 99.8% 36.8% || 00.0% 95.0% || 50.0% 48.1% || 13.0% 50.0% 50.0% 62.9%

TABLE VI: Contrast between supervised learning and our ZSL learning scenario on dataset SIGNAL-202002. Dash lines in the ZSL column specify the boundary between known and unknown classes. Bold: accuracy for a certain unknown class. Italic: accuracy computed only to help draw a transverse comparision. ”-” basically is of the same meanings with Table II.

indicator scenario SIGNAL-202002 supervised learning zero-shot learning accuracy BPSK (1) 84.3% 70.8% QPSK (2) 86.5% 67.8% 8PSK (3) 67.8% 70.3% 16QAM (4) 99.5% 96.8% 64QAM (5) 95.5% 84.8% PAM4 (6) 97.0% 89.0% GFSK (7) 56.3% 38.3% AM-DSB (10) 63.8% 67.3% AM-SSB (11) 44.3% 62.0% \cdashline4-4 CPFSK (8) 100.0% 81.0% B-FM (9) 93.5% 74.5% average total accuracy 80.8% 73.0% average known accuracy 77.3% 71.9% true known rate - 82.3% true unknown rate - 84.9% precision known - 87.4% unknown - 91.6%

Apparently, the curves for the primary two kinds of true rate are monotonic, increasing for the known while decreasing for the unknown. The maximum points of these weighted true rate curves for each dataset, are about 0.4, 0.2, and 0.4 respectively. These points exactly correspond to the results shown in Table II. Besides, the width of the discrimination interval of 2016.10B is only approximately one third of those of 2016.10A and 2016.04C. This implies that the features of 2016.10B are more difficult to learn, and just accounts for its relatively poor performance.

V-C Ablation Study

In this subsection, we explain the necessity of each of the three loss functions. Relevant experiments are mainly based on 2016.10A.

Fig. 6 presents the known accuracy in absence of cross entropy loss, center loss and reconstruction loss respectively during training. In general, we found that the best performance in training will be degraded when any one of these three loss functions is excluded. It can be observed that both cross entropy loss and reconstruction loss make a positive impact on the known accuracy, boosting about 3% to 5%, while center loss seems slightly weaker.

Analyzing Table III, we can easily discern the effect of these three loss functions in the test course, especially the center loss. Results show that the F1 score in absence of cross entropy loss, center loss and reconstruction loss decreases by 1.8%, 1.7% and 2.0% respectively for the known classes. For the unknown classes, the minimum degradation in F1 score is 0.8% after removing cross entropy loss, while the maximum degradation in F1 score is 16.7% after removing center loss. Actually, Fig. 7 indicates that the usage of center loss on 2016.10A indeed helps our model to discriminate more distinctly, resulting in a notably broader discrimination interval. Besides, we have also made an attempt at applying L1 loss [52] to calculate center loss (Eq. (2) in Section IV) and reconstruction loss (Eq. (5) in Section IV). Those related results are presented in the last column of Table III. It is seen that L1 loss can indeed slightly increase the F1 score of known classes by 0.8%, however, at the cost of a decrease in the F1 score of unknown classes by 6.4%.

In sum, the three loss functions, though not exactly promoting our SR2CNN in the same way and in the same fields, are indeed useful.

V-D Other Extensions

We tentatively change several unknown classes on 2016.10A, seeking to excavate more in the feature domain of data. As shown in Table IV, both known precision (KP) and unknown precision (UP) are insensitive to the change of unknown classes, proving that the classification ability of SR2CNN are consistent and well-preserved for the considered dataset. Nevertheless, obviously, the unknown class CPFSK is always the hardest obstacle in the course of discrimination. Since accuracy of CPFSK is always the lowest as well as some isotopic classes are observed in this case. Especially, when class CPFSK and GFSK simultaneously show up in the unknown roles, the performance loss (on both TKR and TUR) is quite striking. We speculate that the unknown CPFSK and GFSK may share a considerable number of similarities with some known classes, which will unluckily mislead SR2CNN about the further discrimination task.

To justify SR2CNN’s superiority, we compare it with a couple of traditional methods prevailing in the field of outlier detection, as well as two open-set recognition methods, i.e., OpenMax [8] and MDL4OW [43]. For outlier detection methods, the detected outlier will be regarded as an unknown sample. For OpenMax, an extra dimension is appended to the output vector to indicate the probability of the current sample being unknown. While for MDL4OW, the extreme value theory is adopted to detect the unknown classes by modeling the distribution of loss. The results are presented in Table V. It is found that our SR2CNN significantly outperforms both outlier detection methods and open-set recognition methods in terms of the true unknown rate. Furthermore, we find that most of the aforementioned methods cannot correctly identify GFSK as unknown. For example, in our experiment, OpenMax wrongly classifies all GFSK samples as known. As for MDL4OW, it identifies a small percentage of GFSK samples at the cost of true known rate. However, it can be found from the experiment results that our SR2CNN can still work very well for this open-set recognition task.

Note that there are no unknown classes identification tasks launched, only discrimination tasks are considered. Hence, here, for a certain unknown class jj, we compute its unknown rate, instead of accuracy, as #unknownjNj\frac{\#^{j}_{unknown}}{N_{j}}, where NjN_{j} denotes the number of samples from unknown class jj, while #unknownj\#^{j}_{unknown} denotes the number of samples from unknown class jj, which are discriminated as unknown ones.

Refer to caption
Figure 8: ROC curves of SR2CNN, OpenMax and some other outlier detecors on 2016.10A.

In addition, relevant ROC curves for the above comparison experiments are depicted in Fig. 8. It is observed that SR2CNN has the largest AUC, indicating its superiority over other methods. Besides, notably, there seems as if a steep ‘cliff erecting’ where False Known Rate approximately equals to 0.5, particularly for EllipticEnvelope, LocalOutlierFactor, and OpenMax. This means that almost half samples of unknown classes are not easy to be correctly discriminated. Correspondingly, according to Table V, we can speculate that these ‘hard’ samples all come from unknown class GFSK.

VI Dataset SIGNAL-202002

We newly synthesize a dataset, denominated as SIGNAL-202002, to hopefully be of great use for further researches in signal recognition field. Basically, the dataset consists of 11 modulation types, which are BPSK, QPSK, 8PSK, 16QAM, 64QAM, PAM4, GFSK, CPFSK, B-FM, AM-DSB and AM-SSB. Each type is composed of 20000 frames. Data is modulated at a rate of 8 samples per symbol, while 128 samples per frame. The channel impairments are modeled by a combination of additive white Gaussian noise, Rayleigh fading, multipath channel and clock offset. We pass each frame of our synthetic signals independently through the above channel model, seeking to emulate the real-world case, which shall consider translation, dilation and impulsive noise, etc. The configuration is set as follows:

20000 samples per modulation type

2×1282\times 128 feature dimension

20 different SNRs, even values between [2dB, 40dB]

The complete dataset is stored as a python pickle file which is about 450 MBytes in complex 32 bit floating point type. Related code for the generation process is implemented in MatLab and the SIGNAL-202002 dataset is available on the link: https://drive.google.com/file/d/1EDfKRNIk_txxyAyPCR7BEGs0BvEk3Bof/view.

We conduct zero-shot learning experiments on our newly-generated dataset and report the results here. As mentioned above, a supervised learning trial is similarly carried out to help us get an overview of the regular performance for each class of SIGNAL-202002. Unfortunately, as Table VI shows, the original two candidates of 2016.10A, AM-SSB and GFSK, both fail to keep on top. Therefore, here, we relocate the unknown roles to another two modulations, CPFSK with the highest accuracy overall, as well as B-FM, which stands out in the three analogy modulation types (B-FM, AM-SSB and AM-DSB).

According to Table VI, an apparent loss on the discrimination ability is observed, as both the TKR and the TUR just slightly pass 80%. However, our SR2CNN still maintain its classification ability, as the accuracy for each class remains encouraging compared with the completely-supervised model. A significant fact is that, the known precision (KP) is incredibly high, even exceeding those KPs on 2016.10A by almost 10%, as shown in Table IV. To account for this, we speculate that the absence of two unknown classes may unintentionally allow SR2CNN to better focus on the features of the known ones, which consequently, leads to a superior performance of known classification task.

VII Conclusion

In this paper, we have proposed a ZSL framework SR2CNN, which can successfully extract precise semantic features of signals and discriminate both known classes and unknown classes. SR2CNN can works very well in the situation where we have no sufficient training data for certain class. Moreover, SR2CNN can generally improve itself in the way of updating semantic center vectors. Extensive experiments demonstrate the effectiveness of SR2CNN. In addition, we provide a new signal dataset SIGNAL-202002 including eight digital and three analog modulation classes for further research. Finally, we would like to point out that, because we often have I/Q signals, a possible direction for future research is using complex neural networks [53] to establish the semantic space.

Appendix A Convolution and Deconvolution Operation

Let 𝒂,𝒃n\bm{a},\bm{b}\in\mathbb{R}^{n} denote the vectorized input and output matrices. Then the convolution operation can be expressed as

𝒃=𝐌𝒂\bm{b}=\mathbf{M}\bm{a} (19)

where 𝐌\mathbf{M} denotes the convolutional matrix, which is sparse. With back propagation of convolution, Loss𝒃\frac{\partial Loss}{\partial\bm{b}} is obtained, thus

Lossaj=iLossbibiaj=iLossbi𝐌i,j=𝐌,jTLoss𝒃\frac{\partial Loss}{\partial a_{j}}=\sum_{i}\frac{\partial Loss}{\partial b_{i}}\frac{b_{i}}{a_{j}}=\sum_{i}\frac{\partial Loss}{\partial b_{i}}\mathbf{M}_{i,j}=\mathbf{M}_{*,j}^{T}\frac{\partial Loss}{\partial\bm{b}} (20)

where aja_{j} denotes the jj-th element of 𝒂\bm{a}, bib_{i} denotes the ii-th element of 𝒃\bm{b}, 𝐌i,j\mathbf{M}_{i,j} denotes the element in the i-th row and j-th column of 𝐌\mathbf{M}, and 𝐌,j\mathbf{M}_{*,j} denotes the jj-th column of 𝐌\mathbf{M}. Hence,

Loss𝒂=[Lossa1Lossa2Lossan]=[𝐌,1TLoss𝒃𝐌,2TLoss𝒃𝐌,nTLoss𝒃]=𝐌TLoss𝒃.\frac{\partial Loss}{\partial\bm{a}}=\left[\begin{matrix}\frac{\partial Loss}{\partial a_{1}}\\ \frac{\partial Loss}{\partial a_{2}}\\ \vdots\\ \frac{\partial Loss}{\partial a_{n}}\end{matrix}\right]=\left[\begin{matrix}\mathbf{M}_{*,1}^{T}\frac{\partial Loss}{\partial\bm{b}}\\ \mathbf{M}_{*,2}^{T}\frac{\partial Loss}{\partial\bm{b}}\\ \vdots\\ \mathbf{M}_{*,n}^{T}\frac{\partial Loss}{\partial\bm{b}}\end{matrix}\right]=\mathbf{M}^{T}\frac{\partial Loss}{\partial\bm{b}}. (21)

Similarly, the deconvolution operation can be notated as

𝒂=𝐌~𝒃\bm{a}=\mathbf{\widetilde{M}}\bm{b} (22)

where 𝐌~\mathbf{\widetilde{M}} denotes a convolutional matrix that has the same shape as MTM^{T}, and it needs to be learned. Then the back propagation of convolution can be formulated as follows:

Loss𝒃=𝐌~TLoss𝒂.\frac{\partial Loss}{\partial\bm{b}}=\mathbf{\widetilde{M}}^{T}\frac{\partial Loss}{\partial\bm{a}}. (23)

For example, the size of the input and output matrices is 4×44\times 4 and 2×22\times 2 as shown in Fig. 3(c). Then 𝒂\bm{a} is a 16-dimensional vector and 𝒃\bm{b} is a 4-dimensional vector. Define convolutional kernel 𝐊\mathbf{K} as

𝐊=[w00w01w02w10w11w12w20w21w22].\mathbf{K}=\left[\begin{matrix}w_{00}&w_{01}&w_{02}\\ w_{10}&w_{11}&w_{12}\\ w_{20}&w_{21}&w_{22}\end{matrix}\right]. (24)

It is not hard to imagine that 𝐌\mathbf{M} is a 4×164\times 16 matrix, and it can be represented as follows:

[w00w01w02000000w00w01w0200000000w20w21w22000000w20w21w22].\left[\begin{matrix}w_{00}&w_{01}&w_{02}&0&\ldots&0&0&0&0\\ 0&w_{00}&w_{01}&w_{02}&\ldots&0&0&0&0\\ 0&0&0&0&\ldots&w_{20}&w_{21}&w_{22}&0\\ 0&0&0&0&\ldots&0&w_{20}&w_{21}&w_{22}\end{matrix}\right]. (25)

Hence, deconvolution is expressed as left-multiplying 𝐌~\mathbf{\widetilde{M}} in forward propagation, and left-multiplying 𝐌~T\mathbf{\widetilde{M}}^{T} in back propagation.

References

  • [1] T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based radio signal classification,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 168–179, 2018.
  • [2] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural network architectures for signals supported on graphs,” IEEE Transactions on Signal Processing, vol. 67, no. 4, pp. 1034–1049, 2018.
  • [3] S. Peng, H. Jiang, H. Wang, H. Alwageed, Y. Zhou, M. M. Sebdani, and Y.-D. Yao, “Modulation classification based on signal constellation diagrams and deep learning,” IEEE transactions on neural networks and learning systems, vol. 30, no. 3, pp. 718–727, 2018.
  • [4] T. J. O’Shea, J. Corgan, and T. C. Clancy, “Unsupervised representation learning of structured radio communication signals,” in 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE).   IEEE, 2016, pp. 1–5.
  • [5] L. Du, H. Liu, P. Wang, B. Feng, M. Pan, and Z. Bao, “Noise robust radar hrrp target recognition based on multitask factor analysis with small training data size,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3546–3559, 2012.
  • [6] G. C. Garriga, P. Kralj, and N. Lavrač, “Closed sets for labeled data,” Journal of Machine Learning Research, vol. 9, no. Apr, pp. 559–580, 2008.
  • [7] W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, and T. E. Boult, “Toward open set recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 7, pp. 1757–1772, 2012.
  • [8] A. Bendale and T. E. Boult, “Towards open set deep networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1563–1572.
  • [9] H. Liu, Z. Cao, M. Long, J. Wang, and Q. Yang, “Separate to adapt: Open set domain adaptation via progressive separation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2927–2936.
  • [10] C. Geng and S. Chen, “Collective decision for open set recognition,” IEEE Transactions on Knowledge and Data Engineering, 2020.
  • [11] M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, “Zero-shot learning with semantic output codes,” in Advances in neural information processing systems, 2009, pp. 1410–1418.
  • [12] R. Socher, M. Ganjoo, C. D. Manning, and A. Ng, “Zero-shot learning through cross-modal transfer,” in Advances in neural information processing systems, 2013, pp. 935–943.
  • [13] Z. Wang, X. Ye, C. Wang, J. Cui, and P. Yu, “Network embedding with completely-imbalanced labels,” IEEE Transactions on Knowledge and Data Engineering, 2020.
  • [14] Y. Gao, L. Gao, X. Li, and Y. Zheng, “A zero-shot learning method for fault diagnosis under unknown working loads,” Journal of Intelligent Manufacturing, pp. 1–11, 2019.
  • [15] T. J. O’Shea, J. Corgan, and T. C. Clancy, “Convolutional radio modulation recognition networks,” in International conference on engineering applications of neural networks.   Springer, 2016, pp. 213–226.
  • [16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • [17] S. Zheng, S. Chen, L. Yang, J. Zhu, Z. Luo, J. Hu, and X. Yang, “Big data processing architecture for radio signals empowered by deep learning: Concept, experiment, applications and challenges,” IEEE Access, vol. 6, pp. 55 907–55 922, 2018.
  • [18] B. Flowers, R. M. Buehrer, and W. C. Headley, “Evaluating adversarial evasion attacks in the context of wireless communications,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1102–1113, 2019.
  • [19] S. Duan, K. Chen, X. Yu, and M. Qian, “Automatic multicarrier waveform classification via pca and convolutional neural networks,” IEEE Access, vol. 6, pp. 51 365–51 373, 2018.
  • [20] L. J. Wong, W. C. Headley, and A. J. Michaels, “Specific emitter identification using convolutional neural network-based iq imbalance estimators,” IEEE Access, vol. 7, pp. 33 544–33 555, 2019.
  • [21] S. Huang, L. Chai, Z. Li, D. Zhang, Y. Yao, Y. Zhang, and Z. Feng, “Automatic modulation classification using compressive convolutional neural network,” IEEE Access, vol. 7, pp. 79 636–79 643, 2019.
  • [22] S. Dörner, S. Cammerer, J. Hoydis, and S. ten Brink, “Deep learning based communication over the air,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 132–143, 2017.
  • [23] L. M. Hoang, M. Kim, and S.-H. Kong, “Automatic recognition of general lpi radar waveform using ssd and supplementary classifier,” IEEE Transactions on Signal Processing, vol. 67, no. 13, pp. 3516–3530, 2019.
  • [24] G. Vanhoy, N. Thurston, A. Burger, J. Breckenridge, and T. Bose, “Hierarchical modulation classification using deep learning,” in MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM).   IEEE, 2018, pp. 20–25.
  • [25] T. J. O’Shea, T. Roy, N. West, and B. C. Hilburn, “Demonstrating deep learning based communications systems over the air in practice,” in 2018 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN).   IEEE, 2018, pp. 1–2.
  • [26] G. Baldini, C. Gentile, R. Giuliani, and G. Steri, “Comparison of techniques for radiometric identification based on deep convolutional neural networks,” Electronics Letters, vol. 55, no. 2, pp. 90–92, 2018.
  • [27] S. M. Hiremath, S. Behura, S. Kedia, S. Deshmukh, and S. K. Patra, “Deep learning-based modulation classification using time and stockwell domain channeling,” in 2019 National Conference on Communications (NCC).   IEEE, 2019, pp. 1–6.
  • [28] X. Zhang, T. Seyfi, S. Ju, S. Ramjee, A. El Gamal, and Y. C. Eldar, “Deep learning for interference identification: Band, training snr, and sample selection,” in 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).   IEEE, 2019, pp. 1–5.
  • [29] K. Sankhe, M. Belgiovine, F. Zhou, L. Angioloni, F. Restuccia, S. D’Oro, T. Melodia, S. Ioannidis, and K. Chowdhury, “No radio left behind: Radio fingerprinting through deep learning of physical-layer hardware impairments,” IEEE Transactions on Cognitive Communications and Networking, vol. 6, no. 1, pp. 165–178, 2019.
  • [30] F. Liang, C. Shen, W. Yu, and F. Wu, “Towards optimal power control via ensembling deep neural networks,” IEEE Transactions on Communications, vol. 68, no. 3, pp. 1760–1776, 2019.
  • [31] X. Chen, J. Cheng, Z. Zhang, L. Wu, J. Dang, and J. Wang, “Data-rate driven transmission strategies for deep learning-based communication systems,” IEEE Transactions on Communications, vol. 68, no. 4, pp. 2129–2142, 2020.
  • [32] D. Roy, T. Mukherjee, M. Chatterjee, E. Blasch, and E. Pasiliao, “Rfal: Adversarial learning for rf transmitter identification and classification,” IEEE Transactions on Cognitive Communications and Networking, 2019.
  • [33] J. Tan, L. Zhang, Y.-C. Liang, and D. Niyato, “Intelligent sharing for lte and wifi systems in unlicensed bands: A deep reinforcement learning approach,” IEEE Transactions on Communications, vol. 68, no. 5, pp. 2793–2808, 2020.
  • [34] M. Li, O. Li, G. Liu, and C. Zhang, “Generative adversarial networks-based semi-supervised automatic modulation recognition for cognitive radio networks,” Sensors, vol. 18, no. 11, p. 3913, 2018.
  • [35] M. Usama, J. Qadir, A. Raza, H. Arif, K.-L. A. Yau, Y. Elkhatib, A. Hussain, and A. Al-Fuqaha, “Unsupervised machine learning for networking: Techniques, applications and research challenges,” IEEE Access, vol. 7, pp. 65 579–65 615, 2019.
  • [36] Z.-L. Tang, S.-M. Li, and L.-J. Yu, “Implementation of deep learning-based automatic modulation classifier on FPGA SDR platform,” Electronics, vol. 7, no. 7, p. 122, 2018.
  • [37] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 Eighth IEEE International Conference on Data Mining.   IEEE, 2008, pp. 413–422.
  • [38] P. J. Rousseeuw and K. V. Driessen, “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, vol. 41, no. 3, pp. 212–223, 1999.
  • [39] Y. Chen, X. S. Zhou, and T. S. Huang, “One-class SVM for learning in image retrieval,” in Proceedings 2001 International Conference on Image Processing (Cat. No. 01CH37205), vol. 1.   IEEE, 2001, pp. 34–37.
  • [40] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp. 93–104.
  • [41] R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, and T. Naemura, “Classification-reconstruction learning for open-set recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4016–4025.
  • [42] C. Geng, S.-j. Huang, and S. Chen, “Recent advances in open set recognition: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  • [43] S. Liu, Q. Shi, and L. Zhang, “Few-shot hyperspectral image classification with unknown classes using multitask deep learning,” IEEE Transactions on Geoscience and Remote Sensing, 2020.
  • [44] D. Roy, T. Mukherjee, M. Chatterjee, and E. Pasiliao, “Detection of rogue RF transmitters using generative adversarial nets,” in 2019 IEEE Wireless Communications and Networking Conference (WCNC).   IEEE, 2019, pp. 1–7.
  • [45] S. Rajendran, W. Meert, V. Lenders, and S. Pollin, “Saife: Unsupervised wireless spectrum anomaly detection with interpretable features,” in 2018 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN).   IEEE, 2018, pp. 1–9.
  • [46] Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in European conference on computer vision.   Springer, 2016, pp. 499–515.
  • [47] X. Chen, D. P. Kingma, T. Salimans, Y. Duan, P. Dhariwal, J. Schulman, I. Sutskever, and P. Abbeel, “Variational lossy autoencoder,” arXiv preprint arXiv:1611.02731, 2016.
  • [48] I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework.” ICLR, vol. 2, no. 5, p. 6, 2017.
  • [49] A. Ng et al., “Sparse autoencoder,” CS294A Lecture notes, vol. 72, no. 2011, pp. 1–19, 2011.
  • [50] F. Pukelsheim, “The three sigma rule,” The American Statistician, vol. 48, no. 2, pp. 88–91, 1994.
  • [51] P. Bajorski, Statistics for imaging, optics, and photonics.   John Wiley & Sons, 2011, vol. 808.
  • [52] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image restoration with neural networks,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 47–57, 2017.
  • [53] A. Hirose, Complex-valued neural networks: Advances and applications.   John Wiley & Sons, 2013, vol. 18.
[Uncaptioned image] Yihong Dong received his B.S. degree in Computer Science from Shanghai University, China, in 2019. He is currently a graduate student with the School of Software Engineering, Tongji University. His research interests include machine learning with the applications in signal processing.
[Uncaptioned image] Xiaohan jiang is currently an undergraduate with the School of Software Engineering, Tongji University. His research interests include deep learning and computer vision.
[Uncaptioned image] Huaji Zhou was born in Jinhua, Zhejiang, China, in 1988. He received the B.S. degree in Navigation, Guidance, and Control Technology and the M.S. degree in Pattern Recognition and Intelligent System from Xidian University, Xi’an, China, in 2010 and 2013, respectively. He is currently a Research Assistant with Science and Technology on Communication Information Security Control Laboratory, Jiaxing, China. And he is pursuing Ph.D. degree in Electronics and Information at Xidian University, Xi’an, China, His current research interests include machine learning and electromagnetic signal processing.
[Uncaptioned image] Yun Lin received the B.S. degree from Dalian Maritime University, Dalian, China, in 2003, the M.S. degree from the Harbin Institute of Technology, Harbin, China, in 2005, and the Ph.D. degree from Harbin Engineering University, Harbin, China, in 2010. He was a research scholar with Wright State University, USA, from 2014 to 2015. Now, he is currently a full professor in the College of Information and Communication Engineering, Harbin Engineering University, China. His current research interests include machine learning and data analytics over wireless networks, signal processing and analysis, cognitive radio and software defined radio, artificial intelligence and pattern recognition. He had published more than 150 international peer-reviewed journal/conference papers, such as the IEEE IoT, TII, TVT, TCCN, TR, Access, INFOCOM, GLOBECOM, ICC, VTC, ICNC. He had four high-cited papers and several best conference papers. He is serving as an editor for the IEEE TRANSACTIONS ON RELIABILITY, KSII Transactions on Internet and Information Systems, and International Journal of Performability Engineering. In addition, he served as General Chair of ADHIP 2020, TPC Chair of MOBIMEDIA 2020, ICEICT 2019 and ADHIP 2017, and TPC member of GLOBECOM, ICC, ICNC and VTC. He had successfully organized several international workshops and symposia with top‐ranked IEEE conferences, including INFOCOM, GLOBECOM, DSP, ICNC, among others.
[Uncaptioned image] Qingjiang Shi received his Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, Shanghai, China, in 2011. From September 2009 to September 2010, he visited Prof. Z.-Q. (Tom) Luo’s research group at the University of Minnesota, Twin Cities. In 2011, he worked as a Research Scientist at Bell Labs China. From 2012, He was with the School of Information and Science Technology at Zhejiang Sci-Tech University. From Feb. 2016 to Mar. 2017, he worked as a research fellow at Iowa State University, USA. From Mar. 2018, he is currently a full professor with the School of Software Engineering at Tongji University. He is also with the Shenzhen Research Institute of Big Data. His interests lie in algorithm design and analysis with applications in machine learning, signal processing and wireless networks. So far he has published more than 60 IEEE journals and filed about 30 national patents. Dr. Shi was an Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING. He was awarded Golden Medal at the 46th International Exhibition of Inventions of Geneva in 2018, and also was the recipient of the First Prize of Science and Technology Award from China Institute of Communications in 2018, the National Excellent Doctoral Dissertation Nomination Award in 2013, the Shanghai Excellent Doctorial Dissertation Award in 2012, and the Best Paper Award from the IEEE PIMRC’09 conference.