A Novel RL-assisted Deep Learning Framework
for Task-informative Signals Selection and Classification for Spontaneous BCIs

Wonjun Ko, Eunjin Jeon, and Heung-Il Suk W. Ko was with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841 Korea Republic.E. Jeon was with the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841 Korea Republic.H.-I. Suk was with the Department of Artificial Intelligence and the Department of Brain and Cognitive Engineering, Korea University, Seoul 02841 Korea Republic.

Abstract

In this work, we formulate the problem of estimating and selecting task-relevant temporal signal segments from a single EEG trial in the form of a Markov decision process and propose a novel reinforcement-learning mechanism that can be combined with the existing deep-learning based BCI methods. To be specific, we devise an actor-critic network such that an agent can determine which timepoints need to be used (informative) or discarded (uninformative) in composing the intention-related features in a given trial, and thus enhancing the intention identification performance. To validate the effectiveness of our proposed method, we conducted experiments with a publicly available big MI dataset and applied our novel mechanism to various recent deep-learning architectures designed for MI classification. Based on the exhaustive experiments, we observed that our proposed method helped achieve statistically significant improvements in performance.

Index Terms:

Brain–Computer Interface; Electroencephalogram; Motor Imagery; Deep Learning; Reinforcement Learning; Subject-independent

I Introduction

Brain–computer interface (BCI) is an emerging technology that allows communicable pathways between a brain and an external device, e.g., a robotic arm, by measuring and identifying intention-reflected brain activities [1]. Generally, non-invasive BCI systems, commonly using electroencephalogram (EEG), are categorized into two types, evoked and spontaneous BCIs. While evoked BCIs exploit evoked potentials like P300, mostly induced by an external stimulus, spontatneous BCIs focus on internal cognitive processes such as event-related (de)synchronization (ERD/ERS). In this work, we focus on motor imagery (MI) induced brain signals [2].

Since MI-EEGs are voluntarily inducible, MI-based BCIs show great values in the clinical and applicational standpoints. However, because of the self-inducing property and difficulty in consistently inducing spontaneous EEG signals for a period of time, the MI-EEG signals are highly likely to have not only MI-relevant information, but also irrelevant information in trials [3], which is regarded as unreliable EEG segments in the following description. Generally, in an MI-EEG acquisition protocol, self-induced MI-EEG data is obtained by presenting a cue-signal (e.g., left-arrow sign to imagine the movement of left-hand, right-arrow sign to imagine the movement of right-hand, etc.) [4]. Therefore, the acquired EEG data could have unreliable segments, when the subject does not fully concentrate during the MI-EEG acquisition because of lack of familiarity in BCIs or uncomfortable condition, e.g., long-calibration time. Further, the MI-EEG can also have different physiological noise, e.g., heartbeat, eyeball movement, etc. [3]. Thus, it is not reasonable to have a complete reliability to the acquired EEG trials.

Refer to caption — Figure 1: Comparison of the power spectrogram (by short-time Fourier transform) of the C4 channel in left-hand motor imagery trials from two subjects. There is a clear and lasting pattern in the range of $\mu$ -band for the Subject #28 (a). However, there is no evident and lasting activation pattern in neither of $\mu$ - nor $\beta$ -bands for the Subject #11 (b).

As an example, Fig. 1 compares the power spectrogram of the C4 channel in left-hand MI trials from two subjects. Many neurophysiological studies on physical or imagery movements [5, 6] have consistently witnessed that MI-caused signal patterns are observed in $\mu$ (8-12Hz) and/or $\beta$ (12-30Hz) bands, even though there are not generic frequency ranges that can be applicable to all subjects and there are high variations in signal patterns among subjects and even among sessions of the same subject. In the spectrogram of the Subject #28, the high power pattern in the near $\mu$ -band is observed and lasting for a period of time. However, such evident patterns are not observable in the spectrogram of the Subject #11. Thus, the typical machine-learning algorithms including recent deep learning methods [7, 6, 8, 9, 10, 11] that exploit the whole signals of trials for model training and intention identification may not be equally applicable to those subjects.

There have been recent studies that considered the unreliability of information in features or raw data in training predictive models [3, 12, 13]. Among them, Li et al. [3] suggested that training predictive models with the full EEG signals of BCI trials is not necessarily helpful to enhance classification performance in MI-BCIs. Inspired by their work, we performed a preliminary study to compare the performance changes between two models trained and tested with (1) full signals (FM) and (2) randomly masked out signals in time, thus discarding the respective features (RM) for individual subjects. The resulting plot is shown in Fig. 2. Interestingly, we could observe that for many subjects, their respective performance with RM was almost the same with or higher than that with FM. Based on that result, we hypothesize that rather than extracting features from the full signals in a trial, it would be effective to select intention-related signal segments, i.e., to discard intention-unrelated or noisy signals, and to use them only for feature representation and the ensuing classifier learning.

In the meantime, while MI-EEGs are obtained by the general protocol, there is no way to know whether the given temporal signal is MI-relevant or not. In other words, we cannot have any information about MI-relevancy for acquired EEG signals explicitly. Thus, we formulate the problem of selecting MI-relevant signal segments without any supervision in the form of a Markov decision process and tackle it via reinforcement learning (RL) [14] systematically. To our best knowledge, this is the first work that proposes RL-based intention-related signal segments selection and jointly learning feature representation and a classifier in a unified framework.

The main contributions of our work are as follows:

•

First, we tackle the problem of estimating and selecting reliable signals in MI-EEG, which can be an important issue to practical usage of BCI, by formulating in an RL framework.
•

Second, we devise an actor-critic model for MI-based BCI and define a novel reward function.
•

As our proposed of the RL-based feature vectors selection over time is modular, it is easy to plug into the existing deep-learning architectures with minor modification, and thus to help enhance classification performance.
•

In our experiments over a big MI dataset, we achieved statistically significant performance improvements with our proposed method injected in various deep networks, further outperforming other comparative methods in the literature.

This paper is organized as follows: Section II reviews the previous studies on EEG decoding methods including deep learning approaches and MI-relevant EEG trials selection. In Section III, we propose an MI-relevant EEG signal segments selection method in an actor-critic framework [14] and describe our objective optimization strategy with a novel reward function. Section IV describes the EEG dataset, experimental settings, and quantitative results by comparing with the existing methods in the literature. We then analyze the results to further validate the effectiveness of our method in Section V, and finally summarize our work in Section VI.

II Related Work

Over the past decades, a common spatial pattern (CSP) algorithm [10] and its variants [11, 15] have been studied most actively for MI-EEG decoding by focusing on spatial filters learning such that the signals are transformed and dimension-reduced to be better discriminative. In particular, Ang et al. [11] band-pass filtered MI signals before applying CSP, thereby representing spatio-spectral features of EEG signals. Suk and Lee [15] proposed a Bayesian framework to jointly optimize the spectral filters and spatial filters in a unified framework by defining frequency bands as random variables.

Meanwhile, deep learning methods have achieved promising results in EEG signal decoding studies [16, 17]. For instance, Schirrmeister et al. [7] proposed various convolutional neural networks (CNNs) for MI classification, e.g., Shallow ConvNet and Deep ConvNet. Ko et al. [18] proposed an interesting recurrent spatio-temporal CNN architecture. Lawhern et al. [6] proposed an EEGNet that exploited depth-wise convolutional layers and separable convolutional layers [19] for reducing tunable parameters, thereby learnable with a limited number of EEG samples. Zhang et al. [20] proposed Parallel CRN and Cascade CRN, combined recurrent neural network (RNN) and CNN to extract spatio-spectral features of MI-EEG. Further, Kwon et al. [9] also proposed multi spectral-spatial feature representation (SSFR) using spectral filtering and CNN for MI decoding on both subject-dependent and independent manners. More recently, Ko et al. [8] devised multi-scale neural network (MSNN), which learns multi-scale (in frequency) feature representations of EEG signals, and presented its applicability to various EEG-based applications.

Unlike most of the existing methods that focused on spatial or spatio-spectral feature extraction with no attempt to find task-relevant EEG trials or signals in trials, Fruitet et al. [21] focused on task-related trials selection by formulating it as a multi-armed bandit problem [22]. In particular, given an EEG trial, their method estimates the confidence of containing task-relevant information compared to idle state EEG signals. Recently, Li et al. [3] proposed spectral component CSP (SCCSP) to select MI relevant EEG trials. Specifically, they conducted independent component analysis [23] on bandpass-filtered signals for MI-relevant and MI-irrelevant components extraction on each class independently. The extracted components were then used for MI-relevant EEG trials selection from the training dataset, based on which they ran CSP for feature extraction and trained a classifier.

Our method can be comparable to their methods in the sense of concerning MI-relevant signals selection in a framework. First, we consider signal segments selection in each trial, rather than selecting trials in a dataset. That is, we can still use the whole trials in a training set by allowing to maximally utilize all the available samples. Second, when comparing with Fruitet et al.’s work [21], our method do not require idle state EEG trials, which otherwise could be great limitation as requiring additional time for data acquisition, thus causing a longer calibration time accordingly. Further, unlike Li et al.’s work [3] of learning baseline components, which are used to determine MI-relevance of EEG signals, feature extraction and classifier learning separately, we devise a systematically integrated framework for feature representations learning, estimation and selection of MI-relevant feature vectors of signal segments, and classifier learning in a unified framework. It is also noteworthy that those modules are jointly optimized in an end-to-end manner. Throughout the paper, we use the terms of signal segments and temporal feature vectors of EEG signals interchangeably.

III Methods

In this section, we define the MI-relevant EEG signal segments selection problem, and formulate it in a novel framework where a reinforcement-learning induced module plays a vital role for performance enhancement. The proposed framework has three main modules as schematized in Fig. 3(a). Given a sequence of signals in a trial ${\bf x}=\left\{x_{1},\dots,x_{T}\right\}\in\mathbb{R}^{C\times T}$ , where $C$ and $T$ denote, respectively, the number of channels and timepoints, it first passes through an embedding network for features representation. The represented feature vectors are then fed into our novel agent module to estimate their task-relevancy and to select the informative signal segments for the target task. Finally, a classifier makes a decision for the task, i.e., MI classification, using the selected feature vectors over time.

III-A Embedding Network

Notably, this module is flexible with many kinds of network architectures, varying from the existing ones in the literature to newly customized networks. In our experiments, we exploit the existing CNN architectures, namely, ShallowNet [7], DeepNet [7], EEGNet [6], and MSNN [18]. Basically, these architectures were proposed by different research groups and presented their superiority or validity in their respective experiments over various datasets. In the following, we denote an embedding network for feature representation as $\phi(\cdot;\theta_{\phi})$ with a tunable parameters $\theta_{\phi}$ .

III-B Agent Network

We introduce a learnable agent that adaptively and automatically selects task-relevant feature vectors of EEG signals over time in a trial without supervision, as there is no explicit way of observing such information in a trial. For the feature vectors $\bm{\phi}=\left\{\phi_{1},\dots,\phi_{T^{\prime}}\right\}\in\mathbb{R}^{D\times T^{\prime}}$ of the input signals, where $D$ is the dimension of feature vectors, we devise a method for automatic selection of signal segments over time $t\in\mathcal{S}$ , $\mathcal{S}\subset\left\{1,\dots,T^{\prime}\right\}$ , such that the selected feature vectors $\left\{\phi_{t}\right\}_{t\in\mathcal{S}}$ carry the most information related to the user’s intention, induced by means of MI. However, as MI involves an internal cognitive process in a brain, and thus there are no clear labels, i.e., informative or non-informative, for signals at which timepoints they actually include the intention-related information.

Here, we formulate the problem of informative feature vectors selection of signals in a Markov decision process [22] and devise an RL-assisted module to enhance the MI-EEG classification performance. Specifically, an agent interacts with the environment defined with a given MI-EEG trial via a sequence of states (defined with the set of feature vectors represented by an embedding network $\phi$ ), actions (selection or rejection), and rewards (effects of making specific actions, i.e., decisions) over time, as illustrated in Fig. 3(b).

In order to demystify our method, we define states, actions, and rewards as follows:

III-B1 State

A state $s_{t}$ $(t=1,\dots,T^{\prime})$ in our work is represented as a continuous vector constructed by concatenating the aggregated feature vectors of the selected up to the previous time point, i.e., $\mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{t-1}}\right)$ and the same one but further including the feature vector of the current time $t$ , i.e., $\phi_{t}$ as follows:

s_{t}=\mathrm{Concat}\left(\begin{array}[]{c}\mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{t-1}}\right),\\ \mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{t-1}}\cup\{\phi_{t}\}\right)\end{array}\right)

(1)

where $\mathcal{S}_{t-1}$ is an index set of the selected feature vectors up to the time $t-1$ . the operators of $\mathrm{Concat}$ and $\mathrm{AGG}$ denote, respectively, a vector concatenation operator and an aggregation operator. In our work, we use a mean aggregator defined as

\mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{t}}\right)=\frac{1}{\left|\mathcal{S}_{t}\right|}\sum_{i\in\mathcal{S}_{t}}\phi_{i}

(2)

where $|\mathcal{S}_{t}|$ is a cardinality of the set $\mathcal{S}_{t}$ .

III-B2 Action

An action space $\mathcal{A}$ is defined to make it possible for the agent to select (1) or reject (0) the sequence of feature vectors over time and we are interested in finding an optimal action sequence to maximize the expected rewards. Concretely, referring to the current state $s_{t}$ that involves the comparative information of both aggregating and non-aggregating the feature vector of the current time $t$ with the features of the earlier selected, it estimates the effect of the current feature vector to increase the resulting expected rewards. Based on the agent’s action, the set $\mathcal{S}_{t}$ is updated as follows:

\mathcal{S}_{t}=\left\{\begin{array}[]{ll}\mathcal{S}_{t-1}\cup\{t\}&\text{if }a_{t}=1\text{ (selection)}\\ \mathcal{S}_{t-1}&\text{otherwise }\text{ (rejection)}\end{array}\right.

(3)

III-B3 Reward

In order to define the rewards with respect to actions made by the agent, we first define the base information by taking a global average pooling (GAP) [24] over the whole feature vectors over time in a trial as follows:

\mathbf{f}_{\mathrm{GAP}}=\mathrm{AGG}\left(\left\{\phi_{t}\right\}_{t\in\{1,\dots,T^{\prime}\}}\right)

(4)

and calculate the classification loss $\mathcal{L}_{\mathrm{GAP}}$ as a criterion. Then, the reward $r_{t}$ with respect to the current action $a_{t}$ and the corresponding feature vector $\mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{t}}\right)$ is defined to measure the relative improvement to the base feature vector of Eq. (4) in terms of the loss as follows:

r_{t}=\mathcal{L}_{t}-\mathcal{L}_{\mathrm{GAP}}

(5)

where $\mathcal{L}_{t}$ is a classification loss of $\mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{t}}\right)$ . With the reward given in Eq. (5), we then define the total return $R_{t}$ as

R_{t}=\sum_{k=0}^{T^{\prime}}\gamma^{k}r_{t+k}

(6)

where $\gamma$ denotes a discount factor to deal with a delayed reward [22].

III-B4 Actor-Critic Network

Technically speaking, of various RL approaches, we exploit an actor-critic model [14], thanks to its popularity and fitness to our problem. That is, our agent maintains a policy network $\pi(a_{t}|s_{t};\theta_{\pi})$ as an actor and a value estimation function $V(s_{t};\theta_{v})$ as a critic. For the $t$ ^th timepoint, the agent receives a state $s_{t}$ and decides its action $a_{t}$ from a set of possible actions $\mathcal{A}$ based on the policy $\pi$ . Then, the reward $r_{t}$ and the next state $s_{t+1}$ are obtained from the environment as in Eq. (3).

In our work, we utilize a synchronized parallel actor-critic network. Specifically, two distinct deep neural networks are used for a policy estimation and the expected return or value estimation, respectively. The output neurons in our policy network $\pi(a_{t}|s_{t};\theta_{\pi})$ correspond to the probability of taking a selection or rejection action with respect to the current feature vector under the state $s_{t}$ , i.e., $a\sim\pi(a_{t}|s_{t};\theta_{\pi})$ . Meanwhile, the value estimation network $V(s_{t};\theta_{v})$ has a single output neuron, which produces the expected return under the current state $s_{t}$ .

III-C Classifier

After selecting informative feature vectors by our agent over time in a trial, the aggregated vector representation of those is then fed into a densely-connected layer $\rho(\cdot;\theta_{\rho})$ for decision-making. As for the aggregation, we again introduce the mean average of feature vectors in Eq. (4), also called as the GAP [24]. In the viewpoint of BCI, the GAP layer can be understood as a means of emphasizing an important spectral range and its neighboring region for each of the feature dimension. Using the aggregated feature vector $\mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{T^{\prime}}}\right)$ , the classifier outputs a class label $\hat{\mathbf{y}}$ of the input EEG trial.

III-D Optimization and Training Strategy

To jointly optimize the embedding network, the policy and value networks of an agent module, and a classifier, the proposed framework involves two types of learning schemes, i.e., supervised learning and reinforcement learning. We combine these two learning strategies in our network optimization.

First, the embedding network $\phi$ and a classifier $\rho$ are pre-trained in a supervised manner without the agent module by minimizing a cross-entropy loss. After pre-training, the actor and critic networks in an agent module are trained to select task-informative features by interacting with the environment. Initially, the agent takes the feature vectors $\bm{\phi}$ represented by the pre-trained embedding network. Thus, the agent basically starts from the more learned position in a parameter space, rather than a random initial point, thereby training parameters $\phi_{\pi}$ and $\phi_{v}$ faster and more robustly.

The model parameters updating is alternated between (i) the agent module and (ii) the other two modules of feature representation and classification. As the agent is directed to find more informative features by being iteratively updated, the embedding network and the classifier can also focus on the task-oriented feature learning, and thus can be better generalized in a more reliable way.

To optimize the sequential actions, we update the trainable parameters of the actor network $\theta_{\pi}$ and the critic network $\theta_{v}$ by performing a gradient ascent in regard to maximization of the expected total return $\mathbb{E}[R_{t}]$ ( $t=1,...,T^{\prime}$ ). Basically, the actor parameters $\theta_{\pi}$ are learned in the direction of $\nabla_{\theta_{\pi}}\log\pi(a_{t}|s_{t};\theta_{\pi})\cdot R_{t}$ [22]. However, although the updating direction is an unbiased estimate of $\nabla_{\theta_{\pi}}\mathbb{E}[R_{t}]$ , we need to reduce the variance of this estimate by introducing another value, called advantage, [14]. The advantage $A_{t}$ is calculated as follows:

A_{t}=r_{t}+\gamma V(s_{t+1};\theta_{v})-V(s_{t};\theta_{v}).

(7)

By applying the advantage function to the gradient estimation, we define a loss for an actor network as follows:

\mathcal{L}_{t}^{\pi}=\log\pi(a_{t}|s_{t};\theta_{\pi})A_{t}.

(8)

Meanwhile, the value estimation function $V(\cdot;\theta_{v})$ approximates the expected return for the given state $s_{t}$ , i.e., $V(s_{t};\theta_{v})=\mathbb{E}[R_{t}|s_{t}]$ . Owing to the fact that we cannot directly know a value of a specific state, the value estimation function is optimized by a bootstrapping method [22]. According to its definition, the current state value estimation $V(s_{t};\theta_{v})$ should be equal to the summation of the current reward and the next state value estimation $r_{t}+\gamma V(s_{t+1};\theta_{v})$ , thus its training loss is defined as follows:

\mathcal{L}_{t}^{v}=\frac{1}{2}\left[V(s_{t};\theta_{v})-\left(r_{t}+\gamma V(s_{t+1};\theta_{v})\right)\right]^{2}.

(9)

The complete pseudo-algorithm to train all the networks in our framework is presented in Algorithm 1.

Input: Training samples and corresponding labels

\mathbf{x},\mathbf{y}

Input: Network architectures

\theta_{\phi}

\theta_{\pi}

\theta_{v}

, and

\theta_{\rho}

; # of pre-training

n_{\mathrm{pre}}

; an optimizer

\mathrm{SGD}

; a learning rate

\alpha

; a discount factor

\gamma

Output: Optimal networks

\theta_{\phi}^{*}

\theta_{\pi}^{*}

\theta_{v}^{*}

, and

\theta_{\rho}^{*}

1 for $i=1,...,n_{\mathrm{pre}}$ do

\bm{\phi}\leftarrow\phi(\mathbf{x};\theta_{\phi})

;

\mathbf{f}_{\mathrm{GAP}}

\leftarrow

Eq. (4);

\hat{\mathbf{y}}\leftarrow\rho(\mathbf{f}_{\mathrm{GAP}};\theta_{\rho})

;

5 Update

\theta_{\phi}

and

\theta_{\rho}

using

\mathrm{SGD}(\mathrm{BCE}(\mathbf{y},\hat{\mathbf{y}}),\alpha)

;

7Estimate

\mathcal{L}_{\mathrm{GAP}}

using

\mathbf{f}_{\mathrm{GAP}}

;

8 while Network parameters not converged do

\bm{\phi}\leftarrow\phi(\mathbf{x};\theta_{\phi})

;

10 for $t=1,...,T^{\prime}$ do

s_{t}\leftarrow

Eq. (1);

a_{t}\sim\pi(a_{t}|s_{t};\theta_{\pi})

;

r_{t}\leftarrow

Eq. (5);

s_{t+1}\leftarrow

Eq. (1);

\mathcal{L}_{t}^{v}\leftarrow

Eq. (9);

16 Update

\theta_{v}

using

\mathrm{SGD}(\mathcal{L}_{t}^{v},\alpha)

;

A_{t}\leftarrow

Eq. (7);

\mathcal{L}_{t}^{\pi}\leftarrow

Eq. (8);

19 Update

\theta_{\pi}

using

\mathrm{SGD}(-\mathcal{L}_{t}^{\pi},\alpha)

;

\hat{\mathbf{y}}\leftarrow\rho(\mathrm{AGG}\left(\left\{\phi_{i}\right\}_{i\in\mathcal{S}_{T^{\prime}}}\right);\theta_{\rho})

;

22 Update

\theta_{\phi}

and

\theta_{\rho}

using

\mathrm{SGD}(\mathrm{BCE}(\mathbf{y},\hat{\mathbf{y}}),\alpha)

;

Algorithm 1 Pseudo-code for the proposed method

IV Experiments

In this section, we describe the dataset used for performance evaluation, our experimental scenarios, experimental settings, and performance comparison among the competitive methods. In regard to the performance comparison, we considered the mean, median and min-max accuracy over all subjects.

IV-A Dataset and Preprocessing

We used a publicly available big KU-MI dataset [4]¹¹1Available at http://gigadb.org/dataset/100542., which consists of left-hand and right-hand MI tasks. MI samples were acquired across two sessions from 54 healthy subjects, recorded from 62 Ag/AgCl electrodes according to the standard 10-20 system, and sampled with 1000Hz. Each MI class of the dataset contains 50 trials with a 4-second length. For preprocessing, following [9, 4], we downsampled EEG trials to 100Hz and then applied a band-pass filtering between 8 and 30Hz, including both $\mu$ and $\beta$ bands, and segmented from 1 sec to 3.5 sec (250 timepoints). Finally, we selected 20 electrodes (FC-1/2/3/4/5/6, C-1/2/3/4/5/6/z, and CP-1/2/3/4/5/6/z) over the sensory-motor cortex areas.

IV-B Experimental Scenarios

In order to empirically prove the validity of our proposed method, we compare with the existing subject-dependent and subject-independent methods in performance. By following the recent work of [9], we set the subject-dependent and subject-independent scenarios as follows:

IV-B1 Subject-dependent

For the subject-dependent case, the offline data (training samples) from the second session was used to train the MI classification models. Then, the online data (testing samples) also from the second session was used for the performance validation using the trained models.

IV-B2 Subject-independent

For the subject-independent scenario, we conducted a leave-one-subject-out cross-validation procedure. To be concrete, we trained subject-independent MI classification models using all training subjects’ offline and online data from both sessions. After training, we evaluated the trained models on the target subject’s offline data from the second session.

IV-C Experimental Settings

While training our proposed framework in Fig. 3(a), we set a mini-batch size of 5, an exponentially decreasing learning rate with an initial value of 0.003 and a decreasing ratio of 0.001 per epoch, an RMSProp optimizer [25], and a Xavier initializer [26]. For the embedding and classification modules in our framework, we used the existing network architectures of [7, 6, 8]. Briefly, Shallow ConvNet [7] is composed of two convolutional layers, a temporal convolutional layer and a spatial convolutional layer with a square activation function for embedding in a feature space. Deep ConvNet [7] has a temporal convolutional layer, a spatial convolutional layer, and following three temporal convolutional layers with an exponential linear unit (eLU) activation function for feature representation. EEGNet [6] consists of a spectral convolutional layer, a spatial depthwise convolutional layer [19], and a temporal separable convolutional layer [19] with an eLU activation function for spatio-temporal feature representation. Finally, for the MSNN [8], a spectral convolution and three residually connected temporal separable convolutional layers and spatial convolutional layers with a leaky ReLU function were used as the embedding part. However, in order for better integration with our proposed agent module for signal segments selection, we made a slight modification in the architecture of Shallow ConvNet, Deep ConvNet, and EEGNet by replacing the last feature output layer (i.e., average pooling in Shallow ConvNet and EEGNet, max pooling in Deep ConvNet) with a GAP layer. In this reason, in the following, we differentiate those networks by naming with ‘original’ and ‘modified’ networks. In regard to the classification module, we utilized the above-mentioned networks’ densely-connected layers, respectively. As for the SSFR [9], because it was designed for energy map-based feature representation, rather than the spatio-temporal features, we did not consider it to apply in our framework.

In a pre-training phase for the embedding and classification networks, we set the number of epochs $n_{\mathrm{pre}}$ by 10. In regard to the total return $R_{t}$ estimation, a discount factor $\gamma$ of 0.95 was used. For the actor and critic networks, we designed densely-connected layers with a softmax and a sigmoid activation functions for their output layers, respectively. During training, we also applied an elastic net regularizer with the coefficients of $\ell_{1}$ and $\ell_{2}$ as 0.01 and 0.001, respectively.

We implemented all the models considered in our experiments, except for the linear models and SSFR as their performances were taken from [9], by Tensorflow 2 [27] and trained on a single Titan RTX GPU on Ubuntu 18.04.

TABLE I: Performance comparison among the comparative and competitive methods under the subject-dependent learning scenario. For the methods with

\star

, their performance was obtained from [9]. AM denotes temporally informative segments selection by our proposed agent module.

Method	Mean (SD)	Median	Max-Min
CSP^$\star$ [10]	68.57 (17.57)	64.50	100.00-42.00
CSSP^$\star$ [28]	69.68 (18.53)	63.00	100.00-42.00
FBCSP^$\star$ [11]	70.59 (18.56)	64.00	100.00-45.00
SCCSP [3]	69.13 (16.90)	64.50	100.00-48.00
BSSFO^$\star$ [15]	71.02 (18.83)	63.50	100.00-48.00
Shallow ConvNet [7]	72.39 (16.38)	68.00	100.00-46.00
Deep ConvNet [7]	62.63 (13.23)	58.50	100.00-50.00
EEGNet [6]	64.93 (18.04)	56.50	100.00-47.00
SSFR^$\star$ [9]	71.32 (15.88)	66.45	99.00-45.90
MSNN [8]	74.39 (15.59)	70.50	100.00-52.00
Shallow ConvNet + AM	74.26 (15.76)	69.00	100.00-53.00
Deep ConvNet + AM	65.02 (15.48)	58.00	100.00-51.00
EEGNet + AM	67.06 (18.05)	57.00	100.00-50.00
MSNN + AM	77.26 (13.92)	74.50	100.00-56.00

TABLE II: Performance comparison among the comparative and competitive methods under the subject-independent learning scenario. For the methods with

\star

, their performance was obtained from [9]. AM denotes our proposed agent module.

Method	Mean (SD)	Median	Max-Min
Pooled CSP^$\star$ [29]	65.65 (16.11)	58.00	100.00-45.00
Fused model^$\star$ [30]	67.37 (16.01)	62.50	98.00-41.00
MR FBCSP^$\star$ [29]	68.59 (15.28)	63.00	97.00-48.00
SSFR^$\star$ [9]	74.15 (15.83)	75.00	100.00-40.00
MSNN [8]	73.96 (17.95)	73.00	100.00-45.00
MSNN + AM	75.24 (17.40)	75.00	100.00-45.00

IV-D Experimental Results

IV-D1 Subject-dependent

The classification accuracy for the subject-dependent scenario is summarized in TABLE I. First, our proposed method with a modified embedding and classification modules from MSNN [8] achieved the highest performance with a large margin, compared to most of the other methods. Second, it is remarkable that deep learning models integrated in our proposed framework for the embedding and classification modules achieved consistently higher performances than the corresponding original methods. It is also noteworthy that the deep learning models combined with our proposed agent module also enhanced the median and minimum accuracy, compared to their counterparts. This implicitly assures that our proposed framework, especially the agent module, helped to boost the performance across all the subjects.

IV-D2 Subject-independent

TABLE II summarizes the classification accuracy of the comparative methods, applicable for the subject-independent scenario. As for our proposed method, we defined the embedding and classification modules with MSNN due to its superiority to other deep models in TABLE I. Again, our proposed method achieved the highest mean accuracy with a small margin compared to the second best performance by SSFR [9]. It is also noticeable that the use of our proposed agent module helped to enhance the performance by 1.28% compared to the original MSNN.

V Analyses

In this section, we present the validity of our proposed framework by conducting a statistical test between deep models of involving or non-involving our agent module. We also conduct a qualitative evaluation for the effect of our proposed agent module by comparing (1) the spectrograms of randomly selected EEG signals and our agent-selected EEG signal segments and (2) the topographic maps estimated by full EEG signals and agent-selected signal segments.

V-A Statistical Analysis

In order to quantitatively validate the effectiveness of our proposed framework, we conducted a two-tailed Wilcoxon’s signed-rank test among the original deep models, their modified ones, and the counterpart agent-involved models. The results are plotted in Fig. 4, which state the statistical significance of our proposed agent module with its superiority in classification accuracy. In detail, for Shallow ConvNet [7], EEGNet [6], and MSNN [8], the proposed framework showed statistical significance with $p$ -values of $<0.05$ , $<0.05$ , and $<0.01$ , respectively. From this statistical comparison, it is reasonable to say that our proposed framework, specifically the agent module, played an important role to enhance the classification accuracy across all subjects. Additionally, we also compared performance of MSNN [8] and MSNN combined with our agent module (MSNN+AM) in the subject-independent scenario, and obtained the result that our method was statistically better with $p<0.05$ than the original model in classification accuracy.

V-B Qualitative Analysis

In Fig. 5, we visualized the spectrogram (via short-time Fourier transform: STFT) of the C3/C4 channel signals in randomly selected trials from two subjects and the respective action sequences made by our agent module plugged in MSNN. In a coupled-consideration of the power spectrum and the the agent’s action of selection, we could observe their positive relations in the sense that the selected signal segments showed high spectral power in the neighbors of the $\mu$ and $\beta$ bands. Note that basically, the timepoints of an agent’s view in our framework ( $1<t<T^{\prime}$ ) is different from the original input timepoints ( $1<t<T$ ) due to a series of convolution operations in the embedding module. For intuitive interpretation of the agent’s action, we estimated and aligned the agent’s timepoints to the input timepoints by reversely computing the corresponding points in the input space.

In the meantime, for more neurophysiological inspection, in Fig. 6, we also visualized topographic maps of full signal segments in a trial and the signal segments selected by our proposed framework for the same trials in Fig. 5. Remarkably, topographic maps based on only the selected signal segments showed more clear and localized ERD/ERS pattern than those from the full signals. In particular, the selected signal segments have more prominent ERD patterns at around the C4 channel in the $\beta$ -rhytm than the full signal segments in the Subject #2. When referring to the spectrogram of that subject in Fig. 5(a), it seemed there was no evident spectral power in the $\beta$ -range over the full signal segments in a trial. However, after selecting the task-informative signal segments, we could observe a meaningful and distinguishable local pattern at the C4 channel in the $\beta$ -range. Similarly, in the spectrogram of the full signals in a trial for the Subject #39 in Fig. 5(b), there seemed less prominent local activations in the $\mu$ -range, thus no localized ERD/ERS pattern in Fig. 6(b). However, after selecting the task-relevant signal segments and plotting the corresponding topographic map, it was then observable a localized ERD/ERS at around the C3 channel. Based on these results, we empirically conclude that our agent module combined with MSNN in our proposed framework is capable of finding MI-relevant EEG signal segments, thus better learning MI-related feature representations and classifier enhancing the MI classification accuracy. Note that there there was no explicit guide or information for our agent to learn such neurophysiological knowledge.

VI Conclusion

In spontaneous BCIs, it is not easy for a user to consistently induce EEG signals for a period of time, apparently for BCI illiterates who are less capable of inducing task-related brain signals. Furthermore, as spontaneous brain signals inducement involves unobservable internal cognitive processes in a brain, it is hard to measure the information level of observed signals with respect to the target tasks, e.g., MI. Hence, it may not for all signals in a trial to necessarily reflect a user’s intention.

In this work, we focused on the problem of signals reliability in an MI-EEG trial and proposed a novel framework for task-relevant signal segments selection with an RL-assisted module for better generalization of the trained predictive models. As the components in our proposed framework are modular, it was easy and straightforward to combine with the existing deep models. From our experimental results and analyses over a publicly available big MI dataset, we observed the validity of our proposed method in both quantitative and qualitative comparisons and understandings.

Although we could achieve the state-of-the-art performance in both subject-dependent and subject-independent scenarios in our experiments, there are still some rooms to further improve our method. In particular, the agent module works on a sequence of feature vectors obtained from a preceding embedding module with the full signals in a trial. This mechanism may not be practically useful for online BCIs. Thus, it needs improving the current agent module to be better suited for real-time BCIs, and it will be our forthcoming research issue.

Acknowledgment

This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (No. 2017-0-00451, Development of BCI based Brain and Cognitive Computing Technology for Recognizing User’s Intentions using Deep Learning).

This work was also supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-00079, Department of Artificial Intelligence (Korea University)).

References

[1] M. H. Bhatti, J. Khan, M. U. G. Khan, R. Iqbal, M. Aloqaily, Y. Jararweh, and B. Gupta, “Soft Computing-based EEG Classification by Optimal Feature Selection and Neural Networks,” IEEE Trans. Ind. Informat., vol. 15, no. 10, pp. 5747–5754, 2019.
[2] G. Pfurtscheller and C. Neuper, “Motor Imagery and Direct Brain–Computer Communication,” Proc. IEEE, vol. 89, no. 7, pp. 1123–1134, 2001.
[3] L. Li, G. Xu, F. Zhang, J. Xie, and M. Li, “Relevant Feature Integration and Extraction for Single-Trial Motor Imagery Classification,” Front. Neurosci., vol. 11, p. 371, 2017.
[4] M.-H. Lee, O.-Y. Kwon, Y.-J. Kim, H.-K. Kim, Y.-E. Lee, J. Williamson, S. Fazli, and S.-W. Lee, “EEG Dataset and OpenBMI Toolbox for Three BCI Paradigms: An Investigation into BCI Illiteracy,” GigaScience, vol. 8, no. 5, p. giz002, 2019.
[5] D. J. McFarland, L. A. Miner, T. M. Vaughan, and J. R. Wolpaw, “Mu and Beta Rhythm Topographies during Motor Imagery and Actual Movements,” Brain Topogr., vol. 12, no. 3, pp. 177–186, 2000.
[6] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEGNet: A Compact Convolutional Neural Network for EEG-based Brain–Computer Interfaces,” J. Neural Eng., vol. 15, no. 5, p. 056013, 2018.
[7] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep Learning with Convolutional Neural Networks for EEG Decoding and Visualization,” Hum. Brain Mapp., vol. 38, no. 11, pp. 5391–5420, 2017.
[8] W. Ko, E. Jeon, S. Jeong, and H.-I. Suk, “Multi-Scale Neural Network for EEG Representation Learning in BCI,” arXiv preprint arXiv:2003.02657, 2020.
[9] O.-Y. Kwon, M.-H. Lee, C. Guan, and S.-W. Lee, “Subject-Independent Brain-Computer Interfaces Based on Deep Convolutional Neural Networks,” IEEE Trans. Neural Netw. Learn. Syst., 2019.
[10] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K.-R. Muller, “Optimizing Spatial Filters for Robust EEG Single-trial Analysis,” IEEE Signal Process. Mag., vol. 25, no. 1, pp. 41–56, 2008.
[11] K. K. Ang, Z. Y. Chin, H. Zhang, and C. Guan, “Filter Bank Common Spatial Pattern (FBCSP) in Brain-Computer Interface,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), 2008, pp. 2390–2397.
[12] W. Zhang, X. He, W. Lu, H. Qiao, and Y. Li, “Feature Aggregation with Reinforcement Learning for Video-Based Person Re-Identification,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 12, pp. 3847–3852, 2019.
[13] Y. Liu, J. Yan, and W. Ouyang, “Quality Aware Network for Set to Set Recognition,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 5790–5799.
[14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep Reinforcement Learning,” in Proc. 33rd Int. Conf. Mach. Learn. (ICML), 2016, pp. 1928–1937.
[15] H.-I. Suk and S.-W. Lee, “A Novel Bayesian Framework for Discriminative Feature Extraction in Brain-Computer Interfaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 2, pp. 286–299, 2012.
[16] X. Zhang, L. Yao, X. Wang, J. Monaghan, and D. Mcalpine, “A Survey on Deep Learning based Brain Computer Interface: Recent Advances and New Frontiers,” arXiv preprint arXiv:1905.04149, 2019.
[17] X. Gu, Z. Cao, A. Jolfaei, P. Xu, D. Wu, T.-P. Jung, and C.-T. Lin, “EEG-based Brain-Computer Interfaces (BCIs): A Survey of Recent Studies on Signal Sensing Technologies and Computational Intelligence Approaches and their Applications,” arXiv preprint arXiv:2001.11337, 2020.
[18] W. Ko, J. Yoon, E. Kang, E. Jun, J.-S. Choi, and H.-I. Suk, “Deep Recurrent Spatio-Temporal Neural Network for Motor Imagery based BCI,” in Proc. 6th Int. Winter Conf. Brain-Comput. Interface (BCI), 2018, pp. 1–3.
[19] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1251–1258.
[20] D. Zhang, L. Yao, X. Zhang, S. Wang, W. Chen, R. Boots, and B. Benatallah, “Cascade and Parallel Convolutional Recurrent Neural Networks on EEG-based Intention Recognition for Brain Computer Interface,” in Proc. 32nd AAAI Conf. Artif. Intell. (AAAI), 2018.
[21] J. Fruitet, A. Carpentier, M. Clerc, and R. Munos, “Bandit Algorithms Boost Brain Computer Interfaces for Motor-Task Selection of A Brain-Controlled Button,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 449–457.
[22] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT press, 2018.
[23] A. Hyvärinen and E. Oja, “Independent Component Analysis: Algorithms and Applications,” Neural Netw., vol. 13, no. 4-5, pp. 411–430, 2000.
[24] M. Lin, Q. Chen, and S. Yan, “Network in Network,” arXiv preprint arXiv:1312.4400, 2013.
[25] S. Ruder, “An Overview of Gradient Descent Optimization Algorithms,” arXiv preprint arXiv:1609.04747, 2016.
[26] X. Glorot and Y. Bengio, “Understanding the Difficulty of Training Deep Feedforward Neural Networks,” in Proc. 13th Int. Conf. Artif. Intell. Statist. (AISTATS), 2010, pp. 249–256.
[27] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems,” arXiv preprint arXiv:1603.04467, 2016.
[28] S. Lemm, B. Blankertz, G. Curio, and K.-R. Muller, “Spatio-Spectral Filters for Improving the Classification of Single Trial EEG,” IEEE Trans. Biomed. Eng., vol. 52, no. 9, pp. 1541–1548, 2005.
[29] F. Lotte, C. Guan, and K. K. Ang, “Comparison of Designs Towards a Subject-Independent Brain-Computer Interface based on Motor Imagery,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), 2009, pp. 4543–4546.
[30] A. M. Ray, R. Sitaram, M. Rana, E. Pasqualotto, K. Buyukturkoglu, C. Guan, K.-K. Ang, C. Tejos, F. Zamorano, F. Aboitiz et al., “A Subject-Independent Pattern-based Brain-Computer Interface,” Front. Behav. Neurosci., vol. 9, p. 269, 2015.

A Novel RL-assisted Deep Learning Framework for Task-informative Signals Selection and Classification for Spontaneous BCIs

Abstract

Index Terms:

I Introduction

II Related Work

III Methods

III-A Embedding Network

III-B Agent Network

III-B1 State

III-B2 Action

III-B3 Reward

III-B4 Actor-Critic Network

III-C Classifier

III-D Optimization and Training Strategy

IV Experiments

IV-A Dataset and Preprocessing

IV-B Experimental Scenarios

IV-B1 Subject-dependent

IV-B2 Subject-independent

IV-C Experimental Settings

IV-D Experimental Results

IV-D1 Subject-dependent

IV-D2 Subject-independent

V Analyses

V-A Statistical Analysis

V-B Qualitative Analysis

VI Conclusion

Acknowledgment

References

A Novel RL-assisted Deep Learning Framework
for Task-informative Signals Selection and Classification for Spontaneous BCIs