Factorization Approach for Sparse Spatio-Temporal Brain-Computer Interface

Byeong-Hoo Lee¹, Jeong-Hyun Cho¹, Byoung-Hee Kwon¹, and Seong-Whan Lee² ¹Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
²Department of Artificial Intelligence, Korea University, Seoul, South Korea
[email protected], [email protected], [email protected], [email protected]

Abstract

Recently, advanced technologies have unlimited potential in solving various problems with a large amount of data. However, these technologies have yet to show competitive performance in brain-computer interfaces (BCIs) which deal with brain signals. Basically, brain signals are difficult to collect in large quantities, in particular, the amount of information would be sparse in spontaneous BCIs. In addition, we conjecture that high spatial and temporal similarities between tasks increase the prediction difficulty. We define this problem as sparse condition. To solve this, a factorization approach is introduced to allow the model to obtain distinct representations from latent space. To this end, we propose two feature extractors: A class-common module is trained through adversarial learning acting as a generator; Class-specific module utilizes loss function generated from classification so that features are extracted with traditional methods. To minimize the latent space shared by the class-common and class-specific features, the model is trained under orthogonal constraint. As a result, EEG signals are factorized into two separate latent spaces. Evaluations were conducted on a single-arm motor imagery dataset. From the results, we demonstrated that factorizing the EEG signal allows the model to extract rich and decisive features under sparse condition.

I Introduction

The human brain has an incredible problem-solving capability and infinite potential. Inspired by the brain, deep neural networks have shown outstanding performance in pattern recognition tasks such as image processing [1, 2, 3, 4], speech processing [5, 6, 7, 8, 9], and language processing [10, 11, 12]. Recently, they have shown remarkable performance in detecting human intentions from brain signals [13, 14]. Particularly, brain-computer interface (BCI) utilizes deep neural networks to develop a communication pathway between brain and external devices using brain signals [15, 16, 17, 18, 19]. BCI collects brain signals in invasive and non-invasive ways; In invasive BCI, brain signals are obtained from electrodes implanted directly into the brain and thus have relatively high quality, but it requires brain surgery [20, 21]. Non-invasive BCI mainly uses electrodes placed on the scalp to collect brain signals which are called an electroencephalogram (EEG). EEG signals are commonly used brain signals because those signals can be obtained without surgical approach. EEG signals have poor spatial resolution and low signal-to-noise ratio that are the main obstacles of non-invasive BCI.

Particularly, in the case of spontaneous BCI in which the user voluntarily generates control signals, the obstacles mentioned above are prominent. Therefore, the amplitude of the signals is low and the information is a form of a harmonic neural population firing which is not explicit. Many studies have developed paradigms to induce less noisy and high-quality EEG signals [22, 23, 24, 15, 25, 26]. Spontaneous BCI induces the user to produce valid control signals following paradigms; motor imagery (MI), visual imagery (VI), and speech imagery (SI). MI is a dynamic state in which the movements are rehearsed internally in the mind without actual movements [27, 28, 29]. Thus, participants are asked to imagine specific muscle movements according to tasks. VI utilizes EEG signals that are generated during visual imagination. Participants consistently imagine specific images to generate control signals [30]. SI refers to speaking in mind without actual speaking [31]. Several datasets were collected and publicly opened for decoding intention in EEG signals relied on these paradigms [32, 33, 34, 30, 35]. Over the decades, numerous studies have been developed based on machine learning [36, 37, 38, 39]. They focused on extracting spatial and temporal features to obtain implicit representations of EEG signals. Especially, since EEG signals have high temporal resolution, recent studies have focused on extracting plenty of temporal features [40, 41, 42]. Therefore, in the case of datasets with distinct regional differences, they achieved remarkable performance improvement. On the other hand, strategies for extracting spatial feature are necessary when the dataset uses only small regions of the brain such as MI tasks in a single-arm [33] and SI [32]. Therefore, imagery tasks within a single-arm would be difficult to distinguish using existing methods hence, EEG signals contain sparse spatio-temporal features.

Refer to caption — Figure 1: The overall framework of the proposed method. Two modules were designed for extracting class-common and class-specific features, respectively. In particular, adversarial learning was applied to extract class-common features $z_{c}$ . To this end, noise signals were utilized as an input of the module to extract fake features. Class-specific features $z_{s}$ are extracted using classification loss. The two types of features are concatenated, and the classifier receives the features as an input. Thus the classifier considers the two separate latent spaces and extract distinct features from them under sparse condition.

In this paper, a factorization approach is proposed to acquire implicit representations of EEG signals. We conjecture that EEG signals generated in small regions of the brain have sparse information which is defined as sparse condition in this paper. Therefore strategic feature extraction should be applied because of little room for spatio-temporal features. Our approach is to explicitly factorize EEG signals into common and specific features to obtain discriminative representations against datasets under sparse condition. As a result, we designed two modules to extract class-common and class-specific features respectively. Class-common module learns common features of EEG signals through adversarial learning. Unlike other studies, we did not include a generator since the goal is to extract explicitly different types of features and hence no explicit/implicit modeling of the underlying input data distribution was required. Features from the modules are concatenated and fed into classifier for prediction. We conducted ablation studies to confirm the effectiveness of each design choice.

In summary, the main contributions of this paper are as follows: 1) We demonstrated that factorization is efficient for classifying EEG signals under sparse condition. To the best of our knowledge, it is the first attempt to explicitly factorize EEG signals for decoding user intentions. 2) An adversarial learning regime without the generator was introduced to obtain common features of EEG signals. Through this, the model can obtain separate latent spaces that enable the classifier to consider distinct representations of EEG signals. 3) We demonstrated that class-common and class-specific features are individually meaningless, but jointly use of the features improves classification performance.

II Related Works

II-A MI Classification

Several studies have contributed to tackling unsatisfactory classification performance. To obtain representations of EEG signals, Lu et al. [43] proposed a restricted Boltzmann machine-based network considering non-stationary properties of EEG signals. Ang et al. [44] developed filter bank to consider representations from different frequency range. The pipeline of this study has inspired other deep learning studies. Sakhavi et al. [45] presented a CNN architecture to extract diverse temporal representations based on [44] exploiting high temporal resolution of EEG signals [46]. Schirrmeister et al. [37] proposed different depth of CNNs to explore multi-view classification. Furthermore, they described how the convolution works on the EEG signals by providing visualization. One of their contributions is revealed that band power features are efficient for MI classification. With the development of CNN-based networks, a study to control the number of parameters was also conducted by Lawhern et al. [36]. They adopted depth-wise convolution and separable convolution to prove that a small number of parameters can achieve similar performance as existing methods. Amin et al. [42] designed multiple CNN architectures for multi-view classification using MI dataset. Different depths allow the classifier to consider multi-level features.

II-B SI Classification

DaSalla et al. [24] introduced common spatial pattern (CSP) to obtain spatial representations for single-trial SI classification. Channel selection is efficient in extracting spatial features as demonstrated by Torres-Garcia et al. [47]. Ngyuyen et al. [32] implemented Riemannian manifold with support vector machine [48] for SI classification.

However, these studies used datasets that involve relatively distinct brain regions which are an advantage for classification. In this study, we used a single-arm MI dataset [33] that contains single-arm movement imagery tasks. Therefore, classes share only narrow brain regions that are an obstacle to improving classification performance. To the best of our knowledge, no one attempts to solve this problem yet, and the proposed method achieved performance improvement in [33].

III Method

III-A Overview

The goal of this study is to divide features into two groups using factorization to extract distinct features under sparse condition. A sparse condition is defined as the absence of distinct spatial or temporal features for the different motor imagery classes. We designed two modules $f_{c}$ and $f_{s}$ to explicitly factorize EEG signals into class-common features $z_{c}$ and class-specific features $z_{s}$ . The $z_{c}$ refers to common features of the EEG signals regardless of the class. The adversarial learning is applied for $f_{c}$ training. Both $z_{c}$ and $z_{s}$ are concatenated and fed into classifier $C$ for final prediction.

III-B Adversarial Learning

Adversarial learning trains models to solve the $minmax$ optimization problem for the robustness of the models in several domains [49, 50, 51, 52, 53]. Here, we use adversarial learning to enable $z_{c}$ to include common features of EEG signals, but not class-specific features. Unlike other studies [54, 55], no generator is required because input data distributions are not utilized in this study. The training objective is to train $f_{c}$ for extraction of common features. The $f_{c}$ utilizes EEG signals $X=\{x_{i=1...N}\}$ to have mapping $\chi\rightarrow z_{c}$ where $X\in\chi$ . Simultaneously, $f_{c}$ is trained to generate $z_{c}$ which can fool a discriminator $D$ , while $D$ is trained to distinguish $z_{c}$ according to labels $K$ (real or fake). To generate fake features $z_{c}^{\prime}$ , we used the resting state class rather than using random noise $X^{\prime}$ . This is because $z_{c}$ and $z_{c}^{\prime}$ should be similar, and the resting state is relatively similar to the input EEG signals than random noise. We defined the loss function as follows:

L_{adv}(f_{c},D,X^{i},K^{i})=\underset{f_{c}}{min}\underset{D}{max}\sum_{i=0}^{N}K^{i}log(D(f_{c}(X^{i})))

(1)

The class label $L$ is replaced $K$ , thus $C$ and $f_{s}$ are not associated with this training. According to this, $D$ learns to maximize the probability of distinguishing correct $K$ , while $f_{c}$ learns to generate $z_{c}$ and $z_{c}^{\prime}$ similarly so that $D$ confuses to distinguish both features.

TABLE I: Architecture of the proposed method.

\mathbb{F}

and

\mathbb{T}

denote feature and temporal dimension, respectively. “[]” and “st.” denote kernel size and stride, respectively.

	$f_{c}$	$f_{s}$	$C$	$D$
Input	$X$ or $X^{\prime}$	$X$	1 $\times$ $\mathbb{F}$ $\times$ 2 $\mathbb{T}$	1 $\times$ $(\mathbb{F}\times\mathbb{T})$
Hidden Layer	Conv. (1,40) [1,48]	Conv. (1,40) [1,48]	(5120, 2560)	(5120, 2560)
	Conv. (40,40) [24,1]	Conv. (40,40) [24,1	(2560, 1280)	(2560, 1280)
	Pool. (1, 68) st.=(1,14)	Pool. (1, 68) st.=(1,14)	(1280, 640)	(1280, 640)
	Flatten	Flatten	(640, C)	(640, 2)
Output	1 $\times$ $\mathbb{F}$ $\times$ $\mathbb{T}$	1 $\times$ $\mathbb{F}$ $\times$ $\mathbb{T}$	1 $\times$ C	1 $\times$ K
Activation Function	Exponential linear unit [56]

Session 1				Session 2		Session 3
Model	Acc	std	Acc	std	Acc	std
CSP+LDA [44]	0.21	0.02	0.26	0.03	0.20	0.01
CSP+RF [57]	0.21	0.02	0.24	0.02	0.19	0.04
CSP+SVM [44]	0.23	0.01	0.25	0.04	0.21	0.02
FBCSP [44]	0.26	0.03	0.28	0.05	0.23	0.03
EEGNet [36]	0.45	0.04	0.43	0.07	0.37	0.05
Shallow ConvNet [37]	0.47	0.05	0.44	0.04	0.40	0.02
Deep ConvNet [37]	0.45	0.02	0.42	0.05	0.38	0.03
MCNN [42]	0.48	0.06	0.45	0.04	0.39	0.02
Proposed Method	0.52	0.04	0.48	0.06	0.45	0.04

Session 1				Session 2		Session 3
Model	Acc	std	Acc	std	Acc	std
W/o $f_{c}$	0.25	0.01	0.26	0.03	0.22	0.02
W/o $f_{s}$	0.45	0.03	0.44	0.02	0.39	0.02
Both	0.52	0.04	0.48	0.03	0.45	0.04

$\lambda$ = 0	$\lambda$ = 0.5	$\lambda$ = 1
Avg.	0.47	0.50	0.52
std.	0.02	0.04	0.04

Factorization Approach for Sparse Spatio-Temporal Brain-Computer Interface

Abstract

I Introduction