Ultra Efficient Transfer Learning with Meta Update for Cross Subject EEG Classification

Tiehang Duan, Mihir Chauhan, Mohammad Abuzar Shaikh, Jun Chu and Sargur Srihari Tiehang Duan, Mihir Chauhan, Mohammad Abuzar Shaikh, Jun Chu and Sargur Srihari are with the Department of Computer Science and Engineering, University at Buffalo, NY 14260, USA (e-mail: {tiehangd, mihirhem, mshaikh2, jchu6, srihari}@buffalo.edu).

Abstract

The pattern of Electroencephalogram(EEG) signal differs significantly across different subjects, and poses challenge for EEG classifiers in terms of 1) effectively adapting a learned classifier onto a new subject, 2) retaining knowledge of known subjects after the adaptation. We propose an efficient transfer learning method, named Meta UPdate Strategy (MUPS-EEG), for continuous EEG classification across different subjects. The model learns effective representations with meta update which accelerates adaptation on new subject and mitigate forgetting of knowledge on previous subjects at the same time. The proposed mechanism originates from meta learning and works to 1) find feature representation that is broadly suitable for different subjects, 2) maximizes sensitivity of loss function for fast adaptation on new subject. The method can be applied to all deep learning oriented models. Extensive experiments on two public datasets demonstrate the effectiveness of the proposed model, outperforming current state of the arts by a large margin in terms of both adapting on new subject and retain knowledge of learned subjects.

Our code is publicly available at https://github.com/tiehangd/MUPS-EEG.

1 Introduction

Electroencephalogram(EEG) signal is widely used to analyze the activities of human brain. The signal is recorded by placing electrodes on different regions of human scalp when the subject performs executive/imaginary tasks or perceives stimulus from outside[1]. EEG signal has proved to be effective for restoring motion capabilities of disabled people [2], human intention interpretation [3], and enhanced experience in gaming control [5].

EEG signal exhibit significant pattern variability across subjects, resulting in two major challenges for EEG classifiers: 1) achieve good performance on new users previously unseen, 2) retain knowledge of previous learnt subjects after the adaptation. We propose to simultaneously tackle the challenge with Meta UPdate Strategy (MUPS-EEG) involving two steps: (1) extracting versatile features which are effective across different subjects with meta learned representations, and (2) perform meta update for fast adaptation on new subject. The meta update mechanism significantly reduced the amount of labeled target data needed to adapt on target subject, and the meta learned representations help preserve learned knowledge on previous subjects. This facilitates the utility of BCI systems in real world scenarios with constant shift between different subjects.

For the extraction of versatile and subject invariant features, previous works adopt either signal processing techniques or deep learning models. For example, [7] utilized filter bank (FB) and common spatial pattern (CSP) for effective feature extraction which are then sent to a fisher linear discriminator (FLD). [9] extracted features from power spectral density (PSD) of EEG signals and used support vector machines (SVM) as the classifier. Models based on deep learning emerged as a promising approach as they alleviate the need for manual feature engineering and achieved state of the art performance. EEGNet [10] is a compact convolutional neural network (CNN) that can be applied to different BCI paradigms. [12] introduced a cascade and parallel structure on CNN for improved performance. CRAM [6] is proposed recently which adopts LSTM with attention mechanism to help the model focusing on most discriminative temporal features, and achieved promising result.

Transfer learning techniques are utilized in EEG classifiers to transform models onto target subject for improved performance. Previous works involves both classic transfer learning [13] [14] and domain adaptation[16][17] to transfer knowledge across subjects. [4] proposed an inter-subject transfer learning framework built on top of CNN model. [16] and [17] explored performance of multiple domain adaptation methods including transfer component analysis (TCA-EEG), maximum independence domain adaptation (MIDA-EEG) and information theoretical learning (ITL) for emotion recognition. Deep-Transfer [23] is a transfer learning framework built on deep CNN-LSTM network to transfer knowledge across subjects. RA-MDRM [24] utilized covariance matrix from different subjects and forms a calibration less system suitable for low resource scenarios.

In this work, we propose a simple and computationally efficient meta updating strategy to tackle cross subject EEG classification, which is applicable to all deep learning oriented classifiers. It allows the EEG classifier to adapt onto a new subject utilizing only a small amount of target data. Furthermore, the model mitigates forgetting that often occurs when transferring a deep learning model to a new context. This Meta UPdate Strategy (MUPS-EEG) originates from meta learning [21][22]. It involves a meta representation learning phase followed by meta adaptation onto target subject. The meta representation learning is performed on the known source subjects which extracts versatile features that are effective across different subjects, and meta adaptation fits the model onto a new subject through a small number of gradient steps without losing knowledge on known subjects. A desirable property of the model is that it doesn’t overfit even if target data is very limited, allowing it to properly function in low target-resource scenarios.

2 Methodology

MUPS-EEG allows efficient adaptation onto a new subject and simultaneously retain knowledge on known subjects. Its meta learnt representations are broadly effective across different subjects, and meta adaptation fits the model to a new subject with efficient target data usage. The difference between MUPS-EEG and classic transfer learning lies in both the optimization process and the training mechanism.

For traditional optimization, weights are sequentially updated after each time step, seeking sensible parameters with

\hat{\Theta}=\operatorname*{argmax}_{\Theta}\log p(\Theta|\mathcal{D}_{s},\mathcal{D}_{t})

(1)

where $\Theta$ is the collection of model parameters, $\mathcal{D}_{s}$ is training data from source subjects, and $\mathcal{D}_{t}$ is the small amount of data from target subject.

MUPS-EEG decomposes the problem into two steps by setting up meta parameters $\Phi$ . Given

\log p(\Theta|\mathcal{D}_{s},\mathcal{D}_{t})=\log\int_{\Phi}p(\Theta|\mathcal{D}_{t},\Phi)p(\Phi|\mathcal{D}_{s})d\Phi

(2)

Maximizing the log likelihood is approximated to first finding meta parameters that maximizes $\log p(\Phi|\mathcal{D}_{s})$

\hat{\Phi}=\operatorname*{argmax}_{\Phi}\log p(\Phi|\mathcal{D}_{s})

(3)

Then approximates eq. 1 to be

\operatorname*{argmax}_{\Theta}\log p(\Theta|\mathcal{D}_{s},\mathcal{D}_{t})\approx\operatorname*{argmax}_{\Theta}\log p(\Theta|\mathcal{D}_{t},\hat{\Phi})

(4)

The mechanism can thus be interpreted as helping the model learn a prior of transferable knowledge on the subjects. This prior is later used to infer the posterior parameters in the network after the model sees a small amount of data from the new subject. The prior learned during meta training act as an inductive bias for minimizing the generalization error during evaluation, which allows the EEG classifier to properly functions on the new subject.

During meta training, MUPS-EEG involves interaction between a base learner and a meta learner, each formed with a representation learning network and a prediction learning network. Representation learning network extracts effective features from raw EEG signal which is then feed to prediction learning network for classification. Both representation learning network and prediction learning network can be arbitrary deep learning models.

The workflow of MUPS-EEG is as follows:

An ensemble of $M$ meta tasks $\mathcal{E}_{meta}=\{\mathcal{T}_{1},\mathcal{T}_{2},...,\mathcal{T}_{M}\}$ is created from source dataset $\mathcal{D}_{s}=\{(x_{1},y_{1}),...,(x_{N},y_{N})\}$ with a total of $L$ known subjects. Each meta task $\mathcal{T}_{i}=\{(x_{1}^{i},y_{1}^{i}),...,(x_{m}^{i},y_{m}^{i})\}$ contains $m$ data points from $l$ subjects, where $m\ll N$ and $l<L$ .

Input : data from source subjects

\mathcal{D}_{s}

, data from target subject

\mathcal{D}_{t}

, base learning rate

\alpha

, meta learning rate

\beta

Output : optimal meta learned model

2for samples in $\mathcal{D}_{s}$ do

3 pretrain

\phi

based on

\mathcal{L}_{\mathcal{D}_{s}}(\phi)

4 end for

6while not done do

7 sample a batch of tasks

\{\mathcal{T}_{1\sim K}\}\in\mathcal{E}_{meta}

8 for meta episode k from 1 to K do

9 Split

\mathcal{T}_{k}

into

\mathcal{T}_{b}

and

\mathcal{T}_{m}

10 for number of base updates do

11 optimize

\theta

with

\mathcal{T}_{b}

by Eq. 5

12 end for

14 optimize

\{\theta^{*},\phi^{*}\}

with

\mathcal{T}_{m}

by Eq. 6.

\{\theta,\phi\}\leftarrow\{\theta^{*},\phi^{*}\}

16 end for

18 end while

Algorithm 1 MUPS-EEG

Each cycle of meta update is called an episode, including two phases: base learning and meta learning. In each episode, a meta task $\mathcal{T}_{i}$ is sampled from the task pool $\mathcal{E}_{meta}$ , with $p$ data points for base learning $\mathcal{T}_{b}$ , $q$ data points for meta learning $\mathcal{T}_{m}$ (omitted indexing on $i$ here for conciseness), and $p+q=m$ .

MUPS-EEG adopts a two stage optimization approach with two sets of optimizers, one for optimizing base learner and the other for optimizing meta learner. Base learner includes representation learning net parameterized with $\phi$ and prediction learning net parameterized with $\theta$ . Meta learner keeps another set of parameters $\{\phi^{*},\theta^{*}\}$ . During initialization, $\{\phi,\phi^{*}\}$ is pretrained to have a warm start, and $\{\theta,\theta^{*}\}$ is randomly initiated. In later episodes, both base learner and meta learner inherit parameter values from meta learner of previous episode.

In base learner, gradient is evaluated with the loss function $\mathcal{L}_{\mathcal{T}_{b}}(\theta,\phi)$ being cross entropy for the classification task. When updating base learner, we only update parameters in prediction learning net, which is

\theta\leftarrow\text{Adam}\Big{(}\theta,\nabla_{\theta}\mathcal{L}_{\mathcal{T}_{b}}(\theta,\phi),\alpha\Big{)}

(5)

where $\alpha$ is the learning rate for base optimizer. Here Adam can be replaced by any optimizer functioning on first order gradient. After base learning loop ends, meta task $\mathcal{T}_{m}$ is applied to get meta gradient $\nabla_{\{\theta,\phi\}}\mathcal{L}_{\mathcal{T}_{m}}(\theta,\phi)$ , and parameters of both representation learning net and prediction learning net get updated accordingly

\{\theta^{*},\phi^{*}\}\leftarrow\text{Adam}\Big{(}\{\theta^{*},\phi^{*}\},\nabla_{\{\theta,\phi\}}\mathcal{L}_{\mathcal{T}_{m}}(\theta,\phi),\beta\Big{)}

(6)

where $\beta$ is the learning rate for meta optimizer. Note this meta optimization is performed over the meta learner, whereas the objective gradient is computed from updated base learner for its gradient descent direction is broadly effective on different subjects. Meta learner is kept between different episodes and then adapt to target subject during evaluation. The algorithm is outlined in Algorithm 1.

3 Experiments

We compare our method against current state of the arts on two public datasets, with detailed experiment setting described as below.

Dataset: The proposed model is evaluated on two public datasets, namely BCI competition IV dataset 2a (abbreviated as BCI IV-2a below) [25] ¹¹1http://bnci-horizon-2020.eu/database/data-sets and DEAP dataset [26] ²²2https://www.eecs.qmul.ac.uk/mmv/datasets/deap/download.html. BCI IV-2a involves 9 subjects doing 4 class motor imaginary tasks. Each subject is tested in two sessions and each session consists 288 trials. Signals are recorded with 22 electrodes at 250Hz sampling rate. DEAP dataset is for emotion recognition, with a total of 32 subjects. 40 trials are recorded for each subject as they watched music videos with different types of arousals. The signal comprises 32 channels at a sampling rate of 512Hz.

Table 1: Comparison of Accuracy and ROC-AUC on target subject for BCI-IV 2a and DEAP dataset. BCI-IV 2a has a total of nine subjects, the models are trained on eight subjects and tested on the subject left out. Similarly, for DEAP one subject is left out for testing and models are trained on the other 31 subjects. Reported result is averaged across all the subjects. The first three models are subject independent and don’t use any target subject data. For the other transfer learning approaches we used the same amount of target data (1 minute of EEG recording for BCI-IV 2a and 5 minutes recording for DEAP dataset) for a fair comparison. MUPS-EEG outperforms comparison methods on both datasets with its efficient meta adaptation mechanism.

Method	BCI-IV		DEAP
Method	Accuracy	ROC-AUC	Accuracy	ROC-AUC
EEGNet [10]	$0.557\pm 0.063$	$0.704\pm 0.033$	$0.459\pm 0.073$	$0.627\pm 0.044$
CTCNN [11]	$0.523\pm 0.105$	$0.721\pm 0.061$	$0.396\pm 0.095$	$0.603\pm 0.048$
CRAM [6]	$0.632\pm 0.080$	$0.769\pm 0.043$	$0.565\pm 0.117$	$0.731\pm 0.078$
MIDA-EEG [16]	$0.650\pm 0.056$	$0.793\pm 0.036$	$0.536\pm 0.108$	$0.671\pm 0.07$
TCA-EEG [17]	$0.674\pm 0.073$	$0.817\pm 0.053$	$0.552\pm 0.114$	$0.695\pm 0.067$
Deep-Transfer [23]	$0.712\pm 0.065$	$0.841\pm 0.041$	$0.638\pm 0.131$	$0.767\pm 0.064$
RA-MDRM [24]	$0.741\pm 0.059$	$0.846\pm 0.032$	$0.614\pm 0.096$	$0.758\pm 0.059$
MUPS-EEG	$\boldsymbol{0.763\pm 0.055}$	$\boldsymbol{0.859\pm 0.038}$	$\boldsymbol{0.672\pm 0.063}$	$\boldsymbol{0.782\pm 0.037}$

Implementation Details: The model is implemented with Pytorch. We used a three layer convolutional neural network (CNN) similar to EEGNet[10] as representation learning network, which is compact and versatile across different BCI paradigms. Prediction learning network includes two fully connected layers. Representation learning network is pretrained on SGD optimizer with learning rate set to 0.01. Adam optimizer is adopted during meta training for adaptation of base learner and meta learner, with learning rate set to 0.001. The learning rate is discounted by 0.2 every 5 steps. We run 10 epochs for representation learning pretraining, and 20 epochs for meta training. Each meta episode involves ten iterations of base learner update and one meta update. During the meta episode, one data batch containing 12 sampled meta tasks are feed into the model, and each task is made up with 20 data segments. 10 data segments are used for base update and the other 10 segments for meta update.

We evaluate model on: 1) the performance with the new target subject, 2) knowledge retained from previous learnt subjects. The performance on the new subject is measured with both accuracy and AUC-ROC. And the knowledge retaining ability is measured with the averaged accuracy (Avg. Acc) and averaged ROC-AUC (Avg. RA) across previous subjects after adaptation finishes.

Result Analysis: Model performance on target subject for BCI IV-2a dataset and DEAP dataset are presented in table 1. We did a comprehensive comparison to models that perform well on cross subject classification tasks with code publicly available. The first three comparison models (EEGNet, CTCNN, CRAM) don’t involve the transfer process and no target data is used³³3These three models adopt a more challenging problem setting which justifies their relatively lower performance. . For the other transfer learning approaches, we used the same amount of target subject data (1 minute of EEG recording for BCI-IV 2a and 5 minutes EEG recording for DEAP dataset) to have a fair comparison. For BCI-IV 2a dataset, MUPS-EEG has an improvement of at least 2.2% on accuracy and 1.3% on AUC-ROC compared with other models. The classification accuracy varies across individual subjects. MUPS-EEG classified 7 out of 9 subjects to above 70% accuracy, which is generally deemed an acceptable threshold for application of BCI systems[4]. For DEAP dataset, MUPS-EEG outperforms other approaches by at least 3.4% in accuracy and 1.5% in AUC-ROC. This performance improvement comes from MUPS-EEG’s ability to rapidly adapt onto the target domain with a small amount of target data.

Table 2: Comparison of averaged accuracy (Avg. Acc) and averaged ROC-AUC (Avg. RA) on learnt source subjects for BCI IV-2a dataset and DEAP dataset. The training setting is the same as described in Table 1, Avg. Acc and Avg. RA are evaluated on the source subjects after adaptation finishes. EEGNet, CTCNN and CRAM are subject independent approaches and not included here, as their performance are the same as reported in Table 1. MUPS-EEG performs consistently better than comparison baselines in retaining knowledge of learned subjects.

Method	BCI-IV		DEAP
Method	Avg. Acc	Avg. RA	Avg. Acc	Avg. RA
MIDA-EEG [16]	$0.752\pm 0.045$	$0.843\pm 0.020$	$0.631\pm 0.059$	$0.743\pm 0.037$
TCA-EEG [17]	$0.746\pm 0.069$	$0.852\pm 0.038$	$0.639\pm 0.063$	$0.746\pm 0.041$
Deep-Transfer [23]	$0.703\pm 0.032$	$0.829\pm 0.017$	$0.611\pm 0.084$	$0.732\pm 0.056$
RA-MDRM [24]	$0.765\pm 0.044$	$0.857\pm 0.028$	$0.636\pm 0.073$	$0.751\pm 0.045$
MUPS-EEG	$\boldsymbol{0.781\pm 0.036}$	$\boldsymbol{0.862\pm 0.024}$	$\boldsymbol{0.665\pm 0.047}$	$\boldsymbol{0.758\pm 0.029}$

Table 2 reveals the model’s capability to retain knowledge on previously learnt subjects after adaptation finishes. MUPS-EEG outperforms other models by a margin of 1.6% on Avg. Acc and 0.5% on Avg. RA for BCI-IV 2a dataset. For DEAP dataset, the model achieved a 2.6% gain on Avg.Acc and 0.7% gain on Avg.RA.

We further explored the influence of different amount of target subject data on model performance, shown in fig. 1. The performance is positively correlated with target data, and we observed both accuracy and AUC-ROC fully converges with 2 minutes of EEG recording from target subject on BCI IV-2a task, while 5 minutes of recording is needed for DEAP dataset.

Comparing between model performance on the two datasets, DEAP posed to be more challenging than BCI IV-2a for the cross subject classification task, where only 3 out of 8 models reaches above 60% accuracy in table 1, given the theoretical chance for random guessing is 33.3%. With current model performing below 70% accuracy, which is generally deemed an acceptable threshold for application of BCI systems [4], further performance improvement is needed for DEAP dataset.

Refer to caption — (a) BCI IV-2a dataset

4 Conclusion

Pattern variability of EEG signal across different subjects is a major challenge for cross subject EEG classification. We propose an efficient transfer learning model built on meta update mechanism for the task. The two step meta update approach functioning on meta tasks enables the model to rapidly adapt onto a new subject and retain knowledge on known subjects at the same time. The model is efficient in terms of target data utilization with its tailored optimization process for target adaptation. We evaluate the model on two public datasets, where it outperforms current state of the arts by a large margin.

References

[1] G. Buzsáki, C. A. Anastassiou, and C. Koch, “The origin of extracellular fields and currents–eeg, ecog, lfp and spikes,” Nature reviews. Neuroscience, vol. 13, no. 6, pp. 407–420, May 2012, 22595786[pmid]. [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/22595786
[2] M. Tariq, P. M. Trivailo, and M. Simic, “Eeg-based bci control schemes for lower-limb assistive-robots,” Frontiers in Human Neuroscience, vol. 12, p. 312, 2018. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnhum.2018.00312
[3] N. Padfield, J. Zabalza, H. Zhao, V. Masero, and J. Ren, “Eeg-based brain-computer interfaces using motor-imagery: Techniques and challenges,” Sensors, vol. 19, no. 6, 2019. [Online]. Available: https://www.mdpi.com/1424-8220/19/6/1423
[4] F. Fahimi, Z. Zhang, W. B. Goh, T.-S. Lee, K. K. Ang, and C. Guan, “Inter-subject transfer learning with an end-to-end deep convolutional neural network for EEG-based BCI,” Journal of Neural Engineering, vol. 16, no. 2, p. 026007, jan 2019. [Online]. Available: https://doi.org/10.1088%2F1741-2552%2Faaf3f6
[5] L.-D. Liao, C.-Y. Chen, I.-J. Wang, S.-F. Chen, S.-Y. Li, B.-W. Chen, J.-Y. Chang, and C.-T. Lin, “Gaming control using a wearable and wireless eeg-based brain-computer interface device with novel dry foam-based sensors,” Journal of NeuroEngineering and Rehabilitation, vol. 9, no. 1, p. 5, 2012. [Online]. Available: https://doi.org/10.1186/1743-0003-9-5
[6] D. Zhang, L. Yao, K. Chen, and J. Monaghan, “A convolutional recurrent attention model for subject-independent eeg signal analysis,” IEEE Signal Processing Letters, vol. 26, no. 5, pp. 715–719, May 2019.
[7] Kai Keng Ang, Zheng Yang Chin, Haihong Zhang, and Cuntai Guan, “Filter bank common spatial pattern (fbcsp) in brain-computer interface,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), June 2008, pp. 2390–2397.
[8] S. Fazli, C. Grozea, M. Danoczy, B. Blankertz, F. Popescu, and K.-R. Müller, “Subject independent eeg-based bci decoding,” in Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, Eds. Curran Associates, Inc., 2009, pp. 513–521. [Online]. Available: http://papers.nips.cc/paper/3671-subject-independent-eeg-based-bci-decoding.pdf
[9] N. Jatupaiboon, S. Pan-ngum, and P. Israsena, “Real-time eeg-based happiness detection system,” TheScientificWorldJournal, vol. 2013, p. 618649, 08 2013.
[10] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces,” Journal of Neural Engineering, vol. 15, no. 5, p. 056013, jul 2018.
[11] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for eeg decoding and visualization,” Human Brain Mapping, vol. 38, no. 11, pp. 5391–5420, 2017. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.23730
[12] D. Zhang, L. Yao, X. Zhang, S. Wang, W. Chen, R. Boots, and B. Benatallah, “Cascade and parallel convolutional recurrent neural networks on eeg-based intention recognition for brain computer interface,” in AAAI, 2018.
[13] D. Wu, V. J. Lawhern, W. D. Hairston, and B. J. Lance, “Switching eeg headsets made easy: Reducing offline calibration effort using active weighted adaptation regularization,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 24, no. 11, pp. 1125–1137, Nov 2016.
[14] D. Wu, V. J. Lawhern, S. Gordon, B. J. Lance, and C. Lin, “Driver drowsiness estimation from eeg signals using online weighted adaptation regularization for regression (owarr),” IEEE Transactions on Fuzzy Systems, vol. 25, no. 6, pp. 1522–1535, Dec 2017.
[15] D. Wu, “Online and offline domain adaptation for reducing bci calibration effort,” IEEE Transactions on Human-Machine Systems, vol. 47, no. 4, pp. 550–563, Aug 2017.
[16] W.-L. Zheng and B.-L. Lu, “Personalizing eeg-based affective models with transfer learning,” in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, ser. IJCAI’16. AAAI Press, 2016, pp. 2732–2738. [Online]. Available: http://dl.acm.org/citation.cfm?id=3060832.3061003
[17] Z. Lan, O. Sourina, L. Wang, R. Scherer, and G. R. Müller-Putz, “Domain adaptation techniques for eeg-based emotion recognition: A comparative study on two public datasets,” IEEE Transactions on Cognitive and Developmental Systems, vol. 11, no. 1, pp. 85–94, March 2019.
[18] V. Jayaram, M. Alamgir, Y. Altun, B. Scholkopf, and M. Grosse-Wentrup, “Transfer learning in brain-computer interfaces,” IEEE Computational Intelligence Magazine, vol. 11, no. 1, pp. 20–31, Feb 2016.
[19] O. Vinyals, C. Blundell, T. Lillicrap, k. kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 3630–3638. [Online]. Available: http://papers.nips.cc/paper/6385-matching-networks-for-one-shot-learning.pdf
[20] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4077–4087. [Online]. Available: http://papers.nips.cc/paper/6996-prototypical-networks-for-few-shot-learning.pdf
[21] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 1126–1135. [Online]. Available: http://dl.acm.org/citation.cfm?id=3305381.3305498
[22] Q. Sun, Y. Liu, T.-S. Chua, and B. Schiele, “Meta-transfer learning for few-shot learning,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 403–412, 2018.
[23] C. Tan, F. Sun, and W. Zhang, “Deep transfer learning for eeg-based brain computer interface,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018, pp. 916–920.
[24] P. Zanini, M. Congedo, C. Jutten, S. Said, and Y. Berthoumieu, “Transfer learning: A riemannian geometry framework with applications to brain–computer interfaces,” IEEE Transactions on Biomedical Engineering, vol. 65, no. 5, pp. 1107–1116, May 2018.
[25] M. Tangermann, K.-R. Müller, A. Aertsen, N. Birbaumer, C. Braun, C. Brunner, R. Leeb, C. Mehring, K. Miller, G. Mueller-Putz, G. Nolte, G. Pfurtscheller, H. Preissl, G. Schalk, A. Schlögl, C. Vidaurre, S. Waldert, and B. Blankertz, “Review of the bci competition iv,” Frontiers in Neuroscience, vol. 6, p. 55, 2012. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2012.00055
[26] S. Koelstra, C. Muhl, M. Soleymani, J. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “Deap: A database for emotion analysis ;using physiological signals,” IEEE Transactions on Affective Computing, vol. 3, no. 1, pp. 18–31, Jan 2012.