A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles
Abstract
Abstract—In this paper, we introduce a new dataset, the driver emotion facial expression (DEFE) dataset, for driver spontaneous emotions analysis. The dataset includes facial expression recordings from 60 participants during driving. After watching a selected video-audio clip to elicit a specific emotion, each participant completed the driving tasks in the same driving scenario and rated their emotional responses during the driving processes from the aspects of dimensional emotion and discrete emotion. We also conducted classification experiments to recognize the scales of arousal, valence, dominance, as well as the emotion category and intensity to establish baseline results for the proposed dataset. Besides, this paper compared and discussed the differences in facial expressions between driving and non-driving scenarios. The results show that there were significant differences in AUs (Action Units) presence of facial expressions between driving and non-driving scenarios, indicating that human emotional expressions in driving scenarios were different from other life scenarios. Therefore, publishing a human emotion dataset specifically for the driver is necessary for traffic safety improvement. The proposed dataset will be publicly available so that researchers worldwide can use it to develop and examine their driver emotion analysis methods. To the best of our knowledge, this is currently the only public driver facial expression dataset.
Index Terms:
Driving safety, Driver emotion, Facial expression dataset, Spontaneous expression, Affective computing, Intelligent vehicles,I Background and Related Work
Driver emotion plays a vital role in driving because it affects driving safety and comfort. Among the 20-50 million non-fatal injuries and 1.24 million fatal road traffic accidents occurring every year worldwide [1], driver’s inability to control his emotions has been regarded as one of the critical factors degrading driving safety [2][3]. The rapid development in intelligent vehicles also calls for an emerging demand in the integration of driver-automation interaction and collaboration to enhance driving comfort, where driver emotion is one of the critical states [4]. Therefore, recognizing driver emotions is essential to improve driving safety and comfort of intelligent vehicles [5].
To describe human emotion, psychological researchers have provided two methodologies to classify emotions, which are discrete emotions and dimensional emotions [6]. Due to the discrete language words used by humans to describe emotions, discrete models are well-established and widely-accepted, such as the basic emotions of Ekman et al. [7] and the emotion tree structure of Parrott [8]. Specifically, Ekman et al. categorized discrete emotions into six basic emotions (happiness, sadness, anger, fear, surprise, and disgust) [7], which are supported by cross-cultural researches showing that humans perceived these basic emotions in a similar form regardless of culture differences [9]. The dimensional emotion models propose that the emotional state can be accurately expressed as a combination of several psychological dimensions, such as the 2D ” circumplex model ” proposed by Russell [10] and the 3D dimensional model of Mehrabian et al. [11]. In the widely adopted model proposed by Russell [10], the valence dimension measures whether humans feel negative or positive, and the arousal dimension measures whether humans are bored or excited. Mehrabian et al. [11] extended the emotional model from 2D to 3D by adding a dominance dimension, which measures submissive or empowered feelings.
The discrete emotion method is intuitive and widely used in peoples’ daily lives. However, it fails to cover the whole range of emotions exhibits by humans. The dimensional emotion method is less intuitive and often requires training the participants to use the dimensional emotion labelling system. Nevertheless, the dimensional emotion method is a more pragmatic and context-dependent approach to describe emotions [6]. In this study, considering the primary emotions of drivers during driving, we combine both the discrete emotion and dimensional emotion methods to describe drivers’ negative emotions (e.g., anger) and positive emotions (e.g., happiness) quantitatively by employing the well-known emotion difference scale (DES) [12] and self-assessed human body model (SAM) [13].
dataset |
|
Emotion | # of participants | Condition | Emotion model/lables | |||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
12AUs | |||||||||||||||||||||||||
|
|
Valence, arousal |
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
35,887 | wild setting |
|
|||||||||||||||||||||||||
|
|
|
100,000 | wild setting |
|
|||||||||||||||||||||||||
|
|
Valence and arousal |
|
wild setting |
|
|||||||||||||||||||||||||
|
|
|
|
|
|
|||||||||||||||||||||||||
|
|
|
450,000 | wild setting |
|
|||||||||||||||||||||||||
|
|
|
|
|
|
Driver emotion recognition is often conducted by analyzing driver emotion expressions. The expressions of human emotions consists of facial expressions, speech, body posture and physiological changes. So far, different behavioural measurements (e.g., facial expression analysis, speech analysis, driving behaviour) [33][34][35] , physiological signal measurements (e.g., skin electrical activity, respiration) [36][37], or self-reported scales (e.g., self-assessment manikin) [38] have been applied in driver emotion recognition. Comparatively, physiological measurements are more objective and can be measured continuously. However, this measurement is highly invasive and may affect drivers’ driving performance. Self-reported measurements measure the subjective experience of drivers when applied correctly, but such measurements cannot take place during the study without interruption. For the study on the driver emotion in the driving environment, it is crucial to use non-invasive and non-contact measurement methods. High intrusiveness has a significant impact on both on the driver emotion expression and actual emotional experience, therefore should be avoided [39]. To this end, this study employed facial expression to recognize driver emotions and ensure the continuity of data collection.
Facial expression is a powerful channel for drivers to express emotions [40]. Recent advances in facial expression-based emotion recognition have motivated the creation of multiple facial expression datasets. Publicly available datasets are fundamental for accelerating facial expression research. As shown in Table 1, we summarized the up-to-date representative public available datasets containing facial expressions. These datasets have been used for emotion recognition and to achieve different levels of success. As shown in Table I, one of the common aspects of these datasets is the collection of participants’ facial expression data in static life scenarios and wild settings. Although facial expression data collected in static life scenarios and wild settings can be employed to recognize emotions using various algorithms, it restricts the application of these algorithms into static life scenarios.
However, driving a car is a complex cognitive process [41], which requires the driver to dynamically respond to driving tasks, such as visual cues, hazard assessment, decision-making, strategic planning[42]. Consequently, driving occupies a lot of driver’s cognitive resources [43], and cognitive processing is needed to elicit emotional responses [44]. Driving affects drivers’ emotion expressions, which are different from the expressions in static life scenarios. As a result, if the above-mentioned algorithms are applied to dynamic driving scenarios, reliable recognition results may not be obtained. Thus, it is necessary to collect drivers’ facial expression data specifically for driver emotion recognition in dynamic driving scenarios and to analyze the human facial expression differences between dynamic driving scenarios and static life scenarios.
II DEFE Data Collection Framework
To address the above-mentioned limitations, we introduce a driver emotion facial expression dataset (DEFE) in this study for driver emotion studies in intelligent vehicles. Table 2 presents the details of the experimental design for stimulus material selection, data collection, experiment protocol, and emotional labels. The performance of different emotion recognition algorithms was analyzed in this study. Also, this paper analyzed the differences in human facial expressions between dynamic driving scenarios and static life scenarios.
Video-audio stimulus selection | ||||||||
---|---|---|---|---|---|---|---|---|
Number of stimulus | 18 | |||||||
Stimulus duration | 30s-120s | |||||||
Initial stimulus selection | Manually selected | |||||||
No. of rating per stimulus | 35 | |||||||
Rating scales |
|
|||||||
Rating values | Discrete scale of 1-9 | |||||||
Selection method |
|
|||||||
Driver facial expression data collection | ||||||||
Number of participants | 60 (17females, 43 males) | |||||||
Number of stimulus | 3 | |||||||
Number of driving tasks | 3 | |||||||
Rating scales |
|
|||||||
Rating values | Discrete scale of 1-9 | |||||||
Recorded signals | Driver facial expression videos | |||||||
DEFE dataset content | ||||||||
Number of video clips | 164 | |||||||
Emotions elicited |
|
|||||||
Clip duration | 30s | |||||||
Video clips format | MP4 | |||||||
Image resolution | 1920*1080, 648*480 | |||||||
Self-report of emotion | Yes | |||||||
Emotion categories labels |
|
|||||||
Emotion intensity labels |
|
|||||||
|
|
In our DEFE dataset, video-audio clips were used as the stimuli to induce different emotions. To this end, a large number of video-audio clips were collected using a manual selection method. Subjective annotation was then performed to select the most appropriate stimulus material. Each stimulus material was labelled at least 35 times using SAM and DES scales, and the most effective three video-audio clips were selected to induce a specific driver emotion in our following experiments for data collection. Then, 60 drivers participated in the data collection experiment. After watching each of the three randomly sequenced video-audio clips selected to elicit a specific emotion, each participant completed the driving tasks in the same driving scenarios and rated their emotional responses in the driving process from the aspects of dimensional emotion and discrete emotion. Besides, we conducted classification experiments for the scales of arousal, valence, dominance, as well as the emotion category and intensity to establish the baseline results for our dataset in terms of classification accuracy and F1 score. Furthermore, we discussed the differences in facial expressions between dynamic driving scenarios and static life scenarios in the same culture by comparing the responses of different action units (AUs) in our DEFE and the JAFFE datasets.
The main contributions of this paper can be described as:
-
•
We provide a new, publicly available dataset DEFE for spontaneous driver emotions analysis. The dataset contains frontal facial videos from 60 drivers, including their biographic information (gender, age, driving age), and subjective ratings on driver emotions (arousal, valence, dominance scales, as well as emotion category and intensity). To the best of our knowledge, this dataset is currently the only public dataset of driver facial expressions.
-
•
We compared the classification results of driver emotions on our DEFE dataset using the mainstream classification algorithms. The DEFE dataset supports driver emotion classification from two aspects, dimensional emotion (arousal, valence and dominance) and discrete emotion (emotional category and intensity). The comparisons established the baseline results of the introduced dataset with classification accuracy and F1 score.
-
•
The differences in human facial expressions between dynamic driving scenarios and static life scenarios were compared by analyzing drivers’ AUs presence, and the results showed significant differences between these two types of scenarios. Therefore, the previous human emotion datasets cannot be directly used for driver emotion analysis, and our introduced DEFE dataset fills this research gap.
The structure of this paper is as follows: Section III presents the selection of stimulus materials. Section IV introduces the DEFE data collection details, and the data processing, classification methods and results are described in Section V. Section VI compares human facial expression differences in dynamic driving scenarios and static life scenarios. The final conclusions are shown in Section VII.
III Video-audio Stimulus Selection
The stimulus is necessary to elicit target emotions. All emotion datasets present the stimuli to evoke emotions, such as the international affective picture system (IAPS) [45] and the international affective digitized sound system (IADS) [46]. Compared with images and music, videos and audios always bring strong emotional feelings. The existed researches have confirmed that video-audio clips can elicit the emotions of subjects reliably [12, 47] , hence video-audio clips were selected in our experiments. Eighteen initial video-audio clips were manually selected, and then we recruited participants to join a subjective rating experiment of these video-audio clips. Finally, three video-audio clips were selected based on the subjective rating results. Each of these steps is explained in detail as follows.
III-A Initial Video-Audio Clips Selection
To select the most effective video-audio clips, two research assistants (1 male and 1 female) reviewed more than 500 video-audio clips and conducted the preliminary screening. They were asked to select video-audio clips that lasted 30-120 seconds and contained content to elicit a single target emotion, including a negative emotion (anger), a positive emotion (happy), and a neutral state. Another two research experts (1 male and 1 female) with rich experience in driver emotions analysis evaluated each selected video-audio clip. A consensus of the two experts decided the selections of the video-audio clips.
The selected video-audio clips are mainly based on Chinese real-life scenarios and events, such as aggressive driving and chatting. Other video-audio clips selection criteria include: 1) the video background should not be too dark, 2) the clip should contain complete speech segments, and 3) there is only one wanted expressing emotion in the clip. Accordingly, we selected 18 video-audio clips and checked them further in subjective annotation session.
Targert Emotion |
|
Clip Content | ||
---|---|---|---|---|
Happiness | 62 |
|
||
Anger | 45 |
|
||
Neutral | 48 |
|
III-B Subjective Annotation
The web-based subjective emotion annotation experiment was conducted to evaluate the video-audio clips. For each participant, the 18 video clips were displayed in a random order, and there was a relatively long break time (3 minutes) between every two clips to avoid interference from the previous one. After watching each video-audio clip, participants finished two questionnaires based on their true feelings, namely the self- assessment manikin (SAM) [13] and the differential emotion scale (DES)[12]. SAM uses non-verbal graphical representations to assess the arousal, valence, and dominance level. The study in [13] has concluded the effectiveness of SAM. We adopted a 9-point scale (1 = ”not at all”, 9 = ” extremely ”) SAM [13] in our study for evaluation. DES is used to assess the different component of emotions, which consists of ten basic emotions. In this study, we used a 9-point scale DES (1= ”not at all”, 9= ” extremely ”) [12] to assess the intensity of each self-reported emotional dimension. None of the clips was evaluated twice by the same participant, and at least 35 assessments were collected for each video.

III-C Video-Audio clips Selection
Three video-audio clips were selected by comprehensively considering the SAM and DES results. Firstly, we normalized the variables by calculating the Z-scores and then conducted a cluster analysis using the K-means algorithm to identify the clusters of emotions based on the SAM data. The clustering results showed that a total of three emotion categories were generated, which corresponded to the positive emotion (happy), negative emotion (anger), and neutral, respectively. The video-audio clip whose rating was closest to the extreme corner of each quadrant was selected and marked as the representative video-audio clip of the cluster [21].
Moreover, we selected video-audio clips for each emotional category based on the following scores of the DES data: 1) Hit rate was defined as the proportion of participants who chose the target emotion. 2) The intensity value was defined as the average score of target emotions. 3) The success index represented the sum of the two Z-scores, which were obtained by normalizing the hit rate and intensity values. Next, video-audio clips with the highest success index were selected from each emotion category, and representative video-audio clips were also selected according to the SAM data for verification. It should be noted that the neutral video-audio clip was only selected based on the clustering results. Eventually, as shown in Table 3, three of the most effective videos were selected for driver facial expression data collection experiment.
IV Driver Facial Expression Data Collection
IV-A Ethics Statement
The experimental procedure and the video content shown to the participants were approved by Chongqing University Cancer Hospital Ethics Committee, China. Participants and data from participants were treated according to the Declaration of Helsinki. The participants were also informed that they had the right to quit the experiment at any time. The video recordings of the participants were included in the dataset only after they gave written consent for the use of their videos for research purpose. A few participants were also agreed to use their face images in research articles.
IV-B Participants
Sixty Chinese participants(47 males and 13 females) with aging from 19 to 56 years old (mean [M] = 27.3 years, standard deviation [SD] = 7.7. Years) were recruited to participate in this experiment from Shapingba District, Chongqing, China. Each participant had a valid driving license with at least one year of driving experience (average [M] = 5.5 years, standard deviation [SD] = 5.8, range = 1-30 years). All the participants had normal or corrected to normal vision (36 participants wear glasses) and normal hearing ability. The presence of occlusion such as glasses is a significant research challenge of facial expression recognition; hence participants wearing glasses were included to evaluate the robustness of emotion recognition. All the participants signed the consent form to participate in the study and received 100 CNY as financial reimbursement for their participation.
IV-C Experiment Setup
The experiments were carried out in a fix-based driving simulator (Figure.1(b)) with illumination-controlled (RDS2000, Real-time technology SimCreator, Ann Arbor, Michigan, USA). Figure.1(d) shows the front view, which was presented using three projectors, and the rear view was displayed using three LCD screens (one for the rearview in the vehicle and two for the left and right rear views). Another two LCD screens were used to display the dashboard and central stackf. The ambient noise and sound of the engine were presented through two speakers. The vibration of the vehicle was simulated through a woofer under the driver’s seat. For the presentation of stimulus without changing the internal environment of the driving simulator, as shown in Figure.1(e), we used a 20-inch central stack screen (, 60Hz) to display the video-audio stimulus materials. A stereo Bluetooth speaker (Xiaomi) was used to play the audio, and the audio volume was set to a relatively loud level. However, each participant was asked before the experiment whether the volume was comfortable and adjusted when necessary for clear hearing. During the experiment, as shown in Figure.1(a), the participants’ faces were continuously imaged with a visual camera. The visual face camera was an HD Pro Webcam C920 (Logitech, Newark, CA.) with a resolution of pixels, collecting data at a frame rate of 30 fps. Also, an iPad (Apple) was used for participant self-reported emotion. Figure.1(c) shows the overall data collection experiment setup.

IV-D Driving Scenarios and Tasks
Two driving scenarios on highways were realized in the simulator. The reason for setting these two scenarios is to minimize the impact of complex driving scenarios on driver performance. The first was a practice scenario (PD) to help participants familiarize themselves with the simulator before the experiment. As shown in Figure.2(a), the practice scenario was an 8km straight section of a four-lane highway with two for each driving direction. The participants were asked to drive on the right lane with the speed changes in the range of 80km/h – 50km/h – 100km/h. The second scenario is an emotional driving (ED) scenario. As shown in Figure.2(b), the emotional driving scenario was a 3km straight section of the same highway with a posted speed limit of 80km/h. The participants were asked to drive on the right lane with speed around 80km/h.
IV-E Experiment protocol
To obtain drivers’ ED data, we designed an experimental protocol about 45 minutes driving. The protocol was composed of one PD, followed by three ED. ED driving included angry driving (AD), happy driving (HD) and neutral driving (ND). Figure.3 presented details of the protocol. Before the experiment, each participant signed a consent form and filled out a basic information questionnaire (gender, age, driving age). Next, they were provided with a set of instructions to inform them of the experimental protocol and the definition of different scales used for self-reported emotions. Then, the participants were required to drive a 10-minute PD to help them get familiar with the operation and motion performance of the driving simulator. After a short break following PD, the participants started the three EDs. The corresponding emotion was induced by watching the selected video-audio clip at the beginning of each ED, following by driving with emotion. At the end of each ED session, the participant was required to report his/her self-evaluated emotion level using SAM and DES. There was a 3 minutes break between each two EDs. During the entire experiment, if the participants felt any discomfort, they could withdraw from the experiment at any time.
IV-F Self-Reported Emotion
To identify the emotion experienced by participants, we employed self-reported scales for subjective assessment of emotions. After each driving task, the participants were asked to assess their emotional experience while driving using SAM and DES. The SAM and DES scales were presented to participants by an iPad. In SAM, the valence scale ranged from unhappy to happy, the arousal scale ranged from calm to stimulation, and the dominance scale ranged from submissive (or ”without control”) to dominant (or ”under control, empowered ”). In DES, there were ten emotion dimensions, and each dimension evaluated the intensity of emotions from ”not at all” to ” extremely ”. Each dimension of the SAM scale and the DES scale is represented from one to nine by a Likert scale. If the self-assessments from participants were not consistent with the induced target emotions, we would use the participants’ self-reported data as the ground truth to label the facial video data.

V Data Processing and Evaluation
V-A Data Processing
In this section, we described the processing of driver facial expression data. First, we labelled the facial expression data of 60 drivers according to their self-reported emotion and removed the ED data that was not successfully induced. Second, we reported how to split data for driver emotion recognition, including splitting effective video clips from the original data and extracting driver facial expression.

During data collection, each participant completed three ED sessions with average recording data of 405s. Also, we compiled the self-reported data for each participant. As shown in Figure.4, the numbers of successfully induced emotional drivers were 52, 56, and 56 for the anger, happy, and neutral driving, respectively. Participants’ self-reported data were used as the ground truth to label driver facial expression data.

As per [48] and [49], the facial expression video sequences 15s after drivers started driving were clipped as the most effective data. Face detection and alignment in driving environments are challenging due to various poses, illuminations and occlusions (glasses). MTCNN (Multi-task Cascaded Convolutional Networks) is a cascade structure based on deep learning, which is relatively accurate when detecting faces in multiple pose angles and in unconstrained scenes [50]. Hence, we used MTCNN to track and extract driver face data from each video frame. After extracting driver face expression data, we obtained a total of 17,310 image frames of driver faces with 64*64 pixel.
Therefore, the created dataset contains facial expression videos and images from 60 drivers with the ground truth of dimensional emotion (valence, arousal and dominance) and discrete emotion (emotion categories and its intensity). A few examples of the dataset images are provided in Figure.5, which shows that drivers’ facial expressions varied with the types of emotion, but the variation was weak in some cases during driving, for example, the difference between AD and ND was tiny. Most video clips were challenging to observe peak expressions, and we also observed that the change of emotion with driving duration was weak, and this phenomenon is probably because the facial expression of emotion was affected by driving tasks.
V-B Classification Protocol
In this section, we introduced two different types of protocols for driver emotion recognition based on facial expression data. (1) To investigate driver emotion classification results based on the dimensional emotion model, we proposed three different nine-classification problems: valence, arousal, and dominance. To this end, the SAM scores of participants were used as the ground truth. Each classification (valence, arousal, dominance) on these scales was divided into nine levels (1 = ”not at all”, 9 = ”extremely”). (2) To study driver emotion classification results based on the discrete emotion model, we proposed a three-emotion classification protocol, namely anger, happiness, and neutral. Besides, we discussed the intensity recognition for anger and happy emotions, respectively. To this end, the DES scores were taken as the ground truth. Each emotion(anger and happiness) intensity was divided into 5 levels (5 = ”no emotion”, 9 = ”maximum intensity”).
It should be noted that the above approach can lead to unbalanced classes for some participants and scales. In light of this, we included F1 scores in order to report reliable results. The F1 score is a commonly used metric in classification tasks, which considers both precision (P) and recall (R) of the model. It quantifies the correct prediction of the positive samples. When categories are unbalanced, the F1 score will be attenuated [51]. We additionally used accuracy as another metric. Accuracy quantifies how well the classification correctly identifies or excludes conditions, and it is robust to unbalanced data.
Both the traditional and the deep learning methods for emotion recognition tasks were included in this study. As the most effective traditional method in most classification tasks [19], SVM (Support Vector Machine) was selected to be implemented by the sklearn toolbox with a linear kernel. As for the deep learning-based classification methods, Xception [52] was applied. The Xception network has been widely adopted in emotion recognition tasks, and many state-of-the-art emotion recognition networks are developed based on the Xception network[53][54]. For the network, the loss function can be expressed as:
(1) |
Equation1 where is the prediction and is the ground truth. The above deep learning method used the same training strategy. First, it employed Adam optimizer [55], which has a learning rate of and a weight decay of for training. Second, image augmentations, including random horizontal flips, random crop, and random rotation, were applied on-the-fly to increase the amount of training images effectively. SVM was applied with Intel R CoreTM i5-dual-core CPU. Xception was used with TITAN XP.
V-C Evaluation Results
Apart from the emotion recognition results for the proposed dataset, we also selected the DEAP [21] and CK + [19] datasets which were collected in static life scenarios as the comparison datasets. The DEAP dataset consists of 32 participants. Each participant watched 40 1-minute long video-audio chips as the emotional stimulus while recording facial videos and physiological signals. There are 40 trials recorded per participant, each corresponding to one emotion elicited by one video-audio chip. After watching each video, the participants were asked to assess their real emotions from five dimensions: valence, arousal, dominance, liking and familiarity. The rating ranges from 1 (weakest) to 9 (strongest), except liking and familiarity, which rating from 1 to 5. Facial videos from 22 of the participants were also recorded at the same time. This paper adopted the 22 facial videos in this dataset and investigated the emotion classification results based on the dimensional emotion model for comparison. The CK + dataset consists of 123 participants. This dataset was posed and spontaneous by multiple participants whose facial expressions started from neutral to the peak. In the CK + dataset, 327 sequences have discrete emotion labels including neutral, sadness, surprise, happiness, fear, anger, contempt and disgust. This paper selected the neutral, anger and happy sequences in this dataset to compare the emotion classification results based on discrete emotion models.
dataset | Method | Valence | Arousal | Dominance | |||
ACC | F1 | ACC | F1 | ACC | F1 | ||
DEFE | SVM | 53.39 | 54.79 | 59.49 | 63.04 | 59.49 | 63.04 |
Xception | 86.00 | 83.73 | 91.54 | 91.76 | 88.17 | 79.55 | |
DEAP | SVM | 27.88 | 23.24 | 29.82 | 23.25 | 28.12 | 24.14 |
Xception | 24.10 | 21.41 | 35.06 | 31.80 | 31.00 | 24.24 |
Table 4 shows the average accuracies and F1 scores (average F1 scores for nine classes) for each rating scale (valence, arousal and dominance) when using protocol one on DEFE. We compared the performances of SVM and Xception on the DEFE dataset. In general, the accuracies when using Xception method were at least higher than the accuracy when using SVM. The highest classification accuracy for valence, arousal, and dominance achieved , , and , respectively, when using Xception. In terms of the F1 scores, the highest scores for valence, arousal, and dominance were: , , and respectively, when using Xception. In addition to the emotion recognition results on the DEFE dataset, Table VI also shows the comparison results on the DEAP dataset when using the same recognition algorithms. The results show that the DEFE dataset had higher recognition accuracies and F1 scores than the DEAP dataset, which may be because the participants’ faces were affixed with electrode pads for physiological signals collection in the DEAP dataset, which affected the facial expression recognition results.
Dataset | Method | Emotion category | Angry intensity | Happy intensity | |||
ACC | F1 | ACC | F1 | ACC | F1 | ||
DEFE | SVM | 53.08 | 52.93 | 86.01 | 87.42 | 85.41 | 85.57 |
Xception | 90.34 | 90.21 | 97.60 | 97.12 | 97.88 | 97.59 | |
CK+ | SVM | 82.70 | 71.45 | - | - | - | - |
Xception | 94.31 | 93.25 | - | - | - | - |
Similarly, Table 5 shows the average accuracies and F1 scores for the emotion categories (anger, happiness, and neutral) when using protocol two. We also compared the classification results when using SVM, and Xception in Table V. The results show that both the highest classification accuracy () and the highest F1 scores () were obtained when using Xception. Apart from the emotion recognition results on DEFE, Table V also presented the comparison results on the CK + dataset when using the same recognition algorithms. The results show that the recognition results of the CK + dataset were higher than that of DEFE dataset.
Moreover, Table 5 shows the average accuracies and F1 scores of the intensity classification results on anger and happiness emotions when using protocol two with different algorithms. Five classes of the intensity of anger and happiness were classified based on facial expression data. The results show that the highest classification accuracies for angry and happy driving intensity were and , respectively. The highest F1 scores for angry and happy intensity were and , respectively. It should be noted that in recognition of emotion intensity, we did not compare the results with other datasets, because there was currently no spontaneous facial expression datasets with emotional intensity labels.
The comparison results in this section show that there is a difference in human facial expression between DEFE and CK+. Due to the influence of driving tasks in driving scenarios, facial expressions of drivers may be suppressed when they experience emotional states. Hence, it is necessary to discuss further the difference between human facial expressions in dynamic driving scenarios and static life scenarios.
VI The facial expression difference between dynamic driving and static life conditions
VI-A dataset Selection for Comparison
In this section, we conducted a differential analysis of the facial expressions between dynamic driving and static life conditions by comparing the DEFE and JAFFE datasets. The static life dataset, Japanese Female Facial Expression (JAFFE) dataset[14], was selected as a baseline. It was posed by 10 East-Asian females with seven emotion expressions (happy, anger, disgust, fear, sad, and neutral). Each female had two to four examples for each emotion. In total, there are 213 grayscale facial expression images in this dataset.
Given the East-Asian cultural background with small difference, the JAFFE dataset was the most optimal control group for our DEFE dataset because of the excluded most cultural bias[56]. Since DEFE only include two emotions (anger and happiness), we also selected anger and happiness expressions from JAFFE for analysis. Meanwhile, gender differences may affect the results so that we removed the male drivers from the initial DEFE dataset.

VI-B Differential Analysis Protocol
Each participant’s facial expressions were evaluated by observing subtle changes in facial features. The Facial Action Coding System (FACS) [58] is a systematic approach to describe what a face looks like when facial muscle movements have occurred. There are 44 coded facial muscle movements, namely Action Units (AUs), in FACS according to the presence and intensity of facial movements. Ekman et al. further proposed that facial emotion expressions could be coded as a combination of several AUs. Figure.6 (a) and (b) display the common FACS [57] codes for anger and happiness, respectively, and Figure.6 (c) presents the AUs descriptions for anger and happiness. In this study, the AU codes for anger (AU 4, 5, 7 and 23) and happiness (AU 6 and 12) were used as the basic units for differential analysis.
We utilized OpenFace [59], a facial expression analysis toolkit, to detect the presence of AUs. When an AU was detected, we coded it as 1 and otherwise 0. Due to video enable to capture enriched data, DFEE contained more facial expression information compared than JAFFE. In the end, the number of observations of happy and anger expressions in JAFFE was 61. DEFE, as a video dataset, had 10020 and 6660 number of observations of happy and angry expressions.
To analyze the differences of AU presence between dynamic driving and static life conditions, we conducted a statistical analysis to investigate the presence of AUs in the two datasets. Given the same emotions in both datasets, we should not observe a statistical difference if the facial expressions were similar between dynamic driving and static life conditions. Meanwhile, the average difference between the two datasets may not fully reflect emotional changes. Instead, it may be led by the baseline difference of two datasets.
Hence, to study the relationship of these AUs to anger and happiness in the two datasets, a logit regression was performed on the two datasets separately with happiness coded as 1 and anger as 0. If the relationship coefficients of AUs had differences in the two datasets, it could be concluded that some AUs performed differently between dynamic driving and static life scenarios. It should be noted that positive coefficient means the AU is related to happiness and negative coefficient means the AU is related to anger.
The presence of AUs in anger | AU 4 | AU 5 | AU 7 | AU 23 | ||
Anger | JAFFE | Average | 0.433 | 0.683 | 0 | 0.05 |
STD | 0.5 | 0.469 | 0 | 0.22 | ||
DEFE | Average | 0.066 | 0.351 | 0.467 | 0.157 | |
STD | 0.248 | 0.477 | 0.499 | 0.364 | ||
T-test | 5.689*** | 5.464*** | -76.378*** | -3.733*** | ||
The presence of AUs in Happiness | AU 6 | AU 12 | - | - | ||
Happiness | JAFFE | Average | 0.361 | 0.475 | - | - |
STD | 0.484 | 0.504 | - | - | ||
DEFE | Average | 0.177 | 0.18 | - | - | |
STD | 0.382 | 0.384 | - | - | ||
T-test | 2.950*** | 4.578*** | - | - | ||
Note: p<0.01: ***, 0.01<p<0.05: ** |
dataset | AUs | AU 4 | AU 5 | AU 6 | AU 7 | AU 12 | AU 23 |
JAFFE | Coefficient | -1.156*** | -0.415*** | 0.084 | 0.571*** | 1.743*** | -0.450*** |
S.E. | 0.1 | 0.039 | 0.066 | 0.038 | 0.101 | 0.055 | |
DEFE | Coefficient | -1.6** | 0.373 | 33.442 | 31.959 | -15.207 | 0.978 |
S.E. | 0.631 | 0.729 | 3961.164 | 3826.095 | 2037.702 | 1.516 | |
Note: p<0.01: ***, 0.01<p<0.05: ** |
VI-C Result and Discussion

Statistical analysis results of AUs presence are shown in Table 6. For happiness, the results show that AU6 and AU12 movements could be observed in both JAFFE and DEFE. However, compared with JAFFE, the presence frequencies of AU 6 and AU 12 in DEFE were significantly lower (p<0.01). For anger, the results show that AU4, AU5 and AU23 movements could be observed in both JAFFE and DEFE, and there are significant differences (p<0.01). Besides, we found that AU7 related to anger from DEFE did not appear in the anger expressions from JAFFE. Sample images of facial expressions in JAFFE and DEFE with labelled AUs as shown in Figure.7.
Compared with JAFFE, DEFE had a lower presence frequency on AU4, AU5, AU6, and AU12, especially AU4, which is highly related to anger, had a slight presence frequency in DEFE. The results may be caused by the main driving task, which requires concentration during driving, and the concentration may decrease the presence of AUs near eyes. On the other hand, the presence frequencies of AU7 and AU23 were lower in JAFFE, which maybe because of the difficulties to express negative emotions in Japanese culture [60].
The logit regression results are shown in Table 7. According to our regression results, in JAFFE, for happiness, the coefficients of AU6 and AU12 were consistent with the results from FACS [57], which means AU6 and AU12 were related to happiness. However, only the results of AU12 are significant (p<0.01). For anger, the coefficients of AU4, AU5, and AU23 were consistent with the results from FACS [57], which means AU4, AU5, and AU23 were related to happiness. The results of AU4, AU5, and AU23 are significant (p<0.01). Interestingly, AU7 (lid tightener) presence shows that AU7 was related to happiness which was different from previous researches[57]. In DEFE, only the result of AU4 was significant(0.01<p<0.05), and the coefficient was consistent with the research in FACS, indicating that AU4 had a significant predictive ability for anger. Other AUs were not observed with significant results.
Overall, for AUs presence, AU4 (Brow Lowerer), AU5 (Upper Lid Raiser), AU6 (Cheek Raiser), AU7 (Lid Tightener), AU12 (Lip Corner Puller), AU23 (Lip Tightener) are significant differences between dynamic driving and static life scenarios. The presence of AU4, AU5, AU6 and AU12 are higher in static life scenarios, indicating that AU AU4, AU5, AU6 and AU12 in dynamic driving scenarios are affected by the main driving tasks, which suppresses the facial expression of the driver ’s emotions. Meanwhile, the presence of AU7 and AU23 is higher in dynamic driving scenarios, which may be because Japanese culture suppresses the expression of negative emotions [60]. As for logit regression results, there are also significant differences between dynamic driving and static life scenarios. For anger, the results in dynamic driving scenarios show that only AU4 is significantly related to anger, while in static life scenarios AU4, AU5, and AU23 are all significantly related to anger. For happiness, the logistic regression results in dynamic driving scenarios show that there is no significant correlation between AUs and happiness, but the results in static life scenarios show that AU12 is significantly related to happiness. These significant differences were most likely due to the main driving tasks, which reduced the frequency and amplitude of facial muscle movements. Due to the limitation of JAFFE data amount, these results may require further investigations.
VII Conclusion and future work
In this work, a dataset for the analysis of spontaneous driver emotions elicited by video-audio stimuli is presented. The dataset includes facial expression recordings from 60 participants during driving. After watching each of the three video-audio clips selected to elicit specific emotions, each participant completed the driving tasks in the same driving scenarios and rated their emotional response in this driving process from the aspects of dimensional emotion and discrete emotion. These self-reported emotions include the scales of arousal, valence, and dominance as well as emotion category and intensity. We selected these three video-audio chips using the SAM and DES scales, which ensured the effectiveness of these stimulus materials aimed at the Chinese cultural background. Besides, we conducted the classification experiment for the scales of arousal, valence, and dominance as well as emotion category and intensity to establish baseline results for the proposed dataset in terms of accuracy and F1 scores, and these results were significantly higher than the results for random classification.
Moreover, we also compared the classification results in terms of accuracy and F1 score of the DEFE dataset with the DEAP and CK+ datasets, and the results show that the recognition results of the DEFE dataset are lower than the CK + dataset. Furthermore, we discussed the differences in facial expressions between driving and non-driving scenarios by comparing the presence of AU in the DEFE and JAFFE datasets. The results show that there were significant differences in AUs presence of facial expressions between driving and non-driving scenarios, and the difference will affect the results of facial emotion prediction, indicating that human emotional expressions in driving scenarios were different from other life scenarios. Therefore, publishing a human emotion dataset specifically for the driver is necessary for traffic safety improvement.
The DEFE dataset will be made publicly available after the work is published to allow researchers to evaluate their algorithms on an off-the-shelf driver facial expression dataset and investigate the possibility of applying them to applications. The DEFE data set provides the possibility to study emotion recognition from different emotion models simultaneously. Meantime, DEFE data can also be used to analyze the difference between driving and non-driving. Also, there are facial occlusions in DEFE, such as glasses and hands, which increases the complexity of facial expression recognition which is a significant research challenge.
Acknowledgment
The authors would like to thank Peizhi Wang, Qianjing Hu, Mingqing Tang, Bingbing Zhang, Guanzhong Zeng and Mengna Liao for their assistance.
References
- [1] W. H. Organization, Global status report on road safety 2015. World Health Organization, 2015.
- [2] L. James, Road rage and aggressive driving: Steering clear of highway warfare. Prometheus Books, 2000.
- [3] G. Li, W. Lai, X. Sui, X. Li, X. Qu, T. Zhang, and Y. Li, “Influence of traffic congestion on driver behavior in post-congestion driving,” Accident Analysis and Prevention, vol. 141, 2020.
- [4] G. Li, S. E. Li, R. Zou, Y. Liao, and B. Cheng, “Detection of road traffic participants using cost-effective arrayed ultrasonic sensors in low-speed traffic situations,” Mechanical Systems and Signal Processing, vol. 132, pp. 535–545, 2019.
- [5] F. Eyben, M. Wöllmer, T. Poitschke, B. Schuller, C. Blaschke, B. Färber, and N. Nguyen-Thien, “Emotion on the road—necessity, acceptance, and feasibility of affective computing in the car,” Advances in human-computer interaction, vol. 2010, 2010.
- [6] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 31, no. 1, pp. 39–58, 2008.
- [7] P. Ekman, W. V. Friesen, M. O’sullivan, A. Chan, I. Diacoyanni-Tarlatzis, K. Heider, R. Krause, W. A. LeCompte, T. Pitcairn, P. E. Ricci-Bitti, et al., “Universals and cultural differences in the judgments of facial expressions of emotion.,” Journal of personality and social psychology, vol. 53, no. 4, p. 712, 1987.
- [8] W. G. Parrott, Emotions in social psychology: Essential readings. Psychology Press, 2001.
- [9] P. Ekman and W. V. Friesen, “Constants across cultures in the face and emotion.,” Journal of personality and social psychology, vol. 17, no. 2, p. 124, 1971.
- [10] J. A. Russell, “A circumplex model of affect.,” Journal of personality and social psychology, vol. 39, no. 6, p. 1161, 1980.
- [11] A. Mehrabian, “Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament,” Current Psychology, vol. 14, no. 4, pp. 261–292, 1996.
- [12] J. J. Gross and R. W. Levenson, “Emotion elicitation using films,” Cognition & emotion, vol. 9, no. 1, pp. 87–108, 1995.
- [13] M. M. Bradley and P. J. Lang, “Measuring emotion: the self-assessment manikin and the semantic differential,” Journal of behavior therapy and experimental psychiatry, vol. 25, no. 1, pp. 49–59, 1994.
- [14] M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek, “The japanese female facial expression (jaffe) database,” in Proceedings of third international conference on automatic face and gesture recognition, pp. 14–16, 1998.
- [15] D. Lundqvist, A. Flykt, and A. Öhman, “The karolinska directed emotional faces (kdef),” CD ROM from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet, vol. 91, no. 630, pp. 2–2, 1998.
- [16] M. Pantic, M. Valstar, R. Rademaker, and L. Maat, “Web-based database for facial expression analysis,” in 2005 IEEE international conference on multimedia and Expo, pp. 5–pp, IEEE, 2005.
- [17] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, “A 3d facial expression database for facial behavior research,” in 7th international conference on automatic face and gesture recognition (FGR06), pp. 211–216, IEEE, 2006.
- [18] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-pie,” Image and Vision Computing, vol. 28, no. 5, pp. 807–813, 2010.
- [19] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” in 2010 ieee computer society conference on computer vision and pattern recognition-workshops, pp. 94–101, IEEE, 2010.
- [20] O. Langner, R. Dotsch, G. Bijlstra, D. H. Wigboldus, S. T. Hawk, and A. Van Knippenberg, “Presentation and validation of the radboud faces database,” Cognition and emotion, vol. 24, no. 8, pp. 1377–1388, 2010.
- [21] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “Deap: A database for emotion analysis; using physiological signals,” IEEE transactions on affective computing, vol. 3, no. 1, pp. 18–31, 2011.
- [22] I. Sneddon, M. McRorie, G. McKeown, and J. Hanratty, “The belfast induced natural emotion database,” IEEE Transactions on Affective Computing, vol. 3, no. 1, pp. 32–41, 2011.
- [23] S. M. Mavadati, M. H. Mahoor, K. Bartlett, P. Trinh, and J. F. Cohn, “Disfa: A spontaneous facial action intensity database,” IEEE Transactions on Affective Computing, vol. 4, no. 2, pp. 151–160, 2013.
- [24] F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, “Introducing the recola multimodal corpus of remote collaborative and affective interactions,” in 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–8, IEEE, 2013.
- [25] S. Du, Y. Tao, and A. M. Martinez, “Compound facial expressions of emotion,” Proceedings of the National Academy of Sciences, vol. 111, no. 15, pp. E1454–E1462, 2014.
- [26] X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale, A. Horowitz, P. Liu, and J. M. Girard, “Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database,” Image and Vision Computing, vol. 32, no. 10, pp. 692–706, 2014.
- [27] S. Happy, P. Patnaik, A. Routray, and R. Guha, “The indian spontaneous expression database for emotion recognition,” IEEE Transactions on Affective Computing, vol. 8, no. 1, pp. 131–142, 2015.
- [28] E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recognition with crowd-sourced label distribution,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 279–283, 2016.
- [29] C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5562–5570, 2016.
- [30] S. Zafeiriou, D. Kollias, M. A. Nicolaou, A. Papaioannou, G. Zhao, and I. Kotsia, “Aff-wild: Valence and arousal’in-the-wild’challenge,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–41, 2017.
- [31] S. R. Livingstone and F. A. Russo, “The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english,” PloS one, vol. 13, no. 5, 2018.
- [32] A. Mollahosseini, B. Hasani, and M. H. Mahoor, “Affectnet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18–31, 2017.
- [33] X. Wang, Y. Liu, F. Wang, J. Wang, L. Liu, and J. Wang, “Feature extraction and dynamic identification of drivers’ emotions,” Transportation research part F: traffic psychology and behaviour, vol. 62, pp. 175–191, 2019.
- [34] H. Gao, A. Yüce, and J.-P. Thiran, “Detecting emotional stress from facial expressions for driving safety,” in 2014 IEEE International Conference on Image Processing (ICIP), pp. 5961–5965, IEEE, 2014.
- [35] G. Li, S. E. Li, B. Cheng, and P. Green, “Estimation of driving style in naturalistic highway traffic using maneuver transition probabilities,” Transportation Research Part C: Emerging Technologies, vol. 74, pp. 113–125, 2017.
- [36] P. Wan, C. Wu, Y. Lin, and X. Ma, “On-road experimental study on driving anger identification model based on physiological features by roc curve analysis,” IET Intelligent Transport Systems, vol. 11, no. 5, pp. 290–298, 2017.
- [37] B. G. Lee, T. W. Chong, B. L. Lee, H. J. Park, Y. N. Kim, and B. Kim, “Wearable mobile-based emotional response-monitoring system for drivers,” IEEE Transactions on Human-Machine Systems, vol. 47, no. 5, pp. 636–649, 2017.
- [38] L. Malta, C. Miyajima, N. Kitaoka, and K. Takeda, “Analysis of real-world driver’s frustration,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 1, pp. 109–118, 2010.
- [39] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, “Analysis of emotion recognition using facial expressions, speech and multimodal information,” in Proceedings of the 6th international conference on Multimodal interfaces, pp. 205–211, 2004.
- [40] L. Yang, I. O. Ertugrul, J. F. Cohn, Z. Hammal, D. Jiang, and H. Sahli, “Facs3d-net: 3d convolution based spatiotemporal representation for action unit detection,” in 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 538–544, IEEE, 2019.
- [41] J. A. Groeger, Understanding driving: Applying cognitive psychology to a complex everyday task. Psychology Press, 2000.
- [42] G. Li, Y. Wang, F. Zhu, X. Sui, N. Wang, X. Qu, and P. Green, “Drivers’ visual scanning behavior at signalized and unsignalized intersections: A naturalistic driving study in china,” Journal of safety research, vol. 71, pp. 219–229, 2019.
- [43] T. Lajunen, D. Parker, and H. Summala, “The manchester driver behaviour questionnaire: a cross-cultural study,” Accident Analysis & Prevention, vol. 36, no. 2, pp. 231–238, 2004.
- [44] T. Brosch, K. R. Scherer, D. M. Grandjean, and D. Sander, “The impact of emotion on perception, attention, memory, and decision-making,” Swiss medical weekly, vol. 143, p. w13786, 2013.
- [45] P. J. Lang, M. M. Bradley, B. N. Cuthbert, et al., “International affective picture system (iaps): Technical manual and affective ratings,” NIMH Center for the Study of Emotion and Attention, vol. 1, pp. 39–58, 1997.
- [46] M. M. Bradley and P. J. Lang, “The international affective digitized sounds (; iads-2): Affective ratings of sounds and instruction manual,” University of Florida, Gainesville, FL, Tech. Rep. B-3, 2007.
- [47] A. Schaefer, F. Nils, X. Sanchez, and P. Philippot, “Assessing the effectiveness of a large database of emotion-eliciting films: A new tool for emotion researchers,” Cognition and Emotion, vol. 24, no. 7, pp. 1153–1172, 2010.
- [48] D. O. Bos et al., “Eeg-based emotion recognition,” The Influence of Visual and Auditory Stimuli, vol. 56, no. 3, pp. 1–17, 2006.
- [49] R. W. Levenson, L. L. Carstensen, W. V. Friesen, and P. Ekman, “Emotion, physiology, and expression in old age.,” Psychology and aging, vol. 6, no. 1, p. 28, 1991.
- [50] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
- [51] L. A. Jeni, J. F. Cohn, and F. De La Torre, “Facing imbalanced data–recommendations for the use of performance metrics,” in 2013 Humaine association conference on affective computing and intelligent interaction, pp. 245–251, IEEE, 2013.
- [52] O. Arriaga, M. Valdenegro-Toro, and P. Plöger, “Real-time convolutional neural networks for emotion and gender classification,” arXiv preprint arXiv:1710.07557, 2017.
- [53] C. Pramerdorfer and M. Kampel, “Facial expression recognition using convolutional neural networks: state of the art,” arXiv preprint arXiv:1612.02903, 2016.
- [54] S. Li and W. Deng, “Deep facial expression recognition: A survey,” arXiv preprint arXiv:1804.08348, 2018.
- [55] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- [56] R. E. Jack, O. G. B. Garrod, H. Yu, R. Caldara, and P. G. Schyns, “Facial expressions of emotion are not culturally universal,” Proceedings of the National Academy of Sciences, vol. 109, no. 19, pp. 7241–7244, 2012.
- [57] L. F. Barrett, R. Adolphs, S. Marsella, A. M. Martinez, and S. D. Pollak, “Emotional expressions reconsidered: challenges to inferring emotion from human facial movements,” Psychological Science in the Public Interest, vol. 20, no. 1, pp. 1–68, 2019.
- [58] P. Ekman, W. V. Friesen, and J. C. Hager, Facial action coding system: the manual. Salt Lake City, Utah: Research Nexus, 2002.
- [59] T. Baltrušaitis, P. Robinson, and L.-P. Morency, “Openface: an open source facial behavior analysis toolkit,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10, IEEE, 2016.
- [60] D. Matsumoto and P. Ekman, “American-japanese cultural differences in intensity ratings of facial expressions of emotion,” Motivation and Emotion, vol. 13, no. 2, pp. 143–157, 1989.
![]() |
Wenbo Li received the B.S. and M.Sc. degree in automotive engineering from Chongqing University, Chongqing, China, in 2014, and 2017, respectively. He is currently working toward the Ph.D. degree with the Advanced Manufacturing and Information Technology Laboratory,Department of Automotive Engineering, Chongqing University, Chongqing, China. He is also a visiting Ph.D. student at the Waterloo Cognitive Autonomous Driving (CogDrive) Lab at University of Waterloo, Canada. His research interests include intelligent vehicle, human emotion, driver emotion detection, emotion regulation, human-machine interaction, brain computer interface. |
![]() |
Yaodong Cui received the B.S. degree in automation from Chang’an University, Xi’an, China, in 2017, received the M.Sc. degree in Systems, Control and Signal Processing from the University of Southampton, Southampton, UK, in 2019. He is currently working toward a Ph.D. degree with the Waterloo Cognitive Autonomous Driving (CogDrive) Lab, Department of Mechanical Engineering, University of Waterloo, Waterloo, Canada. His research interests include sensor fusion, perception for the intelligent vehicle, driver emotion detection. |
![]() |
Yintao Ma received the B.Sc. degree in Engineering Mechanics from the University of Illinois in Urbana Champaign, USA, in 2018. She is currently working toward the M.Sc. degree with the Cognitive Autonomous Driving Laboratory, Department of Mechanical and Mechatronics Engineering, University of Waterloo, ON, Canada. Her research interests include machine learning, image processing, and facial expression recognition. |
![]() |
Xingxin Chen received the B.Sc. degree from Nanjing University, Nanjing, China, in 2018. He is a Master of Applied Science (MASc) student in Waterloo Cognitive Autonomous Driving (CogDrive) Laboratory, Department of Mechanical and Mechatronics Engineering, University of Waterloo, Canada. His research interests include domain adaptation, transfer learning, computer vision. |
![]() |
Guofa Li (M’18) received the Ph.D. degree in Mechanical Engineering from Tsinghua University, Beijing, China, in 2016. He is currently an Assistant Professor in mechanical engineering and automation with the College of Mechatronics and Control Engineering, Shenzhen University, Guangdong, China. His research interests include driving safety in autonomous vehicles, driver behavior and decision making, computer vision, machine learning, and human factors in automotive and transportation engineering. He is the recipient of the Young Elite Scientists Sponsorship Program by SAE-China (2018), the Excellent Young Engineer Innovation Award from SAE-China (2017), and the NSK Sino-Japan Outstanding Paper Prize from NSK Ltd. (2014). |
![]() |
Gang Guo received the B.S., M.S., and Ph.D degrees in automotive engineering from Chongqing University, Chongqing, China, in 1982, 1984, and 1994, respectively. He is currently the Chair and professor at the Department of Automotive Engineering, Chongqing University. He also serves as the Associate Director for the Chongqing Automotive Collaborative Innovation Center. He has authored and co-authored over 100 refereed journal and conference publications. His research interests include intelligent vehicle, multi-sense perception, human-machine interaction, brain computer interface, intelligent manufacturing, and user experience. Dr. Guo is a senior member of the China Mechanical Engineering Society and the Director of the China Automotive Engineering Society. He is also a member of the China User Experience Alliance Committee. |
![]() |
Dongpu Cao (M’08) received the Ph.D. degree from Concordia University, Canada, in 2008. He is the Canada Research Chair in Driver Cognition and Automated Driving, and currently an Associate Professor and Director of Waterloo Cognitive Autonomous Driving (CogDrive) Lab at University of Waterloo, Canada. His current research focuses on driver cognition, automated driving and cognitive autonomous driving. He has contributed more than 200 papers and 3 books. He received the SAE Arch T. Colwell Merit Award in 2012, and three Best Paper Awards from the ASME and IEEE conferences. Dr. Cao serves as an Associate Editor for IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, IEEE/ASME TRANSACTIONS ON MECHATRONICS, IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, IEEE/CAA JOURNAL OF AUTOMATICA SINICA and ASME JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT AND CONTROL. He was a Guest Editor for VEHICLE SYSTEM DYNAMICS and IEEE TRANSACTIONS ON SMC: SYSTEMS. He serves on the SAE Vehicle Dynamics Standards Committee and acts as the Co-Chair of IEEE ITSS Technical Committee on Cooperative Driving. |