Measuring Happiness Around the World Through Artificial Intelligence

Rustem Ozakar Department of Computer Engineering
Erzurum Technical University
Erzurum, Turkey
[email protected] Rafet Efe Gazanfer Department of Computer Engineering
Erzurum Technical University
Erzurum, Turkey
[email protected] Y. Sinan Hanay Department of Computer Engineering
Erzurum Technical University
Erzurum, Turkey
[email protected]

Abstract

In this work, we analyze the happiness levels of countries using an unbiased emotion detector, artificial intelligence (AI). To date, researchers proposed many factors that may affect happiness such as wealth, health and safety. Even though these factors all seem relevant, there is no clear consensus between sociologists on how to interpret these, and the models to estimate the cost of these utilities include some assumptions. Researchers in social sciences have been working on determination of the happiness levels in society and exploration of the factors correlated with it through polls and different statistical methods. In our work, by using artificial intelligence, we introduce a different and relatively unbiased approach to this problem. By using AI, we make no assumption about what makes a person happy, and leave the decision to AI to detect the emotions from the faces of people collected from publicly available street footages. We analyzed the happiness levels in eight different cities around the world through available footage on the Internet and found out that there is no statistically significant difference between countries in terms of happiness.

Index Terms:

human happiness, artificial intelligence, machine learning, facial emotion recognition, happiness index

I Introduction

Emotions are a distinctive part of human nature. It is an essential element of life, and it contributes to our interactions with other people. Our intelligence attaches great importance to how we feel and distinguishes between different emotions. It is not possible to define or measure emotions in an exact manner, since they are abstract and subjective. However, social scientists have used different indicators about how one feels and what affects one’s feelings. Thus, measuring happiness is possible to some extent.

Happiness is a multidimensional complex fact [1]. It is a driving force and natural goal for most people. Researchers in social sciences have different methods on measuring happiness and variables that affect happiness.

Analyzing happiness levels or life satisfaction of countries attracted wide interest in social sciences and media. Generally, social sciences use polls to measure the level of happiness of society. Acquiring accurate results with happiness levels is a desired goal in different research areas. Machine learning can help with this problem by adding unbiased approach on measuring happiness and add an extra dimension to researchers in social sciences. In this mindset, our work examines the emotions from facial images collected from publicly available street footages of eight different cities around the world. Facial expressions from raw city footage are categorized into seven basic emotions; Anger, Sad, Neutral, Disgust, Surprise, Fear, Happy.

Our approach is fundamentally different than that of indicators used by social scientists. It is relatively unbiased, because it relies on machine learning to determine the emotions of people. Machine learning has been a powerful tool for solving problems in various domains such as medicine, economics and robotics. It can accurately detect emotions by analyzing the facial images.

The organization of the paper is as follows; related works in this area are discussed in Section II. Section III describes the proposed methods in detail. Results are given in Section IV and Section V concludes the paper.

II Related Work

Understanding the factors that affect happiness have been studied extensively. Veenhoven, Ruut and Ehrhardt [2] investigated if the data available on happiness are in accordance with the three theories of happiness; comparison, folklore and livability. In a previous study, the researchers concluded that happiness is adversely affected for the people when their neighboring countries becomes wealthier [3]. In another study [1], it is investigated whether social capital has a significant effect on happiness. In [4], researchers updated the Easterlin Paradox, the idea that happiness is proportional to income, is valid for only developed countries. They concluded that this paradox also holds for less developed countries. In [5], the effects of different variables on the happiness (migration, health issues, income, etc.) are investigated on both national and global scale. HSBC’s expat survey [6] measured the satisfaction of expats in different aspects for various countries.

There are varying approaches on the subject in literature for emotion recognition ranging from examining brain signals to hybrid approaches like analyzing audio and text input together with videos. Machine learning algorithms such as Support Vector Machines (SVM) [7], Neural Networks and K-nearest Neighbors (k-NN) can be used to predict emotions from face images.

Some of the notable works in this field are as follows; Ekman and Friesen [8] developed a system which is called FACS (Facial Action Coding System), where basic universal facial expressions are represented as combinations of different action units. Black and Yacoob [9] used polynomials to represent optical flow and extract the motion of the facial features. After that, they used a rule-based classifier to classify basic emotions. Yacoob and Davis [10] used optical flow of the detected facial features within rectangular regions with rule-based classifier for recognition. Essa and Pentland [11] transformed face images into mesh models, then calculated the optical flow from them to recognize facial expressions. Donato et al. [12] compared different techniques to extract and classify facial expressions. Cowie et al. [13] published a survey about emotion recognition containing detailed information about both audio and visual approaches. Cohen et al. [14] in their work used Naive Bayes Classifiers with a Cauchy distribution together with HMM for facial expression recognition. Ioannou et al. [15] used a combination of SVM, morphological operators, neural networks and neuro-fuzzy system to classify facial expressions. Gunes and Piccardi [16] used face and hand gestures in combination to determine emotional expression. Mansoorizadeh and Charkari [17] introduced a hybrid feature space from extracted features of speech and video and performed emotion recognition. Dahmane and Meunier [18] used HOG and SVM to classify emotions from videos containing face. Halder et al. [19] used different combinations of fuzzy sets to classify emotions from faces. Poria et al. [20] used Convolutional (CNN) and Recurrent Neural Networks (RNN) with Multiple Kernel Learning to analyze sentiment from audio, video and text. Zhang et al. [21] introduced a fusion approach using a Part-based Hierarchical Bidirectional RNN (PHRNN) and a Multi-Signal CNN Network (MSCNN) on images to recognize emotions. Jain et al. [22] also proposed a CNN-RNN hybrid model on face images for recognition. Hossein and Muhammad [23] worked on audio and image inputs together using CNN and SVM for emotion recognition.

III Methodology

In our work, we present an unbiased emotion detector through machine learning using publicly available videos of the crowded streets in various countries. We aimed to gather unbiased data from different cities around the world to analyze the psychology of the general population. Footages from different cities are converted to frames to be used as dataset. All methods mentioned here are performed with Python language.

To detect facial emotion recognition, first we need to detect faces in each frame in the dataset. For this purpose, we used Adam Geitgey’s [24] face recognition model. With this model, it is possible to locate face in the frame and crop the face region for further processing.

For each city, we traverse through all frames and detected faces are cropped, turned into grayscale and resized to 48x48 pixels using OpenCV library. Then, these images are sent to pre-trained deep learning model of Priya Dwivedi [25] as inputs. Classification result belongs to one of these emotion categories; Anger, Sad, Neutral, Disgust, Surprise, Fear, Happy. Finally, classification results are exported to a CSV file categorized by city, emotion and value.

III-A Dataset

Our dataset contains eight cities, Barcelona, Copenhagen, Istanbul, Kiev, London, New York, Paris and Tokyo. Footages were taken from various publicly available videos online, containing crowded street scenes dated between July 2013 - October 2020. Each city contains 300 detected 48x48 pixel face images in grayscale, thus complete dataset has 2400 images. It is made sure that dataset doesn’t have any mistaken image for face, or, any sunglasses or regular glasses present.

III-B Face Recognition

For face recognition, we used Adam Geitgey’s [24] CNN based face recognition library. It is actually a Python adaptation of Dlib’s [26] face recognition algorithm, a popular C++ library that consists of various machine learning algorithms. CNN method can detect faces from different angles, unlike the HOG based version in the Dlib library.

III-C Emotion Recognition

As previously mentioned, we used Priya Dwiedi’s model [25] for emotion recognition. They trained their model with The Facial Expression Recognition 2013 (FER-2013) database which is a work of Pierre-Luc Carrier and Aaron Courville [27]. This database consists of 28,709 face images for training in 48x48 pixel grayscale format.

A graphic representation of the CNN model prepared with visualkeras [28] can be seen in the Fig. 1. Activation functions used in layers are ReLU, except the last layer, which is softmax.

Refer to caption — Figure 1: CNN architecture for emotion recognition

IV Experimental Results

Our results showed that most common emotions detected with the algorithm were; Surprise, Fear, Happy and Anger. Sad and Neutral emotions were not detected in any of the cities. In Table-I detailed results containing all emotions are given. We notice that "surprise" emotion is very common. One explanation can be due to the recording nature of video (i.e. a youtuber recording the video) people might be surprised to notice a camera.

In Fig. 2, 95% confidence intervals for happiness proportion by cities are shown. The intervals show that there is no statistically significant difference between cities in terms of proportion of happy people. There can be several explanations for that. First, even though we collected street footage, the recordings used were not taken in an obscure manner and this might have biased emotions. Another explanation could be that the proportion of happy people in street is indeed similar across the world. Results are discussed in Conclusion section.

TABLE I: Emotion results by city

Anger Disgust Surprise Fear Happy Barcelona 2 0 90 203 5 Istanbul 1 1 64 227 7 Kiev 2 0 79 215 4 London 1 0 115 182 2 New York 1 0 61 230 8 Paris 2 0 31 267 0 Tokyo 3 0 32 260 5 Copenhagen 1 0 69 227 3

V Conclusion

In our work, we proposed the use of public footage as a resource to determine happiness levels in society. We collected footage from various cities around the world together, then used artificial intelligence to recognize emotions from facial expressions.

For the reason why surprise and fear are prevalent in results, we think that it could be due to sun light directly hitting people outdoors, thus making them grimace, or maybe they were in a rush to catch up something, or maybe they were surprised when they noticed the camera recording. Another reason may be different races have different facial structures which can give a way to unintended classifications. It could be due to the neural network architecture, or the learning method altogether, it may need more training with different datasets or update in architecture.

For a future direction, further work needed in creating a more specialized dataset for the purposed idea and different machine learning algorithms needs to be explored in terms of performance and accuracy.

References

[1] R. Ram, “Social capital and happiness: Additional cross-country evidence,” Journal of Happiness Studies, vol. 11, no. 4, pp. 409–418, 2010.
[2] R. Veenhoven and J. Ehrhardt, “The cross-national pattern of happiness: Test of predictions implied in three theories of happiness,” Social indicators research, vol. 34, no. 1, pp. 33–68, 1995.
[3] L. Becchetti, S. Castriota, L. Corrado, and E. G. Ricca, “Beyond the joneses: Inter-country income comparisons and happiness,” The Journal of Socio-Economics, vol. 45, pp. 187–195, 2013.
[4] R. A. Easterlin, L. A. McVey, M. Switek, O. Sawangfa, and J. S. Zweig, “The happiness–income paradox revisited,” Proceedings of the National Academy of Sciences, vol. 107, no. 52, pp. 22 463–22 468, 2010.
[5] J. D. Sachs, R. Layard, J. F. Helliwell et al., “World happiness report 2018,” Tech. Rep., 2018.
[6] HSBC, “Expat Explorer Survey - How Countries Compare,” https://www.expatexplorer.hsbc.com/survey/, 2019, [Online; accessed 16-11-2020].
[7] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory. ACM Press, 1992, pp. 144–152.
[8] P. Ekman, W. V. Freisen, and S. Ancoli, “Facial signs of emotional experience.” Journal of personality and social psychology, vol. 39, no. 6, p. 1125, 1980.
[9] M. J. Black and Y. Yacoob, “Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion,” in Proceedings of IEEE international conference on computer vision. IEEE, 1995, pp. 374–381.
[10] Y. Yacoob and L. S. Davis, “Recognizing human facial expressions from long image sequences using optical flow,” IEEE Transactions on pattern analysis and machine intelligence, vol. 18, no. 6, pp. 636–642, 1996.
[11] I. A. Essa and A. P. Pentland, “Coding, analysis, interpretation, and recognition of facial expressions,” IEEE transactions on pattern analysis and machine intelligence, vol. 19, no. 7, pp. 757–763, 1997.
[12] G. Donato, M. S. Bartlett, J. C. Hager, P. Ekman, and T. J. Sejnowski, “Classifying facial actions,” IEEE Transactions on pattern analysis and machine intelligence, vol. 21, no. 10, pp. 974–989, 1999.
[13] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal processing magazine, vol. 18, no. 1, pp. 32–80, 2001.
[14] I. Cohen, N. Sebe, A. Garg, L. S. Chen, and T. S. Huang, “Facial expression recognition from video sequences: temporal and static modeling,” Computer Vision and image understanding, vol. 91, no. 1-2, pp. 160–187, 2003.
[15] S. V. Ioannou, A. T. Raouzaiou, V. A. Tzouvaras, T. P. Mailis, K. C. Karpouzis, and S. D. Kollias, “Emotion recognition through facial expression analysis based on a neurofuzzy network,” Neural Networks, vol. 18, no. 4, pp. 423–435, 2005.
[16] H. Gunes and M. Piccardi, “Bi-modal emotion recognition from expressive face and body gestures,” Journal of Network and Computer Applications, vol. 30, no. 4, pp. 1334–1345, 2007.
[17] M. Mansoorizadeh and N. M. Charkari, “Multimodal information fusion application to human emotion recognition from face and speech,” Multimedia Tools and Applications, vol. 49, no. 2, pp. 277–297, 2010.
[18] M. Dahmane and J. Meunier, “Emotion recognition using dynamic grid-based hog features,” in Face and Gesture 2011. IEEE, 2011, pp. 884–888.
[19] A. Halder, A. Konar, R. Mandal, A. Chakraborty, P. Bhowmik, N. R. Pal, and A. K. Nagar, “General and interval type-2 fuzzy face-space approach to emotion recognition,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, no. 3, pp. 587–605, 2013.
[20] S. Poria, I. Chaturvedi, E. Cambria, and A. Hussain, “Convolutional mkl based multimodal emotion recognition and sentiment analysis,” in 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 2016, pp. 439–448.
[21] Y.-D. Zhang, Z.-J. Yang, H.-M. Lu, X.-X. Zhou, P. Phillips, Q.-M. Liu, and S.-H. Wang, “Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross validation,” IEEE Access, vol. 4, pp. 8375–8385, 2016.
[22] N. Jain, S. Kumar, A. Kumar, P. Shamsolmoali, and M. Zareapoor, “Hybrid deep neural networks for face emotion recognition,” Pattern Recognition Letters, vol. 115, pp. 101–106, 2018.
[23] M. S. Hossain and G. Muhammad, “Emotion recognition using deep learning approach from audio–visual emotional big data,” Information Fusion, vol. 49, pp. 69–78, 2019.
[24] A. Geitgey, “Face recognition,” https://github.com/ageitgey/face_recognition, 2017, [Online; accessed 13-11-2020].
[25] P. Dwivedi, “Face and Emotion Detection,” https://github.com/priya-dwivedi/face_and_emotion_detection/, 2019, [Online; accessed 13-11-2020].
[26] D. E. King, “Dlib-ml: A machine learning toolkit,” The Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
[27] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee et al., “Challenges in representation learning: A report on three machine learning contests,” in International conference on neural information processing. Springer, 2013, pp. 117–124.
[28] P. Gavrikov, “visualkeras,” https://pypi.org/project/visualkeras/, 2020, [Online; accessed 21-11-2020].