An effective facial expression recognition approach for intelligent game systems

12 128 0
An effective facial expression recognition approach for intelligent game systems

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

An effective facial expression recognition approach for intelligent game systems tài liệu, giáo án, bài giảng , luận văn...

Int J Computational Vision and Robotics, Vol 6, No 3, 2016 An effective facial expression recognition approach for intelligent game systems Nhan Thi Cao School of Media, Soongsil University, 511, Sangdo-Dong, Dongjak-Gu, Seoul, 156-743, Korea Email: ctnhen@yahoo.com An Hoa Ton-That University of Information Technology, Vietnam National University, Km 20, Hanoi Highway, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam Email: an_tth@yahoo.com Hyung-Il Choi* School of Media, Soongsil University, 511, Sangdo-Dong, Dongjak-Gu, Seoul, 156-743, Korea Email: hic@ssu.ac.kr *Corresponding author Abstract: This paper presents a novel facial expression recognition approach based on an improved model of completed local binary pattern and support vector machine classification to propose a method for applying to intelligence game applications as well as intelligence communication systems The capturing emotion of players can be applied in interactive games with various purposes, such as transferring player’s emotions to his or her avatar, or activating suitable action to communicate with players in order to obtain positive attitude of the players in educational games Our experiments on two databases included JAFFE (213 images) and CK (2040 images) databases show the effectiveness of the proposed method in comparison with some other methods The accuracy recognition rate of JAFFE database is 96.28% and CK database is 99.85% The advantage of this technique is simple, fast and high accuracy Keywords: facial expression recognition; completed local binary pattern; CLBP; intelligence game systems; support vector machine; SVM Reference to this paper should be made as follows: Cao, N.T., Ton-That, A.H and Choi, H-I (2016) ‘An effective facial expression recognition approach for intelligent game systems’, Int J Computational Vision and Robotics, Vol 6, No 3, pp.223–234 Copyright © 2016 Inderscience Enterprises Ltd 223 224 N.T Cao et al Biographical notes: Nhan Thi Cao is a PhD candidate at Computer Vision Lab in the School of Media at Soongsil University She received her BS (1998) in Information Technology from Dalat University She received her MS (2004) in Computer Science from University of Natural Science, Vietnam National University, Ho Chi Minh City An Hoa Ton-That holds a PhD in Computer Science Department at the University of Information Technology, which belongs to Vietnam National University, Ho Chi Minh City, Vietnam His research interests include computer vision, pattern recognition, fuzzy systems and artificial intelligence He received his BS (2005) in Information Technology He received his MS (2009) in Computer Science from Vietnam National University, Ho Chi Minh City, Vietnam He received his PhD (2014) in Computer Science from Soongsil University, Korea Hyung-Il Choi is a Professor in the School of Media at Soongsil University His research interests include computer vision, pattern recognition, and artificial intelligence He received his BS (1979) in Electronic Engineering from Yonsei University He received his MS (1983) and PhD (1987) degrees in Electrical Engineering and Computer Science from the University of Michigan This paper is a revised and expanded version of a paper entitled ‘A facial expression recognition method for intelligent game applications’ presented at Serious Games & Social Connect Community Conference and the International Symposium on Simulation & Serious Games 2014, Kintex Convention Center, South Korea, 23–24 May 2014 Introduction In recent years, with the development of intelligence communication systems, data-driven animation and intelligent game applications, facial expression recognition has attracted much attention as in Ahsan et al (2013), Cao et al (2013), Liao et al (2006), Priya and Banu (2012), Shan et al (2005, 2009), Zhao and Zhang (2012), for example In this paper, we propose a novel method for recognising facial expressions based on an improvement of a completed modelling of local binary pattern Our experiments on both Japanese Female Facial Expression (JAFFE) database as in Lyons et al (1999) and Cohn-Kanade (CK) database as in Kanade et al (2000) and Lucey et al (2010) show the effectiveness of the proposed method The accuracy rate obtained is high in compare with several other methods for both databases with seven classes of facial expression In intelligent games, emotion recognition of players through facial expression recognition can be used in many ways For example, in interactive and multiplayer games, emotions of players can be transferred to the players’ avatars on the screen Or in educational games, recognising players’ emotions can help the system how to behave in better manner For example, if the player is sleepy, the system may wake him/her up; or if the player is happy after doing something well, the system may cheer him/her up and so on Thus, facial expressions recognition of players is applied, intelligent game systems can become more interactive, vivid and attractive An effective facial expression recognition 225 The rest of the paper is organised as follows: in Section 2, the face region cropping is described Section presents the completed local binary pattern (CLBP) for facial expression recognition and in Section 4, experiments and results are shown Finally, in Section 5, the conclusions are given Face region cropping Face image pre-process is a process to attain normalised face images from input face images gotten from a camera or a database The normalised face images are used for extracting facial expression features This process can be divided into two steps: basic step and enhancement step The basic step is to detect the face region of an input face image and eliminate redundant regions This step can carry out by manual or a real-time face detector The enhancement step is to optimise the face region for extracting facial expression features This step can be made by cropping methods, image normalisation or image filter processes Then the face images are rescaled and used for feature extraction Figure shows the process of face image preprocess Figure The process of face image preprocess Database/ camera Input face images Basic processing Enhancement processing Feature extraction In this paper, the image preprocess is implemented as in Cao et al (2013) It included two steps of preprocess: basic process and enhancement process Normally, human face images from a camera or a database contain much redundant information, e.g., background or non-face regions So, to detect face region in face image, the robust real-time face detection algorithm developed by Viola and Jones (2004) is applied However, the face images obtained still contain some redundant areas that can impact accurate recognition result and processing speed, so in the enhancement step, a cropping technique is used as in following Figure Figure Face region cropped by the cropping method O(0, 0) y = h/6 P(x, y) x = (w1 – w2 ) / Square S for cropping w2 Human face image obtained from the robust real-time face detector w1 h 226 N.T Cao et al The cropping method can be described as following: • First, the size of square S used for cropping the human face in images is determined The side w2 of square S will be equal to the widthwise of the human face The size of square S depends on each database even each image However, based on tested results of some databases by the image preprocess method as in Cao et al (2013), the widthwise of the human face accounts for from 75% to 85% of the widthwise of face images obtained from the robust real-time face detector It means that values of w2 are counted as experimental parameters • Next step is to determine the coordinate P(x, y) from left-up corner of the image in order to crop the square S Let O(0, 0) is coordinate at left-up corner of human face image obtained from the robust real-time face detector, h is the height of the face image, w1 is the width of the face image and w2 is the width of the square S So, the coordinates are y = h/6 and x = (w1 – w2) / Expression y = h/6 based on face images having neutral facial expression Normally, forehead region occupies one-fourth of human face height Thus, forehead region occupies a not small region on human face region but it does not contain much essential information of face expressions For this reason, two-third (2/3) of upper forehead region is trimmed and one-third (1/3) of lower forehead region from eyebrows is retained Finally, the human face image obtained from the robust real-time face detector is cropped by square S at coordinate P(x, y) Figure shows the cropping technique applied for a face image Figure Face region cropped by the cropping technique (the small square) (see online version for colours) Face region cropped by the robust real-time face detector (large square) This cropping method aims at reducing processing time in steps of feature extraction and facial expression recognition, and most important being to improve the rate of facial expression recognition It is suitable for real time systems such as for intelligent human-machine systems or intelligent game applications The CLBP for facial expression recognition 3.1 Local binary pattern The local binary pattern (LBP) operator was first introduced as a complementary measure for local image contrast as in Ojala et al (1996) A LBP code is computed for a pixel in an image by comparing it with its neighbours as in equation (1): An effective facial expression recognition P −1 LBPP , R = ∑s(g p =0 p ⎧1, x ≥ − g c ) 2c , s ( x ) = ⎨ ⎩0, x < 227 (1) where gc is grey value of the central pixel, gp is the grey value of its neighbours, P is the total number of involved neighbours and R is the radius of the neighbourhood Based on the operator, each pixel of image is labelled by a LBP code For facial expression recognition, the uniform LBP code is usually used A LBP code is called uniform if it contains at most two bitwise transitions from to or vice versa when the binary string is considered circular as in Ojala et al (2002) For example, 00000000, 001110000 and 11100001 are uniform patterns An uniform LBP operator is denoted LBPPu,2R A histogram of a labelled image fk(x, y) can be defined as following: Hi = ∑ I ( f ( x, y ) = i ) , k i = 0,… , n − x, y (2) where n is the number of different labels produced by the LBP operator and ⎧1 A is true I ( A) = ⎨ ⎩0 A is false (3) This histogram contains information about the distribution of the local micro-patterns, e.g., spots, edges, corners or flat areas, etc., over the whole image 3.2 Local difference sign-magnitude transform According to Guo et al (2010), based on a central pixel gc and its P circularly and evenly spaced neighbours gp, p = 0, 1, …, P – 1, the difference between gc and gp can be calculated as dp = gp – gc The local difference vector [d0, …, dp–1] describes the image local structure at gc and can be decomposed into two components: ⎪⎧ s p = sign ( d p ) d p = s p * m p with ⎨ ⎪⎩m p = d p (4) ⎧1, d p ≥ is sign of dp and mp is the magnitude of dp The equation (4) is where s p = ⎨ ⎩0, d p < called the local difference sign-magnitude transform and it transforms the local difference vector [d0, …, dp–1] into a sign vector [s0, …, sp–1] and a magnitude vector [m0, …, mp–1] Figure shows an example of the transformation 228 N.T Cao et al Figure (a) A × sample block (b) Local difference (c) Sign component (d) Magnitude component 25 48 76 –7 19 32 41 –13 36 87 (a) 44 55 –23 (b) 1 16 (c) 13 16 44 55 23 (d) 3.3 Completed LBP with CLBP_S and CLBP_M operators The transformation shows that the original LBP uses only the sign vector to code the local pattern because it is proved that dp can be more accurately approximated by using the sign component sp than the magnitude component mp However, it is also found that the magnitude component may contribute additional discriminative information for pattern recognition if it is properly used The sign component is the same as the original LBP operator defined in equation (1) In CLBP, this component is denoted CLBP_S operator, whereas the magnitude component is continuous values as a replacement for the binary ‘1’ and ‘0’ values To code this component in a consistent format with that of sign component to exploit their additional information, the magnitude component is denoted CLBP_M operator and defined as in equation (5): P −1 CLBP _ M P , R = ∑t (m , c) p p =0 p ⎧1, x ≥ c , t ( x, c ) = ⎨ ⎩0, x < c (5) where the threshold c is to be determined adaptively and set as the mean value of mp from the whole image As the same uniform LBP operator, uniform CLBP_MP,R operator is denoted CLBP _ M Pu ,2R Two CLBP_S and CLBP_M operators have same binary string format, so they can be used together for pattern recognition In proposed method, to form a CLBP descriptor, histograms of CLBP_S and CLBP_M codes of the image are made by concatenation It means that the histograms of the CLBP_S and CLBP_M codes are calculated separately, and then concatenate the two histograms together This CLBP scheme can be represented as ‘CLBP_S_M’ 3.4 Extracting CLBP feature for facial expression recognition In facial expression recognition application, in order to represent the face efficiently, features extracted should retain spatial information For this reason, the face image can be An effective facial expression recognition 229 divided into small regions before extracting feature There have been proposed methods for resizing and dividing the face images, for example, 110 × 150 pixels with × regions shown in Figure 5(a) as in Shan et al (2005, 2009), Zhao and Zhang (2012) or 256 × 256 pixels with × regions shown in Figure 5(b), as in Ying et al (2009), or 64 × 64 pixels with eight regions shown in Figure 5(c) as in Liao et al (2006) Figure Proposed methods for resolution and region division (a) (b) (c) After the face images cropped, they are resized to resolution of 64 × 64 pixels as in Cao et al (2014), then the resized face images are divided into non-overlap regions of × pixels for extracting features Next, each region is calculated CLBP histogram or CLBP feature as in Figure The CLBP features extracted from each region are concatenated from left to right and up to down into a single feature vector of the face image Figure Calculating CLBP histogram for face image 3.5 Choosing effective threshold for CLBP_M Originally, CLBP was developed from LBP in order to obtain better result for texture classification, especially in case of rotation invariant texture classification They have been effectively used for facial expression recognition in recent The face image is divided into regions before extracting feature vector in facial expression recognition Since the textures of regions in a face image are different, so we tested some different thresholds c as following: • the mean value of mp from the whole image • the mean value of mp from the region • the mean value of mp from CLBP _ M Pu ,2R operator Experiment results on both JAFFE and CK databases show that choosing the threshold as in the last case obtains the best accurate rate in facial expression recognition 230 N.T Cao et al Experiments and results We applied the proposed method on two databases First database is JAFFE (JAFFE) database JAFFE database in Lyons et al (1999) includes 213 grey images of ten JAFFE Original images from the database have a resolution of 256 × 256 pixels In our experiments, we selected all 213 images as experiment samples Second database is the CK database as in Kanade et al (2000) and Lucey et al (2010) The CK database consists of 100 university students aged from 18 to 30 years, of which 65% were female, 15% were African-American, and 3% were Asian or Latino Subjects were instructed to perform a series of 23 facial displays, six of which were based on description of basic emotions (anger, disgust, fear, joy, sadness, and surprise) Image sequences from neutral to target display were digitised into 640 × 490 pixel arrays with eight-bit precision for greyscale values In CK database, many subjects not express all six primary emotions For our experiments, we chose subjects expressed at least three emotions (included neutral state) So, 86 subjects (56 females and 30 males) from the database are selected Each primary emotion of a subject includes six images with expression degrees from less to more and the neutral emotion is selected from some first images from the sequences Totally, 2040 images (234 anger images, 276 disgust images, 150 fear images, 390 joy images, 474 neutral images, 156 sadness images, and 360 surprise images) are selected for the experiments In classification step, support vector machine (SVM) classifier is applied since many applications have confirmed SVM obtaining high results for classifying facial expression as in Priya and Banu (2012) and Ahsan et al (2013) We used SVM functions with Radial Basis Functions kernel of OpenCV 2.1 In order to choose optimal parameters, we carried out grid-search approach as in Hsu et al (2010) Three-fold cross-validation method is applied for experiments on platform C++ for both databases Confusion matrix of JAFFE database and confusion matrix of CK database are shown in Tables and 2, respectively In these experiments, we used the percentage w2 / w1 of cropped image on preprocess step is 80% and the threshold c for CLBP _ M Pu ,2R operator is the mean value of mp from CLBP _ M Pu ,2R operator Table Confusion matrix of JAFFE database at 80% of percentage w2 / w1 Anger (%) Disgust (%) Fear (%) Joy (%) Neutral (%) Sadness (%) Surprise (%) Anger 96.67 0.00 0.00 0.00 0.00 3.33 0.00 Disgust 3.33 96.67 0.00 0.00 0.00 0.00 0.00 Fear 0.00 0.00 93.94 0.00 0.00 3.03 3.03 Joy 0.00 0.00 0.00 100.00 0.00 0.00 0.00 Neutral 0.00 0.00 0.00 0.00 100.00 0.00 0.00 Sadness 0.00 0.00 6.67 0.00 3.33 90.00 0.00 Surprise 0.00 0.00 0.00 3.33 0.00 0.00 96.67 Average: 96.28 An effective facial expression recognition Table Anger 231 Confusion matrix of CK database at 80% of percentage w2 / w1 Anger (%) Disgust (%) Fear (%) Joy (%) Neutral (%) Sadness (%) Surprise (%) 100 0.00 0.00 0.00 0.00 0.00 0.00 Disgust 0.00 100 0.00 0.00 0.00 0.00 0.00 Fear 0.00 0.00 100 0.00 0.00 0.00 0.00 Joy 0.00 0.00 0.00 100 0.00 0.00 0.00 Neutral 0.00 0.00 0.00 0.21 90.79 0.00 0.00 Sadness 0.00 0.00 0.00 0.00 0.00 100 0.00 Surprise 0.00 0.00 0.00 0.00 0.83 0.00 99.17 Average: 99.85 As we presented in Section 3.5, there are some ways to choose the threshold c for CLBP_M operator, our experiments show that choosing this value is the mean value of mp from CLBP _ M Pu ,2R operator gets the best results on both JAFFE and CK databases Table presents the recognition rate of two databases using various thresholds Figure illustrates the results of using three kinds of threshold choosing in a chart Table Recognition rate using various thresholds on CK and JAFFE databases Recognition rate of CK (%) Threshold Recognition rate of JAFFE (%) The mean value of mp from the whole image 99.72 95.32 The mean value of mp from the region 99.68 93.85 The mean value of mp from the CLBP _ M Pu ,2R operator 99.85 96.28 Figure The chart of the results comparing various thresholds (see online version for colours) 232 N.T Cao et al It is almost impossible to cover all of the published works However, for comparison, we would like to present several typical papers that represent state-of-the-art methods of facial expression recognition whereby an overview of the existing methods is presented The comparison of a number of state-of-the-art methods with proposed approach on JAFFE database and CK database is presented in Table and Table 5, respectively Table Comparison of the state-of-the-art methods with proposed method on JAFFE database Classifying methods Kind of feature Feng et al (2007) Shih et al (2008) Lina and Pan (2009) Zhao and Zhang (2012) LPTa SVM SVM 1-NNc SVM 2DPCA, LBP DKLLEd CLBP b Proposed method LBP 2D-LDA 7 7 213 213 211 213 213 Cross validation test 10-fold 10-fold 10-fold 10-fold 3-fold Recognition rate (%) 93.80 94.13 87.90 84.06 96.28 No of facial expressions Number of images a Notes: LPT: linear programming technique b 2D-LDA: 2D-linear discriminant analysis c 1-NN: 1-nearest-neighbour d DKLLE: discriminant kernel locally linear embedding Table Comparison of the state-of-art methods with proposed method on CK database Classifying methods Ahsan et al (2013) Shan et al (2009) SVM SVM Zhao and Khan et al Zhang (2012) (2013) 1-NNc Proposed method SVM SVM Gabor wavelet and LTPa BLBP DKLLEd PLBPe CLBP 7 7 Number of images 1,632 1,280 1,409 309 sequence 2,040 Cross validation test 7-fold 10-fold 10-fold 10-fold 3-fold Recognition rate (%) 96.90 91.40 95.85 96.70 99.85 Kind of feature No of facial expressions b a Notes: LTP: local transitional pattern b BLBP: boosted-LBP c 1-NN: 1-nearest-neighbour d DKLLE: discriminant kernel locally linear embedding e PLBP: pyramid of LBP Conclusions We presented a novel experimental method of facial expression recognition based on the proposed image preprocessing technique and the improvement of a CLBP model Our experiments showed that a suitable threshold selected in computation CLBP can obtain better recognition rate in facial expression recognition application Based on the An effective facial expression recognition 233 experiments, the accuracy recognition rate of JAFFE database is 96.28% and CK database is 99.85% Moreover, since the proposed method is very simple, fast and obtain high accurate even with smaller resolution (e.g., 48 × 48 pixels), so it is suitable for real time systems such as data-driven animation, intelligent game applications and intelligent human-machine interface systems Acknowledgements This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (2013R1A1A2012012) We would like to thank Professor Michael J Lyons for the use of JAFFE database, Professor Jeffery Cohn for authorising us to use CK database in this work References Ahsan, T., Jabid, T and Chong, U.P (2013) ‘Facial expression recognition using local transitional pattern on gabor filtered facial images’, IETE Technical Review, Vol 30, No 1, pp.47–52 Cao, N.T., Ton-That, A.H and Choi, H.I (2013) ‘An efficient method of face image preprocess for facial expression recognition’, International Journal of Engineering Associates, Vol 2, No 4, pp.10–16 Cao, N.T., Ton-That, A.H and Choi, H.I (2014) ‘Facial expression recognition based on local binary pattern features and support vector machine’, International Journal of Pattern Recognition and Artificial Intelligence, Vol 28, No 6, pp.1456012-1–1456012-24 Feng, X., Pietikäinen, M and Hadid, A (2007) ‘Facial expression recognition based on local binary patterns’, International Journal Pattern Recognition and Image Analysis, Vol 17, No 4, pp.592–598 Guo, Z., Zhang, L and Zhang, D (2010) ‘A completed modeling of local binary pattern operator for texture classification’, IEEE Transactions on Image Processing, Vol 19, No 6, pp.1657–1663 Hsu, C.W., Chang, C.C and Lin, C.J (2010) A Practical Guide to Support Vector Classification, Tech Rep., Taipei Kanade, T., Cohn, J.F and Tian, Y (2000) ‘Comprehensive database for facial expression analysis’, Proceeding 4th IEEE International Conference on Automatic Face and Gesture Recognition, France, pp.46–53 Khan, R.A., Meyer, A., Konik, H and Bouakaz, S (2013) ‘Framework for reliable, real-time facial expression recognition for low resolution images’, Pattern Recognition Letter, Vol 34, No 10, pp.1159–1168 Liao, S., Fan, W., Chung, A.C.S and Yeung, D.Y (2006) ‘Facial expression recognition using advanced local binary patterns, tsallis entropies and global appearance features’, IEEE International Conference on Image Processing, pp.665–668 Lina, D.T and Pan, D.C (2009) ‘Integrating a mixed-feature model and multiclass support vector machine for facial expression recognition’, Integrated Computer-Aided Engineering, Vol 16, No 1, pp.61–74 Lucey, P., Cohn, J.F., Kanade, T., Saragih, J and Ambadar, Z (2010) ‘The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression’, IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp.94–101 234 N.T Cao et al Lyons, M.J., Budynek, J and Akamatsu, S (1999) ‘Automatic classification of single facial images’, IEEE Transaction on Pattern Analysis Machine Intelligent, Vol 21, No 12, pp.1357–1362 Ojala, T., Pietikainen, M and Harwood, D (1996) ‘A comparative study of texture measures with classification based on featured distribution’, Pattern Recognition, Vol 29, No 1, pp.51–59 Ojala, T., Pietikainen, M and Maenpaa, T (2002) ‘Multiresolution gray-scale and rotation invariant texture classification with local binary patterns’, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 24, No 7, pp.971–987 Priya, G.N and Banu, R.S.D.W (2012) ‘Person independent facial expression detection using MBWM and multiclass SVM’, International Journal of Computer Applications, Vol 55, No 17, pp.52–58 Shan, C., Gong, S and McOwan, P.W (2005) ‘Robust facial expression recognition using local binary patterns’, IEEE International Conference on Image Processing, Vol 2, pp.370–373 Shan, C., Gong, S and McOwan, P.W (2009) ‘Facial expression recognition based on local binary patterns: a comprehensive study’, Image and Vision Computer, Vol 27, No 6, pp.803–816 Shih, F.Y., Chuang, C.F and Wang, P.S.P (2008) ‘Performance comparisons of facial expression recognition in JAFFE database’, International Journal Pattern Recognition and Artificial Intelligence, Vol 22, No 3, pp.445–459 Viola, P and Jones, M.J (2004) ‘Robust real-time face detection’, International Journal of Computer Vision, Vol 57, No 2, pp.137–154 Ying, Z., Cai, L., Gan, J and He, S (2009) ‘Facial expression recognition with Local binary pattern and Laplacian eigenmaps’, Huang, D.S et al (Eds.): International Conference on Intelligent Computing, pp.228–235, Springer Berlin Heidelberg Zhao, X and Zhang, S (2012) ‘Facial expression recognition using local binary patterns and discriminant kernel locally linear embedding’, EURASIP Journal on Advances in Signal Processing, Vol 2012, No 1, pp.1–9 ... expression recognition, and most important being to improve the rate of facial expression recognition It is suitable for real time systems such as for intelligent human-machine systems or intelligent game. .. expressions recognition of players is applied, intelligent game systems can become more interactive, vivid and attractive An effective facial expression recognition 225 The rest of the paper is organised... methods for both databases with seven classes of facial expression In intelligent games, emotion recognition of players through facial expression recognition can be used in many ways For example,

Ngày đăng: 16/12/2017, 15:11

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan