Nhận dạng cử chỉ tay cho thuyết trình thông minh

48 333 0
Nhận dạng cử chỉ tay cho thuyết trình thông minh

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY TRAN NGUYEN LE HAND GESTURE RECOGNITION FOR INTELLIGENT PRESENTATION MASTER THESIS OF COMPUTER SCIENCE Hanoi - 2015 VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY TRAN NGUYEN LE HAND GESTURE RECOGNITION FOR INTELLIGENT PRESENTATION Major Code : Computer Science : 60480101 MASTER THESIS OF COMPUTER SCIENCE SUPERVISOR: Dr Le Thanh Ha Hanoi - 2015 AUTHORSHIP “I hereby declare that the work contained in this thesis is of my own and has not been previously submitted for a degree or diploma at this or any other higher education institution To the best of my knowledge and belief, the thesis contains no materials previously published or written by another person except where due reference or acknowledgement is made.” Signature:……………………………………………… SUPERVISOR’S APPROVAL “I hereby approve that the thesis in its current form is ready for committee examination as a requirement for the Master of Computer Science degree at the University of Engineering and Technology.” Signature:……………………………………………… ACKNOWLEDGEMENT I would like to express my sincere gratitude to my advisor Dr Le Thanh Ha, University of Engineering and Technology, Vietnam National University, Hanoi for his enthusiastic guidance, warm encouragement and useful research experiences I am grateful to thank all the teachers of University of Engineering and Technology, VNU for their extremely valuable knowledge, which they gave to me during my master course I would like to thank all my friends and other lab mates in Human Machine Interaction Laboratory for their helpful discussions about my research topic I sincerely acknowledge the basic research projects in natural science in 2012 of the National Foundation for Science & Technology Development (Nafosted), Vietnam (102.01-2012.36, Coding and communication of multiview video plus depth for 3D Television Systems) for supporting finance to my master study Last, but not least, my family is really the biggest motivation behind me I would like to thank my parents and my brother for supporting me spiritually throughout writing this thesis I would like to send them my gratefulness and love Hanoi, October 20th, 2015 Tran Nguyen Le v HAND GESTURE RECOGNITION FOR INTELLIGENT PRESENTATION Tran Nguyen Le Computer Science Abstract: This research presents a contour based hand gesture recognition solution for presentation control using depth image data In this work, a motion-based algorithm is used to detect and track human hand Then, the hand contours are extracted and described by illumination-, rotation- and scale-invariant feature vectors After that, logistic regression and multilayer perceptron classifiers are employed for hand posture and dynamic hand gesture recognition respectively Finally, in the application of presentation control, two recognized gestures are used to move forward to the next slide or backward to the preceding slide The experimental results exhibit the high recognition accuracy and efficiency of our approach, and our prototype application can control PowerPoint slides in real-time Keywords: hand gesture, recognition, intelligent presentation, depth image vi Table of Contents Table of Contents vii Abbreviations .ix List of Figures .x List of Tables xi Chapter INTRODUCTION 1.1 Motivation 1.2 Objectives 1.3 Methodology 1.4 Thesis‟s outline Chapter RELATED WORK 2.1 Infrared laser tracking devices for presentation 2.2 Distance transform based hand gesture recognition 2.3 Body tracking-based hand gesture recognition using Microsoft Kinect Chapter HAND GESTURE RECOGNITION FOR INTELLIGENT PRESENTATION .8 3.1 Image sequence preprocess 3.1.1 Motion extraction from depth images 3.1.2 Noise reduction 10 3.1.1 Initial hand detection 11 3.2 Hand localization .15 3.2.1 Hand tracking 15 3.2.2 Hand region segmentation 16 3.2.2.1 Updating the hand point position .16 3.2.2.2 Using depth threshold from the depth value of the hand point 17 3.2.2.3 Using blob detection to detect hand region from others 17 3.2.2.4 Reducing noise from hand area using hand point position 18 3.2.3 Hand contour extraction 18 3.3 Hand gesture recognition 20 3.3.1 Sample gesture definition 20 3.3.2 Feature vector selection 22 3.3.2.1 Hand posture 22 3.3.2.2 Dynamic hand gesture 24 3.3.3 Training and classifying 25 3.3.3.1 Hand posture 25 3.3.3.2 Dynamic hand gesture 26 3.4 Presentation controller .29 vii 3.4.1 System requirements 29 3.4.2 Workflow of controlling presentation .29 3.4.3 Presentation controller interface .30 Chapter EXPERIMENTAL RESULTS 32 4.1 Data collection 32 4.1.1 Hand posture database 32 4.1.2 Dynamic hand gesture database 32 4.2 Test-bed system and results .32 4.2.1 Accuracy of hand posture recognition 32 4.2.2 Accuracy of dynamic hand gesture recognition .33 4.2.3 Presentation controller performance 34 Chapter CONCLUSION .35 References 36 viii TV Abbreviations Television RGB Red Green Blue
 SDK PC Software Development Kit Personal Computer ix List of Figures Figure 2.1: Tracked skeleton joints of the user‟s body [9] Figure 3.1: Abstract layered view of proposed system Figure 3.2: The process of generating the motion image 10 Figure 3.3: (a) The opening operation, (b) The erosion operation, (c) The dilation operation 11 Figure 3.4: (a) The original motion image, (b) The reduced noise motion image 11 Figure 3.5: Motion clustering with hand size: (a) Before applying the threshold of hand size (b) After applying the threshold of hand size 12 Figure 3.6: Motion history image and motion template procedure Motion history at time (a) t, (b) t+1, (c) t+2, (d) Depth motion history image 13 Figure 3.7: The direction of cluster .14 Figure 3.8: Result of the initial hand detection .15 Figure 3.9: Hand tracking using Kalman filter 16 Figure 3.10: The result of hand region extraction using blob detection: (a) Detected blobs (b) Extracted blob including hand point 17 Figure 3.11: The result of hand segmentation .18 Figure 3.12: Binary image including hand area 18 Figure 3.13: Contour tracing using Moore-Neighbor tracing algorithm .20 Figure 3.14: Hand contour extraction using Moore-Neighbor tracing algorithm 20 Figure 3.15: Hand postures definition 21 Figure 3.16: Dynamic hand gesture definition 22 Figure 3.17: Computation of angle relation 25 Figure 3.18: Workflow of gesture recognition process 28 Figure 3.19: Workflow of controlling presentation .30 Figure 3.20: Presentation controller interface .30 Figure 4.1: Result with Logistic Regression classifier 33 x The central moment 𝜇𝑖𝑗 is defined as: (𝑥 − 𝑥 )𝑖 (𝑦 − 𝑦)𝑗 𝐼(𝑥, 𝑦) 𝜇𝑖𝑗 = 𝑥 (3.3) 𝑦 Where I (x, y) is the intensity at pixel (x, y), 𝑖𝑓 (𝑥, 𝑦) 𝑖𝑠 𝑜𝑛 𝑡𝑕𝑒 𝑐𝑜𝑢𝑡𝑜𝑢𝑟 I (x, y) = 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒 i, j are non-negative integers 𝑥 = 𝑀10 𝑀00 ,𝑦 = 𝑀01 𝑀00 The total area of the objects is given by M00 Scale invariant features can also be found in the scaled central moment The normalized central moment 𝜂𝑖𝑗 of order (i+j) is given by: 𝜇𝑖𝑗 𝜂𝑖𝑗 = 𝑖+𝑗 ) (3.4) (1+ 𝜇00 Based on these definitions, our study computes the Hu invariant moments as follow equations [5 - 12] 𝐻1 = 𝜂20 + 𝜂02 (3.5) 𝐻2 = ( 𝜂20 − 𝜂02 )2 + 4𝜂11 (3.6) 𝐻3 = (𝜂30 − 3𝜂12 ) + (3𝜂21 − 𝜂03 )2 (3.7) 𝐻4 = (𝜂30 + 𝜂12 ) + (𝜂21 + 𝜂03 )2 (3.8) 𝐻5 = (𝜂30 − 3𝜂12 )(𝜂30 + 𝜂12 )[(𝜂30 + 𝜂12 )2 − 3(𝜂21 + 𝜂03 )2 ] + (3𝜂21 − 𝜂03 )(𝜂21 + 𝜂03 )[3(𝜂30 + 𝜂12 )2 − (𝜂21 + 𝜂03 )2 ] 𝐻6 = (𝜂20 − 𝜂02 )[(𝜂30 + 𝜂12 )2 − (𝜂21 + 𝜂03 )2 ] + 4𝜂11 (𝜂30 + 𝜂12 )(𝜂21 + 𝜂03 ) 𝐻7 = (3𝜂21 − 𝜂03 )(𝜂30 + 𝜂12 )[(𝜂30 + 𝜂12 )2 − 3(𝜂21 + 𝜂03 )2 ] − (𝜂30 − 3𝜂21 )(𝜂21 + 𝜂03 )[3(𝜂30 + 𝜂12 )2 − (𝜂21 + 𝜂03 )2 ] 𝐻8 = (3.9) (3.10) (3.11) 𝑆 (3.12) 𝐶 Where S represents the square area of the hand, and C represents the contour‟s boundary 𝐻1 is analogous to the moment of inertia around the image‟s centroid, where the pixels‟ intensities are analogous to physical density 𝐻7 is skew invariant, which enables it to distinguish mirror images of other wise identical images 𝐻3 is not very useful as it is dependent on the others 𝐻8 is invariant to scale, rotation, translation and 23 it helps increase the accuracy of the proposed method a little bit 3.3.2.2 Dynamic hand gesture Differently, a hand posture presented in single depth frame, a dynamic hand gesture is distributed in a sequence of consecutive depth frames Therefore, the construction of feature descriptors for dynamic hand gestures is much more complex than for hand postures This research proposed three types of feature descriptors based on angle relation, distance relation, area-circumference relation in following sections These types of feature descriptors are chosen because of the observation Given a dynamic hand gesture is distributed in a sequence of n depth frames (states) The contour is represented on dimensions array Then, the coordinate of the smallest rectangular that cover hand region on the array is picked (xtop, ytop, xbottom, ybottom) The hand region position (xcenter, ycenter) is computed by: |𝑥𝑡𝑜𝑝 − 𝑥𝑏𝑜𝑡𝑡𝑜𝑚 | 𝑥𝑐𝑒𝑛𝑡𝑒𝑟 = + 𝑥𝑡𝑜𝑝 |𝑦𝑡𝑜𝑝 − 𝑦𝑏𝑜𝑡𝑡𝑜𝑚 | 𝑦𝑐𝑒𝑛𝑡𝑒𝑟 = + 𝑦𝑡𝑜𝑝 (3.13) (3.14) Therefore, from a sequence of n depth frame, a dynamic hand gesture now is distributed in n pairs Pi(xcenter, ycenter) Technical feature of a dynamic hand gesture can be described: - Angle relation With each depth image in series of hand posture, a pair Pi(xcenter, ycenter) is extracted Thus, we have P1, P2, …, Pn Then, the relative angle (𝑃𝑖 𝑃𝑖+1 , 𝑃𝑖+1 𝑃𝑖+2 ) is computed: cos(𝛼) = 𝑃𝑖 𝑃𝑖+1 𝑃𝑖+1 𝑃𝑖+2 (3.15) |𝑃𝑖 𝑃𝑖+1 | × |𝑃𝑖+1 𝑃𝑖+2 | Figure 3.17 illustrates how the angle is constructed The figure represents states of previous hand gesture, when the right hand moves from left to right Red dot represents the position of contour at a state Eventually, we have n-2 relative angles from n positions of hand posture series It means that this type of feature vector has (n2) dimensions 24 Figure 3.17: Computation of angle relation - Distance relation From a list of pairs Pi, we compute the length of vector 𝑃𝑖 𝑃𝑖+1 by using Euclidean distance 𝑑𝑖 = 𝑃𝑖 𝑃𝑖+1 = 𝑥𝑃𝑖 𝑃𝑖+1 + 𝑦𝑃𝑖 𝑃𝑖+1 (3.16) So that, from n pairs Pi, we have n-1 relative distances A feature vector of distance relation has (n-1) dimensions - Area- circumference relation The last, we propose another type of feature descriptor This type using the proportion between the square of hand region and the hand contour length It can be computed by: (3.17) 𝑆 𝑓 = 𝐶 Where S represents the square of the hand region, and C represents the contour‟s boundary Thus, a dynamic hand gesture includes n attributes of this relation So that, a feature vector of this type has n dimensions 3.3.3 Training and classifying 3.3.3.1 Hand posture In this research, Logistic classifier [16] used the training data to generate a Logistic 25 regression model with seven labels linked with seven poses Logistic regression can be binomial, ordinal or multinomial Binomial or binary logistic regression deals with the situations in which the observed outcome for a dependent variable can have only two possible type Multinomial logistic regression deals with situations where the outcome can have three or more possible types that are not ordered Ordinal logistic regression deals with dependent variables that are ordered In our work, there are seven poses of hand posture that are not ordered, so we chose Multinomial logistic regression for classification First, each frame‟s data is transformed to Logistic regression model data We define the training data as {𝑦𝑘 , 𝑥𝑖 }1𝑁 ; 𝑦𝑘 ∈ [1,7]; 𝑥𝑖 ∈ 𝑅𝑚 is the feature vector that contains attributes as described in previous section In this case, logistic regression model has K classes (K=7), we basically create K-1 binary logistic regression models where we can choose one class as reference or pivot Usually, the last class K is selected as the reference Thus, the probability of the reference class can be calculated by 𝐾 𝑃(𝑦𝑖 = 𝐾|𝑥𝑖 ) = − 𝑃(𝑦𝑖 = 𝐾|𝑥𝑖 ) (3.18) 𝑘=1 The general form of the probability is 𝑃(𝑦𝑖 = 𝐾|𝑥𝑖 ) = 𝑒𝑥𝑝(𝜃𝑖𝑇 𝑥𝑖 ) 𝐾 𝑇 𝑖=1 𝑒𝑥𝑝(𝜃𝑖 𝑥𝑖 ) (3.19) As the Kth class is reference 𝜃𝐾 = (0,0, ,0)𝑇 and therefore 𝐾 𝐾−1 𝑒𝑥𝑝(𝜃𝑖𝑇 𝑥𝑖 ) = 𝑒𝑥𝑝(0) + 𝑖=1 𝐾−1 𝑒𝑥𝑝(𝜃𝑖𝑇 𝑥𝑖 ) = + 𝑖=1 𝑒𝑥𝑝(𝜃𝑖𝑇 𝑥𝑖 ) (3.20) 𝑖=1 In the end, we get the following formula for all k

Ngày đăng: 03/04/2016, 18:09

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan