Interactive mixed reality media with real time 3d human capture

INTERACTIVE MIXED REALITY MEDIA WITH REAL TIME 3D HUMAN CAPTURE TRAN CONG THIEN QUI (B.Eng.(Hons.), Ho Chi Minh University of Technology) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF ENGINEERING DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE 2005 Abstract A real time system for capturing humans in 3D and placing them into a mixed reality environment is presented in this thesis The subject is captured by nine firewire cameras surrounding her Looking through a head-mounted-display with a camera in front pointing at a marker, the user can see the 3D image of this subject overlaid onto a mixed reality scene The 3D images of the subject viewed from this viewpoint are constructed using a robust and fast shape-from-silhouette algorithm The thesis also presents several techniques to produce good quality and speed up the whole system The frame rate of this system is around 25 fps using only standard Intel processor based personal computers Beside a remote live 3D conferencing system, this thesis also describes an application of the system in art and entertainment, named Magic Land, which is a mixed reality environment where captured avatars of human and 3D virtual characters can form an interactive story and play with each other This system also demonstrates many technologies in human computer interaction: mixed reality, tangible interaction, and 3D communication The result of the user study not only emphasizes the benefits, but also addresses some issues of these technologies i Acknowledgement I would like to express my heartfelt thanks to the following people for their invaluable guidance and assistance during the course of my work • Dr Adrian David Cheok • Mr Ta Huynh Duy Nguyen • Mr Lee Shangping • Mr Teo Sze Lee • Mr Teo Hui Siang, Jason • Ms Xu Ke • Ms Liu Wei • Mr Asitha Mallawaarachchi • Mr Le Nam Thang • All others from Mixed Reality Laboratory (Singapore) who have helped me in one way or another ii Contents Abstract i Acknowledgement ii List of Figures vi List of Tables ix Introduction 1.1 Background and Motivation 1.2 Contributions 1.3 Thesis Organization 1.4 List of Publications Background and Related Work 2.1 Model-based Approaches 10 2.1.1 Stereo-based approaches 11 2.1.2 Volume Intersection approaches 13 iii 2.2 Image-based Approaches 3D-Live System Overview and Design 3.1 3.2 4.2 30 Hardware and System Description 30 3.1.1 Hardware 30 3.1.2 System Setup 32 Software Components 34 3.2.1 Overview 34 3.2.2 Image Processing Module 35 3.2.3 Synchronization 41 3.2.4 Rendering 43 Image based Novel View Generation 4.1 29 44 Overview of the 3D Human Rendering Algorithm 44 4.1.1 Determining Pixel Depth 45 4.1.2 Finding Corresponding Pixels in Real Images 47 4.1.3 Determining Virtual Pixel Color 48 New Algorithm Methods for Speed and Quality 48 4.2.1 Occlusion Problem 48 4.2.2 New method for blending color 52 Model Based Novel View Generation 56 5.1 Motivation 56 5.2 Problem Formulation 58 iv 5.3 5.4 5.5 3D Model Generation Algorithm 59 5.3.1 Capturing a 3D Point Cloud 59 5.3.2 Surface Construction 60 5.3.3 Combining Several Surfaces with OpenGL 64 Result and Discussion 64 5.4.1 Capturing and Storing the Depth Points 64 5.4.2 Creating the Polygon List and Rendering 66 5.4.3 Composite Surfaces and Implications 67 Conclusion 69 Magic Land: an Application of the Live Mixed Reality 3D Capture System for Art and Entertainment 70 6.1 System Concept and Hardware Components 72 6.2 Software Components 75 6.3 Artistic Intention 78 6.4 Future Work 80 6.5 Magic Land’s Relationship with Mixed Reality Games 82 6.6 User Study of Magic Land 3D-Live system 86 6.6.1 Aim of this User Study 86 6.6.2 Design and Procedures 86 6.6.3 Results of this User Study 87 6.6.4 Conclusion of the User Study 91 v Conclusion 94 7.1 Summary 94 7.2 Future Developments 96 7.3 Conference and Exhibition Experience 98 vi List of Figures 2.1 Correlation methods Credit: E Trucco and A Verri [1] 12 2.2 Visual hull reconstruction Credit: G Slabaugh et al [2] 14 2.3 Color consistency Credit: Slabaugh et al [2] 16 2.4 Using occlusion bitmaps Credit: Slabaugh et al [2] 19 2.5 Output of Space-Carving Algorithm implemented by Kutulakos and Seitz [3] 2.6 Results of different methods to test color consistency, implemented by Slabaugh et al [4] 2.7 2.9 22 A line-based geometry Credit: Y H Fang, H L Chou, and Z Chen [5] 2.8 20 23 Reconstruction process of line-based models Credit: Y H Fang, H L Chou, and Z Chen [5] 24 Some results of Fang’s system [5] 26 2.10 A single silhouette cone face is shown, defined by the edge in the center silhouette Its projection in two other silhouettes is also shown Credit: Matusik et al [6] vii 28 2.11 One output of Matusik’s algorithm [6] 29 3.1 Hardware Architecture 31 3.2 Software Architecture 34 3.3 Color model 37 3.4 Results of Background subtraction: before and after filtering 41 3.5 Data Transferred From Image Processing To Synchronization 42 4.1 Novel View Point is generated by Visual Hull 46 4.2 Example of Occlusion In this figure, A is occluded from camera O 49 4.3 Visibility Computation: since the projection Q is occluded from the epipole E, 3D point P is considered to be invisible from camera K 4.4 50 Rendering Results: In the left image, we use geometrical information to compute visibility while in the right, we use our new visibility computing algorithm One can see the false hands appear in the upper image 51 4.5 Example of Blending Color 53 4.6 Original Images and Their Corresponding Pixel Weights 54 4.7 Rendering Results: The right is with the pixel weights algorithm while the left is not The right image shows a much better result especially near the edges of the figure 55 5.1 Construction of a Polygon List 61 5.2 Illustration of the model creation process 63 viii 5.3 Four reference views 65 5.4 Reducing Sampling Rate 66 5.5 Constructing a surface from sampled depth points 67 5.6 An un-filled polygon rendering of the object 68 5.7 Rendering of composite surfaces 68 5.8 Rendering of composite surfaces 69 6.1 Tangible interaction on the Main Table: (Left) Tangibly picking up the virtual object from the table (Right) The trigger of the volcano by placing a cup with virtual boy physically near to the volcano 6.2 Menu Table: (Left) A user using a cup to pick up a virtual object (Right) Augmented View seen by users 6.3 74 74 Main Table: The Witch turns the 3D-Live human which comes close to it into a stone 75 6.4 System Setup of Magic Land 76 6.5 Main Table: The bird’s eye views of the Magic Land One can see live captured humans together with VRML objects 80 6.6 Graph results for multiple choice questions 93 7.1 Exhibition at Singapore Science Center 98 7.2 Demonstration at SIGCHI 2005 99 7.3 Demonstration at Wired NextFest 2005 99 ix Chapter Conclusion 7.1 Summary This thesis has introduced a complete system for capturing and rendering humans and objects in full 3D The ultimate goal of this project is to achieve real-time 3D communication, that is closer in spirit to the kind of perfect tele-presence made popular by the Star Wars movies This allows humans to communicate with each other unrestricted as if the other person was really standing in front of him/her It would be a more complete experience because various body gestures and other nonverbal cues that were suppresses by other communication media could then be fully expressed The whole 3D-Live system has been presented in details in Chapter This is a complete and robust real time and live human 3D recording system, from capturing images, processing background subtraction, to rendering for novel view 94 CHAPTER CONCLUSION 95 points Many issues in designing and implementing this system have been described and addressed in this chapter In chapter 4, the thesis has gone through different methods to improve the image-based novel view generation algorithm It introduces new ways to compute visibility and blend color in generating images for novel viewpoints These contributions have significantly improved quality and performance of the system, and are very useful for mixed reality researchers After that, in chapter 5, the early stages of the development of a model based novel view generation approach has shown a lot of promise The feasibility and potential advantages of this method has now been revealed, and future work on this area could take 3D-Live closer to achieving real time 3D communication Going beyond communication, Magic Land has demonstrated the potential of 3D-Live technology in interactive art and entertainment The unique combination of mixed reality, tangible interaction and digital art creates an unparalleled novel experience that has been shown through many different conferences and exhibitions over the world and is now a permanent exhibit at the Singapore Science Center Results of the survey on Magic Land’s users reveal some important issues and emphasize the effectiveness of 3D-Live, mixed reality, and tangible interaction on Human Computer Interaction CHAPTER CONCLUSION 7.2 96 Future Developments 3D-Live will provide a framework for many breakthrough and pioneering human computer communication and interaction technologies in the future In the future, the following enhancements are foreseeable The image-based algorithm described in this thesis has not utilize the color information from images captured from camera Thus, the generated result is only a visual hull, not a photo-hull In the future, we will check the color consistency among captured images to acquire better rendering results The model-based algorithm presented in chapter is based on the assumption “Nearby 3D points project to nearby image points” This assumption could be violated by objects with large depth discontinuities, or by self occlusion One possible solution for this problem is checking the differences in depths of vertices If two nearby vertices are too far from each other, they will not be connected This solution will be approached in the future Moveover, currently, the described algorithm computes and throws away a different mesh for each frame of video For some applications, it might be useful to derive the mesh of the next frame as a transformation of the mesh in the original frame and to store the original mesh plus the transformation function Temporal processing such as this would also enable us to accumulate the texture (radiance) of the model as it is seen from different viewpoints Such accumulated texture information could be used to fill in parts that are invisible in one frame with information CHAPTER CONCLUSION 97 from other frames This will significantly increase the speed of the algorithm For Magic Land, the current limit of this system is that human 3D avatars not really interact with virtual objects It is because we are using pre-recorded data and we lack of active feedbacks from captured persons So, in the next step, we will implement a real-time capture system where players at the table can interact with the real-time avatar of the player being captured inside the recording room And at the same time, the captured person would receive feedback from the system about her/his location in Magic Land, while other users move him around with the cups What we intend to is replacing the green wall by high frequency screens These screens will frequently switch between displaying only a green screen and showing the virtual environment where her/his avatar is placing By this way, we will provide the captured person the real-time ego-centric view and she/he will be totally immersive in the virtual world (VR) and will be able to feel all interactions in the realest way For example, when the cup is placed in front of a dragon, the person inside the room will see this dragon standing right in front of her/him and maybe blowing fire toward her/him also Furthermore, the capture user could also affect the VR by reacting appropriately to her/his current status in the virtual environment She/he could actually interact with the user and other subjects through body position and movement For example, the position and movement of the captured person will decide how virtual objects interact on the table CHAPTER CONCLUSION 7.3 98 Conference and Exhibition Experience Up to now, Magic Land has been shown to both academic research community and public at different conferences and exhibitions In Singapore, it was first shown to public during the Planet Game exhibition at Singapore Science Center from September 2004 to February 2005 Currently, Magic Land is a permanent exhibition at this science center It has also been shown at the Interactivity Chamber of SIGCHI 2005, organized at Portland, USA, in April 2005 Most recently, in June 2005, Magic Land was demonstrated for around 30,000 attendees during the WIRED NextFest Exhibition at Chicago, USA This is a huge exhibition of around 120 projects which has been selected through a worldwide search for cutting-edge prototypes, installations, proof-of-concepts and other emerging technologies Figure 7.1: Exhibition at Singapore Science Center CHAPTER CONCLUSION Figure 7.2: Demonstration at SIGCHI 2005 Figure 7.3: Demonstration at Wired NextFest 2005 99 CHAPTER CONCLUSION 100 Bibliography [1] E Trucco and A Verri Introductory Techniques for 3D Computer Vision Prentice Hall, 1998 [2] G Slabaugh, W B Culbertson, T Malzbender, and R Schafer A survey of volumetric scene reconstruction methods from photographs In Proc International Workshop on Volume Graphics, 2001 [3] K N Kutulakos and S M Seitz A theory of shape by space carving In International Journal of Computer Vision, Vol.38, No 3, 2000 [4] G G Slabaugh, W B Culbertson, T Malzbender, M R Stevens, and R W Schafer Methods for volumetric reconstruction of visual scenes In The International Journal of Computer Vision, 2004 [5] Y H Fang, H L Chou, and Z Chen 3d shape recovery of complex objects from multiple silhouette images Pattern Recognition Letters, 24:1279 – 1293, 2003 101 BIBLIOGRAPHY 102 [6] W Matusik, C Buehler, and L McMillan Polyhedral visual hulls for real-time rendering In Proceedings of the 12th Eurographics Workshop on Rendering Techniques, 2001 [7] R Azuma A survey of augmented reality In Presence, 1997 [8] R Azuma, Y Baillot, R Behringer, S Feiner, S Julier, and B MacIntyre Recent advances in augmented reality In IEEE Computer Graphics and Applications, Nov./Dec.2001 [9] G Markus, W Stephan, N Martin, L Edouard, S Christian, K Andreas, K M Esther, S Tomas, V G Luc, L Silke, S Kai, V M Andrew, and S Oliver blue-c: A spatially immersive display and 3d video portal for telepresence In Proceedings of ACM SIGGRAPH, 2003 [10] J M Hasenfratz, M Lapierre, and F Sillion A real-time system for full body interaction Virtual Environments, pages 147–156, 2004 [11] S J D Prince, A D Cheok, F Farbiz, T Williamson, N Johnson, M Billinghurst, and H Kato 3d live: real time captured content for mixed reality In International Symposium on Mixed and Augmented Reality (ISMAR), 2002 [12] M Billinghurst and H Kato Real world teleconferencing In Proc of the conference on HFCS (CHI 99), 1999 [13] http://www.hitl.washington.edu/artoolkit/ BIBLIOGRAPHY 103 [14] M Billinghurst, A.D Cheok, S.J.D Prince, and H Kato Projects in VR Real World Teleconferencing In IEEE Computer Graphics and Applications, Volume 22, 2003 [15] MXRToolkit [Online] Available at: http://sourceforge.net/projects/mxrtoolkit/ [16] P Fua From multiple stereo views to multiple 3-d surfaces International Journal of Computer Vision, 24(1):19–35, 1997 [17] T S Huang and A N Netravali Motion and structure from feature correspondences: A review Proceedings of the IEEE, 82(2):252–268, 1994 [18] C R Dyer Volumetric scene reconstruction from multiple views In Foundations of Image Understanding, 2001 [19] H H Chen and T S Huang A survey of construction and manipulation of octrees Computer Vision, Graphics, and Image Processing, 43(3):409–431, 1988 [20] C H Chien and J K Aggarwal Volume / surface octrees for the representation of three-dimensional objects Computer Vision, Graphics, and Image Processing, 36(1):100–113, 1986 [21] S Srivastava and N Ahuja Octree generation from object silhouettes in perspective views Computer Vision, Graphics and Image Processing, 49(1):68–84, Jan 1990 BIBLIOGRAPHY 104 [22] M Potmesil Generating octree models of 3d objects from their silhouettes in a sequence of images Computer Vision, Graphics andand Image Processing, 40(1):1–29, Oct 1987 [23] R Szeliski Rapid octree construction from image sequences Computer Vision, Graphics and Image Processing: Image Understanding, 58(1):23–32, July 1993 [24] S Seitz and C Dyer Photorealistic scene reconstruction by voxel coloring In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 1997 [25] W Martin and J K Aggarwal Volumetric descriptions of objects from multiple views IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2):150–158, 1983 [26] W Matusik, C Buehler, R Raskar, S J Gortler, and L McMillan Image-based visual hulls Proc SIGGRAPH, pages 369–374, 2000 [27] Point Grey Research Inc [Online] Available at: http://www.ptgrey.com [28] ARToolKit [Online] Available at: http://www.hitl.washington.edu/artoolkit/ [29] OpenCV [Online] Available at: http://sourceforge.net/projects/opencvlibrary/ BIBLIOGRAPHY 105 [30] S J D Prince, A D Cheok, F Farbiz, T Williamson, N Johnson, M Billinghurst, and H Kato Live 3-dimensional content for augmented reality In IEEE Transactions on Multimedia (submitted) [31] T Horprasert et al A statistical approach for robust background subtraction and shadow detection In Proc IEEE ICCV’99 Frame Rate Workshop, Greece, 1999 [32] M Seki, H Fujiwara, and K Sumi A robust background subtraction method for changing background In Proceedings of the Fifth IEEE International Workshop on Applications of Computer Vision, 2000 [33] R Mester, T Aach, and L Dmbgen Illumination-invariant change detection using statistical colinearity criterion In DAGM2001, number 2191 in LNCS Springer, pages 170–177 [34] RGB “Bayer” Color and MicroLenses [Online] http://www.siliconimaging.com/RGB Bayer.htm [35] G Slabaugh, R Schafer, and M Hans Image-based photo hulls In Proceedings of the 1st International Symposium on 3D Data Processing, Visualization, and Transmission, 2002 [36] R Szeliski Video mosaics for virtual environments In IEEE Computer Graphics and Applications, March 1996 BIBLIOGRAPHY 106 [37] Remondino Fabio From point cloud to surface:the modeling and visualization problem In International Workshop on Visualization and Animation of Reality-based 3D Models, volume XXXIV-5, Tarasp-Vulpera, Switzerland, February 2003 International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences [38] Peter Savadjiev, Frank P Ferrie, and Kaleem Siddiqi Surface recovery from 3d point data using a combined parametric and geometric flow approach Technical report, Centre for Intelligent Machines, McGill University, 3480 University Street, Montral, Qubec H3A 2A7, Canada [39] H Kato, K Tachibana, M Tanabe, T Nakajima, and Y Fukuda Magiccup: A tangible interface for virtual objects manipulation in table-top augmented reality Proceedings of Augmented Reality Toolkit Workshop (ART03), pages 85–86, 2003 [40] H Schulzrinne, S Casner, R Frederick, and V Jacobson Rtp: A transport protocol for real-time applications Internet Engineering Task Force, Audio-Video Transport Working Group, 1996 [41] S P Lee, F Farbiz, and A D Cheok Touchy internet: A cybernetic system for human-pet interaction through the internet SIGGRAPH 2003, Sketches and Application, 2003 BIBLIOGRAPHY 107 [42] A Amory, K Naicker, J Vincent, and C Adams The use of computer games as an educational tool: identification of appropriate game types and elements 30(4):311–321, 1999 [43] T W Malone Toward a theory of intrinsically motivating instruction 5:333–369, 1981 [44] CAVE Quake II [Online] Available at: http://brighton.ncsa.uiuc.edu/ prajlich/caveQuake/ [45] T Oshima, K Satoh, H Yamamoto, and H Tamura Ar2 hockey system: A collaboration mixed reality system 3(2):55–60, 1998 [46] H Tamura, H Yamamoto, and A Katayama Mixed reality: Future dreams seen at the border between real and virtual worlds 21(6):64–70, 2001 [47] A D Cheok, X Yang, Z Zhou, M Billinghurst, and H Kato Touch-space: Mixed reality game space based on ubiquitous, tangible, and social computing Journal of Personal and Ubiquitous Computing, 6(5/6):430–442, 2002 [48] S Bjork, J Falk, R Hansson, and P Ljungstrand Pirates! - using the physical world as a game board In Interact 2001, IFIP TC 13 Conference on Human- Computer Interaction, Tokyo, Japan, 2001 BIBLIOGRAPHY [49] A D Cheok, S W Fong, K H Goh, X Yang, W Liu, and F Farzbiz Human pacman: A sensing-based mobile entertainment system with ubiquitous computing and tangible interaction 2003 [50] R L Mandryk and K M Inkpen Supporting free play in ubiquitous computer games 2001 108 [...]... for further developments • The real application, Mixed Reality Magic Land, is the cross-section where art and technology meet It not only combines latest advances in humancomputer interaction and human- human communication: mixed reality, tangible interaction, and 3D- live technology; but also introduces to artists of any discipline intuitive approaches of dealing with mixed reality content Moreover, future... realizing a robust real time capturing and rendering system which at the same time provides a platform for mixed reality based tele-collaboration and provides multi-sensory, multi-user interaction with the digital world The motivation for this thesis stems from here 3D- Live technology is developed to capture and generate realistic novel 3D views of humans at interactive frame rates in real time to facilitate... heralded mixed reality as an exciting and useful technology for the future of computer human interaction, and it has generated interest in a number of areas including computer entertainment, art, architecture, medicine and communication Mixed reality refers to the real- time insertion of computer-generated graphical content into a real scene (see [7], [8] for reviews) More recently, mixed reality systems... rates in real time to facilitate multi-user, spatially immersed collaboration in a mixed reality environment Besides, this thesis also presents an application, named “Magic Land”, a tangible interaction system with fast recording and rendering 3D humans avatars in mixed reality scene, which brings to users new kind of human interaction and self reflection experiences Although, the Magic Land system... Live 3d Human Capture Mixed Reality Interactive System”, In CHI’05 Extended Abstracts on Human Factors in Computing Systems (Portland, OR, USA, April 02 - 07, 2005) ACM Press, New York, NY, 1142-1143 • Ta Huynh Duy Nguyen, Tran Cong Thien Qui, Ke Xu, Adrian David Cheok, Sze Lee Teo, ZhiYing Zhou, Asitha Mallawaarachchi, Shang Ping Lee, Wei Liu, Hui Siang Teo, Le Nam Thang, Yu Li, Hirokazu Kato, Real Time. .. Time 3D Human Capture System for Mixed- Reality Art and Entertainment”, IEEE Transaction On Visualization And Computer Graphics (TVCG), 11, 6 (Nov - Dec 2005), 706 - 721 • Tran Cong Thien Qui, Ta Huynh Duy Nguyen, Adrian David Cheok, Sze Lee Teo, Ke Xu, ZhiYing Zhou, Asitha Mallawaarachchi, Shang Ping Lee, Wei Liu, Hui Siang Teo, Le Nam Thang, Yu Li, Hirokazu Kato, “Magic Land: Live 3D Human Capture Mixed. .. algorithm for generating an arbitrary viewpoint of a collaborator at interactive speeds, which was sufficiently robust and fast for a tangible augmented reality setting 3D- Live is a 8 CHAPTER 2 BACKGROUND AND RELATED WORK 9 complete system for live capture of 3D content and simultaneous presentation in mixed reality The user sees the real world from his viewpoint, but modified so that the image of a... manipulate her own avatar to interact with other virtual objects or even with avatars of other players Furthermore, in this mixed reality system, these interactions occurs as if they are in the real world physical environment Another capture system was also presented in [10] In this paper, the authors demonstrate a complete system architecture allowing the real- time acquisition and full-body reconstruction... detailed design and implementation of Magic Land system, a typical mixed reality application of 3D Live system in art and enter- CHAPTER 1 INTRODUCTION 6 tainment The hardware and software design of this system is presented This chapter also discusses about some modern well known mixed reality games, and makes a detailed comparison of Magic Land with these games Results of a user study conducted for Magic... self reflection and interaction with ones own 3D avatar), the system can be quite simply extended for live capture and live viewing Up to now, the idea of capturing human beings for virtual reality has been studied and discussed in quite a few research articles In [9], Markus et al presented “blue-c”, a system combining simultaneous acquisition of video streams with 3D projection technology in a CAVE-like ... and communication Mixed reality refers to the real- time insertion of computer-generated graphical content into a real scene (see [7], [8] for reviews) More recently, mixed reality systems have... augmented reality setting 3D- Live is a CHAPTER BACKGROUND AND RELATED WORK complete system for live capture of 3D content and simultaneous presentation in mixed reality The user sees the real world... Adrian David Cheok, Hirokazu Kato, “Magic Land: Live 3d Human Capture Mixed Reality Interactive System”, In CHI’05 Extended Abstracts on Human Factors in Computing Systems (Portland, OR, USA,