The rectification and recognition of document images with perspective and geometric distortions

THE RECTIFICATION AND RECOGNITION OF DOCUMENT IMAGES WITH PERSPECTIVE AND GEOMETRIC DISTORTIONS Lu Shijian NATIONAL UNIVERSITY OF SINGAPORE i THE RECTIFICATION AND RECOGNITION OF DOCUMENT IMAGES WITH PERSPECTIVE AND GEOMETRIC DISTORTIONS Lu Shijian A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT NATIONAL UNIVERSITY OF SINGAPORE 2005 ii Table of Contents Table of Contents Acknowledgements iii viii Abstract ix List of Figures x List of Tables Introduction xiii 1.1 Introduction……… ….……………………………………………….1 1.2 Investigated Approach…….……….………………………………………. 1.2.1 Introduction…………………………………………………………… 1.2.2 Document Image Rectification……………………………………… . 1.2.3 Document Image Recognition………………………………………… 1.3 Main Contributions……………….……… ………………………………. 1.4 Organization of the Thesis….………………………………………………8 Related Work 10 2.1 Introduction……………………………………………………………… 10 2.2 Document Image Rectification ……………………………………… 11 2.2.1 Skew Detection and Correction…… ……………………………… 11 iii 2.2.2 Perspective Distortion Detection and Correction………………… . 13 2.2.3 Geometric Distortion Detection and Correction…………………… 15 2.3 Document Image Recognition…… …………….……………………… 17 The Rectification of Document Skew 20 3.1 Introduction……………………………………………………………… 20 3.2 Overview………………………………………………………………… 22 3.3 Preprocessing…………………………………………… ……….……… 24 3.4 Text line Segmentation……… .……………………………….………… 24 3.4.1 Introduction…………………………………………….……….24 3.4.2 Character Centroid Tracing Algorithm…….………………… . 26 3.4.3 Document Block Segmentation…… …………….…………… 31 3.5 Character Orientation Determination…… ………………….……………33 3.5.1 Character Eigen-points Determination…… …….….………….33 3.5.2 Character Orientation Determination…….……….….…………34 3.6 Skew Estimation and Correction……… ……………………….……… 38 3.6.1 Skew Determination… ………………………… …………… 38 3.6.2 Skew Correction…………… ………………… …………… 40 3.6.3 Experiment Results………………… ……… ……………… 41 3.6.4 Discussion…………………………………………………… 42 3.7 Summary… ………………………………………………………………46 Perspective Document Rectification 48 4.1 Introduction……………………………………………………………… 48 4.2 Overview………………………………………………………………… 50 4.3 Vertical Stroke Boundary Identification… .………………………… .52 4.3.1 Introduction……………………………………………………. 52 iv 4.3.2 The Extraction of Stroke Boundaries……………… ………….52 4.3.3 Fuzzy Set Construction……………………… ……………… 56 4.3.4 Fuzzy Aggregation Operators… ………………………………59 4.3.5 Vertical Stroke Boundary Identification… ………………… 61 4.4 Text line Segmentation………………… .……………………………….64 4.5 Perspective Distortion Rectification…………………… ……………… 66 4.5.1 Introduction……………………………………………………. 66 4.5.2 Source Quadrilateral Construction……………… …………… 66 4.5.3 Target Quadrilateral Construction…… ………………………. 67 4.5.4 Rectification Homography Estimation…… ………………… 69 4.5.5 Perspective Rectification…… ……………………………… .71 4.5.6 Discussions…………………………………………………… 72 4.6 Summary… ………………………………………………………………78 Geometric Rectification of Document Images 79 5.1 Introduction……………………………………………………………… 79 5.2 Overview………………………………………………………………… 82 5.3 Vertical Stroke Boundary Identification… .…………………………… .83 5.4 Text Line Segmentation…… …………………………………………….84 5.5 Document Image Segmentation….……………………………………… 88 5.6 Target Rectangle Construction…….…………………………………… . 91 5.6.1 Introduction…………………………………………………… 91 5.6.2 Rough Character Classification……… ………………………. 92 5.6.3 Target Rectangle Construction… …………………………… 94 5.7 Perspective and Geometric Distortion Rectification….………………… 96 5.8 Experiment Results……………………………………………………… 99 v 5.9 Summary ……………………………………………………………… .106 Document Image Recognition 108 6.1 Introduction…………………………………………………………… 108 6.2 Overview…………………………………………………………………109 6.3 Text Line Segmentation… …………………………………………… .111 6.4 Vertical Stroke Boundary Identification……… .……………………….111 6.5 Character Recognition…… ……… ………… .…………………… 111 6.5.1 Introduction………………………………………………………….111 6.5.2 Perspective Invariant Extraction .………………………………… 112 6.5.2.1 Character Ascendant and Descendant Classification . 112 6.5.2.2 Character Euler Number Classification……………… … 113 6.5.2.3 Character Span Classification………… ………………… .115 6.5.2.4 Character Intersection Classification……………… …… 116 6.5.2.5 Character Vertical Stroke Boundary Classification……… . 117 6.5.3 Character Classification based on Perspective Invariants… 121 6.5.4 Post-processing …………………………………………………….124 6.6 Discussion……………………………………………………………… 127 6.7 Summary ……………………………………………………………… .129 Software Tools 131 7.1 Introduction…………………………………………………………… 131 7.2 Overview of Software Tools… ………… …………………………… 132 7.3 Layout Analysis………….… ………………………………………… 133 7.4 Document Image Rectification Module…….………………………… . 134 7.4.1 Distortion Type Determination ….…………………… . 134 7.4.2 Distortion Correction….…… …………………………………… 135 vi 7.5 Document Image Recognition Module .……………………………….137 7.6 Summary…………………………………………………………………137 Conclusion 137 8.1 Summary of Achievements……… .…………………………………….137 8.2 Possible Extensions………………………………………………………141 Bibliography 144 vii Acknowledgments On the completion of this thesis there are a number of people I wish to thank. First and foremost, I’m indebted to my supervisor, Professor Ben M. Chen, for his continuous guidance, insightful suggestions and enthusiastic inspiration. He advised me in various ways to improve my research acumen and shape my research capability. He makes my 4-year research work a most nourishing experience. I would also like to thank Professor C. C. Ko for his guidance. I am particularly grateful to Mr. Zhiying Zhou, Dr. Liang Dong, and Xu Xiang for their assistance with questions relating to computer vision and image processing. They provide me lots of valuable suggestions. Moving beyond DSA lab, I would like to thank my friends Dr. Kemao Peng, Guoyang Cheng, Yingjie He and Xinmin Liu for their assistances. Finally, but not the least, I would like to thank my beloved parents and my wife, for their endless love, forever. viii Abstract As sensor resolution increases in recent years, high-speed non-contact text capture through a digital camera is opening up a new channel for document capturing and processing. This thesis presents a new technique using fuzzy set and morphological operations, which is capable of rectifying and recognizing document images with perspective and geometric distortions. The proposed technique carries out the document distortion correction based on identified vertical character stroke boundary and fitted top line and base line of text lines using fuzzy set and morphological operations. The recognition algorithm classifies captured document text through the exploitation of perspective invariants such as Euler number and intersection numbers. Experimental results show the proposed document rectification algorithm is accurate, fast, and much easier to implement than the existing approaches reported in the literature. The recognition experiments over 150 distorted document images show the recognition rate with the proposed technique reaches over 93%. ix List of Figures 1.1 Document images with perspective and geometric distortions: (a) document images with perspective distortion; (b) document images with geometric distortion…………………………………………………. 3.1 The definition of features of text lines………………………………………… .22 3.2 Overview of the proposed skew detection and correction algorithm…………….23 3.3 Skewed document image scanner using a document scanner…………………… 25 3.4 The classification of character centroids based on distance constraints………… 28 3.5 Text line orientation estimation based on classified character centroids……… . 30 3.6 Detected character eigen-points…………………………………………………. 34 3.7 The detection of character ascendant and descendant through eigen-point classification……………………………………… ………………………… . 37 3.8 Estimation of top line and base line of text line based on classified character eigen-points……………………………………………………………………… 38 3.9 Corrected document image……………………………………………………….39 3.10 Skewed document image with multiple local skews……………………… 41 3.11 Corrected document image corresponding to the one given in Figure 3.11…… 42 3.12 Skewed document image printed in handwritten text……………………….… 43 3.13 Corrected document image corresponding to the one given in Figure 3.12…… 44 3.14 Skewed document image with figure………………………………………… . 45 3.15 Corrected document image corresponding to the one given in Figure 3.14…… 46 x sample images show the implemented system is able to rectify and recognize the document image with skew, perspective, and geometric distortions efficiently. The implemented software system is coded using Matlab at current stage, so the rectification and recognition processes are a bit slow. The processing speed can be greatly improved after the system is implemented using C or C++ language. 138 Chapter Conclusion 8.1 Summary of Achievements This chapter concludes the dissertation by summarizing the main developments and achievements of this work. This thesis presents a set of document image rectification and recognition algorithms that are able to rectify and recognize document images with skew, perspective, and geometric distortions. Different from the reported approaches in the literature, the proposed rectification algorithms model various distortions through analyzing the shape of character stroke boundary. Accordingly, they not need to assume those image features such as HDB and VPM that don’t always exist within captured document images. At the same time, they don’t need some special hardware equipments and what they need is only a single document image captured using a document scanner or digital camera. Besides, a distortion tolerant recognition algorithm is designed and it is able to recognize distorted document text with no rectification. The concepts of document rectification and recognition are described in Chapter 2. The works on the detection and correction of document skew, perspective, and geo- 139 metric distortions reported in the literature are reviewed. The related machine-printed and handwritten text recognition techniques have been reviewed as well. In Chapter 3, a document skew detection and correction algorithm is presented. The proposed algorithm detects the skew distortion based on the orientations of characters and text lines. The orientation of text lines is estimated through a novel character classification technique, whereas the rough orientation of characters is determined through the detection and classification of character eigen-points. The advantage of the proposed algorithm is that it is able to detect skew angle ranging from to 360 degree and the skew estimation speed is independent of the amplitude of skew angle. Furthermore, the proposed algorithm is able to detect multiple local skews. Chapter describes an algorithm that is able to detect and correct perspective distortion coupled with the document image captured using a digital camera. The proposed algorithm works based on the rectification homography, which is estimated using two sets of orthogonal straight lines. One set of straight lines representing the orientation of text lines is determined based on the character classification result as described in Chapter 3. The other set are estimated through the vertical stroke boundaries that are identified from character stroke boundaries using a few fuzzy set and aggregators. Compared with the reported methods, the proposed algorithm is much more robust, as it requires only text information. In Chapter 5, a geometric distortion rectification algorithm is presented and it is able to correct document images with geometric distortion where text lies on a curved instead of planar surface. I propose to remove geometric distortion through the image segmentation, which partitions distorted document images into multiple small patches where text can be approximated to lie on a planar surface. The segmentation of docu- 140 ment images is implemented using two sets of orthogonal straight lines that have been utilized and described in Chapter and 4. Accordingly, the correction of global geometric distortion is finally carried out through the local rectification of partitioned image patches one by one. Compared with the reported methods that require some special hardware equipment or complicated three-dimension reconstruction with multiple document images captured from different viewpoints, the advantage of the proposed geometric distortion correction algorithm is that it needs a single document image captured using a generic digital camera. Chapter describes a document text recognition algorithm. The proposed algorithm aims to recognize distorted document text using several perspective invariants that are detected based on the shape of character strokes. The utilized invariant features include character ascender and descender information, character Euler number information, character span information, character intersection number information, and vertical stroke boundary information. With multiple invariants determined, distorted characters are classified through a character categorization process or a vector distance minimization method when categorization fails. Compared with the reported recognition techniques, the proposed recognition algorithm is able to recognize document text with various distortions and so greatly speeds up the recognition process. 8.2 Future Work The work presented in this thesis may be extended in many directions. Some directions for further work are proposed below. Based on identified vertical stroke boundaries and estimated text line orientations, some camera parameters could be estimated. The 3D model of curved document 141 surface may be reconstructed with estimated camera parameters and the shading information with a single document image. With identified vertical stroke boundaries and estimated text line orientations, the layout of document images with perspective and geometric distortions may be analyzed. At present stage, the proposed rectification techniques cannot handle handwritten text well, as they rely heavily on the vertical stroke boundary. Some new techniques may be investigated for the handwritten text rectification in the future work. The proposed geometric technique assumes that text lies on a smoothly curved document surface where the orientation of text lines can be modeled with a quadratics. Therefore, it cannot deal with document image with arbitrary distortion such as the rubbed paper sheet. Some new geometric models may be exploited for the estimation of text line orientation. The proposed rectification techniques assume that captured document text is typed with Roman letters. Some new techniques that are able to handle multiple languages including Chinese, Kanji, and Arabic will be very meaningful. Currently, the proposed rectification techniques work on a single document image captured using a digital camera. Based on the intersections between identified vertical stroke boundaries and horizontal lines, the proposed algorithms may be extended to the video systems that are able to track and rectify document in real time through feature point matching. The performance of the proposed recognition technique relies heavily on the character segmentation result. Some novel character segmentation methods that are able to segment document text with perspective and geometric distortions 142 may be investigated in the future step. At present stage, the proposed recognition technique works with character-level classification based on a few perspective invariants that characterize character shape. The recognition algorithm may be improved through incorporating the word knowledge. Some similar perspective invariants such as ascender and descender, Euler number, and horizontal intersection that describe word shape may be very helpful. As the proposed techniques have great potentials to be commercialized for document text capture and understanding. It will be very meaningful to implement the proposed techniques in some potable devices such as digital camera, mobile phone, or PDA. 143 Bibliography [1] M. S. Brown and W. B. Seales, “Beyond 2D Images: Effective 3D Imaging for Library Materials”, ACM Conference on Digital Library, San Antonio, Texas, US, 2000, pp. 27-36. [2] M. S. Brown and W. B. Seales, “Document restoration using 3D shape: A General Deskewing Algorithm for Arbitrarily Warped Documents,” International Conference on Computer Vision, Vancouver, Canada, 2001, pp. 117−124. [3] M. S. Brown and W. B. Seales, “Image Restoration of Arbitrarily Warped Documents,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, No. 10, 2004, pp. 1295-1306. [4] G. Agam and C. H. Wu, “Structural Rectification of Non-planar Document Images: Application to Graphics Recognition,” Proceedings of the Fourth International Workshop on Graphics Recognition Algorithms and Applications, Kingston, Ontario, Canada, 2001, pp. 289−298. [5] H. Cao, X. Ding and C. Liu, “Rectifying the Bound Document Image Captured by the Camera: A Model Based Approach”, International Conference on Document Analysis and Pattern Recognition, Edinburgh Scotland, 2003, pp. 71-74. [6] H. Cao, X. Ding and C. Liu, “A Cylindrical Surface Model to Rectify the Bound Document Image,” Ninth IEEE International Conference on Computer Vision Volume 1, Nice, France, 2003, pp. 228233. [7] Z. Zhang and C.L. Tan, “Correcting Document Image Warping Based on Regression of Curved Text Lines,” Seventh International Conference on Document Analysis and Recognition Volume I, Edinburgh, Scotland, 2003, pp. 3-6. [8] M. Pilu, “Undoing Page Curl Distortion Using Applicable Surfaces,” Computer Vision and Pattern Recognition, Volume 1, Kauai, 2001, pp. 67-72. 144 [9] M. Pilu, “Undoing Page Curl Distortion Using Applicable Surfaces,” International Conference on Image Processing, Greece, 2001, pp. 237-240. [10] Z. Zhang and C.L. Tan, “Straightening Warped Text Lines Using Polynomial Regression,” IEEE 2002 International Conference on Image Processing, Rochester, NY, USA, 2002, pp. 22-25. [11] P. Clark and M. Mirmehdi, “Recognizing Text in Real Scenes,” International Journal of Document Analysis and Recognition, vol. 4, no. 4, 2002, pp.243–257. [12] M. Pilu, “Extraction of Illusory Linear Clues in Perspectively Skewed Documents,” IEEE Computer Vision and Pattern Recognition Conference, Kauai, USA, 2001, pp. 363-368. [13] C. R. Dance, “Perspective Estimation for Document Images,” SPIE Conference on Document Recognition and Retrieval IX. San Jose, CA, 2002, pp. 244–254. [14] P. Clark and M. Mirmhedi, “Rectifying Perspective Views of Text in 3D Scenes using Vanishing Points,” Pattern Recognition, vol. 36, 2003, pp. 2673–2686. [15] G. Myers, R. Bolles, Q. T. Luong, and J. Herson, “Recognition of Text in 3-D Scenes,” Fourth Symposium on Document Image Understanding Technology, Columbia, Maryland, 2001, pp. 8599 [16] P. Clark and M. Mirmehdi, “Combining Statistical Measures to Find Image Text Regions,” Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain, 2000, pp. 450–453. [17] P. Clark and M. Mirmehdi, “Estimating the orientation and recovery of text planes in a single image,” Proceedings of the 12th British Machine Vision Conference, Manchester, UK, 2001, pp. 421–430. [18] P. Clark and M. Mirmehdi, “Location and Recovery of Text on Oriented Surfaces,” SPIE conference on Document Recognition and Retrieval VII, 2000, pp. 267-277. [19] P. Clark and M. Mirmehdi, “Finding Text Regions Using Localised Measures,” Proceedings of the 11th British Machine Vision Conference, Bristol, UK, 2000, pp. 675-684. [20] O. Okun, M. Pietikainen and J. Sauvola, “Document Skew Estimation without Angle Range Restriction,” International Journal on Document Analysis and Recognition, Bangalore, India, 1999, vol. 2, pp.132–144. [21] H. K. Kwag, S. H. Kim and S. H. Jeong, G. S. Lee, “Efficient Skew Estimation and Correction Algorithm for Document Images,” Image and Vision Computing, vol. 20, 2002, pp. 25-35. 145 [22] L. O’Gorman, “The Document Spectrum for Page Layout Analysis,” IEEE Trans. on Pattern Analysis Machine Intelligence, vol. 15, no. 11, 1993, pp. 1162–1173. [23] S. N. Shihari and V. Govindaraju, “Analysis of Textual Images using the Hough Transform,” Machine and Vision Application, vol.2, 1989, pp. 141-153. [24] J. Hinds, L. Fisher and D. P. D’Amato, “A Document Skew Detection Method using Run-length Encoding and the Hough Transform,” International Conference on Pattern Recognition, Los Alamitos, CA, 1990, pp. 464-468. [25] D.S. Le, G.R. Thoma and H. Wechsler, “Automated Page Orientation and Skew angle Detection for Binary Document Image,” Pattern Recognition, vol. 27, no. 10, 1994, pp. 1325-1344. [26] B. Yu and A. K. Jain, “A Robust and Fast Skew Detection Algorithm for Generic Documents,” Pattern Recognition, vol. 29, no. 10, 1996, pp. 1599–1629. [27] H. Jiang, C. Han and K. Fan, “A Fast Approach to the Detection and Correction of Skew Documents,” Pattern Recognition Letters, vol. 18, 1997, pp. 675–686. [28] J. Wang, M. K. H. Leung and S. C. Hui, “Cursive Word Reference Line Detection,” Pattern Recognition, vol. 30, no. 3, 1997, pp. 503–511. [29] P. Y. Yin, “Skew Detection and Block Classification of Printed Documents,” Image and Vision Computing, vol. 19, no. 8, 2001, pp.567–579. [30] W. Postl, “Detection of Linear Oblique Structures and Skew Scan in Digitized Documents,” International Conference on Pattern Recognition, Los Alamitos, CA, 1986, pp. 687–689. [31] G. Ciardiello, G. Scafuro, M. T. Degrandi, M. R. Spada and M. P. Roccotelli, “An Experimental System for Office Document Handling and Text Recognition,” International Conference on Pattern Recognition, Rome, Italy, 1988, pp. 739–743. [32] T. Akiyama and N. Hagita, “Automated Entry System for Printed Documents,” Pattern Recognition,” vol. 23, 1990, pp. 1141-1154. [33] G. S. Peake and T. N. Tan, “A General Algorithm for Document Skew Angle Estimation,” IEEE International Conference on Image Processing 2, Washington DC, 1997, pp. 230–233. [34] C. Sun and D. Si, “Skew and Slant Correction for Document Image using Gradient Direction,” International Conference on Document Analysis and Recognition, Ulm, Germany, 1997, pp. 170–174. 146 [35] J. Kanai and A. D. Bagdanov, “Projection Profile based Skew Estimation Algorithm for JBIG Compressed Images,” International Journal of Document Analysis and Recognition, vol. 1, 1998, pp. 43–51. [36] T. Steiherz, N. Intrator and E. Rivlin, “Skew Detection via Principal Component Analysis” International Conference on Document Analysis and Recognition, Bangalore, India, 1999, pp. 153– 156. [37] A. Hashizume, P.S. Yeh and A. Rosenfeld, “Method of Detecting the Orientation of Aligned Components,” Pattern Recognition, vol. 4, no. 3, 1986, pp. 125–132. [38] H. Yan, “Skew Correction of Document Images using Interline Cross-correlation,” Computer Vision, Graphics and Image Processing, vol.55, no.6, 1993, pp. 538–543. [39] B. Gatos, N. Papamarkos and C. Chamzas, “Skew Detection and Text Line Position Determination in Digitized Documents,” Pattern Recognition, vol. 30, no. 9, 1997, pp. 1505–1519. [40] M. Chen and X. Ding, “A Robust Skew Detection Algorithm for Grayscale Document Image,” International Conference on Document Analysis and Recognition, Bangalore, India, 1999, 617– 620. [41] J. J. Hull, “Document Image Skew Detection: Survey and Anotated Bibliography,” Document Analysis Systems II, World Scientific, 1998, pp. 40–64. [42] R. M. Haralick, Monocular Vision using Inverse Perspective Projection Geometry: Analytic Relations, IEEE International Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 1989, pp. 370–378. [43] V. Bruce and P. R. Green. Visual Perception, 2nd edition, 1991. [44] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2000. [45] http://www.ocr.com/download.shtml [46] D. Carmo, Differential Geometry of Curves and Surfaces, Prentice Hall, 1976. [47] A. Blake and A. Zisserman, Visual Reconstruction, MIT Press, Cambridge, MA, 1987. [48] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, MIT Press, Cambridge, MA, 1993. 147 [49] D. Terzopoulos and K. Fleischer, “Modeling Inelastic Deformations Viscoelasticity, Plasticity, Fracture”, International Conference on Computer Graphics and Interactive Techniques, Atlanta, Georgia, USA, 1988, pp. 269-278. [50] D. Terzopoulos, J. C. Platt and A. H. Barr, “Elastically Deformable Models”, International Conference on Computer Graphics and Interactive Techniques, Anaheim, California, USA, 1987, pp. 205-214. [51] L. O'Gorman and R. Kasturi. Document Image Analysis, IEEE Computer Society Press, Los Alamitos, CA, 1994 [52] H. Bunke, P.S.P. Wang and H.S. Baird, Document Image Analysis, World Scientific, Singapore, 1994 [53] G. Nagy, “Twenty Years of Document Image Analysis in PAMI”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, No. 1, 2000, pp. 38-62. [54] H. Baird, “Difficult and Urgent Open Problems in Document Analysis for Libraries”, International Workshop on Document Image Analysis for Libraries, Los Alamitos, CA, 2004, pp. 25-32. [55] H. Baird, “Digital Libraries and Document Image Analysis,” International Conference on Document Analysis and Recognition Volume 1, Edinburgh, Scotland, 2003, pp. 2-14 [56] M. Junker, R. Hoch and A. Dengle, “On the Evaluation of Document Analysis Components by Recall, Precision and Accuracy”, International Conference on Document Analysis and Recognition, Bangalore, India, 1999, pp. 713-716. [57] O. E. Agazzi and S. S. Kuo, “Hidden Markov Model Based Optical Character Recognition in the Presence of Deterministic Transformations,” Pattern Recognition, vol. 26, no. 12, 1993, pp. 18131826. [58] Y. Yasuda, K. Yamamoto, and H. Yamada, “Effect of the Perturbed Correlation Method for Optical Character Recognition,” International Conference on Document Analysis and Recognition, Tsukuba, Japan, 1993, pp. 830-833. [59] T. M. Ha and H. Bunke, “Off-Line, Handwritten Numeral Recognition by Perturbation Method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 5, 1997, pp. 535539. 148 [60] T. Wakahara and K. Odaka, “Adaptive Normalization of Handwritten Characters Using Global/Local Affine Transformation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, 1998, pp. 1332-1341. [61] S. Di Zenzo, M. Del Buono, M. Meucci and A. Spirito, “Optical Recognition of Hand-printed Characters of Any Size, Position, and Orientation,” IBM Journal of Research and Development, vol. 36 , no. 1992, pp. 487-501. [62] T. Wakahara, Y. Kimura and A. Tomono, “Affine-Invariant Recognition of Gray-Scale Characters Using Global Affine Transformation Correlation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, No. 4, 2001, pp. 384-395. [63] T. M. Ha and H. Bunke, “Handwritten Numeral Recognition by Perturbation Method,” Proc. Fourth Int’l Workshop Frontiers of Handwriting Recognition, Taipei, Taiwan, 1994, pp. 97-106. [64] T. Wakahara, “Shape Matching Using LAT and Its Application to Handwritten Numeral Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 6, 1994, pp. 618-629. [65] O. E. Agazzi, S. Kuo, E. Levin and R. Pieraccini, “Connected and Degraded Text Recognition using Planal Hidden Markov Models,” Proc. IEEE Int. Conf. Acoust., Speech, & Sig. Processing, Minneapolis, 1993. [66] T. Wakahara, Y. Kimura, “Affine-invariant Gray-scale Character Recognition Using GAT Correlation,” International Conference on Pattern Recognition, Barcelona, Catalonia, Spain, 2000, pp. 417-421. [67] T. M. Ha and H. Bunke, “Design, Implementation, and Testing of Perturbation Method for Handwritten Numeral Recognition,” Technical Report IAM-96-014, Institute of Computer Science and Applied Mathematics, University of Berne, Switzerland, 1996. [68] A. Rosenfeld, D. Doermann and D. DeMenthon, “Video Mining,” Kluwer Academic Publishers, 2003. [69] T. Sato, T. Kanade, E. K. Huges, M. A. Smith and S. Satoh, “Video OCR: Indexing Digital News Libraries by Recognition of Superimposed Caption,” ACM Multimedia Systems, vol. 7, no. 5, 1999, pp. 385-395. [70] J. Wang and J. Jean, “Segmentation of Merged Characters by Neural Networks and Shortest Path,” Pattern Recognition, vol. 27, No. 5, 1994, pp. 649-658. 149 [71] S. W. Lee, D. J. Lee and H. S. Park, “A New Methodology for Gray-Scale Character Segmentation and Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, No. 10, 1996, pp. 1045-1050. [72] Y. Lu and M. Shridhar, “Character Segmentation in Handwritten Words: An Overview,” Pattern Recognition, vol. 29, No. 1, 1996, pp. 77-96. [73] Z. X. Shi and V. Govindaraju, “Segmentation and Recognition of Connected Handwritten Numeral Strings,” Pattern Recognition, vol. 30, No. 9, 1997, pp. 1501-1504. [74] S. Liang, M. Ahmadi and M. Shridhar, “Segmentation of Handwritten Interference Marks Using Multiple Directional Stroke Planes and Reformalized Morphological Approach,” IEEE Transactions on Image Processing, vol.6, No. 8, 1997, pp. 1195-1202 [75] J. H. Bae, K. C. Jung, J. W. Kim and H. J. Kim, “Segmentation of Touching Characters Using an MLP,” Pattern Recognition Letters, vol. 19, No. 8, 1998, pp. 701-709. [76] D. G. Yu and H. Yan, “Separation of Single-touching Handwritten Numeral Strings based on Structural Features,” Pattern Recognition, vol. 31, No. 12, 1998, pp. 1835-1847. [77] B. Yanikoglu and P. A. Sandon, “Segmentation of Off-line Cursive Handwriting using Linear Programming,” Pattern Recognition, vol. 31, No. 12, 1998, pp. 1825-1833. [78] D. G. Yu and H. Yan, “Separation of Touching Handwritten Multi-numeral strings based on Morphological Structural Features,” Pattern Recognition, vol. 34, No. 3, 2001, pp. 587-599. [79] Y. Nakajima, S. Mori, S. Takegami and S. Sato, “Global Methods for Stroke Segmentation,” International Journal on Document Analysis and Recognition, vol. 2, No. 1, 1999, pp. 19-23. [80] R. Azmi and E. Kabir, “A New Segmentation Technique for Omnifont Farsi Text,” Pattern Recognition Letters, vol. 22, No. 2, 2001, pp. 97-104. [81] U. Pal, A. Belaïd and C. Choisy, “Touching Numeral Segmentation using Water Reservoir Concept,” Pattern Recognition Letters, vol. 24, No. 1-3, January 2003, pp. 261-272. [82] J. Park, “An Adaptive Approach to Offline Handwritten Word Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, 2002, pp. 920-931. [83] J. Park and V. Govindaraju, “Using Lexical Similarity in Handwritten Word Recognition,” International Conference on Computer Vision and Pattern Recognition, Hilton Head Island, South Carolina, 2000, pp. 290-295. 150 [84] J. Park and V. Govindaraju, “Use of Adaptive Segmentation in Handwritten Phrase Recognition,” Pattern Recognition, vol. 35, no. 1, 2002, pp. 245-252. [85] J. T. Favata, “Offline General Handwritten Word Recognition Using an Approximate BEAM Matching Algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, No. 9, 2001, pp. 1009-1021. [86] T. K. Ho, “Stop Word Location and Identification for Adaptive Text Recognition,” International Journal of Document Analysis and Recognition, vol. 3, no. 1, 2000, pp. 16-26. [87] A. El-Yacoubi, M. Gilloux and R. Sabourin, “An HMM-Based Approach for Off-Line Unconstrained Handwritten Word Modeling and Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 8, 1999, pp. 752-760. [88] T. K. Ho, J. J. Hull and S. N. Srihari, “A Word Shape Analysis Approach to Recognition of Degraded Word Images,” Pattern Recognition Letters, vol. 13, 1992, pp. 821-826. [89] A. Goshtasby and R. W. Ehrich, “Contextual Word Recognition Using Probabilistic Relaxation Labeling,” Pattern Recognition, vol. 21, no.5, 1988, pp. 455-462. [90] A. L. Spitz, “Shape-based Word Recognition,” International Journal of Document Analysis and Recognition, vol. 1, no. 4, 1998, pp. 178-190. [91] A. Kundu and Y. He, “On Optimal Order in Modeling Sequence of Letters in Words of Common Language as a Markov Chain,” Pattern Recognition, vol. 24, no. 7, 1991, pp. 603-608. [92] C. B. Bose and S. S. Kuo, “Connected and Degraded Text Recognition Using Hidden Markov Model,” Pattern Recognition, vol. 27, no. 10, 1994, pp. 1345-1363 [93] C. C. Yen and S. S. Kuo, “Degraded Gray-Scale Text Recognition Using Pseudo-2D Hidden Markov-Models and N-Best Hypotheses," Graphical Models and Image Processing, vol. 57, no. 2, 1995, pp. 131-145. [94] O. E. Agazzi and S. S. Kuo, “Visual Keyword Recognition using Hidden Markov Models,” International Conference on Computer Vision and Pattern Recognition, New York, 1993, pp. 329-334. [95] S. S. Kuo and O. E. Agazzi, “Keyword Spotting In Poorly Printed Documents Using Pseudo-2D Hidden Markov-Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 8, 1994, pp. 842-848. [96]. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, Inc., New York, 1973. 151 [97] W. Niblack, An Introduction to Image Processing, Prentice Hall, pp. 115-116, 1986. [98] O. D. Trier and T. Taxt, “Evaluation of Binarization Method for Document Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 3, 1995, pp. 312-315. [99] C. Wolf, J. M. Jolion and F. Chassaing, “Text Localization, Enhancement and Binarization in Multimedia Documents,” International Conference on Pattern Recognition, vol. 4, Québec City, Canada, 2002, pp. 1037-1040. [100] Y. Liu and S. N. Srihari, “Document Image Binarization based on Texture Features,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 19, no. 5, 1997, pp. 540-544. [101] T. Pavlidis, “Threshold Selection using Second Derivatives of the Grayscale Image,” International Conference on Document Analysis and Recognition, Tsukuba City, Japan, 1993, pp. 274277. [102] A. Rosenfeld and R. C. Smith, “Thresholding using Relaxation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 3, no. 5, 1981, pp. 598-606. [103] R. Lienhart and A. Wernicke, “Localizing and Segmenting Text in Images, Videos and Web Pages,” IEEE Transactions on Circuits and Systems for Video Technology, Vol.12, No. 4, 2002, pp. 256 -268. [104] V. Wu, R. Manmatha and E.M. Riseman, “Textfinder: An Automatic System to Detect and Recognize Text in Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, Issue 11, 1999, pp. 1224-1229. [105] A. K. Jain and B. Yu, “Automatic Text Localization in Images and Video Frames,” Pattern Recognition, vol. 31, no. 12, 1998, pp. 2055-2076. [106] V. Wu, R. Manmatha and E. M. Riseman, “Finding Text in Images,” Proceedings of the 2nd ACM International Conference on Digital Libraries, Pennsylvania, United States, 1997, pp. 3-12. [107] H. K. Kim, “Efficient Automatic Text Location Method and Content-Based Indexing and Structuring of Video Database,” Journal of Visual Communication and Image Representation, vol. 7, no. 4, 1996, pp. 336-344 [108] H. M. Suen and J. F. Wang, “Text String Extraction from Images of Color-Printed Documents,” IEE Proceedings on Vision and Image Signal Process, vol.143, no. 4, 1996, pp. 210-216 [109] Pierre Soille, Morphological Image Analysis: Principles and Applications, 2nd edn., Springer Verlag, Berlin, New York, 2003. 152 [110] G. Worlberg, Digital Image Warping, IEEE Computer Society Press, Los Alamitos, California, 1990. [111] L. Bobrowski and J. Bezdek, “C-means Clustering with the L1 and L∞ norms,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, 1991, pp. 545-554. [112] R. Jain, R. Kasturi and B. G. Schunck, Machine Vision, McGraw-Hill, New York, 1995. [113] W. B. Irving, Applied Statistical Methods, New York, Academic Press, 1974. [114] L. A. Zadeh, Calculus of Fuzzy Restrictions, Fuzzy Sets and Their Application to Cognitive and Decision Making Processes, Academic Press, San Diego, 1975 [115] H. J. Zimmermann and P. Zysno, “Latent Connectives in Human Decision Making,” Fuzzy Sets and Systems, vol. 4, 1980, pp. 37-51. [116] http://www.scansoft.com/omnipage/ 153 [...]... so they provide an alternative channel for document capture and understanding Figure 1.1: Document images with perspective and geometric distortions: (a) document images with perspective distortion; (b) document images with geometric distortion 1.2 Investigated Approaches 1.2.1 Introduction This thesis presents a set of algorithms designed for the rectification and recognition of distorted document images. .. a single document image captured by a digital camera Development of a new rectification- recognition framework that is able to perform the rectification and recognition of document text with perspective and geometric distortions Design of a document text recognition system that is able to recognize document text with perspective and geometric distortions with no rectification Establishment of a fuzzy... [68, 69] The character classification techniques that are tolerant of perspective and geometric distortions will be much more preferred, even with a bit lower recognition rate 2 The work presented in this thesis mainly addresses the rectification and recognition of document images captured using a digital camera Several document image rectification models are proposed and they are able to rectify document. .. for text conversion The second approach skips the rectification process and schemes to recognize the distorted document text with no rectification 1.2.2 Document Image Rectification In this thesis, three types of document distortions including rotation-induced skew, perspective distortion, and geometric distortion are studied I propose to detect and correct these three types of distortions using identified... into two phases: document image analysis and document image understanding [53-56] Document analysis normally performs the overall interpretation of logical structure and physical layout of document images It is normally regarded as a preprocessing step before the document image understanding, which handles the final recognition of captured document text based on the analysis results in the first stage... rectify and recognize document images with rotation-induced skew, perspective, and geometric distortions The basic concepts of skew, perspective and geometric distortions are described Hence, different rectification and recognition techniques are reviewed In Chapter 3, the rotation-induced skew is detected and corrected Characters that belong to different text lines are firstly classified based on the. .. document images captured using a document scanner or digital camera 3 Tow techniques are proposed to convert the captured document images to electronic text that can be edited and retrieved through a computer With the first approach, captured document images with skew, perspective, and geometric distortions are firstly rectified and the rectified document images are then fed to the existing generic OCR systems... considered As the research work presented in this thesis mainly focuses on the rectification and recognition of document text captured using a digital camera, the review is divided into two parts, which review the rectification and recognition separately 2.2 Document Image Rectification A large number of document distortion detection and correction techniques have been reported in the literature Most of early... surface geometries of the bound document captured using a digital camera The mathematical relation between three-dimension document surface points and the points on two-dimension image plane is firstly determined based on the geometry of camera imaging Then, baselines of the horizontal text line are extracted and the bending extent of distorted document surface is estimated The problem of the proposed algorithm... process With the straight part of text lines as a reference, the curved part of text line is modeled with a quadratic based on the connected component analysis and polynomial re- 16 gression [10] The proposed algorithm assumes that document text is scanned horizontally and so the straight part of text lines lies on a horizontal straight line Therefore, it cannot handle document images with perspective and . for document capture and understanding. Figure 1.1: Document images with perspective and geometric distortions: (a) document images with perspective distortion; (b) document images with geometric. distorted document images show the recognition rate with the proposed technique reaches over 93%. x List of Figures 1.1 Document images with perspective and geometric distortions: . perspective and geometric distortions.  Design of a document text recognition system that is able to recognize document text with perspective and geometric distortions with no rectification.

The rectification and recognition of document images with perspective and geometric distortions

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan