experimental analysis of neural network based feature extractors for

6 285 0
experimental analysis of neural network based feature extractors for

Đang tải... (xem toàn văn)

Thông tin tài liệu

Experimental Analysis of Neural Network Based Feature Extractors for Cursive Handwriting Recognition Ling Gang, Brijesh Verma and Siddhi Kulkarni School of Information Technology, Griffith University-Gold Coast Campus PMB 50, GCMC, Qld 9726, Australia E-mail: B.Verma@mailbox.gu.edu.au, S.Kulkarni@mailbox.gu.edu.au Web: http://intsun.int.gu.edu.au ABSTRACT Artificial neural networks have been widely used in many real world applications including classification of cursive handwritten segmented characters. However, the feature extraction ability of MLP based neural networks has not been investigated properly. In this paper, a new MLP based approach such as an auto-associator for feature extraction from segmented handwritten characters is proposed. The performance of Auto-Associator (AA), Multilayer Perceptron (MLP) and Multi-MLP as a feature extractor have been investigated and compared. The results and detailed analysis of our investigation are presented in the paper. 1. INTRODUCTION 1.1 Motivations and aims of the research There are a number of classification techniques widely used by researchers in many real world applications. However, there are a very few researchers who have tried MLP based neural networks as a feature extractor. The need for research to further improve and embellish current character recognition techniques has been widely recognized. It is also recognised that the types of feature extractors used contribute to some of the errors caused. Therefore a need can be seen to find a new feature extractor and investigate the NN-based feature extraction techniques, to show which are indeed the best and most efficient techniques to use. 1.2 Background There are only a few empirical comparative studies of NN-based feature extraction paradigms have been made. The paradigms in Mao and Jain [1] are compared only for exploratory data projection and two-dimensional classification and in Lerner et al. [2] only for one database. In the research carried out by Boaz Lerner, Hugo Guterman, Mayer Aladjem [3]; the complex architectures of more than two layers were not considered as candidates for the classifier and the number of output units was three, which is quite small. The comparative studies of different MLP-based feature extraction have not been considered yet. So it is necessary to do more over this issue, the primary aim of this research is to investigate the feature extraction ability of Auto-Associator, MLP and Multi-MLP to determine which one is more suitable and reliable to be used in real-world handwriting character recognition systems. The origins of character recognition [4-6] can be found as early as 1870. It first appeared as an aid to the visually handicapped, and the first successful attempt was made by the Russian scientist Tyurin in 1900 [7]. From then on, many papers about neural networks [8-15] and applications have been presented and widely used in pattern recognition areas, the modern version of character recognition appeared in the middle of the 1940s with the development of digital computers. Thenceforth it was realized as a data processing approach with application to the business world. The principal of motivation for the development of character recognition is the need to cope with enormous quantities of paper such as bank checks, commercial forms, government records, credit card imprints and mail sorting generated by the expanding technological society. Presently, the methodologies in character recognition have advanced from the earlier use of primitive techniques for the recognition of machine printed numerals and a limited number of English letters to the application of sophisticated techniques for the recognition of a wide variety of complex handwriting characters, symbols and word/script. 1.3 Organization of the paper This paper consists of five sections. Section 1 presents the motivations and background. Section 2 details research methodology, proposing and describing the methods that were employed in this research. Section 3 details experimental results, listing the results obtained during the experiments. Section 4 provides a discussion, an analysis of the experimental results and compares the three different techniques that have been investigated. Section 5 provides conclusions that have been drawn from this research. 2. PROPOSED RESEARCH METHODOLOGY The Figure 1 below outlines the proposed research methodology and it is described in the sections below. 0-7803-7278-6/02/$10.00 ©2002 IEEE Figure 1. Block diagram of research methodology 2.1 Character acquisition and preprocessing Before experiments could be carried out, there was a need to process the original images. The techniques employed to prepare input files for various techniques are discussed in the following sections. 2.1.1 Character database acquisition The training and test characters/words used in this research came from the following directories on CEDAR CD-ROM (Benchmark Database): TRAIN/BINANUMS/BD/* TEST/BINANUMS/BD/* TRAIN/CITIES/BD/* TEST/CITIES/BD/* All the images were black & white lower case characters and stored in PBM format. All useless white spaces around the images have been removed. 2.1.2 Character resizing Resizing was the first technique used to process the images. The resizing process partially employed an existing program written by R. Crane [16] using the C programming language, which was modified by us, and all the images were resized to 30 rows by 40 columns. 2.1.3 Chain code feature extraction For all the training and test characters, the character images were first boundaried, all the pixels of each image were changed to the background colour, except the outmost ones of the image. The images were then processed using chain code technique with 8 directions. After chain coding, each image was divided into some small sub-images whose size was 10 rows by 10 columns. The numbers of each direction in a single sub- window were added, and these numbers were recorded to be used later. After all the images were chain coded, all the numbers were divided by the biggest one among them to create the inputs whose maximum value was 1 and minimum value was 0, each character had 12 * 8 =96 inputs. 3. EXPERIMENTAL METHOD There were a total of three character feature extraction/recognition techniques investigated in this research: AA, MLP and Multi-MLP. The BP algorithm was employed as a common training algorithm. The networks used were feed-forward neural networks. There was only one hidden layer for all kinds of networks in this research. The number of neurons in the input layer of all these extractor was governed by the size of the sub- windows of the training characters. Training character matrices had 30 rows by 40 columns, and the sub- window had the size 10 rows by 10 columns. So each image had 12 sub-windows, since each sub-window contributed 8 elements, the number of units in the input layer was 96. 3.1 Auto-associator (AA) feature extractor An AA, as its name implies, is a network that learns an input-output mapping such that the output is the same as the input, in other words, the target data set is identical to the input data set. Hence, an AA has configuration of d: m: d with d units in both the input and output layers and m<d in the hidden layer. The dimensionality of the input and output is obviously the same and the network is trained, using the error back propagation algorithm, to generate output vectors o as close as possible to the input vectors x by again minimizing the mean squared error over all patterns in the training set: ∑∑ = = −= p p k k p k p k xo E 1 1 2)(2/1 where o p k represents the kth output of the nth input vector x n =( x n 1 …. x n k , … x n d ) and N is the number of training patterns. The key aspect of the auto associative MLP is that the number of hidden units at the center of the network is usually chosen to be much smaller than the input/output dimensionality. As a result of this bottleneck, the hidden units extract a low-dimensional representation of the input data and such a network can therefore be used for feature extraction. 0-7803-7278-6/02/$10.00 ©2002 IEEE To be consistent with all the neural networks being compared in this research, the training file was set to the same format, they all consisted of two parts. The first part was obtained by chain coding as described previously. For the AA, the second part was exactly the same as the first part. 3.2 MLP The values of the MLP’s learning rate and momentum were both set to 0.1 as those of the AA. The number of inputs and outputs were set to 96 and 24, respectively. The number of hidden units of MLP was set to 26. The input vector of the MLP was obtained by employing the chain code feature extractor as described previously, the second part was the output vector. This part indicates to the network which class the current character belongs to. 3.3 Multi-MLP For the Multi-MLP feature extractor, the situation was different. The possible outputs governed the number of neural networks instead of the number of output units. Each neural network had only 2 units in the output layer, in other words, each neural network was dedicated to recognize one letter. The hidden units of these networks were all set to 26. The input vector to the Multi-MLP was the same as for the AA and MLP, but the desired output of the Multi- MLP only had two classes. 3.4 Criteria for training termination When using the back propagation algorithm, the usual criterion for termination is the reduction of the Root Mean Squared (RMS) error to an acceptable level. There is no general level for the RMS error, however, the smaller, the better. It was found in experiments, the convergence of the RMS error was very slow, therefore, another criterion for termination that was considered, was to set the network to stop training after a certain number of iterations. 4. EXPERIMENTAL RESULTS The proposed approaches were all implemented and run on SP2 supercomputer and on the NIAS (UNIX) machine at Griffith University. SP2 is an IBM-based product that consists of eight RS/6000 390 machines and 14 RS/6000 590 machines connected by a high speed switch. The operation system is UNIX. The programming language used for implementation was C. This section shows the results obtained by using the three neural network based character feature extraction techniques. In this research, the training data set consists of 16 lowercase letters: a, b, c, d, e, h, i, l, m, n, o, r, s, t, u, x. After training, the three techniques were assessed by their classification rates for both training data and test data. There were many experiments conducted, all of them were very time consuming. Sometime it took more than several days. Only the most relevant results shall be shown in this and following sections. 4.1 Preliminary results Before training these networks with 16 letters, some preliminary experiments were conducted. In these experiments, only the AA and the MLP were trained and, instead of 16 letters, only four letters were chosen: a, c, d, e. All training characters were hand printed characters. There were 96 training pairs. Both the MLP and the AA were trained with 2000 iterations. The number of test characters was 181. The results are displayed in Table 1 below. There were two AAs were trained, one AA had 26 hidden units and the other AA had 96 hidden units. As can be seen from Table 1, the classification rates of the training data were 100%. The classification rates for the test data were quite high as well. All of them were between 82% and 87%. Of course, the classification would be lower than these figures if the number of classes increased, but these figures were very promising for doing more research. Subsequent experiments used more classes and larger datasets, more and more images were segmented from the handwritten words, which were very hard to recognize, so that these ANNs could be trained with a more diverse and challenging training set. TABLE 1. PRELIMINARY RESULTS (#inputs: 96) Classification Rate (%) MLP /AA No. of hidden units No. of outp uts RMS Training set Test set MLP 26 4 0.01252 100 82.3 AA 26 96 0.03259 100 86.7 AA 96 96 0.02989 100 83.9 4.2 Classification rates for the MLP After doing some preliminary experiments, the number of letters in the training and test database were increased from 4 to 16. The numbers of characters in training and test database were increased as well. The MLP based network is a very popular character recognition technique and has been widely used in many fields. So the major comparison of work was conducted between the MLP and the AA. The results in Table 1 (rows 1, 2) were obtained by training the MLP with 352 hand printed characters. The test data set presented to the MLP was 280 hand printed characters. The MLP was trained 0-7803-7278-6/02/$10.00 ©2002 IEEE with 2000 and 5000 iterations, respectively. As can be seen from Table 2, the classification rate for the training set was 99.7% for two MLPs, near 100%. Test sets of two MLPs obtained high figures as well: 78.2%. TABLE 2. CLASSIFICATION RATES USING MLP (#inputs: 96, #outputs: 24, #hidden units: 26) Classification Rate (%) Training pairs No. of iterations RMS Training set Test Set 352 2000 0.0141 99.7 78.2 352 5000 0.0118 99.7 78.2 506 5000 0.0184 94.4 52.8 656 5000 0.0604 91.5 55.1 951 5000 0.0678 94.2 60 The results in Table 2 (rows 3,4,5) were obtained by adding more and more cursive characters to the previous training data set. At the beginning, there were no cursive characters, there were only 352 hand-printed characters in the training data set. The number of cursive characters was increased from 0 to 144, 304 and 599. Accordingly, the number of training characters increased from 352 to 506, 656, 951. TABLE 3. CLASSIFICATION RATES FOR TWO MLPS Classification Rate (%) Trai ning pairs No. of Iteratio ns RMS Training set Test set Test (top5) 656 5,000 0.0604 92.3 60.2 79.8 614 5,000 0.0505 93.9 60.8 86.1 The above table (Table 3) contains the classification rates for two MLPs. The result in first row was trained by all 16 letters including l, but tested without l. The second one was trained and tested without l: 4.3 Classification rates for the AA The comparison of performance between the AA and the MLP was the major aim of this research. The results in Table 4 (rows 1,2,3,4) were obtained by training the AA with 352 characters. The test data set presented to the AA was 280 characters. To achieve better results, two AAs with a different number of hidden units were trained. One AA had 26 hidden units, so the structure of its classifier was 26 inputs, 26 hidden units and 24 output units. The second AA had 96 hidden units, accordingly, the structure of its classifier was 96 input units, 26 hidden units and 24 output units. The two kinds of AA were trained for 2000 and 5000 iterations respectively. TABLE 4. CLASSIFICATION RATES FOR AA (#inputs: 96, #outputs: 96) Classification Rate(%) No. of training chars No. of hidden units No. of Itera tions RMS Training set Test set 352 26 2000 0.0240 97.4 75 352 26 5000 0.0317 98.9 76.1 352 96 2000 0.0240 99.4 76.8 352 96 5000 0.0180 99.7 78.2 506 96 5000 0.0482 95.1 53.3 656 96 5000 0.0592 94.4 56.7 951 96 5000 0.0650 94.3 61.9 As can be seen from the above table, the AA with 96 hidden units outperforms the AA with 26 hidden units. Therefore the subsequent experiments were only conducted using an AA with 96 hidden units. The results listed in Table 4 (last 3 rows) were obtained by adding the previous training and test data sets, which contained only hand printed characters, to some cursive characters from data set B as used in the case with the MLP. The number of iterations for all experiments was 5000. The number of hidden units of AA was 96. The classifier had 26 hidden units. The training pairs were increased from 352 to 506, 656 and 951. The test set had 1,056 training pairs. As can be seen from the above table, the classification rates increased when the number of training pairs increased. Of course, since the number of training iterations for each AA was 5000, the classification rate for the training set decreased when the number of training pairs increased. As the RMS error increased, the classification rates of the training set decreased, in this case, from 95.1% to 94.3%. Compared with its MLP counterpart, some letters obtained higher classification rates. For example, letter l obtained a 17.9% increase in classification rate, whereas letter a decreased by 11.9%. No changes to the classification rate of letter x. In total 8 letters of the AA obtained higher classification rates than the MLP, 7 letters obtained lower classification rates, they were letters a, b, d, e, h, i, m, respectively. 4.4 Classification rates for the Multi-MLP Multi-MLP was the one that had 16 neural networks each of which had only two classes. The training files were different to each of these networks. Each network was 0-7803-7278-6/02/$10.00 ©2002 IEEE designed to respond to a particular letter. Each network had only two classes. The first one was trained to respond to a given letter. The other one was trained to respond to the rest of the letters. Since it was very time consuming, only one experiment was conducted so far: the number of training pairs for each network was 951, the same as for the MLP and the AA. Some of its testing results are listed in Table 5. TABLE 5. CLASSIFICATION RATES OF MULTI-MLP (#inputs: 96, #outputs: 2, #hidden units: 26) Classification Rate (%) Multi -MLP RMS Training set Test set Test (top5) A 0.072687 93.5 56.3 81.0 B 0.065042 92.9 50.0 75.0 C 0.056367 96.7 65.4 94.3 D 0.046068 96.7 81.3 100 E 0.072770 90.3 57.6 79.7 H 0.117057 80.6 52.8 88.9 I 0.08604 90.3 57.9 86.3 L 0.12983 74.2 31.9 76.8 M 0.032646 98.4 72.7 95.5 N 0.102719 83.9 46.3 73.1 O 0.010263 83.9 67.0 91.3 R 0.072746 93.5 56.0 93.3 S 0.079608 90.3 56.9 81.5 T 0.046014 96.8 75.7 98.6 U 0.056317 95.2 64.7 100 X 0.056279 85.2 66.7 83.3 These results were obtained by training the multi-MLP with the same training characters as those used to train MLP and AA and test characters were the same as those of MLP and AA as well. But the number of classes of each network was only 2 rather than 24. The number of iterations for all networks was 5000. As can be seen, the RMS for different MLP were varied dramatically, and the classification rates of these MLP were also different, the highest one was 81.3%, the lowest one was 31.9%. 5. DISCUSSION 5.1 Classification rate As can be seen from the tables in previous section, among three feature extraction techniques, the AA provided the top results for character recognition: 61.9% for the test set and 94.3% for the training set, The second best one was the MLP, followed by Multi-MLP, whose classification rate for the training set was 94.1%, and the test set was 57.1%. From the experiments we can observe that increasing the number of training pairs for the AA or the MLP is a good way to increase the classification rates, but the characters added were all cursive characters. Does that increase the classification rates of the hand printed character section which has 280 out of a total of 1056 characters in test dataset? The classification rates for only the 280 hand printed characters rather than the whole test set were calculated. From above, we can observe that when the number of cursive characters in the training dataset of the MLP increased from 0 to 599, the classification rates for printed characters only increased by 0.7%. Meanwhile, the classification rates for cursive characters dramatically increased from 30.4% to 53.2%. As can be seen, when the number of cursive characters in the training dataset of the AA increased from 0 to 599, the classification rates for printed characters only increased by 1.9%. Even so, it is better than the MLP ‘s 0.7% increase. Meanwhile, the classification rates for cursive characters dramatically increased from 26.2% to 55.9%, the percentage increase was 29.7%, while the MLP counterpart was 22.8%. 5.2 General problems with classification rates It was found that when the number of training pairs was increased to 951 and the number of iterations was set to 5,000, the best classification rate for handwritten characters using the three feature extraction techniques was less than 62%. Increasing the number of iterations did little to improve the classification rates. It was deduced that three main factors influenced the classification rates obtained by these techniques: Small training dataset, Difficulty of training and testing data, Resizing problems, Similarity of characters. 5.2.1 Small training data set As can be seen from the previous experiments, the classification rate increased when the number of training pairs increased. For example, when the number of training pairs increased by 150 from 506 to 656. The classification rate increased by 3.4 percent. When 295 more training pairs were added to obtain a total of 951, the classification rate increased by 5.2 percent accordingly. A 60 percent classification rate is quite high considering that a maximum of 951 training pairs was used. Due to time constraints, more training pairs were not employed. 5.2.2 Difficulty of training and testing data The other factor that influenced the recognition rate was the nature of the handwritten data. As the characters sampled were real world characters, it was found that the actual writing styles of two different people could be extremely diverse. Misclassification could easily occur by both human and automated systems. The diversity of characters was not only shown between people, but also between samples of the same person. This increased the difficulty of training and test data dramatically. The following characters (Figure 2) are some samples for lowercase b: 0-7803-7278-6/02/$10.00 ©2002 IEEE Figure 2. Examples of lowercase b The other important reason was that some training and test samples were segmented from handwritten words, sometimes they became very hard to recognize (Figure 3) even for a human. (a) (b) (c) Figure 3. Examples of segmented characters. (a) Lowercase r, (b) Lowercase s, (c) Lowercase x. 5.2.3 Resizing problems Because the sizes of training and test images were different from each other, all the characters were resized to the same size before being chain coded in order to attain more comparable features. However, one of the major disadvantages of resizing is that it can cause some of the character’s characteristics to be lost, and that may be critical for extracting features. 5.2.4 Similarity of characters After analyzing the tables describing target and actual outputs of each class, we found that some classes had very high classification rates, whereas some others got as low as 40%. After further analysis, it was found that some letters were easily recognized as some other particular letters. For example, the letter l was easily recognized as letter e or letter i. It has been found that about 20 percent of some letters were recognized as the other letters, such as 19.6 percent of l’s were recognized as e’s. 6. CONCLUSIONS We have investigated neural network based 3 feature extraction techniques. The Auto-associator feature extractor proposed by us, achieved the highest recognition rates. The highest rate for difficult handwritten characters from CEDAR benchmark database were approximately 61.9%. The classification rate was quite high considering it only used 951 training pairs and 5,000 iterations. The classification rates of the MLP were lower than those of the AA. The best classification result for handwritten characters was 60.0% (1.9% less than the AA). The recognition rates and overall performance of the Multi-MLP were the lowest of all three techniques tested. The highest classification rate it provided was 57.1% (4.8% lower than the AA and 2.9% lower than the MLP), and this method took the longest training time. References [1] J. Mao, A. K. Jain, Artificial neural networks for feature extraction and multivariate data projection. IEEE Trans. Neural Networks 6, 296 –317, 1995. [2] B. Lerner, Toward a completely automatic neural network based human chromosome analysis. IEEE Trans, Syst. Man Cyber. 28, Part B, Special issue on artificial neural networks, 544-552, 1998. [3] B. Lerner, H. Guterman, Mayer Aladjem, A comparative study of neural network based feature extraction paradigms. Pattern Recognition. Vol 20, 1999. [4] M. E. Stevens, Introduction to the special issue on optical character recognition (OCR), Pattern Recognition 2, 147 – 150, 1970. [5] J. Rabinow, Whither OCR and whence Datamation. 38 – 42, July 1969. [6] P. L. Andersson, Optical character recognition – a survey. Datamation, 43 – 48, July 1969. [7] J. Mantas, An overview of character recognition methodologies, Pattern Recognition 19, 425 –430 1986. [8] R. Davis and J. L. Yall, Recognition of handwritten characters – a review, Image vision Comput. 4, 208 –218. [9] Y. Cheng and C. H. Leung, Chain – code transform for chinese character recognition, IEEE 1985, Proc. Int. Conf. Cyb. Soc. Tucson, AZ, U.S.A. pp. 42 – 45, 1985. [10] H. I. Avi-Itzhak, T. A. Diep, and H. Garland, “High accuracy optical character recognition using neural networks with centroid dithering”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 17, pp. 218-224, 1995. [11] S-W. Lee, “Off-Line Recognition of totally unconstrained handwritten numerals using multilayer cluster neural network”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 18, pp. 648-652, 1996. [12] S-B. Cho, “Neural-network classifiers for recognizing totally unconstrained handwritten numerals”, IEEE Trans. on Neural Networks, Vol. 8, pp. 43-53, 1997. [13] N.W. Strathy, C.Y. Suen, and A. Krzyzak, “Segmentation of handwritten digits using contour features”, ICDAR ‘93, pp. 577- 580, 1993. [14] B. A. Yanikoglu, and P. A. Sandon, “Off-line cursive handwriting recognition using style parameters”, Tech. Report PCS-TR93-192, Dartmouth College, NH., 1993. [15] J-H. Chiang, “A hybrid neural model in handwritten word recognition”, Neural Networks, Vol. 11, pp. 337-346, 1998. [16] R. Crane, A simplified approach to image processing: classical and modern techniques in C. Prentice Hall. 1996. 0-7803-7278-6/02/$10.00 ©2002 IEEE . Experimental Analysis of Neural Network Based Feature Extractors for Cursive Handwriting Recognition Ling Gang, Brijesh Verma and Siddhi Kulkarni School of Information Technology,. algorithm. The networks used were feed-forward neural networks. There was only one hidden layer for all kinds of networks in this research. The number of neurons in the input layer of all these. Multi-MLP For the Multi-MLP feature extractor, the situation was different. The possible outputs governed the number of neural networks instead of the number of output units. Each neural network

Ngày đăng: 28/04/2014, 10:10

Mục lục

  • IJCNN Main Menu

  • IJCNN Table of Contents

  • IJCNN Author Index

  • ----------------

  • Search CD-ROM

  • Search Results

  • Print

  • ----------------

  • WCCI CD-ROM Help

  • ----------------

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan