Tài liệu Mạng thần kinh thường xuyên cho dự đoán P1 docx

Thông tin tài liệu

Recurrent Neural Networks for Prediction Authored by Danilo P. Mandic, Jonathon A. Chambers Copyright c 2001 John Wiley & Sons Ltd ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic) 1 Introduction Artificial neural network (ANN) models have been extensively studied with the aim of achieving human-like performance, especially in the field of pattern recognition. These networks are composed of a number of nonlinear computational elements which operate in parallel and are arranged in a manner reminiscent of biological neural inter- connections. ANNs are known by many names such as connectionist models, parallel distributed processing models and neuromorphic systems (Lippmann 1987). The ori- gin of connectionist ideas can be traced back to the Greek philosopher, Aristotle, and his ideas of mental associations. He proposed some of the basic concepts such as that memory is composed of simple elements connected to each other via a number of different mechanisms (Medler 1998). While early work in ANNs used anthropomorphic arguments to introduce the methods and models used, today neural networks used in engineering are related to algorithms and computation and do not question how brains might work (Hunt et al. 1992). For instance, recurrent neural networks have been attractive to physicists due to their isomorphism to spin glass systems (Ermentrout 1998). The following properties of neural networks make them important in signal processing (Hunt et al. 1992): they are nonlinear systems; they enable parallel distributed processing; they can be implemented in VLSI technology; they provide learning, adaptation and data fusion of both qualitative (symbolic data from artificial intelligence) and quantitative (from engineering) data; they realise multivariable systems. The area of neural networks is nowadays considered from two main perspectives. The first perspective is cognitive science, which is an interdisciplinary study of the mind. The second perspective is connectionism, which is a theory of information processing (Medler 1998). The neural networks in this work are approached from an engineering perspective, i.e. to make networks efficient in terms of topology, learning algorithms, ability to approximate functions and capture dynamics of time-varying systems. From the perspective of connection patterns, neural networks can be grouped into two categories: feedforward networks, in which graphs have no loops, and recurrent networks, where loops occur because of feedback connections. Feedforward networks are static, that is, a given input can produce only one set of outputs, and hence carry no memory. In contrast, recurrent network architectures enable the information to be temporally memorised in the networks (Kung and Hwang 1998). Based on training by example, with strong support of statistical and optimisation theories 2 SOME IMPORTANT DATES IN THE HISTORY OF CONNECTIONISM (Cichocki and Unbehauen 1993; Zhang and Constantinides 1992), neural networks are becoming one of the most powerful and appealing nonlinear signal processors for a variety of signal processing applications. As such, neural networks expand signal processing horizons (Chen 1997; Haykin 1996b), and can be considered as massively interconnected nonlinear adaptive filters. Our emphasis will be on dynamics of recurrent architectures and algorithms for prediction. 1.1 Some Important Dates in the History of Connectionism In the early 1940s the pioneers of the field, McCulloch and Pitts, studied the potential of the interconnection of a model of a neuron. They proposed a computational model based on a simple neuron-like element (McCulloch and Pitts 1943). Others, like Hebb were concerned with the adaptation laws involved in neural systems. In 1949 Donald Hebb devised a learning rule for adapting the connections within artificial neurons (Hebb 1949). A period of early activity extends up to the 1960s with the work of Rosenblatt (1962) and Widrow and Hoff (1960). In 1958, Rosenblatt coined the name ‘perceptron’. Based upon the perceptron (Rosenblatt 1958), he developed the theory of statistical separability. The next major development is the new formulation of learning rules by Widrow and Hoff in their Adaline (Widrow and Hoff 1960). In 1969, Minsky and Papert (1969) provided a rigorous analysis of the perceptron. The work of Grossberg in 1976 was based on biological and psychological evidence. He proposed several new architectures of nonlinear dynamical systems (Grossberg 1974) and introduced adaptive resonance theory (ART), which is a real-time ANN that performs supervised and unsupervised learning of categories, pattern classification and prediction. In 1982 Hopfield pointed out that neural networks with certain symmetries are analogues to spin glasses. A seminal book on ANNs is by Rumelhart et al. (1986). Fukushima explored com- petitive learning in his biologically inspired Cognitron and Neocognitron (Fukushima 1975; Widrow and Lehr 1990). In 1971 Werbos developed a backpropagation learning algorithm which he published in his doctoral thesis (Werbos 1974). Rumelhart et al. rediscovered this technique in 1986 (Rumelhart et al. 1986). Kohonen (1982), introduced self-organised maps for pattern recognition (Burr 1993). 1.2 The Structure of Neural Networks In neural networks, computational models or nodes are connected through weights that are adapted during use to improve performance. The main idea is to achieve good performance via dense interconnection of simple computational elements. The simplest node provides a linear combination of N weights w 1 , ,w N and N inputs x 1 , ,x N , and passes the result through a nonlinearity Φ, as shown in Figure 1.1. Models of neural networks are specified by the net topology, node characteristics and training or learning rules. From the perspective of connection patterns, neural networks can be grouped into two categories: feedforward networks, in which graphs have no loops, and recurrent networks, where loops occur because of feedback connections. Neural networks are specified by (Tsoi and Back 1997) INTRODUCTION 3 1 1 2 N 0ii N 2 i 0 node +1 =(xwy x x x w w w w +w ) . . . ΦΣ Figure 1.1 Connections within a node • Node: typically a sigmoid function; • Layer: a set of nodes at the same hierarchical level; • Connection: constant weights or weights as a linear dynamical system, feedforward or recurrent; • Architecture: an arrangement of interconnected neurons; • Mode of operation: analogue or digital. Massively interconnected neural nets provide a greater degree of robustness or fault tolerance than sequential machines. By robustness we mean that small perturbations in parameters will also result in small deviations of the values of the signals from their nominal values. In our work, hence, the term neuron will refer to an operator which performs the mapping Neuron: R N +1 → R (1.1) as shown in Figure 1.1. The equation y = Φ  N  i=1 w i x i + w 0  (1.2) represents a mathematical description of a neuron. The input vector is given by x = [x 1 , ,x N , 1] T , whereas w =[w 1 , ,w N ,w 0 ] T is referred to as the weight vector of a neuron. The weight w 0 is the weight which corresponds to the bias input, which is typically set to unity. The function Φ : R → (0, 1) is monotone and continuous, most commonly of a sigmoid shape. A set of interconnected neurons is a neural network (NN). If there are N input elements to an NN and M output elements of an NN, then an NN defines a continuous mapping NN: R N → R M . (1.3) 4 PERSPECTIVE 1.3 Perspective Before the 1920s, prediction was undertaken by simply extrapolating the time series through a global fit procedure. The beginning of modern time series prediction was in 1927 when Yule introduced the autoregressive model in order to predict the annual number of sunspots. For the next half century the models considered were linear, typically driven by white noise. In the 1980s, the state-space representation and machine learning, typically by neural networks, emerged as new potential models for prediction of highly complex, nonlinear and nonstationary phenomena. This was the shift from rule-based models to data-driven methods (Gershenfeld and Weigend 1993). Time series prediction has traditionally been performed by the use of linear parametric autoregressive (AR), moving-average (MA) or autoregressive moving-average (ARMA) models (Box and Jenkins 1976; Ljung and Soderstrom 1983; Makhoul 1975), the parameters of which are estimated either in a block or a sequential manner with the least mean square (LMS) or recursive least-squares (RLS) algorithms (Haykin 1994). An obvious problem is that these processors are linear and are not able to cope with certain nonstationary signals, and signals whose mathematical model is not linear. On the other hand, neural networks are powerful when applied to problems whose solutions require knowledge which is difficult to specify, but for which there is an abundance of examples (Dillon and Manikopoulos 1991; Gent and Shep- pard 1992; Townshend 1991). As time series prediction is conventionally performed entirely by inference of future behaviour from examples of past behaviour, it is a suitable application for a neural network predictor. The neural network approach to time series prediction is non-parametric in the sense that it does not need to know any information regarding the process that generates the signal. For instance, the order and parameters of an AR or ARMA process are not needed in order to carry out the prediction. This task is carried out by a process of learning from examples presented to the network and changing network weights in response to the output error. Li (1992) has shown that the recurrent neural network (RNN) with a sufficiently large number of neurons is a realisation of the nonlinear ARMA (NARMA) process. RNNs performing NARMA prediction have traditionally been trained by the real- time recurrent learning (RTRL) algorithm (Williams and Zipser 1989a) which provides the training process of the RNN ‘on the run’. However, for a complex physical process, some difficulties encountered by RNNs such as the high degree of approximation involved in the RTRL algorithm for a high-order MA part of the underlying NARMA process, high computational complexity of O(N 4 ), with N being the number of neurons in the RNN, insufficient degree of nonlinearity involved, and relatively low robustness, induced a search for some other, more suitable schemes for RNN-based predictors. In addition, in time series prediction of nonlinear and nonstationary signals, there is a need to learn long-time temporal dependencies. This is rather difficult with con- ventional RNNs because of the problem of vanishing gradient (Bengio et al. 1994). A solution to that problem might be NARMA models and nonlinear autoregressive moving average models with exogenous inputs (NARMAX) (Siegelmann et al. 1997) realised by recurrent neural networks. However, the quality of performance is highly dependent on the order of the AR and MA parts in the NARMAX model. INTRODUCTION 5 The main reasons for using neural networks for prediction rather than classical time series analysis are (Wu 1995) • they are computationally at least as fast, if not faster, than most available statistical techniques; • they are self-monitoring (i.e. they learn how to make accurate predictions); • they are as accurate if not more accurate than most of the available statistical techniques; • they provide iterative forecasts; • they are able to cope with nonlinearity and nonstationarity of input processes; • they offer both parametric and nonparametric prediction. 1.4 Neural Networks for Prediction: Perspective Many signals are generated from an inherently nonlinear physical mechanism and have statistically non-stationary properties, a classic example of which is speech. Linear structure adaptive filters are suitable for the nonstationary characteristics of such signals, but they do not account for nonlinearity and associated higher-order statistics (Shynk 1989). Adaptive techniques which recognise the nonlinear nature of the signal should therefore outperform traditional linear adaptive filtering techniques (Haykin 1996a; Kay 1993). The classic approach to time series prediction is to undertake an analysis of the time series data, which includes modelling, identification of the model and model parameter estimation phases (Makhoul 1975). The design may be iterated by measuring the closeness of the model to the real data. This can be a long process, often involving the derivation, implementation and refinement of a number of models before one with appropriate characteristics is found. In particular, the most difficult systems to predict are • those with non-stationary dynamics, where the underlying behaviour varies with time, a typical example of which is speech production; • those which deal with physical data which are subject to noise and experimen- tation error, such as biomedical signals; • those which deal with short time series, providing few data points on which to conduct the analysis, such as heart rate signals, chaotic signals and meteorolog- ical signals. In all these situations, traditional techniques are severely limited and alternative techniques must be found (Bengio 1995; Haykin and Li 1995; Li and Haykin 1993; Niranjan and Kadirkamanathan 1991). On the other hand, neural networks are powerful when applied to problems whose solutions require knowledge which is difficult to specify, but for which there is an abundance of examples (Dillon and Manikopoulos 1991; Gent and Sheppard 1992; Townshend 1991). From a system theoretic point of view, neural networks can be considered as a conveniently parametrised class of nonlinear maps (Narendra 1996). 6 STRUCTURE OF THE BOOK There has been a recent resurgence in the field of ANNs caused by new net topolo- gies, VLSI computational algorithms and the introduction of massive parallelism into neural networks. As such, they are both universal function approximators (Cybenko 1989; Hornik et al. 1989) and arbitrary pattern classifiers. From the Weierstrass The- orem, it is known that polynomials, and many other approximation schemes, can approximate arbitrarily well a continuous function. Kolmogorov’s theorem (a neg- ative solution of Hilbert’s 13th problem (Lorentz 1976)) states that any continuous function can be approximated using only linear summations and nonlinear but contin- uously increasing functions of only one variable. This makes neural networks suitable for universal approximation, and hence prediction. Although sometimes computationally demanding (Williams and Zipser 1995), neural networks have found their place in the area of nonlinear autoregressive moving average (NARMA) (Bailer-Jones et al. 1998; Connor et al. 1992; Lin et al. 1996) prediction applications. Comprehensive survey papers on the use and role of ANNs can be found in Widrow and Lehr (1990), Lippmann (1987), Medler (1998), Ermentrout (1998), Hunt et al. (1992) and Billings (1980). Only recently, neural networks have been considered for prediction. A recent compe- tition by the Santa Fe Institute for Studies in the Science of Complexity (1991–1993) (Weigend and Gershenfeld 1994) showed that neural networks can outperform conven- tional linear predictors in a number of applications (Waibel et al. 1989). In journals, there has been an ever increasing interest in applying neural networks. A most comprehensive issue on recurrent neural networks is the issue of the IEEE Transactions of Neural Networks, vol. 5, no. 2, March 1994. In the signal processing community, there has been a recent special issue ‘Neural Networks for Signal Processing’ of the IEEE Transactions on Signal Processing, vol. 45, no. 11, November 1997, and also the issue ‘Intelligent Signal Processing’ of the Proceedings of IEEE, vol. 86, no. 11, November 1998, both dedicated to the use of neural networks in signal processing applications. Figure 1.2 shows the frequency of the appearance of articles on recurrent neural networks in common citation index databases. Figure 1.2(a) shows number of journal and conference articles on recurrent neural networks in IEE/IEEE publications between 1988 and 1999. The data were gathered using the IEL Online service, and these publications are mainly periodicals and conferences in electronics engineering. Figure 1.2(b) shows the frequency of appearance for BIDS/ATHENS database, between 1988 and 2000, 1 which also includes non-engineering publications. From Figure 1.2, there is a clear growing trend in the frequency of appearance of articles on recurrent neural networks. Therefore, we felt that there was a need for a research monograph that would cover a part of the area with up to date ideas and results. 1.5 Structure of the Book The book is divided into 12 chapters and 10 appendices. An introduction to connectionism and the notion of neural networks for prediction is included in Chapter 1. The fundamentals of adaptive signal processing and learning theory are detailed in Chap- ter 2. An initial overview of network architectures for prediction is given in Chapter 3. 1 At the time of writing, only the months up to September 2000 were covered. INTRODUCTION 7 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 0 20 40 60 80 100 120 140 Number of journal and conference papers on Recurrent Neural Networks via IEL Year Number (a) Appearance of articles on Recurrent Neural Networks in IEE/IEEE publications in period 1988–1999 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 0 10 20 30 40 50 60 70 Number of journal and conference papers on Recurrent Neural Networks via BIDS Year Number (b) (b) Appearance of articles on Recurrent Neural Networks in BIDS database in period 1988–2000 Figure 1.2 Appearance of articles on RNNs in major citation databases. (a) Appearance of articles on recurrent neural networks in IEE/IEEE publications in period 1988–1999. (b) App earance of articles on recurrent neural networks in BIDS database in period 1988–2000. 8 READERSHIP Chapter 4 contains a detailed discussion of activation functions and new insights are provided by the consideration of neural networks within the framework of modu- lar groups from number theory. The material in Chapter 5 builds upon that within Chapter 3 and provides more comprehensive coverage of recurrent neural network architectures together with concepts from nonlinear system modelling. In Chapter 6, neural networks are considered as nonlinear adaptive filters whereby the necessary learning strategies for recurrent neural networks are developed. The stability issues for certain recurrent neural network architectures are considered in Chapter 7 through the exploitation of fixed point theory and bounds for global asymptotic stability are derived. A posteriori adaptive learning algorithms are introduced in Chapter 8 and the synergy with data-reusing algorithms is highlighted. In Chapter 9, a new class of normalised algorithms for online training of recurrent neural networks is derived. The convergence of online learning algorithms for neural networks is addressed in Chapter 10. Experimental results for the prediction of nonlinear and nonstationary signals with recurrent neural networks are presented in Chapter 11. In Chapter 12, the exploitation of inherent relationships between parameters within recurrent neural networks is described. Appendices A to J provide background to the main chapters and cover key concepts from linear algebra, approximation theory, complex sigmoid activation functions, a precedent learning algorithm for recurrent neural networks, ter- minology in neural networks, a posteriori techniques in science and engineering, con- traction mapping theory, linear relaxation and stability, stability of general nonlinear systems and deseasonalising of time series. The book concludes with a comprehensive bibliography. 1.6 Readership This book is targeted at graduate students and research engineers active in the areas of communications, neural networks, nonlinear control, signal processing and time series analysis. It will also be useful for engineers and scientists working in diverse application areas, such as artificial intelligence, biomedicine, earth sciences, finance and physics. . optimisation theories 2 SOME IMPORTANT DATES IN THE HISTORY OF CONNECTIONISM (Cichocki and Unbehauen 1993; Zhang and Constantinides 1992), neural networks are. the perceptron. The work of Grossberg in 1976 was based on biological and psychological evidence. He proposed several new architectures of nonlinear dynamical

Ngày đăng: 21/01/2014, 15:20

Xem thêm: Tài liệu Mạng thần kinh thường xuyên cho dự đoán P1 docx, Tài liệu Mạng thần kinh thường xuyên cho dự đoán P1 docx

Tài liệu Mạng thần kinh thường xuyên cho dự đoán P1 docx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan