Tài liệu Digital Signal Processing Handbook P15 docx

Thông tin tài liệu

Mendel, J.M. “Estimation Theory and Algorithms: From Gauss to Wiener to Kalman” Digital Signal Processing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c  1999byCRCPressLLC 15 Estimation Theory and Algorithms: From Gauss to Wiener to Kalman Jerry M. Mendel University of Southern California 15.1 Introduction 15.2 Least-Squares Estimation 15.3 Properties of Estimators 15.4 Best Linear Unbiased Estimation 15.5 Maximum-Likelihood Estimation 15.6 Mean-Squared Estimation of Random Parameters 15.7 Maximum A Posteriori Estimation of Random Parameters 15.8 The Basic State-Variable Model 15.9 State Estimation for the Basic State-Variable Model Prediction • Filtering (the Kalman Filter) • Smoothing 15.10 Digital Wiener Filtering 15.11 Linear Prediction in DSP, and Kalman Filtering 15.12 Iterated Least Squares 15.13 Extended Kalman Filter Acknowledgment References Further Information 15.1 Introduction Estimation is one of four modeling problems. The other three are representation (how something should be modeled), measurement (which physical quantities should be measured and how they should be measured), and validation (demonstrating confidence in the model). Estimation, which fits in between the problems of measurement and validation, deals with the determination of those physical quantities that cannot be measured from those that can be measured. We shall cover a wide range of estimation techniques including weighted least squares, best linear unbiased, maximum- likelihood, mean-squared, and maximum-a posteriori. These techniques are for parameter or state estimation or a combination of the two, as applied to either linear or nonlinear models. Thediscrete-timeviewpoint is emphasized in this chapterbecause: (1) muchrealdataiscollectedin a digitized manner, so it isin a form ready to be processedby discrete-timeestimation algorithms; and (2) the mathematics associatedwith discrete-time estimation theory is simpler than with continuous- time estimation theory. We view (discrete-time) estimation theory as the extension of classical signal processing to the design of discrete-time (digital) filters that process uncertain data in a optimal manner. Estimation theory can, therefore, be viewed as a natural adjunct to digital signal processing theory. Mendel [12] is the primary reference for all the material in this chapter. c  1999 by CRC Press LLC Estimation algorithms process data and, as such, must be implemented on a digital computer. Our computation philosophy is, whenever possible, leave it to the experts. Many of our chapter’s algorithms can be used with MATLAB TM and appropriate toolboxes (MATLAB is a registered trade- mark of The MathWorks, Inc.). See [12] for specific connections between MATLAB TM and toolbox M-files and the algorithms of this chapter. The main model that we shall direct our attention to is linear in the unknown parameters, namely Z(k) = H(k)θ + V(k) . (15.1) In this model, which we refer to as a “generic linear model,” Z(k) = col (z(k), z(k − 1), .,z(k− N + 1)), which is N × 1, is called the measurement vector. Its elements are z(j) = h  (j)θ + v(j); θ which is n × 1, is called the parameter vector, and contains the unknown deterministic or random parameters that will be estimated using oneor more ofthis chapter’stechniques; H(k), which is N ×n, is called the observation matrix; and, V(k), which is N × 1, is called the measurement noise vector. By convention, the argument “k”ofZ(k), H(k), and V(k) denotes the fact that the last measurement used to construct (15.1)isthekth. Examples of problems that can be cast into the form of the generic linear model are: identifying the impulse response coefficients in the convolutional summation model for a linear time-invariant system from noisy output measurements; identifying the coefficients of a linear time-invariant finite- difference equation model for a dynamical system from noisy output measurements; function ap- proximation; state estimation; estimating parameters of a nonlinear model using a linearized version of that model; deconvolution; and identifying the coefficients in a discretized Volterra series representation of a nonlinear system. The following estimation notation is used throughout this chapter: ˆ θ(k)denotes an estimate of θ and ˜ θ(k)denotes the errorin estimation, i.e., ˜ θ(k) = θ − ˆ θ(k). The generic linearmodel is thestarting point for the derivation of manyclassical parameter estimation techniques, and the estimation model for Z(k) is ˆ Z(k) = H(k) ˆ θ(k). In the rest of this chapter we develop specific structures for ˆ θ(k). These structures are referred to as estimators. Estimates are obtained whenever data are processed by an estimator. 15.2 Least-Squares Estimation The method of least squares dates back to Karl Gauss around 1795 and is the cornerstone for most estimationtheory. The weightedleast-squaresestimator (WLSE), ˆ θ WLS (k), isobtainedbyminimizing the objective function J [ ˆ θ(k)]= ˜ Z  (k)W(k) ˜ Z(k), where [using (15.1)] ˜ Z(k) = Z(k) − ˆ Z(k) = H(k) ˜ θ(k)+V(k), and weightingmatrix W(k)mustbesymmetric andpositivedefinite. Thisweighting matrix can be used to weight recent measurements more (or less) heavily than past measurements. If W(k) = cI, so that all measurements are weighted the same, then weighted least-squares reduces to least squares, in which case, we obtain ˆ θ LS (k). Setting dJ[ ˆ θ(k)]/d ˆ θ(k) = 0, we find that: ˆ θ WLS (k) =  H  (k)W(k)H(k)  −1 H  (k)W(k)Z(k) (15.2) and, consequently, ˆ θ LS (k) =  H  (k)H(k)  −1 H  (k)Z(k) (15.3) Note, also, that J [ ˆ θ WLS (k)]=Z  (k)W(k)Z(k) − ˆ θ  WLS (k)H  (k)W(k)H(k) ˆ θ WLS (k). Matrix H  (k)W(k)H(k) must be nonsingular for its inverse in (15.2) to exist. This is true if W(k) is positive definite, as assumed, and H(k) is of maximum rank. We know that ˆ θ WLS (k) minimizes J [ ˆ θ WLS (k)] because d 2 J [ ˆ θ(k)]/d ˆ θ 2 (k) = 2H  (k)W(k)H(k) > 0, since H  (k)W(k)H(k) is invertible. Estimator ˆ θ WLS (k) processes the measurements Z(k) linearly; hence, it is referred to as a linear c  1999 by CRC Press LLC estimator. In practice, we do not compute ˆ θ WLS (k) using (15.2), because computing the inverse of H  (k)W(k)H(k) is fraught with numerical difficulties. Instead, the so-called normal equations [H  (k)W(k)H(k)] ˆ θ WLS (k) = H  (k)W(k)Z(k) are solved using stable algorithms from numerical linear algebra (e.g., [3] indicating that one approach to solving the normal equations is to convert the original least squares problem into an equivalent, easy-to-solve problem using orthogonal transformations such as Householder or Givens transformations). Note, also, that (15.2) and (15.3) apply to the estimation of either deterministic or random parameters, because nowhere in the derivation of ˆ θ WLS (k) did we have to assume that θ was or was not random. Finally, note that WLSEs may not be invariant under changes of scale. One way to circumvent this difficulty is to use normalized data. Least-squares estimates can also be computed using the singular-value decomposition (SVD) of matrix H(k). This computation is valid for both the overdetermined (N < n) and underdetermined (N > n) situations and for the situation when H(k) may or may not be of full rank. The SV D of K × M matrix A is: U  AV =   0 0 0  (15.4) where U and V are unitary matrices,  = diag (σ 1 ,σ 2 , .,σ r ), and σ 1 ≥ σ 2 ≥ .≥ σ r > 0.The σ i ’s are the singular values of A, and r is the rank of A. Let the SVD of H(k) begivenby(15.4). Even if H(k) is not of maximum rank, then ˆ θ LS (k) = V   −1 0 0 0  U  Z(k) (15.5) where  −1 = diag (σ −1 1 σ −1 2 , .,σ −1 r ) and r is the rank of H(k). Additionally, in the overdetermined case, ˆ θ LS (k) = r  i=1 v i (k) σ 2 i (k) v  i (k)H  (k)Z(k) (15.6) Similar formulas exist for computing ˆ θ WLS (k). Equations (15.2) and (15.3) are batch equations, because they process all of the measurements at one time. These formulas can be made recursive in time by using simple vector and matrix partitioning techniques. The information form of the recursive WLSE is: ˆ θ WLS (k + 1) = ˆ θ WLS (k) + K w (k + 1)[z(k + 1) − h  (k + 1) ˆ θ WLS (k)] (15.7) K w (k + 1) = P(k + 1)h(k + 1)w(k + 1) (15.8) P −1 (k + 1) = P −1 (k) + h(k + 1)w(k + 1)h  (k + 1) (15.9) Equations (15.8) and (15.9) require the inversion of n × n matrix P.Ifn is large, then this will be a costly computation. Applying a matrix inversion lemma to (15.9), one obtains the following alternative covariance form of the recursive WLSE: Equation (15.7), and K w (k + 1) = P(k)h(k + 1)  h  (k + 1)P(k)h(k + 1) + 1 w(k + 1)  −1 (15.10) P(k + 1) =  I − K w (k + 1)h  (k + 1)  P(k) (15.11) Equations (15.7)–(15.9)or(15.7), (15.10), and (15.11), are initialized by ˆ θ WLS (n) and P −1 (n),where P(n) =[H  (n)W(n)H(n)] −1 , and are used for k = n, n + 1, .,N − 1. Equation (15.7) can be expressed as ˆ θ WLS (k + 1) =  I − K w (k + 1)h  (k + 1)  ˆ θ WLS (k) + K w (k + 1)z(k + 1) (15.12) c  1999 by CRC Press LLC which demonstrates that the recursive WLSE is a time-varying digital filter that is excited by random inputs (i.e., the measurements), one whose plant matrix [I − K w (k + 1)h  (k + 1)] may itself be random because K w (k + 1) and h(k + 1) may be random, depending upon the specific application. The random natures of these matrices make the analysis of this filter exceedingly difficult. Two recursions are present in the recursive WLSEs. The first is the vector recursion for ˆ θ WLS given by (15.7). Clearly, ˆ θ WLS (k +1) cannot be computed from this expression until measurement z(k +1) is available. The second is the matrix recursion for either P −1 givenby(15.9)orP givenby(15.11). Observe that values for these matrices can be precomputed before measurements are made. A digital computer implementation of (15.7)–(15.9)isP −1 (k +1) → P(k +1) → K w (k +1) → ˆ θ WLS (k +1), whereas for (15.7), (15.10), and (15.11), it is P(k) → K w (k + 1) → ˆ θ WLS (k + 1) → P(k + 1). Finally, the recursive WLSEs can even be used for k = 0, 1, .,N − 1. Often z(0) = 0, or there is no measurement made at k = 0, so that we can set z(0) = 0. In this case we can set w(0) = 0, and the recursive WLSEs can be initialized by setting ˆ θ WLS (0) = 0 and P(0) to a diagonal matrix of very large numbers. This is very commonly done in practice. Fast fixed-order recursive least-squares algorithms that are based on the Givens rotation [3] and can be implemented using systolic arrays are described in [5] and the references therein. 15.3 Properties of Estimators How dowe know whether or notthe resultsobtained from the WLSE, orfor that matter any estimator, are good? To answer this question, we must make use of the fact that all estimators represent transformations of random data; hence, ˆ θ(k)is itself random, so that its properties must be studied from a statistical viewpoint. This fact, and its consequences, which seem so obvious to us today, are due to the eminent statistician R.A. Fischer. It is common to distinguish between small-sample and large-sample properties of estimators. The term “sample” refers to the number of measurements used to obtain ˆ θ, i.e., the dimension of Z. The phrase “small sample” means any number of measurements (e.g., 1, 2, 100, 10 4 ,orevenan infinite number), whereas the phrase “large sample” means “an infinite number of measurements.” Large-sample properties are also referred to as asymptotic properties. If an estimator possesses as small-sample property, it also possesses the associated large-sample property; but the converse is not always true. Although large sample means an infinite number of measurements, estimators begin to enjoy large-sample properties for much fewer than an infinite number of measurements. How few usually depends on the dimension of θ,n, the memory of the estimators, and in general on the underlying, albeit unknown, probability density function. A thorough study into ˆ θ would mean determining its probability density function p( ˆ θ). Usually, it is too difficult to obtain p( ˆ θ) for most estimators (unless ˆ θ is multivariate Gaussian); thus, it is customary to emphasize the first-and second-order statistics of ˆ θ (or its associated error ˜ θ = θ − ˆ θ), the mean and the covariance. Small-sample properties of an estimator are unbiasedness and efficiency. An estimator is unbiased if its mean value is tracking the unknown parameter at every value of time, i.e., the mean value of the estimation error is zero at every value of time. Dispersion about the mean is measured by error variance. Efficiency is related to how small the error variance will be. Associatedwith efficiency is the very famous Cramer-Rao inequality (Fisher information matrix, in the case of a vector of parameters) which places a lower bound on the error variance, a bound that does not depend on a particular estimator. Large-sample properties of an estimator are asymptotic unbiasedness, consistency, asymptotic normality, and asymptotic efficiency. Asymptotic unbiasedness and efficiency are limiting forms of their small sample counterparts, unbiasedness and efficiency. The importance of an estimator being asymptotically normal (Gaussian) is that its entire probabilistic description is then known, and it c  1999 by CRC Press LLC can be entirely characterized just by its asymptotic first- and second-order statistics. Consistency is a form of convergence of ˆ θ(k) to θ; it is synonymous with convergence in probability. One of the reasons for the importance of consistency in estimation theory is that any continuous function of a consistent estimator is itself a consistent estimator, i.e., “consistency carries over.” It is also possible to examine other types of stochastic convergence for estimators, such as mean-squared convergence and convergence with probability 1. A general carry-over property does not exist for these two types of convergence; it must be established case-by case (e.g., [11]). Generally speaking, it is very difficult to establish small sample or large sample properties for least- squares estimators, except in the very special case when H(k) and V(k) are statistically independent. While this condition is satisfied in the application of identifying an impulse response, it is violated in the important application of identifying the coefficients in a finite difference equation, as well as in many other important engineering applications. Many large sample properties of LSEs are determined by establishing that the LSE is equivalent to another estimator for which it is known that the large sample property holds true. We pursue this below. Least-squares estimators require no assumptions about the statistical nature of the generic model. Consequently, the formula for the WLSE is easy to derive. The price paid for not making assumptions about the statistical nature of the generic linear model is great difficulty in establishing small or large sample properties of the resulting estimator. 15.4 Best Linear Unbiased Estimation Our second estimator is both unbiased and efficient by design, and is a linear function of measurements Z(k). It is called a best linear unbiased estimator (BLUE), ˆ θ BLU (k). As in the derivation of the WLSE, we begin with our generic linear model; but, now we make two assumptions about this model, namely: (1) H(k) must be deterministic, and (2) V(k) must be zero mean with positive definite known covariance matrix R(k). The derivation of the BLUE is more complicated than the derivation of the WLSE because of the design constraints; however, its performance analysis is much easier because we build good performance into its design. We begin by assuming the following linear structure for ˆ θ BLU (k), ˆ θ BLU (k) = F(k)Z(k). Matrix F(k) is designed such that: (1) ˆ θ BLU (k) is an unbiased estimator of θ, and (2) the error variance for each of the n parameters is minimized. In this way, ˆ θ BLU (k) will be unbiased and efficient (within the class of linear estimators) by design. The resulting BLUE estimator is: ˆ θ BLU (k) =[H  (k)R −1 (k)H(k)]H  (k)R −1 (k)Z(k) (15.13) A very remarkable connection exists between the BLUE and WLSE, namely, the BLUE of θ is the special case of the WLSE of θ when W(k) = R −1 (k). Consequently, all results obtained in our section above for ˆ θ WLS (k) can be applied to ˆ θ BLU (k) by setting W(k) = R −1 (k). Matrix R −1 (k) weights the contributions of precise measurements heavily and deemphasizes the contributions of imprecisemeasurements. The best linear unbiased estimation design technique has led to a weighting matrix that is quite sensible. If H(k) is deterministic and R(k) = σ 2 ν I, then ˆ θ BLU (k) = ˆ θ LS (k). This result, known as the Gauss-Markov theorem, is important because we have connected two seemingly different estimators, one of which, ˆ θ BLU (k), has the properties of unbiasedness and minimum variance by design; hence, in this case ˆ θ LS (k) inherits these properties. In a recursive WLSE, matrix P(k) has no special meaning. In a recursive BLUE [which is obtained by substituting W(k) = R −1 (k) into (15.7)–(15.9), or (15.7), (15.10) and (15.11)], matrix P(k) is the covariance matrix for the error between θ and ˆ θ BLU (k), i.e., P(k) =[H  (k)R −1 (k)H(k)] −1 = cov [ ˜ θ BLU (k)]. Hence, every time P(k) is calculated in the recursive BLUE, we obtain a quantitative measure of how well we are estimating θ. c  1999 by CRC Press LLC Recall that we stated that WLSEs may change in numerical value under changes in scale. BLUEs are invariant under changes in scale. This is accomplished automatically by setting W(k) = R −1 (k) in the WLSE. The fact that H(k) must be deterministic severely limits the applicability of BLUEs in engineering applications. 15.5 Maximum-Likelihood Estimation Probability is associated with a forward experiment in which the probability model, p(Z(k)|θ),is specified, including values for the parameters, θ, in that model (e.g., mean and variance in a Gaussian density function), and data (i.e., realizations) are generated using this model. Likelihood, l(θ|Z(k)), is proportional to probability. In likelihood, the data is given as well as the nature of the probability model;but the parameters of the probability model are not specified. They must be determined from the given data. Likelihood is, therefore, associated with an inverse experiment. The maximum-likelihood method is based on the relatively simple idea that different (statistical) populations generate different samples and that any given sample (i.e., set of data) is more likely to have come from some populations than from others. Inorder todeterminethemaximum-likelihoodestimate(MLE) of deterministic θ, ˆ θ ML , we needto determine a formula for the likelihood function and then maximize that function. Becauselikelihood is proportional to probability, we need to know the entire joint probability density function of the measurements in order to determine a formula for the likelihood function. This, of course, is much more information about Z(k) than was required in the derivation of the BLUE. In fact, it is the most information that we can ever expect to know about the measurements. The price we pay for knowing so much information about Z(k) is complexity in maximizing the likelihood function. Generally, mathematical programming must be used in order to determine ˆ θ ML . Maximum-likelihood estimates are very popular and widely used because they enjoy very good large sample properties. They are consistent, asymptotically Gaussian with mean θ and covariance matrix 1 N J −1 , in which J is the Fisher information matrix, and are asymptotically efficient. Functions of maximum-likelihood estimates are themselves maximum-likelihood estimates, i.e., if g(θ ) is a vector function mapping θ into an interval in r-dimensional Euclidean space, then g( ˆ θ ML ) is a MLE of g(θ ). This “invariance” property is usually not enjoyed by WLSEs or BLUEs. In one special case it is very easy to compute ˆ θ ML , i.e., for our generic linear model in which H(k) is deterministic and V(k) is Gaussian. In this case ˆ θ ML = ˆ θ BLU . These estimators are: unbiased, because ˆ θ BLU is unbiased; efficient (within the class of linear estimators), because ˆ θ BLU is efficient; consistent, because ˆ θ ML is consistent; and, Gaussian, because they depend linearly on Z(k), which is Gaussian. If, in addition, R(k) = σ 2 ν I, then ˆ θ ML (k) = ˆ θ BLU (k) = ˆ θ LS (k), and these estimators are unbiased, efficient (within the class of linear estimators), consistent, and Gaussian. The method of maximum-likelihood is limited to deterministic parameters. In the case of random parameters, we can still use the WLSE or the BLUE, or, if additional information is available, we can use either a mean-squared or maximum-a posteriori estimator, as described below. The former does not use statistical information about the random parameters, whereas the latter does. 15.6 Mean-Squared Estimation of Random Parameters Givenmeasurementsz(1), z(2), .,z(k), themean-squaredestimator(MSE)ofrandom θ, ˆ θ MS (k) = φ[z(i), i = 1, 2, .,k], minimizes the mean-squared error J [ ˜ θ MS (k)]=E{ ˜ θ  MS (k) ˜ θ MS (k)} [where ˜ θ MS (k) = θ − ˆ θ MS (k)]. The function φ[z(i), i = 1, 2, .,k] may be nonlinear or linear. Its exact structure is determined by minimizing J [ ˜ θ MS (k)]. c  1999 by CRC Press LLC The solution to this mean-squared estimation problem, which is known as the fundamental theorem of estimation theory is: ˆ θ MS (k) = E { θ|Z(k) } (15.14) As it stands, (15.14) is not terribly useful for computing ˆ θ MS (k). In general, we must first compute p[θ|Z(k)] and then perform the requisite number of integrations of θp[θ|Z(k)] to obtain ˆ θ MS (k). It is useful to separate this computation into two major cases; (1) θ and Z(k) are jointly Gaussian — the Gaussian case, and (2) θ and Z(k) are not jointly Gaussian — the non-Gaussian case. When θ and Z(k) are jointly Gaussian, the estimator that minimizes the mean-squared error is ˆ θ MS (k) = m θ + P θz (k)P −1 z (k)  Z(k) − m z (k)  (15.15) where m θ is the mean of θ,m z (k) is the mean of Z(k), P z (k) is the covariance matrix of Z(k), and P θz (k) is the cross-covariance between θ and Z(k). Of course, to compute ˆ θ MS (k) using (15.15), we must somehow know all of these statistics, and we must be sure that θ and Z(k) are jointly Gaussian. For the generic linear model, Z(k) = H(k)θ +V(k), in which H(k) is deterministic, V(k) is Gaussian noise with known invertible covariance matrix R(k), θ is Gaussian with mean m θ and covariance matrix P θ , and, θ and V(k) are statistically independent, then θ and Z(k) are jointly Gaussian, and, (15.15) becomes ˆ θ MS (k) = m θ + P θ H  (k)  H(k)P θ H  (k) + R(k)  −1 [ Z(k) − H(k)m θ ] (15.16) where error-covariance matrix P MS (k), which is associated with ˆ θ MS (k),is P MS (k) = P θ − P θ H  (k)  H(k)P θ H  (k) + R(k)  −1 H(k)P θ =  P −1 θ + H  (k)R −1 (k)H(k)  −1 . (15.17) Using (15.17)in(15.16), ˆ θ MS (k) can be reexpressed as ˆ θ MS (k) = m θ + P MS (k)H  (k)R −1 (k) [ Z(k) − H(k)m θ ] (15.18) Suppose θ and Z(k) are not jointly Gaussian and that we know m θ , m z (k), P z (k), and P θz (k).In this case, the estimator that is constrained to be an affine transformation of Z(k) and that minimizes the mean-squared error is also given by (15.15). We now know the answer to the following important question: When is the linear (affine) mean- squaredestimatorthe sameas the mean-squaredestimator? The answeris when θ and Z(k)are jointly Gaussian. If θ and Z(k) are not jointly Gaussian, then ˆ θ MS (k) = E{θ |Z(k)}, which, in general, is a nonlinear function of measurements Z(k), i.e., it is a nonlinear estimator. Associated with mean-squared estimation theory is the orthogonality principle: Suppose f [Z(k)] is any function of the data Z(k); then the error in the mean-squared estimatoris orthogonal to f[Z(k)] in the sense that E{[θ − ˆ θ MS (k)]f  [Z(k)]} = 0. A frequently encountered special case of this occurs when f[Z(k)]= ˆ θ MS (k), in which case E{ ˜ θ MS (k) ˜ θ  MS (k)}=0. When θ and Z(k) are jointly Gaussian, ˆ θ MS (k) in (15.15) has the following properties: (1) it is unbiased; (2) each of its components has the smallest error variance; (3) it is a “linear” (affine) estimator; (4) it is unique; and, (5) both ˆ θ MS (k) and ˜ θ MS (k) are multivariate Gaussian, which means that these quantities are completely characterized by their first- and second-order statistics. Tremendous simplifications occur when θ and Z(k) are jointly Gaussian! Many of the results presented in this section are applicable to objective functions other than the mean-squared objective function. See the supplementary material at the end of Lesson 13 in [12] for discussions on a wide number of objective functions that lead to E{θ |Z(k)} as the optimal estimator of θ, as well as discussions on a full-blown nonlinear estimator of θ. c  1999 by CRC Press LLC There is a connection between the BLUE and the MSE. The connection requires a slightly different BLUE, one that incorporates the a priori statistical information about random θ. To do this, we treat m θ as an additional measurement that is augmented to Z(k). The additional measurement equation is obtained by adding and subtracting θ in the identity m θ = m θ , i.e., m θ = θ + (m θ − θ). Quantity (m θ − θ)is now treated as zero-mean measurement noise with covariance P θ . The augmented linear model is  Z(k) m θ  =  H(k) I  θ +  V(k) m θ − θ  (15.19) Let the BLUE estimator for this augmented model be denoted ˆ θ a BLU (k). Then it is always true that ˆ θ MS (k) = ˆ θ a BLU (k). Note that the weighted least-squares objective function that is associated with ˆ θ a BLU (k) is J a [ ˆ θ a (k)]=[m θ − ˆ θ a (k)]  P −1 θ [m θ − ˆ θ a (k)]+ ˜ Z  (k)R −1 (k) ˜ Z(k). 15.7 Maximum A Posteriori Estimation of Random Parameters Maximum a posteriori (MAP) estimation is also known as Bayesian estimation. Recall Bayes’s rule: p(θ|Z(k)) = p(Z(k)|θ)p(θ)/p(Z(k)) in which density function p(θ|Z(k)) is known as the a posteriori (or posterior) conditional density function, and p(θ) is the prior density function for θ. Observe that p(θ|Z(k)) is related to likelihood function l{θ |Z(k)}, because l{θ|Z(k)}∝p(Z(k)|θ). Addi- tionally, because p(Z(k)) does not depend on θ,p(θ|Z(k)) ∝ p(Z(k)|θ)p(θ). In MAP estimation, values of θ are found that maximize p(Z(k)|θ)p(θ). Obtaining a MAP estimate involves specifying both p(Z(k)|θ)and p(θ) and finding the value of θ that maximizes p(θ|Z(k)). It is the knowledge of the a priori probability model for θ,p(θ), that distinguishes the problem formulation for MAP estimation from MS estimation. If θ 1 ,θ 2 , .,θ n are uniformly distributed, then p(θ|Z(k)) ∝ p(Z(k)|θ), and the MAP estimator of θ equals the ML estimator of θ. Generally, MAP estimates are quite different from ML estimates. For example, the invariance property of MLEs usually does not carry over to MAP estimates. One reason for this can be seen from the formula p(θ|Z(k)) ∝ p(Z(k)|θ)p(θ). Suppose, for example, that φ = g(θ) and we want to determine ˆ φ MAP by first computing ˆ θ MAP . Because p(θ) depends on the Jacobian matrix of g −1 (φ), ˆ φ MAP = g( ˆ θ MAP ). Usually ˆ θ MAP and ˆ θ ML (k) are asymptotically identical to one another since in the large sample case the knowledge of the observations tends to swamp the knowledge of the prior distribution [10]. Generally speaking, optimization must be used to compute ˆ θ MAP (k). In the special but important case, when Z(k) and θ are jointly Gaussian, then ˆ θ MAP (k) = ˆ θ MS (k). This result is true regardless of the nature of the model relating θ to Z(k). Of course, in order to use it, we must first establish that Z(k) and θ are jointly Gaussian. Except for the generic linear model, this is very difficult to do. When H(k) is deterministic, V(k) is white Gaussian noise with known covariance matrix R(k), and θ is multivariate Gaussian with known mean m θ and covariance P θ , ˆ θ MAP (k) = ˆ θ a BLU (k);hence, for the generic linear Gaussian model, MS, MAP, and BLUE estimates of θ are all the same, i.e., ˆ θ MS (k) = ˆ θ a BLU (k) = ˆ θ MAP (k). 15.8 The Basic State-Variable Model In the rest of this chapter we shall describe a variety of mean-squared state estimators for a linear, (possibly) time-varying, discrete-time, dynamical system, which we refer to as the basic state-variable model. This system is characterized by n × 1 state vector x(k) and m × 1 measurement vector z(k), and is: x(k + 1) = (k + 1,k)x(k) + (k + 1,k)w(k) + (k + 1,k)u(k) (15.20) c  1999 by CRC Press LLC z(k + 1) = H(k + 1)x(k + 1) + v(k + 1) (15.21) where k = 0, 1, In this model w(k) and v(k) are p ×1 and m×1 mutually uncorrelated(possibly nonstationary) jointly Gaussian white noisesequences; i.e., E{w(i)w  (j)}=Q(i)δ ij , E{v(i)v  (j)}= R(i)δ ij , and E{w(i)v  (j)}=S = 0, for all i and j. Covariance matrix Q(i) is positive semidefinite and R(i) is positive definite [so that R −1 (i) exists]. Additionally, u(k) is an l × 1 vector of known system inputs, and initial state vector x(0) is multivariate Gaussian, with mean m x (0) and covariance P x (0), and x(0) is not correlated with w(k) and v(k). The dimensions of matrices , , , H, Q, and R are n × n, n × p, n × l, m × n, p × p, and m × m, respectively. The double arguments in matrices , , and  may not always be necessary, in which case we replace (k + 1,k)by k. Disturbancew(k)is oftenusedtomodeldisturbanceforcesacting on the system, errorsin modeling the system, or errors due to actuators in the translation of the known input, u(k), into physical signals. Vector v(k) is often used to model errors in measurements made by sensing instruments, or unavoidable disturbances that act directly on the sensors. Not all systems are described by this basic model. In general, w(k) and v(k) may be correlated, some measurements may be made so accurate that, for all practical purposes, they are “perfect” (i.e., no measurement noise is associated with them), and either w(k) or v(k), or both, may be nonzero mean or colored noise processes. How to handle these situations is described in Lesson 22 of [12]. When x(0) and {w(k), k = 0, 1, .} are jointly Gaussian, then {x(k), k = 0, 1, .} is a Gauss- Markov sequence. Note that if x(0) and w(k) are individually Gaussian and statistically independent, they will be jointly Gaussian. Consequently, the mean and covariance of the state vector completely characterize it. Let m x (k) denote the mean of x(k). For our basic state-variable model, m x (k) can be computed from the vector recursive equation m x (k + 1) = (k + 1,k)m x (k) + (k + 1,k)u(k) (15.22) where k = 0, 1, ., and m x (0) initializes (15.22). Let P x (k) denote the covariance matrix of x(k). For our basic state-variable model, P x (k) can be computed from the matrix recursive equation P x (k + 1) = (k + 1,k)P x (k)  (k + 1,k)+ (k + 1,k)Q(k)  (k + 1,k) (15.23) where k = 0,1, .,and P x (0) initializes (15.23). Equations (15.22) and (15.23) are easily pro- grammed for a digital computer. Forour basic state-variable model, when x(0), w(k), and v(k) are jointly Gaussian, then {z(k), k = 1, 2, .} is Gaussian, and m z (k + 1) = H(k + 1)m x (k + 1) (15.24) and P z (k + 1) = H(k + 1)P x (k + 1)H  (k + 1) + R(k + 1) (15.25) where m x (k + 1) and P x (k + 1) arecomputedfrom(15.22) and (15.23), respectively. For our basic state-variable model to be stationary, it must be time-invariant, and the probability density functions of w(k) and v(k) must be the same for all values of time. Because w(k) and v(k) are zero-mean and Gaussian, this means that Q(k) must equal the constant matrix Q and R(k) must equal the constant matrix R. Additionally, either x(0) = 0 or (k,0)x(0) ≈ 0 when k>k 0 ; in both cases x(k) will be in its steady-state regime, so stationarity is possible. If the basic state-variable model is time-invariant and stationary and if  is associated with an asymptotically stable system (i.e., one whose poles all lie within the unit circle), then [1] matrix P x (k) reaches a limiting (steady-state) solution ¯ P x and ¯ P x is the solution of the following steady-state versionof(15.23): ¯ P x =  ¯ P x   + Q  . This equation is called a discrete-time Lyapunov equation. c  1999 by CRC Press LLC [...]... output of the FIR digital WF to be as close as possible to signal s(k) The resulting Wiener-Hopf equations are η f (i) i=0 c 1999 by CRC Press LLC q q φhh (j − i) + δ(j − i) = φhh (j ), r r j = 0, 1, , η (15.44) where φhh (i) = ∞ h(l)h(l + i) The truncated steady-state KF is a FIR digital WF For a detailed l=0 comparison of Kalman and Wiener filters, see ([12] Lesson 19) To obtain a digital Wiener deconvolution... Theory for Signal Processing, Communications, and Control, Prentice-Hall PTR, Englewood Cliffs, NJ, 1995 Further Information Recent articles about estimation theory appear in many journals, including the following engineering journals: AIAA J., Automatica, IEEE Trans on Aerospace and Electronic Systems, IEEE Trans on Automatic Control, IEEE Trans on Information Theory, IEEE Trans on Signal Processing, ... Automatica, IEEE Trans on Aerospace and Electronic Systems, IEEE Trans on Automatic Control, IEEE Trans on Information Theory, IEEE Trans on Signal Processing, Int J Adaptive Control and Signal Processing, Int J Control, and Signal Processing Nonengineering journals that also publish articles about estimation theory include: Annals Inst Statistical Math., Ann Math Statistics, Ann Statistics, Bull Inst Internat... mean-squared error filter, i.e., a digital Wiener filter (WF) Consider the scalar measurement case, in which measurement z(k) is to be processed by a digital filter F (z), whose coefficients, f (0), f (1), , f (η), are obtained by minimizing the mean-squared n error I (f ) = E{[d(k) − y(k)]2 } = E{e2 (k)}, where y(k) = f (k) ∗ z(k) = i=0 f (i)z(k − i) and d(k) is a desired filter output signal Using calculus, it... state equation The main advantage of the steady-state filter is a drastic reduction in on-line computations 15.9.3 Smoothing Although there are three types of smoothers, the most useful one for digital signal processing is the fixed-interval smoother, hence, we only discuss it here The fixed-interval smoother is x(k|N ), k = ˆ 0, 1, , N − 1, where N is a fixed positive integer The situation here is... + r (15.46) The inverse Fourier transform of (15.46), or spectral factorization, gives {f (j ), j = 0, ±1, ±2, } 15.11 Linear Prediction in DSP, and Kalman Filtering A well-studied problem in digital signal processing (e.g., [5]), is the linear prediction problem, in which the structure of the predictor is fixed ahead of time to be a linear transformation of the data The “forward” linear prediction... time-varying digital filters Predictor y(k) uses a finite window of past measurements: y(k − 1), y(k − 2), , y(k − M) ˆ This window of measurements is different for different values of tk This use of measurements is quite different than our use of the measurements in state prediction, filtering, and smoothing The latter are based on an expanding memory, whereas the former is based on a fixed memory Digital signal- processing. .. wish to obtain the optimal estimate of the state vector x(k), which is based on all the available measurement data {z(j ), j = 1, 2, , N} Fixed-interval smoothing is very useful in signal processing situations, where the processing is done after all the data are collected It cannot be carried out on-line during an experiment like filtering can Because all the available data are used, we cannot hope to... = q(k) 15.10 Digital Wiener Filtering The steady-state KF is a recursive digital filter with filter coefficients equal to hf (j ), j = 0, 1, Quite often hf (j ) ≈ 0 for j ≥ J , so that the transfer function of this filter, Hf (z), can be truncated, i.e., Hf (z) ≈ hf (0) + hf (1)z−1 + + hf (J )z−J The truncated steady-state, KF can then be implemented as a finite-impulse response (FIR) digital filter... first assume a signal- plus-noise model for z(k), because a KF uses a system model, i.e., z(k) = s(k) + ν(k) = h(k) ∗ w(k) + ν(k), where h(k) is the IR of a linear time-invariant system and, as in our basic state-variable model, w(k) and ν(k) are mutually uncorrelated (stationary) white noise sequences with variances q and r, respectively We must also specify an explicit form for “desired signal d(k) . Information Theory, IEEE Trans. on Signal Processing, Int. J. Adaptive Control and Signal Processing, Int. J. Control, and Signal Processing. Nonengineering journals. “Estimation Theory and Algorithms: From Gauss to Wiener to Kalman” Digital Signal Processing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton:

Ngày đăng: 13/12/2013, 00:15

Xem thêm: Tài liệu Digital Signal Processing Handbook P15 docx, Tài liệu Digital Signal Processing Handbook P15 docx

Tài liệu Digital Signal Processing Handbook P15 docx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan