Class Notes in Statistics and Econometrics Part 12 ppsx

CHAPTER 23 The Mean Squared Error as an Initial Criterion of Precision The question how “close” two random variables are to each other is a central concern in statistics. The goal of statistics is to find observed random variables which are “close” to the unobserved parameters or random outcomes of interest. These observed random variables are usually called “estimators” if the unobserved magnitude is nonrandom, and “predictors” if it is random. For scalar random variables we will use the mean squared error as a criterion for closeness. Its definition is MSE[ ˆ φ; φ] (read it: mean squared error of ˆ φ as an estimator or predictor, whatever the case may be, of φ): (23.0.1) MSE[ ˆ φ; φ] = E[( ˆ φ −φ) 2 ] 629 630 23. THE MEAN SQUA RED ERROR AS AN INITIAL CRITERION OF PRECISION For our purposes, therefore, the estimator (or predictor) ˆ φ of the unknown parameter (or unobserved random variable) φ is no worse than the alternative ˜ φ if MSE[ ˆ φ; φ] ≤ MSE[ ˜ φ; φ]. This is a criterion which can be applied before any observations are collected and actual estimations are made; it is an “initial” criterion regarding the expected average performance in a series of future trials (even though, in economics, usually only one trial is made). 23.1. Comparison of Two Vector Estimators If one wants to compare two vector estimators, say ˆ φ and ˜ φ, it is often impossible to say which of two estimators is better. It may be the case that ˆ φ 1 is better than ˜ φ 1 (in terms of MSE or some other criterion), but ˆ φ 2 is worse than ˜ φ 2 . And even if every component φ i is estimated better by ˆ φ i than by ˜ φ i , certain linear combinations t  φ of the components of φ may be estimated better by t  ˜ φ than by t  ˆ φ. Problem 294. 2 points Construct an example of two vector estimators ˆ φ and ˜ φ of the same random vector φ =  φ 1 φ 2   , so that MSE[ ˆ φ i ; φ i ] < MSE[ ˜ φ i ; φ i ] for i = 1, 2 but MSE[ ˆ φ 1 + ˆ φ 2 ; φ 1 + φ 2 ] > MSE[ ˜ φ 1 + ˜ φ 2 ; φ 1 + φ 2 ]. Hint: it is easiest to use an example in which all random variables are constants. Another hint: the geometric analog would be to find two vectors in a plane ˆ φ and ˜ φ. In each component (i.e., 23.1. COMPARISON OF TWO VECTOR ESTIMATORS 631 projection on the axes), ˆ φ is closer to the origin than ˜ φ. But in the projection on the diagonal, ˜ φ is closer to the origin than ˆ φ. Answer. In the simplest counterexample, all variables involved are constants: φ =  0 0  , ˆ φ =  1 1  , and ˜ φ =  −2 2  .  One can only then say unambiguously that the vector ˆ φ is a no worse estimator than ˜ φ if its MSE is smaller or equal for every linear combination. Theorem 23.1.1 will show that this is the c ase if and only if the MSE-matrix of ˆ φ is smaller, by a nonnegative definite matrix, than that of ˜ φ. If this is so, then theorem 23.1.1 says that not only the MSE of all linear transformations, but also all other nonnegative definite quadratic loss functions involving these vectors (such as the trace of the MSE-matrix, which is an often-used criterion) are minimized. In order to formulate and prove this, we first need a formal definition of the MSE-matrix. We write MSE for the matrix and MSE for the scalar mean squared error. The MSE-matrix of ˆ φ as an estimator of φ is defined as (23.1.1) MSE[ ˆ φ; φ] = E [( ˆ φ −φ)( ˆ φ −φ)  ] . 632 23. THE MEAN SQUA RED ERROR AS AN INITIAL CRITERION OF PRECISION Problem 295. 2 points Let θ be a vector of possibly random parameters, and ˆ θ an estimator of θ. Show that (23.1.2) MSE[ ˆ θ; θ] = V [ ˆ θ − θ] + ( E [ ˆ θ − θ])( E [ ˆ θ − θ])  . Don’t assume the scalar result but make a proof that is good for vectors and scalars. Answer. For any random vector x follows E [xx  ] = E  (x − E [x] + E [x])(x − E [x] + E [x])   = E  (x − E [x])(x − E [x])   − E  (x − E [x]) E [x]   − E  E [x](x − E [x])   + E  E [x] E [x]   = V [x] − O − O + E [x] E [x]  . Setting x = ˆ θ − θ the statement follows.  If θ is nonrandom, formula (23.1.2) simplifies slightly, since in this case V [ ˆ θ−θ] = V [ ˆ θ]. In this c ase , the MSE matrix is the covariance matrix plus the squared bias matrix. If θ is nonrandom and in addition ˆ θ is unbiased, then the MSE-matrix coincides with the covariance matrix. Theorem 23.1.1. Assume ˆ φ and ˜ φ are two estimators of the parameter φ (which is allowed to be random itself). Then conditions (23.1.3), (23.1.4), and (23.1.5) are 23.1. COMPARISON OF TWO VECTOR ESTIMATORS 633 equivalent: For every constant vector t, MSE[t  ˆ φ; t  φ] ≤ MSE[t  ˜ φ; t  φ](23.1.3) MSE[ ˜ φ; φ] −MSE[ ˆ φ; φ] is a nonnegative definite matrix(23.1.4) For every nnd Θ, E  ( ˆ φ −φ)  Θ( ˆ φ −φ)  ≤ E  ( ˜ φ −φ)  Θ( ˜ φ −φ)  .(23.1.5) Proof. Call MSE[ ˜ φ; φ] = σ 2 Ξ and MSE[ ˆ φ; φ] = σ 2 Ω Ω Ω. To show that (23.1.3) implies (23.1.4), simply note that MSE[t  ˆ φ; t  φ] = σ 2 t  Ω Ω Ωt and likewise MSE[t  ˜ φ; t  φ] = σ 2 t  Ξt. Therefore (23.1.3) is equivalent to t  (Ξ − Ω Ω Ω)t ≥ 0 for all t, which is the defining property making Ξ −Ω Ω Ω nonnegative definite. Here is the proof that (23.1.4) implies (23.1.5): E[( ˆ φ −φ)  Θ( ˆ φ −φ)] = E[tr  ( ˆ φ −φ)  Θ( ˆ φ −φ)  ] = = E[tr  Θ( ˆ φ −φ)( ˆ φ −φ)   ] = tr  Θ E [( ˆ φ −φ)( ˆ φ −φ)  ]  = σ 2 tr  ΘΩ Ω Ω  and in the same way E[( ˜ φ −φ)  Θ( ˜ φ −φ)] = σ 2 tr  ΘΞ  . The difference in the expected quadratic forms is therefore σ 2 tr  Θ(Ξ − Ω Ω Ω)  . By assumption, Ξ − Ω Ω Ω is nonnegative definite. Therefore, by theorem A.5.6 in the Mathematical Appendix, or by Problem 296 below, this trace is nonnegative. 634 23. THE MEAN SQUA RED ERROR AS AN INITIAL CRITERION OF PRECISION To complete the proof, (23.1.5) has (23.1.3) as a sp e cial case if one sets Θ = tt  .  Problem 296. Show that if Θ and Σ Σ Σ are symmetric and nonnegative definite, then tr(ΘΣ Σ Σ) ≥ 0. You are allowed to use that tr(AB) = tr(BA), that the trace of a nonnegative definite matrix is ≥ 0, and Problem 129 (which is trivial). Answer. Write Θ = RR  ; then tr(ΘΣ Σ Σ) = tr(RR  Σ Σ Σ) = tr(R  Σ Σ ΣR) ≥ 0.  Problem 297. Consider two very simple-minded estimators of the unknown nonrandom parameter vector φ =  φ 1 φ 2  . Neither of these estimators depends on any observations, they are constants. The first estimator is ˆ φ = [ 11 11 ], and the second is ˜ φ = [ 12 8 ]. • a. 2 points Compute the MSE-matrices of these two estimators if the true value of the parameter vector is φ = [ 10 10 ]. For which estimator is the trace of the MSE matrix smaller? 23.1. COMPARISON OF TWO VECTOR ESTIMATORS 635 Answer. ˆ φ has smaller trace of the MSE-matrix. ˆ φ − φ =  1 1  MSE[ ˆ φ; φ] = E [( ˆ φ − φ)( ˆ φ − φ)  ] = E [  1 1   1 1  ] = E [  1 1 1 1  ] =  1 1 1 1  ˜ φ − φ =  2 −2  MSE[ ˜ φ; φ] =  4 −4 −4 4  Note that both MSE-matrices are singular, i.e., both estimators allow an error-free look at cer tain linear combinations of the parameter vector.  • b. 1 point Give two vectors g = [ g 1 g 2 ] and h =  h 1 h 2  satisfying MSE[g  ˆ φ; g  φ] < MSE[g  ˜ φ; g  φ] and MSE[h  ˆ φ; h  φ] > MSE[h  ˜ φ; h  φ] (g and h are not unique; there are many possibilities). Answer. With g =  1 −1  and h =  1 1  for instance we get g  ˆ φ − g  φ = 0, g  ˜ φ − g  φ = 4, h  ˆ φ; h  φ = 2, h  ˜ φ; h  φ = 0, therefore MSE[g  ˆ φ; g  φ] = 0, MSE[g  ˜ φ; g  φ] = 16, 636 23. THE MEAN SQUA RED ERROR AS AN INITIAL CRITERION OF PRECISION MSE[h  ˆ φ; h  φ] = 4, MSE[h  ˜ φ; h  φ] = 0. An alternative way to compute this is e.g. MSE[h  ˜ φ; h  φ] =  1 −1   4 −4 −4 4  1 −1  = 16  • c. 1 point Show that neither MSE[ ˆ φ; φ] − MSE[ ˜ φ; φ] nor MSE[ ˜ φ; φ] − MSE[ ˆ φ; φ] is a nonnegative definite matrix. Hint: you are allowed to use the mathematical fact that if a matrix is nonnegative definite, then its determinant is nonnegative. Answer. (23.1.6) MSE[ ˜ φ; φ] − MSE[ ˆ φ; φ] =  3 −5 −5 3  Its determinant is negative, and the determinant of its negative is also negative.  CHAPTER 24 Sampling Properties of the Least Squares Estimator The estimator ˆ β was derived from a geometric argument, and everything which we showed so far are what [DM93, p. 3] calls its numerical as opposed to its statistical prop e rties. But ˆ β has also nice statistical or sampling prop e rties. We are assuming right now the specification given in (18.1.3), in which X is an arbitrary matrix of full column rank, and we are not assuming that the errors must b e Normally distributed. The assumption that X is nonrandom means that repeated samples are taken with the same X-matrix. This is often true for expe rimental data, but not in econometrics. The sampling properties which we are really interested in are those where also the X- matrix is random; we will derive those later. For this later derivation, the properties 637 638 24. SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR with fixed X-matrix, which we are going to discuss presently, will be needed as an intermediate step. The assumption of fixed X is therefore a preliminary technical assumption, to be dropped later. In order to know how good the estimator ˆ β is, one needs the statistical properties of its “sampling error” ˆ β − β. This sampling error has the following formula: ˆ β − β = (X  X) −1 X  y − (X  X) −1 X  Xβ = = (X  X) −1 X  (y − Xβ) = (X  X) −1 X  ε ε ε(24.0.7) From (24.0.7) follows immediately that ˆ β is unbiased, since E [(X  X) −1 X  ε ε ε] = o. Unbiasedness does not make an estimator be tter, but many good estimators are unbiased, and it simplifies the math. We will use the MSE-matrix as a criterion for how good an estimator of a vector of unobserved parameters is. Chapter 23 gave some reasons why this is a sensible criterion (compare [DM93, Chapter 5.5]). [...]... Now look at this lefthand side; it is amazing and surprising that it is exactly the population equivalent of the F -test for testing α = 0 in the regression with intercept It can be estimated by replacing α2 with α2 and σ 2 with s2 (in the regression with intercept) Let’s look at this statistic ˆ 24.3 MISCELLANEOUS PROPERTIES OF THE BLUE 657 If α = 0 it has a F -distribution with 1 and n − 2 degrees of... the worst that can happen if one uses φ following: ˆ Theorem 24.2.2 β is a linear minimax estimator of the parameter vector β ˆ in the following sense: for every nonrandom coefficient vector t, t β is the linear 644 24 SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR minimax estimator of the scalar φ = t β with respect to the MSE I.e., for every ˜ ˜ other linear estimator φ = a y of φ one can find a value... about Minimax Estimators Theorem 24.1.1 is a somewhat puzzling property of the least squares estimator, since there is no reason in the world to restrict one’s search for good estimators to unbiased estimators An alternative and more enlightening characterization of ˆ β does not use the concept of unbiasedness but that of a minimax estimator with respect to the MSE For this I am proposing the following... 0 · · · 0 where n is the number of oby ˆ n servations Hint: That the regression has an intercept term as first column of the X-matrix means that Xe(1) = ι, where e(1) is the unit vector having 1 in the first place and zeros elsewhere, and ι is the vector which has ones everywhere ˆ Answer Write both y and β in terms of y, i.e., y = ¯ ¯ 1 ι n ˆ y and β = (X X)−1 X y Therefore (24.1.3) σ 2 (1) 1 σ2 σ 2... not give terribly good estimators (in fact, they are inadmissible, see the Section about the Stein rule below) ˆ β is therefore simultaneously the solution to two very different minimization problems We will refer to it as the OLS estimate if we refer to its property of minimizing the sum of squared errors, and as the BLUE estimator if we think of it as the best linear unbiased estimator Note that even... Linear Unbiased Estimator) of β in the following vector sense: for every nonrandom coefficient ˆ vector t, t β is the scalar BLUE of t β, i.e., every other linear unbiased estimator ˜ = a y of φ = t β has a bigger MSE than t β ˆ φ ˜ Proof Write the alternative linear estimator φ = a y in the form ˜ (24.1.4) φ = t (X X)−1 X + c y 642 24 SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR then the sampling... distribution, and the only thing we needed to know so far was that it was likely to assume larger values than with α = 0 This is why a small value of that statistic supported the hypothesis that α = 0 But in the present case we are not testing whether α = 0 but whether the constrained MSE is better than the unconstrained This is the case of the above inequality holds, the limiting case being that it... µ2 /2 A noncentral F has a noncentral χ2 in numerator and 1 a central one in denominator.) The testing principle is therefore: compare the observed value with the upper α point of a F distribution with noncentrality parameter 1/2 This gives higher critical values than testing for α = 0; i.e., one may reject that α = 0 but not reject that the MSE of the contrained estimator is larger This is as it should... (xi − x)2 ¯ (xi − x)εi ¯ since (xi − x)2 ¯ ˆ Eβ = Eβ + E =β+ since (xi − x)2 ¯ since (xi − x)xi = ¯ (xi − x)2 ¯ (xi − x)εi ¯ (xi − x)2 ¯ (xi − x) E εi ¯ (xi − x)2 ¯ =β ˆ since E εi = 0 for all i, i.e., β is unbiased ˆ • b 2 points Derive the variance of β if (24.3.5) is the true model 652 24 SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR ˆ Answer One can again argue it: since the formula for var... satisfies AX = O 658 24 SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR ε Answer E [Ay − ε ] = E [AXβ + Aε − ε ] = AXβ + o − o This is = o for all β if and only if AX = O ˜ • b 2 points Which unbiased linear predictor ε = Ay of ε minimizes the MSEmatrix E [(˜ − ε )(˜ − ε ) ]? Hint: Write A = I − X(X X)−1 X + C What is the ε ε minimum value of this MSE-matrix? ε ε Answer Since AX = O, the prediction . or random outcomes of interest. These observed random variables are usually called “estimators” if the unobserved magnitude is nonrandom, and “predictors” if it is random. For scalar random. φ 1 + φ 2 ]. Hint: it is easiest to use an example in which all random variables are constants. Another hint: the geometric analog would be to find two vectors in a plane ˆ φ and ˜ φ. In each component. singular, i.e., both estimators allow an error-free look at cer tain linear combinations of the parameter vector.  • b. 1 point Give two vectors g = [ g 1 g 2 ] and h =  h 1 h 2  satisfying

Class Notes in Statistics and Econometrics Part 12 ppsx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan