Class Notes in Statistics and Econometrics Part 13 pps

27 286 0
Class Notes in Statistics and Econometrics Part 13 pps

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CHAPTER 25 Variance Estimation: Should One Require Unbiasedness? There is an imperfect analogy b etween linear estimation of the coefficients and quadratic estimation of the variance in the linear model. This chapter sorts out the principal commonalities and differences, a task obscured by the widespread but un- warranted imposition of the unbiasedness assumption. It is based on an unpublished paper co-authored with Peter Ochshorn. We will work in the usual regression model (25.0.12) y = Xβ + ε ε ε, 675 676 25. VARIANCE ESTIMATION: UNBIASEDNESS? where y is a vector of n observations, X is nonstochastic with rank r < n, and the disturbance vector ε ε ε satisfies E [ε ε ε] = o and E [ε ε εε ε ε  ] = σ 2 I. The nonstochastic vector β and scalar σ 2 > 0 are the parameters to be estimated. The usual estimator of σ 2 is (25.0.13) s 2 = 1 n −r y  My = 1 n −r n  i=1 ˆε 2 i where M = I −X(X  X) − X  and ˆε = My. If X has full rank, then ˆε = y −X ˆ β, where ˆ β is the least squares estimator of β. Just as ˆ β is the best (minimum mean square error) linear unbiased estimator of β, it has been shown in [Ati62], see also [Seb77, pp. 52/3], that under certain additional assumptions, s 2 is the best unbiased estimator of σ 2 which can be written in the form y  Ay with a nonnegative definite A. A precise formulation of these additional assumptions will be given below; they are, for instance, satisfied if X = ι, the vector of ones, and the ε i are i.i.d. But they are also satisfied for arbitrary X if ε ε ε is normally distributed. (In this last case, s 2 is best in the larger class of all unbiased estimators.) This suggests an analogy between linear and quadratic estimation which is, how- ever, by no means perfect. The results just cited pose the following puzzles: • Why is s 2 not best nonnegative quadratic unbiased for arbitrary X-matrix whenever the ε i are i.i.d. with zero mean? What is the common logic behind 25. VARIANCE ESTIMATION: UNBIASEDNESS? 677 the two disparate alternatives, that either restrictions on X or restrictions on the distribution of the ε i can make s 2 optimal? • It comes as a surprise that, again under the assumption of normality, a very simple modification of s 2 , namely, the Theil-Schweitzer estimator from [TS61] (25.0.14) ˆσ 2 = 1 n −r + 2 y  My = 1 n −r + 2 n  i=1 ˆε 2 i , which is biased, has lower mean square error (MSE) than s 2 . • It is unclear why it is necessary to require ex ante that A is nonnegative definite. Wouldn’t estimators which can yield negative values for σ 2 be automatically inferior to nonnegative ones? We will show that these puzzles can be resolved if one replaces the requirement of unbiasedness by that of bounded MSE. (This is particularly satisfying since such a replacement is also called for in the case of linear estimators.) Then puzzle (2) disappears: the Theil-Schweitzer estimator is no longer an oddity but it is the best bounded MSE quadratic estimator of σ 2 when the kurtosis is zero. And puzzle (3) disappears as well: nonnegativity is only necessary because unbiasedness alone does not imply bounded MSE. Under this approach it becomes evident that there are two important disanalogies b etween linear and quadratic estimation: whereas the 678 25. VARIANCE ESTIMATION: UNBIASEDNESS? best bounded MSE linear estimator ˆ β of β is (a) unbiased and (b) does not depend on the nuisance parameter σ 2 , the best quadratic bounded MSE estimator of σ 2 is (a) biased and (b) depends on a fourth-order nuisance parameter, the kurtosis of the disturbances. This, again, helps to dispel the false suggestiveness of puzzle (1). The main assumption is distributional. If the kurtosis is known, then the best nonnegative quadratic unbiased estimator exists. However it is uninteresting, since the (biased) best bounded MSE quadratic estimator is better. The class of unbiased estimators only then becomes interesting when the kurtosis is not known: for certain X-matrices, the best nonnegative quadratic unbiased estimator does not depend on the kurtosis. However even if the kurtosis is not known, this paper proposes to use as estimate of σ 2 the maximum value which one gets when one applies the best bonded mean squared error estimator for all possible values of the kurtosis. 25.1. Setting the Framework Straight The assumption of unbiasedness has often been criticized. Despite its high- sounding name, there are no good reasons that one should confine one’s search for good estimators to unbiased ones. Many good estimators are unbiased, but the prop e rty of unbiasedness has no bearing on how good an estimator is. In many cases unbiased estimators do not exist or are not desirable. It is indeed surprising that the 25.1. SETTING THE FRAMEWORK STRAIGHT 679 powerful building of least squares theory seems to rest on such a flimsy assumption as unbiasedness. G. A. Barnard, in [Bar63], noted this and proposed to replace unbiasedness by bounded MSE, a requirement which can be justified by the researcher following an “insurance strategy”: no bad surprises regarding the MSE of the estimator, what- ever the value of the true β. Barnard’s s uggestion has not found entrance into the textbooks—and indeed, since linear estimators in model (25.0.12) are unbiased if and only if they have bounded MSE, it might be considered an academic question. It is usually not recognized that even in the linear case, the assumption of bounded MSE s erves to unify the theory. C hristensen’s monograph [Chr87] treats, as we do here in chapter 27, best linear prediction on the basis of known first and second m oments in parallel with the regression model. Both models have much in common, but there is one result which seems to set them apart: best linear predic- tors exist in one, but only best linear unbiased predictors in the other [Chr87, p. 226]. If one considers bounded MSE to be one of the basic assumptions, this seeming irregularity is easily explained: If the first and second moments are known, then ev- ery linear predictor has bounded MSE, while in the regression model only unbiased linear estimators do. One might still argue that no real harm is done with the assumption of unbiased- ness, because in the linear case, the best bounded MSE estimators or predictors turn 680 25. VARIANCE ESTIMATION: UNBIASEDNESS? out to be unbiased. This last defense of unbiasedness falls if one goes from linear to quadratic estimation. We will show that the best bounded MSE quadratic estimator is biased. As in the the linear case, it is possible to derive these results without fully specify- ing the distributions involved. In order to compute the MSE of linear estimators, one needs to know the first and second moments of the disturbances, which is reflected in the usual assumption ε ε ε ∼ (o, σ 2 I). For the MSE of quadratic estimators, one also needs information about the third and fourth moments. We will therefore derive optimal quadratic estimators of σ 2 based on the following assumptions regarding the first four moments, which are satisfied whenever the ε i are independently identically distributed: 25.1. SETTING THE FRAMEWORK STRAIGHT 681 Assumption. A vector of n observations y = Xβ + ε ε ε is available, where ε i satisfy E[ε i ] = 0(25.1.1) E[ε i ε j ] =  σ 2 > 0 if i = j 0 otherwise (25.1.2) E[ε i ε j ε k ] =  σ 3 γ 1 if i = j = k 0 otherwise (25.1.3) E[ε i ε j ε k ε l ] =      σ 4 (γ 2 + 3) if i = j = k = l σ 4 if i = j = k = l or i = k = j = l or i = l = j = k 0 otherwise. (25.1.4) Here γ 1 is the skewness and γ 2 the kurtosis of ε i . They are allowed to range within their natural limits (25.1.5) 0 ≤ γ 2 1 ≤ γ 2 + 2. Problem 318. Show that the condition (25.1.5), γ 2 1 ≤ γ 2 + 2, always holds. 682 25. VARIANCE ESTIMATION: UNBIASEDNESS? Answer. (25.1.6) (σ 3 γ 1 ) 2 = (E[ε 3 ]) 2 =  cov[ε, ε 2 ]  2 ≤ var[ε] var[ε 2 ] = σ 6 (γ 2 + 2)  The concept of bounded MSE which is appropriate here requires the bound to be independent of the true value of β, but it may depend on the “nuisance parameters” σ 2 , γ 1 , and γ 2 : Definition 25.1.1. The mean square error E[( ˆ θ − θ) 2 ] of the estimator ˆ θ of a scalar parameter θ in the linear model (25.0.12) will be said to be bounded (with respect to β) if a finite number b exists with E[( ˆ θ−θ) 2 ] ≤ b regardless of the true value of β. This bound b may depend on the known nonstochastic X and the distribution of ε ε ε, but not on β. 25.2. Derivation of the Best Bounded MSE Quadratic Estimator of the Variance Theorem 25.2.1. If the estimator ˜σ 2 of σ 2 in the regression model (25.0.12) is quadratic, i.e., if it has the form ˜σ 2 = y  Ay with a symmetric A, then its mean square error E[(y  Ay −σ 2 ) 2 ] is bounded (with respect to β) if and only if AX = O. 25.2. DERIVATION OF THE BEST BOUNDED MSE QUADRATIC ESTIMATOR OF THE VARIANCE683 Proof: Clearly, the condition AX = O is sufficient. It implies y  Ay = ε ε ε  Aε ε ε, which therefore only depends on the distribution of ε ε ε, not on the value of β. To show necessity, note that bounded MSE means both bounded variance and bounded squared bias. The variance depends on skewness and kurtosis; writing a for the vector of diagonal elements of A, it is (25.2.1) var[y  Ay] = 4σ 2 β  X  A 2 Xβ+4σ 3 γ 1 β  X  Aa+σ 4  γ 2 a  a+2 tr(A 2 )  . This formula can be found e.g. in [Seb77, pp. 14–16 and 52]. If AX = O , then a vector δ exists with δ  X  A 2 Xδ > 0; therefore, for the sequence β = jδ, the variance is a quadratic polynomial in j, which is unbounded as j → ∞. The following ingredients are needed for the best bounded MSE quadratic esti- mator of σ 2 : Theorem 25.2.2. We will use the letter τ to denote the vector who se ith com- ponent is the square of the ith residual τ i = ˆε 2 i . Then (25.2.2) E [τ ] = σ 2 m where m is the diagonal vector of M = I −X(X  X) − X  . Furthermore, (25.2.3) V [τ ] = σ 4 Ω Ω Ω where Ω Ω Ω = γ 2 Q 2 + 2Q + mm  , 684 25. VARIANCE ESTIMATION: UNBIASEDNESS? Q is the matrix with q ij = m 2 ij , i.e., its elements are the squares of the elements of M, and γ 2 is the kurtosis. Here is a proof in tile notation: from (9.2.23) follows (25.2.4) E  ∆ ˆε ˆε ˆε ˆε ∆  = E  ∆ M M ε ε ε ε ε ε ε ε ε ε ε ε M M ∆  = [...]... estimate of β in this transformed model is (26.0.2) ˆ β = (X P P X)−1 X P P y = (X Ψ−1 X)−1 X Ψ−1 y ˆ This β is the BLUE of β in model (26.0.1), and since estimators which are linear ˆ in P y are also linear in y and vice versa, β is also the BLUE in the original GLS model Problem 319 2 points Show that ˆ β − β = (X Ψ−1 X)−1 X Ψ−1ε (26.0.3) ˆ ˆ and derive from this that β is unbiased and that MSE[β;... MSE be independent of the value of β 25.3 UNBIASEDNESS REVISITED 689 But why should the data analyst be particulary interested in estimates whose MSE is independent of β? The research following up on Hsu tried to get rid of this assumption again C R Rao, in [Rao52], replaced independence of the MSE by the assumption that A be nonnegative definite We argue that this was unfortunate, for the following two... those familiar from linear estimators The problem at hand can be reduced to a minimum trace problem as defined in Rao [Rao73, pp 65–66], and part of the following proof draws on a private communication of C R Rao regarding consistency of equation (1f.3.4) in [Rao73, p 65] Proof of theorem 25.2.3: Take an alternative estimator of the form σ 2 = y Ay ˜ where A is symmetric with AX = O Since the MSE is variance... that of bounded MSE, one can not only unify some known results in linear estimation and prediction, but one also obtains a far-reaching analogy between linear estimation of β and quadratic estimation of σ 2 The most important dissimilarity is that, whereas one does not have to know the nuisance parameter σ 2 in order to write down the best linear bounded MSE estimator of β, the best quadratic bounded... These m and Ω play an important and, to me, surprising role in the estimator of σ 2 : Theorem 25.2.3 The best bounded MSE quadratic estimator of σ 2 is (25.2.5) σ2 = m Ω− τ ˆ where m and Ω are defined as in Theorem 25.2.2 Other ways to write it are (25.2.6) λi ε2 ˆi σ 2 y M ΛM y = ˆ i 686 25 VARIANCE ESTIMATION: UNBIASEDNESS? where λ = Ω − m or any other vector satisfying (25.2.7) Ω λ = m, and Λ is... resulting estimator is none 2 other than the unbiased s defined in (25.0 .13) The property of unbiasedness which makes it so popular in the classroom—it is easy to check—gains here objective relevance For the best nonnegative quadratic unbiased estimator one needs to know Ω only up to a scalar factor, and in some special cases the unknown kurtosis merges into this arbitrary multiplicator 25.4 Summary If one replaces... if one relies on the standard computer printouts for significance tests etc The estimate of σ 2 generated by this regression is now usually biased How biased it is depends on the X-matrix, but most often it seems biased upwards The estimated standard errors in the regression printouts not only use the wrong s, but they also insert this wrong s into the wrong formula ˆ σ 2 (X X)−1 instead of σ 2 (XΨ−1... normality for large parts of their output, should therefore provide the Theil-Schweitzer estimate as a matter of course If the kurtosis is not known, one can always resort to s2 It is unbiased and consistent, but does not have any optimality properties in the general case If the design matrix is “quadratically balanced,” s2 can be justified better: in this case s2 has minimum MSE in the class of nonnegative... scalar, and Ψ is a known positive definite matrix This is simply the OLS model in disguise To see this, we need a few more facts about positive definite matrices Ψ is nonnegative definite if and only if a Q exists with Ψ = QQ If Ψ is positive definite, this Q can be chosen square and nonsingular Then P = Q−1 satisfies P P Ψ = P P QQ = I, i.e., P P = Ψ−1 , and also P ΨP = P QQ P = I Premultiplying the GLS... diagonal, and write A = M ΛM + D Then DX = O and D has zeros in the diagonal, therefore tr(M ΛM D) = tr(ΛM DM ) = tr(ΛD) = 0, since ΛD still has zeros in the diagonal Therefore tr(A2 ) = tr(M ΛM Λ) + tr(D 2 ) = λ Qλ + tr(D 2 ) Regarding Q observe that m = diag M can be written m = Qι, where ι is the vector of ones, therefore tr A = ι Qλ = m λ Using all this in (25.2.10) gives (25.2.11) 1 MSE = (m λ − . is uninteresting, since the (biased) best bounded MSE quadratic estimator is better. The class of unbiased estimators only then becomes interesting when the kurtosis is not known: for certain X-matrices,. true β. Barnard’s s uggestion has not found entrance into the textbooks and indeed, since linear estimators in model (25.0.12) are unbiased if and only if they have bounded MSE, it might be considered. than those familiar from linear estimators. The problem at hand can be reduced to a minimum trace problem as defined in Rao [Rao73, pp. 65–66], and part of the following proof draws on a private

Ngày đăng: 04/07/2014, 15:20

Tài liệu cùng người dùng

Tài liệu liên quan