Class Notes in Statistics and Econometrics Part 5 pps

CHAPTER 9 Random Matrices The step from random vectors to random matrices (and higher order random arrays) is not as big as the step from individual random variables to random vectors. We will first give a few quite trivial verifications that the expected value operator is indeed a linear operator, and them make some not quite as trivial observations about the expected values and higher moments of quadratic forms. 9.1. Linearity of Expected Values Definition 9.1.1. Let Z be a random matrix with elements z ij . Then E [Z] is the matrix with elements E[z ij ]. 245 246 9. RANDOM MATRICES Theorem 9.1.2. If A, B, and C are constant matrices, then E [AZB + C] = A E [Z]B + C. Proof by multiplying out. Theorem 9.1.3. E [Z  ] = ( E [Z])  ; E [tr Z] = tr E [Z]. Theorem 9.1.4. For partitioned matrices E [  X Y  ] =  E [X] E [Y ]  . Special cases: If C is a constant, then E [C] = C, E [AX + BY ] = A E [X] + B E [Y ], and E [a · X + b · Y ] = a · E [X] + b · E [Y ]. If X and Y are random matrices, then the covariance of these two matrices is a four-way array containing the covariances of all elements of X with all elements of Y . Certain conventions are necessary to arrange this four-way array in a two- dimensional scheme that can be written on a sheet of paper. Before we develop those, we will first define the covariance matrix for two random vectors. Definition 9.1.5. The covariance matrix of two random vectors is defined as (9.1.1) C [x, y] = E [(x − E [x])(y − E [y])  ]. Theorem 9.1.6. C [x, y] = E [xy  ] − ( E [x])( E [y])  . Theorem 9.1.7. C [Ax + b, Cy + d] = A C [x, y]C  . 9.1. LINEARITY OF EXPECTED VALUES 247 Problem 152. Prove theorem 9.1.7. Theorem 9.1.8. C [  x y  ,  u v  ] =  C [x, u] C [x, v] C [y, u] C [y, v]  . Special case: C [Ax+By, Cu+Dv] = A C [x, u]C  +A C [x, v]D  +B C [y, u]C  + B C [y, v]D  . To show this, express each of the arguments as a partitioned matrix, then use theorem 9.1.7. Definition 9.1.9. V [x] = C [x, x] is called the dispersion matrix. It follows from theorem 9.1.8 that (9.1.2) V [x] =      var[x 1 ] cov[x 1 , x 2 ] ··· cov[x 1 , x n ] cov[x 2 , x 1 ] var[x 2 ] ··· cov[x 2 , x n ] . . . . . . . . . . . . cov[x n , x 1 ] cov[x n , x 2 ] ··· var[x n ]      Theorem 9.1.10. V [Ax] = A V [x]A  . From this follows that V [x] is nonnegative definite (or, as it is also called, positive semidefinite). Problem 153. Assume y is a random vector, and var[y i ] exists for every com- ponent y i . Then the whole dispersion matrix V [y] exists. 248 9. RANDOM MATRICES Theorem 9.1.11. V [x] is singular if and only if a vector a exists so that a  x is almost surely a constant. Proof: Call V [x] = Σ Σ Σ. Then Σ Σ Σ singular iff an a exists with Σ Σ Σa = o iff an a exists with a  Σ Σ Σa = var[a  x] = 0 iff an a exists so that a  x is almost surely a constant. This means, singular random variables have a restricted range, their values are contained in a linear subspace. This has relevance for estimators involving singular random variables: two such estimators (i.e., functions of a singular random variable) should still be considered the same if their values coincide in that subspace in which the values of the random variable is concentrated—even if elsewhere their values differ. Problem 154. [Seb77, exercise 1a–3 on p. 13] Let x = [x 1 , . . . , x n ]  be a vector of random variables, and let y 1 = x 1 and y i = x i − x i−1 for i = 2, 3, . . . , n. What must the dispersion matrix V [x] be so that the y i are uncorrelated with each other and each have unit variance? 9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 249 Answer. cov[x i , x j ] = min(i, j). y = Ax with A =     1 0 0 0 0 −1 1 0 0 0 0 −1 1 0 0 0 0 −1 1 0 0 0 0 −1 1     A −1 =     1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1     and A −1 (A −1 )  = (A  A) −1 =     1 1 1 1 1 1 2 2 2 2 1 2 3 3 3 1 2 3 4 4 1 2 3 4 5      9.2. Means and Variances of Quadratic Forms in Random Matrices 9.2.1. Expected Value of Quadratic Form. Theorem 9.2.1. Assume E [y] = η, V [y] = σ 2 Ψ, and A is a matrix of constants. Then (9.2.1) E[y  Ay] = σ 2 tr(AΨ) + η  Aη. 250 9. RANDOM MATRICES Proof. Write y as the sum of η and ε ε ε = y − η; then y  Ay = (ε ε ε + η)  A(ε ε ε + η)(9.2.2) = ε ε ε  Aε ε ε + ε ε ε  Aη + η  Aε ε ε + η  Aη(9.2.3) η  Aη is nonstochastic, and since E [ε ε ε] = o it follows E[y  Ay] − η  Aη = E[ε ε ε  Aε ε ε](9.2.4) = E[tr(ε ε ε  Aε ε ε)] = E[tr(Aε ε εε ε ε  )] = tr(A E [ε ε εε ε ε  ])(9.2.5) = σ 2 tr(AΨ).(9.2.6) Here we used that tr(AB) = tr(BA) and, if c is a scalar, i.e., a 1 ×1 matrix, then tr(c) = c.  In tile notation (see Appendix B), the proof of theorem 9.2.1 is much more straightforward and no longer seems to rely on “tricks.” From y ∼ (η,Σ Σ Σ), i.e., we 9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 251 are writing now σ 2 Ψ = Σ Σ Σ, follows E [yy  ] = ηη  + Σ Σ Σ, therefore (9.2.7) E  y y  = η η + Σ Σ Σ ; therefore (9.2.8) E  y A y  = E  y A y  = η A η + Σ Σ Σ A . Problem 155. [Seb77, Exercise 1b–2 on p. 16] If y 1 , y 2 , . . . , y n are mutually independent random variables with commom mean η, and with variances σ 2 1 , σ 2 2 , . . . , σ 2 n , respectively, prove that (9.2.9) 1 n(n − 1)  i (y i − ¯y) 2 is an unbiased estimator of var[¯y]. It is recommended to use theorem 9.2.1 for this. 252 9. RANDOM MATRICES Answer. Write y =  y 1 y 2 . . . y n   and Σ Σ Σ = diag(  σ 2 1 σ 2 2 . . . σ 2 n  ). Then the vector  y 1 − ¯y y 2 − ¯y . . . y n − ¯y   can be written as (I − 1 n ιι  )y. 1 n ιι  is idempotent, therefore D = I − 1 n ιι  is idempotent too. Our estimator is 1 n(n−1) y  D y, and since the mean vector η = ιη satisfies Dη = o, theorem 9.2.1 gives E[y  D y] = tr[DΣ Σ Σ] = tr[Σ Σ Σ] − 1 n tr[ιι  Σ Σ Σ](9.2.10) = (σ 2 1 + ··· + σ 2 n ) − 1 n tr[ι  Σ Σ Σι](9.2.11) = n − 1 n (σ 2 1 + ··· + σ 2 n ).(9.2.12) Divide this by n(n −1) to get (σ 2 1 + ··· + σ 2 n )/n 2 , which is var[¯y], as claimed.  For the variances of quadratic forms we need the third and fourth moments of the underlying random variables. Problem 156. Let µ i = E[(y −E[y]) i ] be the ith centered moment of y, and let σ = √ µ 2 be its standard deviation. Then the skewness is defined as γ 1 = µ 3 /σ 3 , and kurtosis is γ 2 = (µ 4 /σ 4 ) −3. Show that skewness and kurtosis of ay + b are equal to those of y if a > 0; for a < 0 the skewness changes its sign. Show that skewness γ 1 and kurtosis γ 2 always satisfy (9.2.13) γ 2 1 ≤ γ 2 + 2. 9.2. MEANS AND VARIANCES OF QUADRATIC FORMS 253 Answer. Define ε = y − µ, and apply Cauchy-Schwartz for the variables ε and ε 2 : (9.2.14) (σ 3 γ 1 ) 2 = (E[ε 3 ]) 2 =  cov[ε, ε 2 ]  2 ≤ var[ε] var[ε 2 ] = σ 6 (γ 2 + 2)  Problem 157. Show that any real numbers γ 1 and γ 2 satisfying (9.2.13) can be the skewness and kurtosis of a random variable. Answer. To show that all combinations satisfying this inequality are possible, define r =  γ 2 + 3 −3γ 2 1 /4 a = r + γ 1 /2 b = r − γ 1 /2 and construct a random variable x which assumes the following three values: (9.2.15) x =    a with probability 1/2ar 0 with probability 1/(γ 2 + 3 −γ 2 1 ), −b with probability 1/2br This variable has expected value zero, variance 1, its third moment is γ 1 , and its fourth moment γ 2 + 3.  Theorem 9.2.2. Given a random vector ε ε ε of independent variables ε i with zero expected value E[ε i ] = 0, and whose second and third moments are identical. Call var[ε i ] = σ 2 , and E[ε 3 i ] = σ 3 γ 1 (where σ is the positive square root of σ 2 ). Here γ 1 is 254 9. RANDOM MATRICES called the skewness of these variables. Then the following holds for the third mixed moments: (9.2.16) E[ε i ε j ε k ] =  σ 3 γ 1 if i = j = k 0 otherwise and from (9.2.16) follows that for any n ×1 vector a and symmetric n ×n matrices C whose vector of diagonal elements is c, (9.2.17) E[(a  ε ε ε)(ε ε ε  C ε ε ε)] = σ 3 γ 1 a  c. Proof. If i = j = k = i, then E[ε i ε j ε k ] = 0 · 0 · 0 = 0; if i = j = k then E[ε i ε j ε k ] = σ 2 · 0 = 0, same for i = j = k and j = i = k. Therefore only E[ε 3 i ] remains, which proves (9.2.16). Now (a  ε ε ε)(ε ε ε  C ε ε ε) =  i,j,k a i c jk ε i ε j ε k (9.2.18) E[a  ε ε εε ε ε  C ε ε ε] = σ 3 γ 1  i a i c ii = σ 3 γ 1 a  c.(9.2.19) One would like to have a matrix notation for (9.2.16) from which (9.2.17) follows by a trivial operation. This is not easily possible in the usual notation, but it is possible [...]... lies in the ellipse then it also lies in each band I.e., the ellipse is contained in the intersection of all the bands • k 1 point Show: If x − µ = Ψaα with some arbitrary scalar α, then (10.3.32) is an equality, and if α = ± 6/a Ψa, then both sides in (10.3.32) have the value 6 I.e., the boundary of the ellipse and the boundary lines of the band intersect Since the ellipse is completely inside the band,... density which contains 95% of the probability mass x Ψ1 = y 0.62 −0 .56 1. 85 1.67 0.62 0 .56 1. 85 −1.67 , Ψ2 = , Ψ3 = , Ψ4 = , −0 .56 1.04 1.67 3.12 0 .56 1.04 1.67 3.12 3.12 −1.67 1.04 0 .56 3.12 1.67 0.62 0.81 5 = , Ψ6 = , Ψ7 = , Ψ8 = , −1.67 1. 85 0 .56 0.62 1.67 1. 85 0.81 1.04 3.12 1.67 0 .56 0.62 Ψ9 = , Ψ10 = Which is it? Remember that for a uni2.67 1. 85 0.62 −1.04 variate Normal, 95% of the probability... dr, the inner integral becomes − 2π e−t 0 = 1 2π ; therefore the whole integral is 1 Therefore the product of the integrals of the marginal densities is 1, and since each such marginal integral is positive and they are equal, each of the marginal integrals is 1 too 2 ∞ Problem 161 6 points The Gamma function can be defined as Γ(r) = 0 xr−1 e−x √ 1 Show that Γ( 2 ) = π (Hint: after substituting r = 1/2,... has to replace u and v in the above equations by u − µu and v − µv du and dv remain the same, because the Jacobian of the translation u → u − µu , v → v − µv is 1 While the univariate normal was determined by mean and standard deviation, the bivariate normal is determined by the two means µu and µv , the two standard deviations σu and σv , and the correlation coefficient ρ 10.3.2 Level Lines of the Normal... NORMAL • a 3 points Compute the probability that a random variable (10.3.24) x ∼N y 1 0 .5 , 1 −0. 25 −0. 25 1 falls into this ellipse Hint: you should apply equation (10.4.9) Then you will have to look up the values of a χ2 distribution in a table, or use your statistics software to get it • b 1 point Compute the standard deviations of x and y, and the correlation coefficient corr(x, y) • c.√ points The vertical... elements in the diagonal), therefore 280 10 MULTIVARIATE NORMAL cross out 10, and a nonnegative determinant, therefore cross out 8 Covariance must be positive, so cross out 1 and 5 Variance in x-direction is smaller than in y-direction, therefore cross out 6 and 7 Remains 2 and 3 Of these it is number 3 By comparison with Figure 1 one can say that the vertical band between 0.4 and 2.6 and the horizontal band... easiest if we use matrix notation Define (10.3.26) x= x y µ= 1 1 Ψ= 0 .5 −0. 25 −0. 25 1 a= a b Equation (10.3.23) in matrix notation says: the ellipse contains all the points for which (10.3.27) (x − µ) Ψ−1 (x − µ) ≤ 6 Show that the band defined by inequality (10.3. 25) contains all the points for which 2 a (x − µ) ≤ 6 a Ψa • h 2 points Inequality (10.3.28) can also be written as: (10.3.28) (10.3.29) (x...9.2 MEANS AND VARIANCES OF QUADRATIC FORMS in tile notation: ε (9.2.20) = E γ1 σ 3 ∆ ε ε Therefore a a ε (9.2.21) = E ε γ1 σ 3 ∆ ε C C 255 256 Since n 9 RANDOM MATRICES ∆ C is the vector of diagonal elements of C, called c, the last term in equation (9.2.21) is the scalar product a c Given a random vector ε of independent variables εi with zero expected value E[εi ] = 0 and identical second and fourth... consisting of linear combinations of the same set of multivariate normal variables is again multivariate normal For simplicity we will go over now to the bivariate Normal distribution 10.3 Special Case: Bivariate Normal The following two simple rules allow to obtain all bivariate Normal random variables: (1) If x and y are independent and each of them has a (univariate) normal distribution with mean 0 and. .. the joint density function in its natural form as the product of a marginal and a conditional density function: fu,v (u, v) = fu (u) · fv|u (v; u) From this decomposition one can draw the following conclusions: 2 • u ∼ N (0, σu ) is normal and, by symmetry, v is normal as well Note that u (or v) can be chosen to be any nonzero linear combination of x and y Any nonzero linear transformation of independent . their values are contained in a linear subspace. This has relevance for estimators involving singular random variables: two such estimators (i.e., functions of a singular random variable) should. CHAPTER 9 Random Matrices The step from random vectors to random matrices (and higher order random arrays) is not as big as the step from individual random variables to random vectors. We. 2)  Problem 157 . Show that any real numbers γ 1 and γ 2 satisfying (9.2.13) can be the skewness and kurtosis of a random variable. Answer. To show that all combinations satisfying this inequality

Class Notes in Statistics and Econometrics Part 5 pps

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan