Class Notes in Statistics and Econometrics Part 26 pot

CHAPTER 51 Distinguishing Random Variables from Variables Created by a Deterministic Chaotic Process Dynamical systems are either described as recursive functions (discrete time) or as differential equations. With discrete time, recursive functions (recursive functions are difference equations, discrete analog of differential equations), one can easily get chaotic behavior. E.g., the tent map or logistic function. The problem is: how to distinguish the output of such a process from a randomly generated output. The same problem can also happ en in the continuous case. First-order differential equations can be visualized as vector fields. 1083 1084 51. DETERMINISTIC CHAOS AND RANDOMNESS An attractor A is a compact set which has a neighborhood U such that A is the limit set of all trajectories starting in U. That means, every trajectory starting in U comes arbitrarily close to each point of the attractor. In R 2 , there are three different types of attractors: fixed points, limit cycles, and saddle lo ops. But in R 3 and higher, chaos can occur, i.e., the trajectory can have a “strange attractor.” Example: Lorenz attractor. There is no commonly accepted definition of a strange attractor, it is an attractor that is neither a point nor a closed curve, and trajectories attracted by it take vastly different courses after a short time. Now fractal dimensions: first the Hausdorff dimension as lim ε→0 log N(ε) log(1/ε) , indicating the exponent with which the number of covering pieces N(ε) increases as the diameter of the pieces diminishes. Examples with integer dimensions: for points we have N(ε) = 1 always, therefore dimension is 0. For straight lines of length L, N(ε) = L/ε, therefore we get lim ε→0 log(L/ε) log(1/ε) = 1, and for an area with s urface S it is lim ε→0 log(S/ε 2 ) log(1/ε) = 2. Famous example of set with fractal dimension is the Cantor set: start with unit interval, take middle third out, then take middle third of the two remaining segments out, etc. For ε = 1/3 one gets N(ε) = 2, for ε = 1/9 one gets N(ε) = 4, 51. DETERMINISTIC CHAOS AND RANDOMNESS 1085 and generally, for ε = (1/3) m one gets N(ε) = 2 m . Therefore the dimension is lim m→∞ log 2 m log 3 m = log 2 log 3 = 0.63. A c oncept related to the Hausdorff dimension is the correlation dimension. To compute this one needs C(ε), the fraction of the total number of points that are within the Euclidian distance ε of a given point. (This C(ε) is a quotient of two infinite numbers, but in finite samples it is a quotient of two large but finite numbers, this is w hy it is more tractable than the Hausdorff dimension.) Example again with straight line and area, using sup norm: line: C(ε) = 2ε/L, area: C(ε) = 4ε 2 /S. Then the correlation dimension is lim ε→0 log C(ε) log ε , again indicating how this count varies with the distance. To compute it, use log C M (ε), which is the sample analog of log C(ε) for a sample of size M, and plot it against log ε. To get this sample analog, look at all pairs of different points, and count those which are less than ε apart, and divide by total number of pairs of different points N(N − 1)/2. Clearly, if ε is too small, it falls through between the points, and if it is too large, it extends beyond the boundaries of the set. Therefore one cannot look at the slope in the origin but must look at the slope of a straight line segment near the origin. Another reason for not lo oking at too small ε is that there may be a measurement error.) 1086 51. DETERMINISTIC CHAOS AND RANDOMNESS It seems the correlation dimension is c lose to and cannot exceed the Hausdorff dimension. What one really wants is apparently the Hausdorff dimension, but the correlation dimension is a numerically convenient surrogate. Importance of fractal dimensions: If an attractor has a fractal dimension, then it is likely to be a strange attractor (although strictly speaking it is neither necessary nor sufficient). E.g. it s ee ms to me the precise Hausdorff dimension of the Lorentz attractor is not known, but the correlation dimension is around 2.05. 51.1. Empirical Methods: Grassberger-Procaccia Plots. With conventional statistical means, it is hard to distinguish chaotic deterministic from random timeseries. In a timeseries generated by a tent map, one obtains for almost all initial conditions a time series whose the autocorrelation function is zero for all lags. We need sophisticated results from chaos theory to be able to tell them apart. Here is the first such result: Assume there is a time series of n-dimensional vectors x t having followed a deterministic chaotic motion for a long time, so that for all practical purposes it has arrived at its strange attractor, but at every time point t you only observe the jth component x j t. Then an embedding of dimension m is an artificial dynamical system formed by the m-histories of this jth component. Takens proved that if x t lies on a strange attractor, and the embedding dimension 51.1. EMPIRICAL METHODS: GRASSBERGER-PROCACCIA PLOTS. 1087 m > 2n−1 then the embedding is topologically equivalent to the original time series. In particular this means that it has the same correlation dimension. This has important implications: if a time series is part of a deterministic system also including other time series, then one can draw certain conclusions about the attractor without knowing the other time series. Next point: the correlation dimension of this embedding is lim ε→0 log C(ε,m) log ε , where the embedding dimension m is added as second argument into the function C. If the system is deterministic, the correlation dimension settles to a stationary value as the embedding dimension m increases; for a random system it keeps increasing, in the i.i.d. case it is m. (In the special case that this i.i.d. distribution is the uniform one, the m-histories are uniformly distributed on the m-dimensional unit cub e, and it follows immediately, like our examples above.) Therefore the Grassberger-Procaccia plots show for each m one curve, plotting log C(ε, m) against log ε. For ε small, i.e., log ε going towards −∞, the plots of the true C’s become asymptotically a straight line emanating from the origin with a given slope which indicates the dimension. Now one cannot make ε very small for two reasons: (1) there are only finitely many data points, and (2) there is also a measurement error whose effect disappe ars if ε becomes bigger than a few standard deviations of this measurement error. Therefore one looks at the slope for values of ε that are not too small. 1088 51. DETERMINISTIC CHAOS AND RANDOMNESS One method to see whether there is a deterministic structure is to compare this sample correlation dimension with that of “scrambled” data and see whether the slopes of the original data do not become steeper while those of the scrambled data still become steeper. Scrambling means: fit an autocorrelation and then randomly draw the residuals. This is a powerful tool for distinguishing random noise from a deterministic system. CHAPTER 52 Instrumental Variables Compare here [DM93, chapter 7] and [Gre97, Section 6.7.8]. Greene first intro- duces the simple instrumental variables estimator and then shows that the generalized one picks out the best linear combinations for forming simple instruments. I will follow [DM93] and first introduce the generalized instrumental variables estimator, and then go down to the simple one. In this chapter, we will discuss a sequence of models y n = X n β + ε ε ε n , where ε ε ε n ∼ (o n , σ 2 I n ), and X n are n ×k-matrices of random regressors, and the number of observations n → ∞. We do not make the assumption plim 1 n X  n ε ε ε n = o which would ensure consistency of the OLS estimator (compare Problem 394). Instead, a sequence of n ×m matrices of (random or nonrandom) “instrumental variables” W n 1089 1090 52. INSTRUMENTAL VARIABLES is available which satisfies the following three conditions: plim 1 n W  n ε ε ε n = o(52.0.1) plim 1 n W  n W n = Q exists, is nonrandom and nonsingular(52.0.2) plim 1 n W  n X n = D exists, is nonrandom and has full column rank(52.0.3) Full column rank in (52.0.3) is only possible if m ≥ k. In this situation, regression of y on X is inconsistent. But if one regresses y on the projection of X on R [W ], the column space of W , one obtains a c onsistent estimator. This is called the instrumental variables estimator. If x i is the ith column vector of X, then W (W  W ) −1 W  x i is the projection of x i on the space spanned by the columns of W . Therefore the matrix W (W  W ) −1 W  X consists of the columns of X projected on R [W ]. This is what we meant by the projection of X on R [W ]. With these projections as regressors, the vector of regression coefficients becomes the “generalized instrumental variables estimator” (52.0.4) ˜ β =  X  W (W  W ) −1 W  X  −1 X  W (W  W ) −1 W  y 52. INSTRUMENTAL VARIABLES 1091 Problem 460. 3 points We are in the model y = Xβ + ε ε ε and we have a matrix W of “instrumental variables” which satisfies the following three conditions: plim 1 n W  ε ε ε = o, plim 1 n W  W = Q exists, is nonrandom and positive definite, and plim 1 n W  X = D exists, is nonrandom and has full column rank. Show that the instrumental variables estimator (52.0.5) ˜ β =  X  W (W  W ) −1 W  X  −1 X  W (W  W ) −1 W  y is consistent. Hint: Write ˜ β n − β = B n · 1 n W  ε ε ε and show that the sequence of matrices B n has a plim. Answer. Wr ite it as ˜ β n =  X  W (W  W ) −1 W  X  −1 X  W (W  W ) −1 W  (Xβ + ε ε ε) = β +  X  W (W  W ) −1 W  X  −1 X  W (W  W ) −1 W  ε ε ε = β +  ( 1 n X  W )( 1 n W  W ) −1 ( 1 n W  X)  −1 ( 1 n X  W )( 1 n W  W ) −1 1 n W  ε ε ε, 1092 52. INSTRUMENTAL VARIABLES i.e., the B n and B of the hint are as follows: B n =  ( 1 n X  W )( 1 n W  W ) −1 ( 1 n W  X)  −1 ( 1 n X  W )( 1 n W  W ) −1 B = plim B n = (D  Q −1 D ) −1 D  Q −1  Problem 461. Assume plim 1 n X  X exists, and plim 1 n X  ε ε ε exists. (We only need the existence, not that t he first is nonsingular and the second zero). Show that σ 2 can be estimated consistently by s 2 = 1 n (y − X ˜ β)  (y − X ˜ β). Answer. y −X ˜ β = Xβ + ε ε ε − X ˜ β = ε ε ε − X( ˜ β −β). Therefore 1 n (y −X ˜ β)  (y −X ˜ β) = 1 n ε ε ε  ε ε ε − 2 n ε ε ε  X( ˜ β −β) + ( ˜ β −β)   1 n X  X  ( ˜ β −β). All summands have plims, the plim of the first is σ 2 and those of the other two are zero.  Problem 462. In the situation of Problem 460, add the stronger assumption 1 √ n W  ε ε ε → N(o, σ 2 Q), and show that √ n( ˜ β n − β) → N (o, σ 2 (D  Q −1 D) −1 ) Answer. ˜ β n − β = B n 1 n W  n ε ε ε n , therefore √ n( ˜ β n − β) = B n n −1/2 W  n ε ε ε n → BN(o, σ 2 Q) = N( o, σ 2 BQB  ). Since B = (D  Q −1 D ) −1 D  Q −1 , the result follows.  [...]... left inverse and R the only right inverse and L = R Proof: L = L(AR) = (LA)R = R Problem 465 2 points If W X is square and has full column rank, then it is nonsingular Show that in this case (52.0.4) simplifies to the “simple” instrumental variables estimator: (52.0.9) ˜ β = (W X)−1 W y 52 INSTRUMENTAL VARIABLES 1097 Answer In this case the big inverse can be split into three: (52.0.10) (52.0.11) ˜... with intercept, i.e., X = ι x , and we have one instrument w for x (while the constant term is its own instrument), i.e., W = ι w Show that the instrumental variables estimators for slope and intercept are (52.0.12) (52.0.13) (wt − w)(y t − y ) ¯ ¯ (wt − w)(xt − x) ¯ ¯ ˜x α = y − β¯ ˜ ¯ ˜ β= Hint: the math is identical to that in question 238 Problem 467 2 points Show that, if there are as many instruments... G W is idempotent − Answer (52.0.8) U = U V I = WG O where G= I O Problem 464 2 points Show: if a matrix D has full column rank and is square, then it has an inverse Answer Here you need that column rank is row rank: if D has full column rank it also has full row rank And to make the proof complete you need: if A has a left inverse L and a right inverse R, then L is the only left inverse and R the... parallel until it intersects point A Then the intersection with the w-ray is the projection of y on w But this latter plane contains ε , since ε is orthogonal to w This makes sure that the regression gives the right results Problem 463 4 points The asymptotic MSE matrix of the instrumental variables estimator with W as matrix of instruments is σ 2 plim X W (W W )−1 W X Show that if one adds more instruments,... is inconsistent If I allow two instruments, the two instruments must be in the horizontal plane perpendicular to the vector ε which is assumed still vertical Here we project x on this horizontal plane and then regress the y, which stays where it is, on this x In this way the residuals have the right direction! 52 INSTRUMENTAL VARIABLES 1095 What if there is one instrument, but it does not not lie in. .. sufficient to show that the inequality holds before going over to the plim, i.e., if W = U V , then (52.0.7) X U (U U )−1 U X −1 − X W (W W )−1 W X −1 is nonnegative definite Hints: (1) Use theorem A.5.5 in the Appendix (proof is not required) (2) Note that U = W G for some G Can you write this G in − 1096 52 INSTRUMENTAL VARIABLES partitioned matrix form? (3) Show that, whatever W and G, W (W W )−1 W W G(G... not not lie in the same plane as x and y? This is the most general case as long as there is only one regressor and one instrument This instrument w must lie somewhere in the horizontal plane We have to project x on it, and then regress y on this projection Look at it this way: take the plane orthogonal to w which goes through point C The projection of x on w is the intersection of the ray generated... as there are observations, then the instrumental variables estimator (52.0.4) becomes identical to OLS 1098 52 INSTRUMENTAL VARIABLES Answer In this case W has an inverse, therefore the projection on R[W ] is the identity Staying in the algebraic paradigm, (W W )−1 = W −1 (W )−1 An implication of Problem 467 is that one must be careful not to include too many instruments if one has a small sample... considered the predicted values if one regresses every column of X on W I.e., instead of regressing y on X one regresses y on those linear combinations of the columns of W which best approximate the columns of X Here is more detail: the matrix of estimated coeffiˆ cients in the first regression is Π = (W W )−1 W X, and the predicted values in ˆ ˆ this regression are X = W Π = W (W W )−1 W X The second regression,... estimates β by x y/x x, is inconsistent Let O be the origin, A the point on the x-vector where ε branches off (i.e., the end of xβ), furthermore let B be the point on the x-vector where the orthogonal projection of y comes down, and C the end of the x-vector ¯ ¯ ¯ 2 ¯ ¯ Then x y = OC OB and x x = OC , therefore x y/x x = OB/OC, which would be the β if the errors were orthogonal Now introduce a new variable . dimension.) Example again with straight line and area, using sup norm: line: C(ε) = 2ε/L, area: C(ε) = 4ε 2 /S. Then the correlation dimension is lim ε→0 log C(ε) log ε , again indicating how this count varies. steeper. Scrambling means: fit an autocorrelation and then randomly draw the residuals. This is a powerful tool for distinguishing random noise from a deterministic system. CHAPTER 52 Instrumental. also has full row rank. And to make the proof complete you need: if A has a left inverse L and a right inverse R, then L is the only left inverse and R the only right inverse and L = R. Proof: L

Class Notes in Statistics and Econometrics Part 26 pot

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan