Econometric theory and methods, Russell Davidson - Chapter 5 ppsx

Chapter 5 Confidence Intervals 5.1 Introduction Hypothesis testing, which we discussed in the previous chapter, is the foun- dation for all inference in classical econometrics. It can be used to find out whether restrictions imposed by economic theory are compatible with the data, and whether various aspects of the specification of a model appear to be correct. However, once we are confident that a model is correctly specified and incorporates whatever restrictions are appropriate, we often want to make inferences about the values of some of the parameters that appear in the model. Although this can be done by performing a battery of hypothesis tests, it is usually more convenient to construct confidence intervals for the individual parameters of specific interest. A less frequently used, but sometimes more informative, approach is to construct confidence regions for two or more parameters jointly. In order to construct a confidence interval, we need a suitable family of tests for a set of point null hypotheses. A different test statistic must be calculated for each different null hypothesis that we consider, but usually there is just one type of statistic that can be used to test all the different null hypotheses. For instance, if we wish to test the hypothesis that a scalar parameter θ in a regression model equals 0, we can use a t test. But we can also use a t test for the hypothesis that θ = θ 0 for any specified real number θ 0 . Thus, in this case, we have a family of t statistics indexed by θ 0 . Given a family of tests capable of testing a set of hypotheses about a (scalar) parameter θ of a model, all with the same level α, we can use them to construct a confidence interval for the parameter. By definition, a confidence interval is an interval of the real line that contains all values θ 0 for which the hypothesis that θ = θ 0 is not rejected by the appropriate test in the family. For level α, a confidence interval so obtained is said to be a 1 − α confidence interval, or to be at confidence level 1 − α. In applied work, .95 confidence intervals are particularly popular, followed by .99 and .90 ones. Unlike the parameters we are trying to make inferences about, confidence intervals are random. Every different sample that we draw from the same DGP will yield a different confidence interval. The probability that the random interval will include, or cover, the true value of the parameter is called the coverage probability, or just the coverage, of the interval. Suppose that all the Copyright c  1999, Russell Davidson and James G. MacKinnon 177 178 Confidence Intervals tests in the family have exactly level α, that is, they reject their corresponding null hypotheses with probability exactly equal to α when the hypothesis is true. Then the coverage of the interval constructed from this family of tests will be precisely 1 − α. Confidence intervals may be either exact or approximate. When the exact distribution of the test statistics used to construct a confidence interval is known, the coverage will be equal to the confidence level, and the interval will be exact. Otherwise, we have to be content with approximate confidence intervals, which may be based either on asymptotic theory or on the bootstrap. In the next section, we discuss both exact confidence intervals and approximate ones based on asymptotic theory. Then, in Section 5.3, we discuss bootstrap confidence intervals. Like a confidence interval, a 1 −α confidence region for a set of k model parameters, such as the components of a k vector θ, is a region in a k dimensional space (often, the region is the k dimensional analog of an ellipse) constructed in such a way that, for every point represented by the k vector θ 0 in the confidence region, the joint hypothesis that θ = θ 0 is not rejected by the appropriate member of a family of tests at level α. Thus confidence regions constructed in this way will cover the true values of the parameter vector 100(1 − α)% of the time, either exactly or approximately. In Section 5.4, we show how to construct confidence regions and explain the relationship between confidence regions and confidence intervals. In previous chapters, we assumed that the error terms in regression models are independently and identically distributed. This assumption yielded a simple form for the covariance matrix of a vector of OLS parameter estimates, expression (3.28), and a simple way of estimating this matrix. In Section 5.5, we show that it is possible to estimate the covariance matrix of a vector of OLS estimates even when we abandon the assumption that the error terms are identically distributed. Finally, in Section 5.6, we discuss a simple and widely- used method for obtaining standard errors, covariance matrix estimates, and confidence intervals for nonlinear functions of estimated parameters. 5.2 Exact and Asymptotic Confidence Intervals A confidence interval for some scalar parameter θ consists of all values θ 0 for which the hypothesis θ = θ 0 cannot be rejected at some specified level α. Thus, as we will see in a moment, we can construct a confidence interval by “inverting” a test statistic. If the finite-sample distribution of the test statistic is known, we will obtain an exact confidence interval. If, as is more commonly the case, only the asymptotic distribution of the test statistic is known, we will obtain an asymptotic confidence interval, which may or may not be reasonably accurate in finite samples. Whenever a test statistic based on asymptotic theory has poor finite-sample properties, a confidence interval Copyright c  1999, Russell Davidson and James G. MacKinnon 5.2 Exact and Asymptotic Confidence Intervals 179 based on that statistic will have p oor coverage: In other words, the interval will not cover the true parameter value with the specified probability. In such cases, it may well be worthwhile to seek other test statistics that will yield different confidence intervals with better coverage. To begin with, suppose that we wish to base a confidence interval for the parameter θ on a family of test statistics that have a distribution or asymptotic distribution like the χ 2 or the F distribution under their respective nulls. Statistics of this type are always positive, and tests based on them reject their null hypotheses when the statistics are sufficiently large. Such tests are often equivalent to two-tailed tests based on statistics distributed as standard normal or Student’s t. Let us denote the test statistic for the hypothesis that θ = θ 0 by the random variable τ(y, θ 0 ). Here y denotes the sample used to compute the particular realization of the statistic. It is the random element in the statistic, since τ (·) is just a deterministic function of its arguments. For each θ 0 , the test consists of comparing the realized τ(y, θ 0 ) with the level α critical value of the distribution of the statistic under the null. If we write the critical value as c α , then, for any θ 0 , we have by the definition of c α that Pr θ 0  τ(y, θ 0 ) ≤ c α  = 1 − α. (5.01) Here the subscript θ 0 indicates that the probability is calculated under the hypothesis that θ = θ 0 . If c α is a critical value for the asymptotic distribution of τ (y, θ 0 ), rather than for the exact distribution, then (5.01) is only approximately true. For θ 0 to belong to the confidence interval obtained by inverting the family of test statistics τ(y, θ 0 ), it is necessary and sufficient that τ(y, θ 0 ) ≤ c α . (5.02) Thus the limits of the confidence interval can be found by solving the equation τ(y, θ) = c α (5.03) for θ. This equation will normally have two solutions. One of these solutions will be the upper limit, θ u , and the other will be the lower limit, θ l , of the confidence interval that we are trying to construct. If c α is an exact critical value for the test statistic τ(y, θ ) at level α, then the confidence interval [θ l , θ u ] constructed in this way will have coverage 1 − α, as desired. To see this, observe first that, if we can find an exact critical value c α , the random function τ(y, θ 0 ) must be pivotal for the model M under consideration. In saying this, we are implicitly generalizing the definition of a pivotal quantity (see Section 4.6) to include random variables that may depend on the model parameters. A random function τ(y, θ) is said to be pivotal for M if, when it is evaluated at the true value θ 0 corresponding to some DGP in M, the result is a random variable whose distribution does not depend on what that DGP is. Pivotal functions of more than one model parameter are defined Copyright c  1999, Russell Davidson and James G. MacKinnon 180 Confidence Intervals in exactly the same way. The function is merely asymptotically pivotal if only the asymptotic distribution is invariant to the choice of DGP. Suppose that τ (y, θ 0 ) is an exact pivot. Then, for every DGP in the model M, (5.01) holds exactly. Since θ 0 belongs to the confidence interval if and only if (5.02) holds, this means that the confidence interval contains the true parameter value θ 0 with probability exactly equal to 1 − α, whatever the true parameter value may be. Even if it is not an exact pivot, the function τ(y, θ 0 ) must be asymptotically pivotal, since otherwise the critical value c α would depend asymptotically on the unknown DGP in M, and we could not construct a confidence interval with the correct coverage, even asymptotically. Of course, if c α is only approximate, then the coverage of the interval will differ from 1 − α to a greater or lesser extent, in a manner that, in general, depends on the unknown true DGP. Quantiles When we speak of critical values, we are implicitly making use of the concept of a quantile of the distribution that the test statistic follows under the null hypothesis. If F (x) denotes the CDF of a random variable X, and if the PDF f(x) ≡ F  (x) exists and is strictly positive on the entire range of possible values for X, then q α , the α quantile of F, for 0 ≤ α ≤ 1, satisfies the equation F (q α ) = α. The assumption of a strictly positive PDF means that F is strictly increasing over its range. Therefore, the inverse function F −1 exists, and q α = F −1 (α). For this reason, F −1 is sometimes called the quantile function. If F is not strictly increasing, or if the PDF does not exist, which, as we saw in Section 1.2, is the case for a discrete distribution, the α quantile does not necessarily exist, and is not necessarily uniquely defined, for all values of α. The 0.5 quantile of a distribution is often called the median. For α = 0.25, 0.5, and 0.75, the corresponding quantiles are called quartiles; for α = 0.2, 0.4, 0.6, and 0.8, they are called quintiles; for α = i/10 with i an integer between 1 and 9, they are called deciles; for α = i/20 with 1 ≤ i ≤ 19, they are called vigintiles; and, for α = i/100 with 1 ≤ i ≤ 99, they are called centiles. The quantile function of the standard normal distribution is shown in Figure 5.1. All three quartiles, the first and ninth deciles, and the .025 and .975 quantiles are shown in the figure. Asymptotic Confidence Intervals The discussion up to this point has deliberately been rather abstract, because τ(y, θ 0 ) can, in principle, be any sort of test statistic. To obtain more concrete results, let us suppose that τ(y, θ 0 ) ≡  ˆ θ − θ 0 s θ  2 , (5.04) where ˆ θ is an estimate of θ, and s θ is the corresponding standard error, that is, an estimate of the standard deviation of ˆ θ. Thus τ (y, θ 0 ) is the square Copyright c  1999, Russell Davidson and James G. MacKinnon 5.2 Exact and Asymptotic Confidence Intervals 181 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0000 0.500.25 −0.6745 0.75 0.6745 0.10 −1.2816 0.90 1.2816 0.025 −1.9600 0.975 1.9600 α F −1 (α) Figure 5.1 The quantile function of the standard normal distribution of the t statistic for the null hypothesis that θ = θ 0 . If ˆ θ were an OLS estimate of a regression coefficient, then, under conditions that were discussed in Section 4.5, the test statistic defined in (5.04) would be asymptotically distributed as χ 2 (1) under the null hypothesis. Therefore, the asymptotic critical value c α would be the 1 − α quantile of the χ 2 (1) distribution. For the test statistic (5.04), equation (5.03) becomes  ˆ θ − θ s θ  2 = c α . Taking the square root of both sides and multiplying by s θ then gives | ˆ θ − θ| = s θ c 1/2 α . (5.05) As expected, there are two solutions to equation (5.05). These are θ l = ˆ θ − s θ c 1/2 α and θ u = ˆ θ + s θ c 1/2 α , and so the asymptotic 1 − α confidence interval for θ is  ˆ θ − s θ c 1/2 α , ˆ θ + s θ c 1/2 α  . (5.06) This means that the interval consists of all values of θ between the lower limit ˆ θ − s θ c 1/2 α and the upper limit ˆ θ + s θ c 1/2 α . For α = 0.05, the 1 − α quantile Copyright c  1999, Russell Davidson and James G. MacKinnon 182 Confidence Intervals θθ ˆ θ θ l θ u 1.96s θ 1.96s θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c α = 3 . 8415 (( ˆ θ − θ)/s θ ) 2 Figure 5.2 A symmetric confidence interval of the χ 2 (1) distribution is 3.8415, the square root of which is 1.9600. Thus the confidence interval given by (5.06) becomes  ˆ θ − 1.96s θ , ˆ θ + 1.96s θ  . (5.07) This interval is shown in Figure 5.2, which illustrates the manner in which it is constructed. The value of the test statistic is on the vertical axis of the figure. The upper and lower limits of the interval occur at the values of θ where the test statistic (5.04) is equal to c α , which in this case is 3.8415. We would have obtained the same confidence interval as (5.06) if we had started with the asymptotic t statistic ( ˆ θ − θ 0 )/s θ and used the N(0, 1) distribution to perform a two-tailed test. For such a test, there are two critical values, one the negative of the other, because the N(0, 1) distribution is symmetric about the origin. These critical values are defined in terms of the quantiles of that distribution. The relevant ones are now the α/2 and the 1 − (α/2) quantiles, since we wish to have the same probability mass in each tail of the distribution. It is conventional to denote these quantiles of the standard normal distribution by z α/2 and z 1−(α/2) , respectively. Note that z α/2 is negative, since α/2 < 1/2, and the median of the N(0, 1) distribution is 0. By symmetry, it is the negative of z 1−(α/2) . Equation (5.03), which has two solutions for a χ 2 test, is replaced by two equations, each with just one solution, as follows: τ(y, θ) = ±c. Here τ (y, θ) denotes the (signed) t statistic rather than the χ 2 (1) statistic used in (5.03), and the positive number c can be defined either as z 1−(α/2) or as −z α/2 . The resulting confidence interval [θ l , θ u ] can thus be written in Copyright c  1999, Russell Davidson and James G. MacKinnon 5.2 Exact and Asymptotic Confidence Intervals 183 two different ways:  ˆ θ + s θ z α/2 , ˆ θ − s θ z α/2  and  ˆ θ − s θ z 1−(α/2) , ˆ θ + s θ z 1−(α/2)  . (5.08) When α = .05, we once again obtain the interval (5.07), since z .025 = −1.96 and z .975 = 1.96. Asymmetric Confidence Intervals The confidence interval (5.06), which is the same as the interval (5.08), is a symmetric one, because θ l is as far below ˆ θ as θ u is above it. Although many confidence intervals are symmetric, not all of them share this property. The symmetry of (5.06) is a consequence of the symmetry of the standard normal distribution and of the form of the test statistic (5.04). It is possible to construct confidence intervals based on two -tailed tests even when the distribution of the test statistic is not symmetric. For a chosen level α, we wish to reject whenever the statistic is too far into either the right-hand or the left-hand tail of the distribution. Unfortunately, there are many ways to interpret “too far” in this context. The simplest is probably to define the rejection region in such a way that there is a probability mass of α/2 in each tail. This is called an equal-tailed confidence interval. Two critical values are needed for each level, a lower one, c − α , which will be the α/2 quantile of the distribution, and an upper one, c + α , which will be the 1 − (α/2) quantile. A realized statistic ˆτ will lead to rejection at level α if either ˆτ < c − α or ˆτ > c + α . This will lead to an asymmetric confidence interval. We will discuss such intervals, where the critical values are obtained by bootstrapping, in the next section. It is also possible to construct confidence intervals based on one-tailed tests. Such an interval will be open all the way out to infinity in one direction. Sup- pose that, for each θ 0 , the null θ ≤ θ 0 is tested against the alternative θ > θ 0 . If the true parameter value is finite, we will never want to reject the null for any θ 0 that substantially exceeds the true value. Consequently, the confidence interval will be open out to plus infinity. Formally, the null is rejected only if the signed t statistic is algebraically greater than the appropriate critical value. For the N(0, 1) distribution, this is z 1−α for level α. The null θ ≤ θ 0 will not be rejected if τ(y, θ 0 ) ≤ z 1−α , that is, if ˆ θ − θ 0 ≤ s θ z 1−α . The interval over which θ 0 satisfies this inequality is just  ˆ θ − s θ z 1−α , +∞  . (5.09) P Values and Asymmetric Distributions The above discussion of asymmetric confidence intervals raises the question of how to calculate P values for two-tailed tests based on statistics with asymmetric distributions. This is a little tricky, but it will turn out to be useful when we discuss bootstrap confidence intervals in the next section. Copyright c  1999, Russell Davidson and James G. MacKinnon 184 Confidence Intervals If the P value is defined, as usual, as the smallest level for which the test rejects, then, if we denote by F the CDF used to calculate critical values or P values, the P value associated with a statistic τ should be 2F (τ) if τ is in the lower tail, and 2(1 − F (τ)) if it is in the upper tail. This can be seen by the same arguments, based on Figure 4.2, that were used for symmetric two-tailed tests. A slight problem arises as to the point of separation between the left and right sides of the distribution. However, it is easy to see that only one of the two possible P values is less than 1, unless F (τ ) is exactly equal to 0.5, in which case both are equal to 1, and there is no ambiguity. In complete generality, then, we have that the P value is p(τ) = 2 min  F (τ), 1 − F (τ)  . (5.10) Thus the point that separates the left and right sides of the distribution is the median, q .50 , since F (q .50 ) = .50 by definition. Any τ greater than the median is in the right-hand tail of the distribution, and any τ less than the median is in the left-hand tail. Exact Confidence Intervals for Regression Coefficients In Section 4.4, we saw that, for the classical normal linear model, exact tests of linear restrictions on the parameters of the regression function are available, based on the t and F distributions. This implies that we can construct exact confidence intervals. Consider the classical normal linear model (4.21), in which the parameter vector β has been partitioned as [β 1 . . . . β 2 ], where β 1 is a (k − 1) vector and β 2 is a scalar. The t statistic for the hypothesis that β 2 = β 20 for any particular value β 20 can be written as ˆ β 2 − β 20 s 2 , (5.11) where s 2 is the usual OLS standard error for ˆ β 2 . Any DGP in the model (4.21) satisfies β 2 = β 20 for some β 20 . With the correct value of β 20 , the t statistic (5.11) has the t(n − k) distribution, and so Pr  t α/2 ≤ ˆ β 2 − β 20 s 2 ≤ t 1−(α/2)  = 1 − α, (5.12) where t α/2 and t 1−(α/2) denote the α/2 and 1 − (α/2) quantiles of the t(n − k) distribution. We can use equation (5.12) to find a 1 − α confidence interval for β 2 . The left-hand side of the equation is equal to Pr  s 2 t α/2 ≤ ˆ β 2 − β 20 ≤ s 2 t 1−(α/2)  = Pr  −s 2 t α/2 ≥ β 20 − ˆ β 2 ≥ −s 2 t 1−(α/2)  = Pr  ˆ β 2 − s 2 t α/2 ≥ β 20 ≥ ˆ β 2 − s 2 t 1−(α/2)  . Copyright c  1999, Russell Davidson and James G. MacKinnon 5.3 Bootstrap Confidence Intervals 185 Therefore, the confidence interval we are seeking is  ˆ β 2 − s 2 t 1−(α/2) , ˆ β 2 − s 2 t α/2  . (5.13) At first glance, this interval may look a bit odd, because the upper limit is obtained by subtracting something from ˆ β 2 . What is subtracted is negative, however, because t α/2 < 0, since it is in the lower tail of the t distribution. Thus the interval does in fact contain the point estimate ˆ β 2 . It may still seem strange that the lower and upper limits of (5.13) depend, respectively, on the upper-tail and lower-tail quantiles of the t(n − k) distribution. This actually makes perfect sense, however, as can be seen by looking at the infinite confidence interval (5.09) based on a one-tailed test. There, since the null is that θ ≤ θ 0 , the confidence interval must be op en out to +∞, and so only the lower limit of the confidence interval is finite. But the null is rejected when the test statistic is in the upper tail of its distribution, and so it must be the upper-tail quantile that determines the only finite limit of the confidence interval, namely, the lower limit. Readers are strongly advised to take some time to think this point through, since most people find it strongly counter-intuitive when they first encounter it, and they can accept it only after a period of reflection. In the case of (5.13), it is easy to rewrite the confidence interval so that it depends only on the positive, upper-tail, quantile, t 1−(α/2) . Because the Student’s t distribution is symmetric, the interval (5.13) is the same as the interval  ˆ β 2 − s 2 t 1−(α/2) , ˆ β 2 + s 2 t 1−(α/2)  ; (5.14) compare the two ways of writing the confidence interval (5.08). For con- creteness, suppose that α = .05 and n − k = 32. In this special case, t 1−(α/2) = t .975 = 2.037. Thus the .95 confidence interval based on (5.14) extends from 2.037 standard errors below ˆ β 2 to 2.037 standard errors above it. This interval is slightly wider than the interval (5.07), which is based on asymptotic theory. We obtained the interval (5.14) by starting from the t statistic (5.11) and using the Student’s t distribution. As readers are asked to demonstrate in Exercise 5.2, we would have obtained precisely the same interval if we had started instead from the square of (5.11) and used the F distribution. 5.3 Bootstrap Confidence Intervals When exact confidence intervals are not available, and they generally are not, asymptotic ones are normally used. However, just as asymptotic tests do not always perform well in finite samples, neither do asymptotic confidence intervals. Since bootstrap P values and tests based on them often outperform their asymptotic counterparts, it seems natural to base confidence intervals Copyright c  1999, Russell Davidson and James G. MacKinnon 186 Confidence Intervals on bootstrap tests when asymptotic intervals give poor coverage. There are a great many varieties of bootstrap confidence intervals; for a comprehensive discussion, see Davison and Hinkley (1997). When we construct a bootstrap confidence interval, we wish to treat a family of tests, each corresponding to its own null hypothesis. Since, when we perform a bootstrap test, we must use a bootstrap DGP that satisfies the null hypothesis, it appears that we must use an infinite number of bootstrap DGPs if we are to consider the full family of tests, each with a different null. Fortunately, there is a clever trick that lets us avoid this difficulty completely. It is, of course, essential for a bootstrap test that the bootstrap DGP should satisfy the null hypothesis under test. However, when the distribution of the test statistic does not depend on precisely which null is being tested, the same bootstrap distribution can be used for a whole family of tests with different nulls. If a family of test statistics is defined in terms of a pivotal random function τ(y, θ 0 ), then, by definition, the distribution of this function is independent of θ 0 . Thus we could choose any value of θ 0 that the model allows for the bootstrap DGP, and the distribution of the test statistic, evaluated at θ 0 , would always be the same. The important thing is to make sure that τ (·) is evaluated at the same value of θ 0 as the one used to generate the bootstrap samples. Even if τ (·) is only asymptotically pivotal, the effect of the choice of θ 0 on the distribution of the statistic should be slight if the sample size is reasonably large. Suppose that we wish to construct a bootstrap confidence interval based on the t statistic ˆ t(θ 0 ) ≡ τ(y, θ 0 ) = ( ˆ θ − θ 0 )/s θ . The first step is to compute ˆ θ and s θ using the original data y. Then we generate bootstrap samples using a DGP, which may be either parametric or semiparametric, characterized by ˆ θ and by any other relevant estimates, such as the error variance, that may be needed. The resulting bootstrap DGP is thus quite independent of θ 0 , but it does depend on the estimate ˆ θ. We can now generate B bootstrap samples, y ∗ j , j = 1, . . . , B. For each of these, we compute an estimate θ ∗ j and its standard error s ∗ j in exactly the same way that we computed ˆ θ and s θ from the original data, and we then compute the bootstrap “t statistic” t ∗ j ≡ τ(y ∗ j , ˆ θ) = θ ∗ j − ˆ θ s ∗ j . (5.15) This is the statistic that tests the null hypothesis that θ = ˆ θ, because ˆ θ is the true value of θ for the bootstrap DGP. If τ(·) is an exact pivot, the change of null from θ 0 to ˆ θ makes no difference. If τ(·) is an asymptotic pivot, there should usually be only a slight difference for values of θ 0 close to ˆ θ. The limits of the b ootstrap confidence interval will depend on the quantiles of the EDF of the t ∗ j . We can choose to construct either a symmetric confidence Copyright c  1999, Russell Davidson and James G. MacKinnon [...]... result (5. 55) and a first-order Taylor expansion of g(θ) around θ0 , it can be shown that the vector analog of (5. 49) is a ˆ n1/2 (ˆ − γ0 ) ∼ N 0, G0 V ∞ (θ)G0 , γ Copyright c 1999, Russell Davidson and James G MacKinnon (5. 56) 208 Confidence Intervals where G0 is an l × k matrix with typical element ∂gi (θ)/∂θj , evaluated at θ0 ; see Exercise 5. 14 The asymptotic covariance matrix that appears in (5. 56)... ∂gi (θ)/∂θj , evaluated at θ0 , and V ∞ (θ) is the ˆ k × k asymptotic covariance matrix of n1/2 (θ − θ0 ) ˆ 5. 15 Suppose that γ = exp(β) and β = 1.324, with a standard error of 0.2432 ˆ and its standard error Calculate γ = exp(β) ˆ Construct two different 99 confidence intervals for γ One should be based on (5. 51), and the other should be based on (5. 52) 5. 16 Construct two 95 bootstrap confidence intervals... are 50 observations, β = [1 1 1], and the data on the exogenous variables are to be found in the file mw.data These data were originally used by MacKinnon and White (19 85) Copyright c 1999, Russell Davidson and James G MacKinnon 5. 8 Exercises 211 For each of the two DGPs and each of the N simulated data sets, construct 95 confidence intervals for β1 and β2 using the usual OLS covariance matrix and. .. follows the χ2 (2) distribution, and find the 95 and 99 “quantiles” of the EDF, that is the 950 th and 990 th entries in the sorted list of the realizations Compare these with the 95 and 99 quantiles of the χ2 (2) distribution Copyright c 1999, Russell Davidson and James G MacKinnon 210 Confidence Intervals 5. 6 Using the data in the file earnings.data, construct a 95 studentized bootstrap confidence interval... compute γj and (sγ )∗ The bootstrap j ∗ “t statistics” (γj − γ )/(sγ )∗ are then sorted If (cγ )∗ and (cγ )∗ ˆ α/2 1−(α/2) denote j the entries indexed by (α/2)(B + 1) and (1 − (α/2))(B + 1) in the sorted list, then the (asymmetric) bootstrap confidence interval is ∗ γ − sγ (cγ )∗ ˆ ˆ 1−(α/2) , γ − sγ (cγ )α/2 (5. 54) As readers are asked to check in Exercise 5. 16, the intervals (5. 53) and (5. 54) are... (5. 08) The result is γ − sγ z1−(α/2) , γ + sγ z1−(α/2) , ˆ ˆ (5. 51) where sγ is the delta method estimate (5. 50), and z1−(α/2) is the 1 − (α/2) quantile of the standard normal distribution This confidence interval can be expected to work well whenever the finite-sample distribution of γ is well ˆ approximated by the normal distribution and sγ is a reliable estimator of its standard deviation Using (5. 08)... and the ωt , we can apply a law of large numbers to the second term in the second line of (5. 37); see White (1980, 1984) for details Since vt has mean 0 by construction, this term converges to 0, while the first term converges to (5. 35) The above argument shows that (5. 37) tends in probability to (5. 35) Because (5. 37) is asymptotically equivalent to (5. 36), the latter also tends in probability to (5. 35) ... with α = 05 and B = 999, this is the 9 75 th entry in the sorted list, since there are precisely 25 integers in the range 9 75 999, just as there are in the range 1− 25 The asymmetric equal-tail bootstrap confidence interval can be written as ∗ ˆ ˆ θl , θu = θ − sθ c∗ 1−(α/2) , θ − sθ cα/2 (5. 17) This interval bears a striking resemblance to the exact confidence inter∗ val (5. 13) Clearly, c∗ 1−(α/2) and cα/2... was discussed in Section 5. 6, is even more general, since it can be used whenever one parameter, or vector of parameters, is a nonlinear function of another 5. 8 Exercises 5. 1 Find the 0 25, 05, 10, and 20 quantiles of the standard normal distribution Use these to obtain whatever quantiles of the χ2 (1) distribution you can 5. 2 Starting from the square of the t statistic (5. 11), and using the F (1, n −... exogenous Copyright c 1999, Russell Davidson and James G MacKinnon 5. 5 Heteroskedasticity-Consistent Covariance Matrices 199 ˆ Because β is consistent for β0 , ut is consistent for ut , and u2 is therefore ˆ ˆt 2 consistent for ut Thus, asymptotically, expression (5. 36) is equal to 1 − n n u2 Xti Xtj t t=1 1 =− n 1 =− n n 2 (ωt + vt )Xti Xtj t=1 n 2 ωt Xti Xtj t=1 1 +− n (5. 37) n vt Xti Xtj , t=1 2 . . . . . . . . . . . . . 0.0000 0 .50 0. 25 −0.67 45 0. 75 0.67 45 0.10 −1.2816 0.90 1.2816 0.0 25 −1.9600 0.9 75 1.9600 α F −1 (α) Figure 5. 1 The quantile function of the standard normal distribution of. is the median, q .50 , since F (q .50 ) = .50 by definition. Any τ greater than the median is in the right-hand tail of the distribution, and any τ less than the median is in the left-hand tail. Exact. interval (5. 08). For con- creteness, suppose that α = . 05 and n − k = 32. In this special case, t 1−(α/2) = t .9 75 = 2.037. Thus the . 95 confidence interval based on (5. 14) extends from 2.037 standard

Econometric theory and methods, Russell Davidson - Chapter 5 ppsx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan