Handbook of mathematics for engineers and scienteists part 161 docx

1088 MATHEMATIC AL STATISTICS 21.1.3-3. Sample variances. The statistic σ 2∗ = α ∗ 2 –(m ∗ ) 2 = 1 n n  i=1 (X i – m ∗ ) 2 (21.1.3.9) is called the sample variance (variance of the empirical distribution) of the sample X 1 , , X n . Suppose that α k = E{X k 1 } and μ k = E{(X 1 – α 1 ) k }.Ifα 4 < ∞, then the sample variance (21.1.3.9) has the properties of asymptotic unbiasness and consistency. The statistic s 2∗ = n n – 1  α ∗ 2 –(m ∗ ) 2  = 1 n – 1 n  i=1 (X i – m ∗ ) 2 =(s ∗ ) 2 (21.1.3.10) is called the adjusted sample variance and s ∗ is called the sample mean-square deviation of the sample X 1 , , X n . They satisfy the formulas E{σ 2∗ } = n – 1 n σ 2 , E{s 2∗ } = σ 2 ,Var{s 2∗ } = 1 n  μ 4 – n – 3 n – 4 σ 4  .(21.1.3.11) The statistic s 2∗ is an unbiased estimator of the variance μ 2 . 21.1.3-4. Characteristics of asymmetry and excessî The number γ ∗ 1 = μ ∗ 3 (μ ∗ 2 ) 3/2 = 1 nσ 3 ∗ n  i=1 (X i – m ∗ ) 3 (21.1.3.12) is called the sample asymmetry coefficient of a random sample X 1 , , X n . The number γ ∗ 2 = μ ∗ 4 μ ∗ 2 2 – 3 = 1 nσ 4 ∗ n  i=1 (X i – m ∗ ) 4 – 3 (21.1.3.13) is called the sample excess coefficient of a random sample X 1 , , X n . The sample excess coefficient is used in criteria for testing the hypothesis H 0 : γ 2 ≠ 0, which implies that the distribution of the random variable X i differs from the normal distribution. The statistics γ ∗ 1 and γ ∗ 2 are consistent estimators of asymmetry and excess. 21.2. Statistical Estimation 21.2.1. Estimators and Their Properties 21.2.1-1. Notion of estimator. A statistical estimator θ ∗ (or simply an estimator) of an unknown parameter θ for a sample X 1 , , X n is a function θ ∗ = θ ∗ (X 1 , , X n ) depending only on the sample X 1 , , X n . An estimator is a random variable and varies depending on the sample. Just as any random variable, it has a distribution function F θ ∗ (x). The distribution law of the statistic θ ∗ = θ ∗ (X 1 , , X n ) can be found by well-known methods of probability theory. Estimators producing separate points in the space of parameters or parametric functions to be estimated are called point estimators. Estimators producing sets of points in the space of parameters or parametric functions to be estimated are called interval estimators. 21.2. STATISTICAL ESTIMATION 1089 21.2.1-2. Unbiased estimators. The difference E{θ ∗ } – θ is called the bias of the estimator θ ∗ of the parameter θ.The estimator θ ∗ is said to be unbiased if its expectation is equal to the parameter to be estimated, i.e., if E{θ ∗ } = θ; otherwise, the estimator θ ∗ is said to be biased. Example 1. The sample moments α ∗ k are unbiased estimators of the moments α k . Example 2. The sample variance σ 2∗ is a biased estimator of the variance σ 2 , because E{σ 2∗ } = n – 1 n σ 2 ≠ σ 2 . If an estimator is not unbiased, then it either overestimates or underestimates θ. In both cases, this results in systematic errors of the same sign in the estimate of the parameter θ. If E{θ ∗ n } → θ as n →∞, then the estimator θ ∗ is said to be asymptotically unbiased. 21.2.1-3. Efficiency of estimators. An unbiased estimator θ ∗ n is said to be efficient if it has the least variance among all possible unbiased estimators of the parameter θ for random samples of the same size. Cram ´ er–Rao inequality.Letθ ∗ be an unbiased estimator of a parameter θ. Then (under additional regularity conditions imposed on the family F (x; θ)) the variance of θ ∗ satisfies the inequality Var {θ ∗ } ≥ 1 nI(θ) ,(21.2.1.1) where I(θ)istheFisher information, determined in the continuous case by the formula I(θ)=E  ∂ ∂θ ln p(x; θ)  2  and in the discrete case by the formula I(θ)=E  ∂ ∂θ ln P (X; θ)  2  . The Cram ´ er–Rao inequality determines a lower bound for the variance of an unbiased estimator. The variable ε(θ)= 1 nI(θ)Var{θ ∗ } (21.2.1.2) is called the efficiency ε(θ) of an unbiased estimator θ ∗ . An unbiased estimator θ ∗ is said to be (Cram ´ er–Rao) efficient if ε(θ)=1 for any θ. Remark. The Cram ´ er–Rao inequality for biased estimators has the form Var{θ ∗ n } ≥ 1 nI(θ)  1 + ∂ ∂θ (E{θ ∗ n } – θ)  .(21.2.1.3) 21.2.1-4. Consistency of estimators. An estimator θ ∗ is said to be consistent if it converges to the estimated parameter as the sample size increases. The convergence in question can be of different types: in probability, with probability 1, in mean-square, etc. As a rule, convergence in probability is used; i.e., an estimator θ ∗ is said to be consistent if for each ε > 0 and for all possible values of the unknown parameter θ this estimator satisfies the relation P {|θ ∗ – θ| > ε} → 0 as n →∞.(21.2.1.4) The consistency of an estimator justifies an increase in the size of a random sample, since in this case the probability of a large error in the estimate of the parameter θ decreases. 1090 MATHEMATIC AL STATISTICS 21.2.1-5. Sufficient estimators. 1 ◦ . A statistic S =(S 1 (X 1 , , X n ), , S k (X 1 , , X n )) is said to be sufficient for θ if the conditional distribution F X 1 , ,X n (x 1 , , x n |S = s) is independent of the parameter θ. N EYMAN–FISHER THEOREM. A statistic S is sufficient for a parameter θ if and only if the likelihood function (see Paragraph 21.2.2-2) has the form L(X 1 , , X n ; θ)=A(X 1 , , X n )B(S;θ), (21.2.1.5) where A(X 1 , , X n ) depends only on the sample X 1 , , X n and the function B(S, θ) depends only on S and θ . THEOREM. If S is a sufficient statistic and θ ∗ is an estimator of a parameter θ , then the conditional expectation θ ∗ S = E{θ ∗ |S} is an unbiased estimator of the parameter θ , depends only on the sufficient statistic S , and satisfies the inequality Var {θ ∗ S } ≤ Var{θ ∗ } (21.2.1.6) for all θ . 2 ◦ . A statistic S = S(X 1 , , X n )issaidtobecomplete for a family of distributions F(x; θ) if the fact that the relation E{h(s)} = E{h(S(X 1 , , X n ))} =  +∞ –∞  +∞ –∞ h(S(x 1 , , x n ))p(x 1 ; θ) p(x n ; θ) dx 1 dx n (21.2.1.7) is satisfied for all θ implies that h(s) ≡ 0. T HEOREM (MINIMALITY OF VARIANCE OF AN ESTIMATOR DEPENDING ON A COMPLETE SUFFICIENT STATISTIC ). Let S be a complete sufficient statistic, and let θ ∗ be an unbiased estimator of an unknown parameter θ .Then θ ∗ S = E{θ ∗ |S} (21.2.1.8) is the unique unbiased estimator with minimal variance. 21.2.1-6. Main statistical estimators. Let X 1 , , X n be a random sample of a normal population with parameters (a, σ 2 ). Then a) the statistic X i – a σ has the standard normal distribution; b) the statistic (m ∗ – a) √ n σ has the standard normal distribution; c) the statistic (m ∗ – a) √ n √ s 2∗ = (m ∗ – a) √ n – 1 √ σ 2∗ , which is called Student’s ratio,hasthe t-distribution with n – 1 degrees of freedom; d) the statistic X i – a √ s 2∗ = X i – a √ σ 2∗  n – 1 n has the t-distribution with n–1 degrees of freedom; e) the statistic (n – 1)s 2∗ σ 2 = nσ 2∗ σ 2 = 1 σ 2 n  k=1 (X k –m ∗ ) 2 has the chi-square distribution with n – 1 degrees of freedom; f) the statistic 1 σ 2 n  k=1 (X k – a) 2 has the chi-square distribution with n degrees of freedom. 21.2. STATISTICAL ESTIMATION 1091 Remark. The quantiles u α , χ 2 α , t α ,andφ α of the normal distribution, the chi-square distribution, the t-distribution, and the F -distribution can be found in the corresponding tables or calculated in EXCEL, MATHCAD, and other software. Example 3. For a normal distribution, the γ-quantile u γ can be determined using NORMINV(γ;m;σ)in EXCEL software, where, in the examples illustrating the use of software, γ is the confidence level, α = 1 – γ is the significance level, m is the expectation, and σ is the standard deviation. The function qnorm(γ, m, σ)in MATHCAD software can also be used. Example 4. For the chi-square distribution with n degrees of freedom, the γ-quantile χ 2 γ (n) can be found using the function CHIINV(α, n) in EXCEL software. The function qchisq(γ, n) in MATHCAD software can also be used. Example 5. For the t-distribution with n degrees of freedom, the γ-quantile t γ can be found using the function TINV(2α, n) in EXCEL software. The function qt(γ, n) in MATHCAD software can also be used. Example 6. For the F -distribution with n 1 and n 2 degrees of freedom, the γ-quantile φ γ can be found using the function FINV(α, n 1 , n 2 ) in EXCEL software. The function qF(γ, n 1 , n 2 ) in MATHCAD software can also be used. 21.2.2. Estimation Methods for Unknown Parameters 21.2.2-1. Method of moments. If the theoretical distribution function F (x) of a population belongs to a k-parameter family F (x; θ 1 , , θ k ) with unknown parameters θ 1 , , θ k , then any numerical characteristic is a function of the parameters θ 1 , , θ k . For the known distribution function, one can find the first k theoretical moments α i = α i (θ 1 , , θ k )(i = 1, 2, , k)(21.2.2.1) if these moments exist. The method of moments works as follows: for a large sample size k, the theoretical moments α 1 , , α k in system (21.2.2.1) are replaced by the sample moments α ∗ 1 , , α ∗ k ; then this system is solved for θ 1 , , θ k and the estimates of the unknown parameters are obtained. Thus the estimators θ ∗ 1 , , θ ∗ k of the unknown parameters θ 1 , , θ k in the method of moments are obtained from the system of equations α ∗ i = α 1 (θ ∗ 1 , , θ ∗ k )(i = 1, 2, , k). (21.2.2.2) The estimators obtained by the method of moments are, as a rule, consistent. Example 1. AsampleX 1 , , X n is selected from a population with theoretical distribution function having the exponential density p(x)=p(x, θ)=θe –θx (x ≥ 0). Since α 1 = 1/θ for the exponential law, we see that the method of moments gives α ∗ 1 = 1/θ ∗ , which implies that θ ∗ = 1/α ∗ 1 = 1/X. It should be noted that the efficiency of estimators obtained by the method of moments is, as a rule, less than 1, and these estimators are even biased. Since the estimators obtained by the method of moments are rather simple, they are often used as initial approximations for finding more efficient estimators. 21.2.2-2. Maximum likelihood estimation. Maximum likelihood estimation is the most popular estimation method. It is based on conditions for the extremum of a function of one or several random variables. The likelihood function is usually taken for such a function. 1092 MATHEMATIC AL STATISTICS The likelihood function is defined as the function L(X 1 , , X n )=L(X 1 , , X n ; θ) =  P (X 1 ; θ)P (X 2 ; θ) P(X n ; θ) in the discrete case, p(x 1 ; θ)p(x 2 ; θ) p(x n ; θ) in the continuous case. (21.2.2.3) In the likelihood function L(X 1 , , X n ; θ), the sample elements X 1 , , X n are fixed parameters and θ is an argument. The maximum likelihood estimator is a value θ ∗ such that L(X 1 , , X n ; θ ∗ )=max θ L(X 1 , , X n ; θ). (21.2.2.4) Since L and ln L attain maximum values for the same values of the argument θ,itis convenient to use the logarithm of the likelihood function rather than the function itself in practical implementations of the maximum likelihood method. The equation ∂ ∂θ ln[L(X 1 , , X n ; θ)] = 0 (21.2.2.5) is called the likelihood equation. If the theoretical distribution function F(X 1 , , X n ; θ 1 , , θ k ) depends on several parameters θ 1 , , θ k , then equation (21.2.2.5) should be replaced in the maximum likelihood method by the system of likelihood equations ∂ ∂θ 1 ln[L(X 1 , , X n ; θ 1 , , θ k )] = 0, ∂ ∂θ 2 ln[L(X 1 , , X n ; θ 1 , , θ k )] = 0, ∂ ∂θ k ln[L(X 1 , , X n ; θ 1 , , θ k )] = 0. (21.2.2.6) Properties of estimators obtained by the maximum likelihood method: 1. Coincidence of the efficient estimator with the estimator obtained by the maximum likelihood method. If there exists an efficient estimator (or a set of jointly efficient estimators), then it is the unique solution of the likelihood equation (or system of equations). 2. Asymptotic efficiency of the maximum likelihood estimator. Under certain conditions on the family F (x; θ) that guarantee the possibility of differentiating in the integrand and expanding ∂/∂θ ln p(x, θ) in a Taylor series up to the first term, the likelihood equation has a solution that is asymptotically normal with parameters (θ, 1/(nI)) as n →∞, where I is the Fisher information. 21.2.2-3. Least-squares method. The least-squares method for obtaining an estimator θ ∗ of the parameter θ is based on the minimization of the sum of squares of deviations of the sample data from the desired estimator of θ; i.e., it is required to find a value θ ∗ minimizing the sum S(θ)= n  i=1 (X i – θ) 2 → min . (21.2.2.7) The least-squares method is the simplest method for finding estimators of the parameter θ. 21.2. STATISTICAL ESTIMATION 1093 Example 2. Let us estimate the parameter θ of the Poisson distribution P (X = k)= λ k k! e –λ . The function S(θ)= n  i=1 (X i – θ) 2 has a minimum at the point  θ = 1 n n  i=1 X i . Thus, then least-squares estimator of the parameter θ in the Poisson distribution is θ ∗ = 1 n n  i=1 X i . Remark. Estimators of unknown parameters can also be found by the following methods: the minimal distance method, the method of nomograms, the minimal risk method, the maximum a posteriori (conditional) probability method, etc. 21.2.3. Interval Estimators (Confidence Intervals) 21.2.3-1. Confidence interval. An estimator θ ∗ of a parameter θ of the distribution of an observable random variable X (the estimator is a statistic, i.e., function of a random sample) is itself a random variable with its own distribution law and numerical characteristics (parameters) of the distribution. For a small number of observations, the following problems may arise: 1. What error can arise from the replacement of the parameter θ by its point estimator θ ∗ ? 2. What is the probability that the errors obtained are not beyond any prescribed limits? An interval estimator (confidence interval) is an interval (θ ∗ L , θ ∗ R ), determined by the sample, such that with some probability close to unity this interval contains the value of the population parameter to be estimated; i.e., P (θ ∗ L ≤ θ ≤ θ ∗ R )=γ,(21.2.3.1) where θ ∗ L and θ ∗ R are the lower and upper (left and right) boundaries of the confidence interval of the parameter θ and γ is the confidence level. The confidence level and the significance level satisfy the relation γ + α = 1.(21.2.3.2) The confidence intervals can be constructed from a given estimator θ ∗ andagivenconfidence level γ by various methods. In practice, the following two types of confi dence intervals are used: two-sided and one-sided. 21.2.3-2. Confidence interval in the case of normal samples. 1 ◦ . Confidence interval for estimating the expectation given variance. Suppose that a random sample X 1 , , X n is selected from a population X with a normal distribution law with unknown expectation a and known variance σ 2 . To estimate the expectation, we use the statistic m ∗ (the sample mean), which has the normal distribution with parameters (a, σ 2 /n). Then the statistic (m ∗ – a) √ n/σ has the normal distribution with parameters (0, 1). The confidence interval for the expectation m given variance σ 2 has the form  m ∗ – u 1+γ 2 σ √ n ; m ∗ + u 1+γ 2 σ √ n  ,(21.2.3.3) where u 1+γ 2 is the 1+γ 2 -quantile of the normal distribution with parameters (0, 1). 1094 MATHEMATIC AL STATISTICS 2 ◦ . Confidence interval for the expectation if the variance is unknown. Suppose that a random sample X 1 , , X n is selected from a population X normally distributed with unknown variance σ 2 and unknown expectation a. To estimate the expectation, we use the statistic T =  n s 2∗ (m ∗ – a) having the t-distribution (Student’s distribution) with L = n – 1 degrees of freedom. The confidence interval for the expectation for the case in which the variance is unknown has the form  m ∗ –  s 2∗ n t 1+γ 2 ; m ∗ +  s 2∗ n t 1+γ 2  .(21.2.3.4) Remark. For a sufficiently large sample size n, the difference between the confidence intervals obtained by (21.2.3.3) and (21.2.3.4) is small, since Student’s distribution tends to the normal distribution as n →∞. 3 ◦ . Confidence interval for the variance of a population with a normal distribution and known expectation. Suppose that a sample X 1 , , X n is selected from a normal population with unknown variance and known expectation equal to a. For the estimator of the unknown variance σ 2 the statistic σ 2 0 = 1 n n  k=1 (X k – a) 2 is used. In this case, the statistic χ 2 = nσ 2 0 /σ 2 has the chi-square distribution with n degrees of freedom. The confidence interval for the variance, given expectation, has the form  nσ 2 0 χ 2 1+γ 2 (n) ; nσ 2 0 χ 2 1–γ 2 (n)  .(21.2.3.5) 4 ◦ . Confidence interval for the variance of a population with normal distribution and unknown expectation. For the estimator of the unknown variance σ 2 the sample variance s 2∗ is used. In this case, the statistic χ 2 =(n – 1)s 2∗ /σ 2 has the chi-square distribution with n–1 degrees of freedom. The confidence interval for the variance given that the expectation is unknown has the form  (n – 1)s 2∗ χ 2 1+γ 2 (n – 1) ; (n – 1)s 2∗ χ 2 1–γ 2 (n – 1)  .(21.2.3.6) 21.3. Statistical Hypothesis Testing 21.3.1. Statistical Hypothesis. Test 21.3.1-1. Statistical hypothesis. Any assumption concerning the form of the population distribution law or numerical values of the parameters of the distribution law is called a statistical hypothesis. Any statistical hypothesis uniquely determining the distribution law is said to be simple;otherwise,itis said to be composite. A statistical hypothesis is said to be parametric if it contains an assumption concerning the range of the unknown parameters. If a hypothesis does not contain any assumption concerning the range of the unknown parameters, then such a hypothesis is said to be nonparametric. The hypothesis H 0 to be tested is called the null hypothesis, and the opposite hypothesis H 1 is called the alternative hypothesis. . Fisher information. 21.2.2-3. Least-squares method. The least-squares method for obtaining an estimator θ ∗ of the parameter θ is based on the minimization of the sum of squares of deviations of the. interval. An estimator θ ∗ of a parameter θ of the distribution of an observable random variable X (the estimator is a statistic, i.e., function of a random sample) is itself a random variable with. value of the population parameter to be estimated; i.e., P (θ ∗ L ≤ θ ≤ θ ∗ R )=γ,(21.2.3.1) where θ ∗ L and θ ∗ R are the lower and upper (left and right) boundaries of the confidence interval of