Chapter 11: STASTISTICAL INFERENCE

19 263 0
Chapter 11: STASTISTICAL INFERENCE

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

PART III Statistical inference CHAPTER The 11.1 II nature of statistical inference Introduction In the discussion of descriptive statistics in Part it was argued that in order to be able to go beyond the mere summarisation and description of the observed data under consideration it was important to develop a mathematical model purporting to provide a generalised description of the data generating process (DGP) Motivated by the various results on frequency curves, a probability model in the form of the parametric family of density functions ®= { f(x; 0), @ â} and its various ramifications was formulated in Part II, providing such a mathematical model Along with the formulation of the probability model ® various concepts and results were discussed in order to enable us to extend and analyse the model, preparing the way for statistical inference to be considered in the sequel Before we go on to consider that, however, it is important to understand the difference between the descriptive study of data and statistical inference As suggested above, the concept of a density function in terms of which the probability model is defined was motivated by the concept of a frequency curve It is obvious that any density function f(x; 6) can be used as a frequency curve by reinterpreting it as a non-stochastic function of the observed data This precludes any suggestions that the main difference between the descriptive study of data and statistical inference proper lies with the use of density functions in describing the observed data ‘What is the main difference then” In descriptive statistics the aim is to summarise and describe the data under consideration and frequency curves provide us with a convenient way to that The choice ofa frequency curve is entirely based on the data in hand On the other hand, in statistical inference a probability model © is 213 214 The nature of statistical inference postulated a priori as a generalised description of the underlying DGP giving rise to the observed data (not the observed data themselves) Indeed, there 1s nothing stochastic about a set of members making up the data The stochastic element is introduced into the framework in the form of uncertainty relating to the underlying DGP and the observed data are viewed as one of the many possible realisations In descriptive statistics we start with the observed data and seek a frequency curve which describes these data as closely as possible In statistical inference we postulate a probability model ® a priori, which purports to describe either the DGP giving rise to the data or the population which the observed data came from These constitute fundamental departures from descriptive statistics allowing us to make generalisations beyond the numbers in hand This being the case the analysis of observed data in statistical inference proper will take a very different form as compared with descriptive statistics briefly considered in Part I In order to see this let us return to the income data discussed in Chapter There we considered the summarisation and description of personal income data on 23 000 households using descriptors like the mean, median, mode, variance, the histogram and the frequency curve These enabled us to get some idea about the distribution of incomes among these households The discussion ended with us speculating about the possibility of finding an appropriate frequency curve which depends on few parameters enabling us to describe the data and analyse them in a much more convenient way In Section 4.3 we suggested that the parametric family of density functions of the Pareto distribution o=| fs nL () 6+1 1x0, der.) (Lt) could provide a reasonable probability model for incomes over £4500 As can be seen, there is only one unknown parameter @ which once specified (x; 8) is completely determined In the context of statistical inference we postulate ® a priori as a stochastic model not of the data in hand but of the distribution of income of the population from which the observed data constitute one realisation, i.e the UK households Clearly, there is nothing wrong with using f(x; @) as a frequency curve in the context of descriptive statistics by returning to the histogram of these data and after plotting F(x; 6) for various values of 8, say = 1, 1.5, 2, choose the one which comes closer to the frequency polygon For the sake of the argument let us assume that the curve chosen is @= 1.5, Le 1.5 (4500 Fx) “ sng ( =) 2.5 = 452 804 x7 25, (11.2) 11.2 The sampling model 215 This provides us with a very convenient descriptor of these data as can be easily seen when compared with the cumbersome histogram function Me FRO) = t $i (11.3) KL: X41) +Ö(X;+¡ —X;) (see Chapter 2) But it is no more than a convenient descriptor of the data in hand For example, we cannot make any statements about the distribution of personal income in the UK on the basis of the frequency curve f*(x) In order to that we need to consider the problem in the context of statistical inference proper By postulating ® above as a probability model for the distribution of income in the UK and interpreting the observed data asa sample from the population under study we could go on to consider questions about the unknown parameter as well as further observations from the probability model, see Section 11.4 below In Section 11.2 the important concept of a sampling model is introduced as a way to link the probability model postulated, say ®= { f(x; 6), c@), to the observed data x=(x;, ,x„Y available The sampling model provides the second important ingredient needed to define a statistical model; the starting point of any ‘parametric’ statistical inference In Section 11.3, armed with the concept ofa statistical model, we go on to discuss a particular approach to statistical inference, known as the frequency approach The frequency approach is briefly contrasted with another important approach to statistical inference, the Bayesian A brief overview of statistical inference is considered in Section 11.4 asa prelude to the discussion of the next three chapters The most important concept in statistical inference is the concept of a statistic which is discussed in Section 11.5 This concept and its distribution provide the cornerstone for estimation, testing and prediction 11.2 The sampling model As argued above, the probability model ®= { f(x; 6), @¢ O} constitutes a very important component of statistical inference Another important element in the same context is what we call a sampling model, which provides the link between the probability model and the observed data It is designed to model the relationship between them and refers to the way the observed data can be viewed in relation to ® In order to be able to formulate sampling models we need to define formally the concept of a sample in statistical inference 216 The nature of statistical inference Definition A sample is defined to be a set of random variables (r.v.s) (Xị, X ., X,) whose density functions coincide with the ‘true’ density function f(x; 9) as postulated by the probability model Note that the term sample has a very precise meaning in this context and it is not the meaning attributed in everyday language In particular the term does not refer to any observed data as the everyday use of the term might suggest The significance of the concept becomes apparent when we learn that the observed data in this context are considered to be one of the many possible realisations of the sample In this interpretation lies the inductive argument of statistical inference which enables us to extend the results based on the observed data in hand to the underlying mechanism giving rise to them Hence the observed data in this context are no longer just a set of numbers we want to make some sense of, they represent a particular outcome of an experiment; the experiment as defined by the sampling model postulated to complement the probability model đ= { f(x; 6) 0câ) Given that a sample is a set of r.v.s related to ® it must distribution which we call the distribution of the sample have a Definition The distribution of the sample X =(X, X,,)' is defined to be the joint distribution of the rv/s X, , X,, denoted by ƒ(Xị X„; Ø) Sƒf(X: Ø) The distribution of the sample incorporates both forms of relevant information, the probability as well as sample information It must come as no surprise to learn that f(x; 0) plays a very important role in statistical inference The form of f(x; 6) depends crucially on the nature of the sampling model as well as ® The simplest but most widely used form ofa sampling model ts the one based on the idea ofa random experiment & (see Chapter 3) and is called a random sample Definition A set of random variables (X,, X, X,) is called a random sample from f(x; 0) if the rvs X,.X>, ,X,, are independent and identically distributed (IID) In this case the distribution of the 11.2 The sampling model 217 sample takes the form f(Xi.X; X„; Ổ)= [] i=] ƒ(x;:Ø)=[ ƒf(x: 0)]J" the first equality due to independence and the second to the fact that the rvs ure identically distributed One of the important ingredients of a random experiment is that the experiment can be repeated under identical conditions This enables us to construct a random sample by repeating the experiment n times Such a procedure of constructing a random sample might suggest that this is feasible only when experimentation ts possible Although there is some truth in this presupposition, the concept ofa random sample is also used in cases where the experiment can be repeated under identical conditions, if only conceptually In order to see this let us consider the personal income example where ® represents a Pareto family of density functions ‘What isa random sample in this case” If we can ensure that every household in the UK has the same chance of being selected in one performance of a conceptual experiment then we can interpret the n households selected as representing a random sample (X,, X>, , X,,) and their incomes (the observed data) as being a realisation of the sample In general we denote the sample by X=(X, ,X,,)' and its realisation by x =(x, ,x,)', where x is assumed + =R to take values in the observation space 2% i.e x 6.4; usually A less restrictive form ofa sampling model is what we call an independent sample, where the identically distributed condition in the random sample is relaxed Definition A set of ruos(X,, ,X,,) is suid to be an independent sample from f(x 0), f= 1, 2, , n, respectively if the rves X, X, are independent In this case the distribution of the sample takes the form AXXO = PY] £058) (11.4) i=l Usually the density functions ƒ(x,:Ø,), ¡=1,2 n belong to the same family but their numerical characteristics (moments, etc.) may differ If we relax the independence assumption as well we have what we can call a non-random sample 218 The nature of statistical inference Definition Š A set of rcs(X,, ,X,) is said to be a non-random sample from ƒ(Xi, x„: Ö) jƒ the rv’s X,, ,X, are non-HD In this case the only decomposition of the distribution of the sample possible is ƒ(X¡ Xz , Xu; Ổ)= II ƒ(Xj/Xị v Xi~i:Ô), i= given Xo, where f(x;/X,, ,X;~1/9,), i=1,2, ,n, (11.5) represent conditional distribution of X, given X,, Xy, , X;-1 the A non-random sample is clearly the most general of the sampling models considered above and includes the independent and random samples as special cases given that ƒ(X//Xị.- „X;~1;Ø,)=ƒ(X,;8,), i= 1,2, cà (11.6) when X,, ,X,, are independent r.v.’s Its generality, however, renders the concept non-operational unless certain restrictions are imposed on the heterogeneity and dependence among the X;s Such restrictions have been extensively discussed in Sections 8.2~—3 In Part IV the restrictions often used are stationarity and asymptotic independence In the context of statistical inference we need to postulate both a probability as well as a sampling model and thus we define a statistical model as comprising both Definition A statistical model is defined as comprising (i) (ii) a probability model đ= { f(x; 0), â}; and a sampling model X=(X,, X3, , X,)'- The concept of a statistical model provides the starting point of all forms of statistical inference to be considered in the sequel To be more precise, the concept of a statistical model forms the basis of what is known as parametric inference There is also a branch of statistical inference known as non-parametric inference where no ® is assumed a priori (see Gibbons (1971)) Non-parametric statistical inference is beyond the scope of this book It must be emphasised at the outset that the two important components of a statistical model, the probability and sampling models, are clearly interrelated For example, we cannot postulate the probability model ®= 11.3 The frequency approach 219 { f(x; 0), 0€@} if the sample X is non-random This is because if the r.v.’s X,, , X, are not independent the probability model must be defined in terms of their joint distribution, i.e = { f(x,, ,x,3 0), 8€O} Moreover, in the case of an independent but not identically distributed sample we need to specify the individual density functions for each r.v in the sample, i.e = { f(x, 9), 06 O, K=1, 2, , n} The most important implication of this relationship is that when the sampling model postulated is found to be inappropriate it means that the probability model has to be respecified as well Several examples of this are encountered in Chapters 21 to 23 11.3 The frequency approach In developing the concept of a probability model in Part II it was argued that no interpretation of probability was needed The whole structure was built upon the axiomatic approach which defined probability as a set function P(-): #— [0,1] satisfying various axioms and devoid of any interpretations (see Section 3.2) In statistical inference, however, the interpretation of the notion of probability is indispensable The discerning reader would have noted that in the above introductory discussion we have already adopted a particular attitude towards the meaning of probability In interpreting the observed data as one of many possible realisations of the DGP as represented by the probability model we have committed ourselves towards the frequency interpretation of probability This is because we implicitly assumed that if we were to repeat the experiment under identical conditions indefinitely (ic with the number of observations going to infinity) we would be able to reconstruct the probability model ® In the case of the income example discussed above, this amounts to assuming that if we were to observe everybody’s income and plot the relative frequency curve for incomes over £4500 we would get a Pareto density function This suggests that the frequency approach to statistical inference can be viewed as a natural extension of the descriptive study of data with the introduction of the concept of a probability model In practice we never have an infinity of observations in order to recover the probability model completely and hence caution should be exercised in interpreting the results of the frequency-approach-based statistical methods which we consider in the sequel These results depend crucially on the probability model which we interpret as referring to a situation where we keep on repeating the experiment to infinity This suggests that the results should be interpreted as holding under the same circumstances, i.e ‘in the long run’ or ‘on average’ Adopting such an interpretation implies that we should propose statistical procedures which give rise to ‘optimum results’ according to 220 The nature of statistical inference Probability model ®= {f(x;6),6 € Of Distribution of the sample F(x4,Xa, -,X,/6) X= Sampling model (X,,X2, , X,) Observed data X= (x1, X2, ++ Xp) Fig 11.1 The frequentist approach to statistical inference criteria related to this ‘long-run’ interpretation Hence, it is important to keep this in mind when reading the following chapters on criteria for optimal estimators, tests and perdictors The various approaches to statistical inference based on alternative interpretations of the notion of probability differ mainly in relation to what constitutes relevant information for statistical inference and how it should be processed In the case of the frequency approach (sometimes called the classical approach) the relevant information comes in the form of a probability model ®= { f(x; 0), Øc@} and a sampling model X=(X;, X;, , X,,)', providing the link between ® and the observed data x =(x,, X>, ,X,)' The observed data are in effect interpreted as a realisation of the sampling model, i.e X =x This relevant information is then processed via the distribution of the sample f(x,,.x2, , X43 8) (see Fig 11.1) The ‘subjective’ interpretation of probability, on the other hand, leads to a different approach to statistical inference This is commonly known as the Bayesian approach because the discussion is based on revising prior beliefs about the unknown parameters @ in the light of the observed data using Bayes’ formula The prior information about @ comes in the form of a probability distribution (6); that is, @is assumed to be a random variable The revision to the prior f(6) comes in the form of the posterior distribution ƒ({Ø/x) via Bayes’ formula: /46X= ƒ(x/0) /() f ng + /tx/Ø)/(0) (11.7) f(x/0) being the distribution of the sample and f(x) being constant for 11.4 An overview of statistical inference 221 X=x For more details and an excellent discussion of the frequency and Bayesian approaches to statistical inference see Barnett (1973) In what follows we concentrate exclusively on the frequency approach 11.4 An overview of statistical inference As defined above the simplest form of a statistical model comprises: (i) (ii) a probability model ®= { f(x; 6), @€ @}; and a sampling model X=(X;, X; , X„} — a random sample Using this simple statistical model, let us attempt a brief overview of statistical inference before we consider the various topics individually in order to keep the discussion which follows in perspective The statistical model in conjunction with the observed data enable us to consider the following questions: (1) Are the observed data consistent with the postulated statistical model? (misspecification) (2) Assuming that the statistical model postulated is consistent with the observed data, what can we infer about the unknown parameters 0€ ©? (a) Can we decrease the uncertainty about @ by reducing the parameter space from © to ©, where @, is a subset of ©? (confidence estimation) (b) Can we decrease the uncertainty about @ by choosing a particular value in ©, say 6, as providing the most representative value of 0? (point estimation) (c) Can we consider the question that @ belongs to some subset O, of ©? (hypothesis testing) (3) Assuming that a particular representative value @ of @ has been chosen what can we infer about further observations from the DGP as described by the postulated statistical model? (prediction) The above questions describe the main areas of statistical inference Comparing these questions with the ones we could ask in the context of descriptive statistics we can easily appreciate the role of probability theory in statistical inference The second question posed above (the first question is considered in the appendix below) assumes that the statistical model postulated is ‘valid’ and considers various forms of inference relating to the unknown parameters Ø Point estimation (or just estimation): refers to our attempt to give numerical value to This entails constructing a mapping h(): #—© Fig 11.2) We a (see call function h(X) an estimator of @ and its value h(x) an 222 The nature of statistical inference # h{*) Fig 11.2 Point estimation Fig 11.3 Interval estimation estimate of Chapters 12 and 13 on point estimation deal with the issues of defining and constructing ‘optimal’ estimators, respectively Confidence estimation: refers to the construction of a numerical region for 6, in the form of a subset ©, of © (see Fig 11.3) Again, confidence estimation comes in the form of a multivalued function (one-to-many) g(-): > © Hypothesis testing, on the other hand, relates to some a priori statement about @ H,0¢0, situation as ‘valid’ of the form Hy: Ø0e©ạ, ©, 1, (11.25) assuming that w„< % It turns out that in practice the statistics g(X) of interest are often functions of these sample moments Examples of such continuous functions of the sample raw moments are the sample central moments being defined by H=- V(X X,Y, r>1 (11.26) These provide us with a direct extension of the sample variance and they represent the sample equivalents to the central moments „=| (x—7(x) dx (11.27) With the help of asymptotic theory we could generalise the above asymptotic results related to m,, r>1, to those of Y, =q(X) where q(-) is a Borel function conditions For example, we could show that under the same a.S () By > Hy P (11) Hy (iii) v0), —u) — `7 N0, 1; (11.28) OF?= Mạy+¿T— HZ+¡ — 2+ 1)M,H, ; + (F + 1)®HH;, (11.29) where HS assuming that u;„< œ; see exercise Í 228 The nature of statistical inference Asymptotic results related to Y, =q(X) can be used when the distribution of Y, is not available (or very difficult to use) Although there are many ways to obtain asymptotic results in particular cases it is often natural to proceed by following the pattern suggested by the limit theorems in Chapter 9: Step | Under certain conditions Y, = q(X) can be shown to converge in probability to some function of h(6) of @, ice P Y,+h(0), as or Y,—h(6) (11.30) Step Construct two sequences {h,(6), c,(0), n= 1} such that Y*= ¥: — ¥y— fol ) c„(0) D —= Z~N(0, l) (11.31) Let F,,(y*) denote the asymptotic distribution of Y*, then for large n (11.32) FQ)= F.(9*), and F,.(y*) can be used as the basis of any inference relating to Y,=q(X) A question which naturally comes to mind is how large n should be to justify the use of these results Commonly no answer is available because the answer would involve the derivation of F,,() whose unavailability was the very reason we had to resort to asymptotic theory In certain cases higherorder approximations based on asymptotic expansions can throw some light on this question (see Chapter 10) In general, caution should be exercised when asymptotic results are used for relatively small values of n, say n< 100? Appendix 11.1 — The empirical distribution function The first question posed in Section 11.4 relates to the validity of the probability and sampling models postulated One way to consider the validity of the probability model postulated is via the empirical distribution function F*(x) defined by F?(x)=- (number of x,s y where ru) —2 ¥ (- 1" ' expt 242523 | k-1 yeR, This asymptotic distribution of J nD,, can be used to test the validity of ®; see Section 21.2 Important concepts Sample, the distribution of the sample, sampling model, random sample, independent sample, non-random sample, observation space, statistical model, empirical distribution function, point estimation, confidence estimation, hypothesis testing, a statistic, sample mean, sample variance, sample raw moments, sample central moments, the distribution of a statistic, the asymptotic distribution of a statistic Questions Discuss the difference between descriptive statistics and inference statistical 230 The nature of statistical inference Contrast f(x; 6) as a descriptor of observed data with f(x; 6) as a member of a parametric family of density functions Explain the concept of a sampling model and discuss its relationship to the probability model and the observed data Compare the sampling models: (i) random sample; (it) independent sample; (m) non-random sample; and explain the form of the distribution of the sample in each case Explain the concept of the empirical distribution function ‘Estimation and hypothesis testing is largely a matter of constructing mappings of the form g(-): — @ Discuss Explain why a Statistic is a random variable Ensure that you understand the results (11.15}(11.21) (see Appendix 6.1) ‘Being able to derive the distribution of statistics of interest is largely 10 what statistical inference is all about.’ Discuss Discuss the concept of a statistical model Exercises 1.* Using the results (22)-(29) show that for a random sample X from a distribution whose first four moments exist, eee(0 aoe) Additional references Barnett (1973); Bickel and Doksum (1977); Cramer (1946); Dudewicz (1976) ... approach to statistical inference, the Bayesian A brief overview of statistical inference is considered in Section 11.4 asa prelude to the discussion of the next three chapters The most important... There is also a branch of statistical inference known as non-parametric inference where no ® is assumed a priori (see Gibbons (1971)) Non-parametric statistical inference is beyond the scope of this... entirely based on the data in hand On the other hand, in statistical inference a probability model © is 213 214 The nature of statistical inference postulated a priori as a generalised description of

Ngày đăng: 17/12/2013, 15:19

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan