Tài liệu Kinh tế ứng dụng_ Lecture 1: Normal Distribution pptx

7 326 0
Tài liệu Kinh tế ứng dụng_ Lecture 1: Normal Distribution pptx

Đang tải... (xem toàn văn)

Thông tin tài liệu

Applied Econometrics Normal Distribution 1 Applied Econometrics Lecture 1: Normal Distribution For many random variables, the probability distribution is a specific bell-shaped curve, called the normal curve, or Gaussian curve. This is the most common and useful distribution in statistics. 1) Standard normal distribution The standard normal distribution has the probability density function as follows: e z 2π 1 P(z)Y 2 2 1 − == Features of the curve are: 1) z 2 increases in the negative exponent. Therefore, P(z) decreases, approaching 0 symmetrically in both tails. 2) The mean, which is zero (μ = 0), is the balancing point or the center of symmetry. 3) The standard deviation is one (σ = 1) Example 1.1: If z has a standard normal distribution, find: P(-2<z<2) 1 Solution: P(-2<z<2) = 1 – P(z<-2) – P(z>2) = 1 – 2. (0.023) = 0.954 2) General normal distribution The general normal distribution has the probability density function as follows: e σ μX 2πσ 1 Y 2 2 1 ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − = − The quantity Y, which is the height of the curve at any point along the scale of X, is known as the probability density of that particular value of the variable quantity, X. Example 2.1: The local authorities in a certain city install 2,000 electricity lamps in the streets of the city. If these lamps have an average life of 1,000 burning hours, with a standard deviation of 200 hours, what number of the lamps might be expected to fail in the first 700 burning hours? 1 If z is continuous, P(z≥c) = P(X>c). In other words, ≥ and > can be used interchangeably for any continuous random variable. Written by Nguyen Hoang Bao May 17, 2004 Applied Econometrics Normal Distribution 2 Solution: In this case, we want to find the probability corresponding to the area of the probability curve below t = [(700-1000)/200] = -1.5. We ignore the sign and enter our table at 1.5 to find that the probability for lives less than 700 hours is P = 0.067. Hence the expected number of failures will be 2,000 x 0.067 = 134. Example 2.2: What number of lamps may be expected to fail between 900 and 1,300 burning hours? Solution: z The number of lamps, which will fail under 900 hours: The corresponding value of t = [(900 – 1000)/200] = -0.5. Entering the table with this value of t, we find for the probability of failure below 900 hours: P = 0.309. z The number of lamps, which will fail over 1,300: The corresponding value of t = [(1,300 – 1,000)/200] = 1.5. Entering the table with this value of t, we find for the probability of failure over 1,300 hours: P = 0.067. z Hence the probability of failure outside the limits 900 to 1,300 hours will be 0.376 (0.309+0.067 = 0.376). It follows that the number of lamps we may expect to fail outside these limits is: 2,000 x 0.376 = 752. But we were asked to find the number, which are likely to fail inside the limits stated. This is 2,000 – 752 = 1,248. Example 2.3: After what period of burning hours would you expect that 10% of the lamps would have failed? Solution: What we want here is the value of t corresponding to a probability P = 0.1. Looking along our table, we find that when t = 1.25 the probability is P = 0.106. This is near enough for our purpose of prediction. Hence we may take it that 10% of the lamps will fail at 1.25 standard deviations. Since one standard deviation is equal to 200 hours, it follows that 10% of the lamps will fail before 1,000 – 1.25 x (200) = 1,000 – 250 = 750 hours. 3) Moment-based characteristics of a distribution First moment Mean > Median: the distribution is skewed to the right Mean ≅ Median ≅ Mode: the distribution is symmetrically distributed Mean < Median: the distribution is skewed to the left Written by Nguyen Hoang Bao May 17, 2004 Applied Econometrics Normal Distribution 3 Second moment The spread of a distribution is measured by its standard deviation ( ) 1n X X S n 1i 2 i − = ∑ − = Third moment Coefficient of skewness: a 3 = (1/ns 3 ) ∑(X i - X ) 3 z Cubic power preserves the sign of an expression but inflate the larger deviations proportionally much more than smaller deviations. If the distribution is symmetrical, negative and positive cubic power will cancel each other out. z The cubic power of the standard deviation in the denominator is used to standardize the measure and so remove the dimension (i.e., it will not depend on the units in which the variable is measured) z If a 3 > 0, the distribution is skewed to the right (meaning its long tail is to the right) and the mean is greater than the median If a 3 ≅ 0, the distribution is normally distributed (approximate symmetry) and the mean is approximately equal to the median If a 3 < 0, the distribution is skewed to the left (meaning its long tail is to the left) and the mean is smaller than the median Fourth moment Coefficient of kurtosis: a 4 = (1/ns 4 ) ∑(X i - X ) 4 z Fourth powers make each sign positive but inflate larger deviations even more than cubic powers or squares would do. z The presence of heavy tails, therefore, will tend to inflate the numerator proportionally more than denominator. The fatter the tails, therefore, the higher the kurtosis. z The fourth power of the standard deviation in the denominator standardizes the measure and renders it dimensionless. Written by Nguyen Hoang Bao May 17, 2004 Applied Econometrics Normal Distribution 4 z If a 4 > 3, the distribution has heavier tails than a normal distribution If a 3 < 3, the distribution has a rectangular distribution which has a body but no tails Table 3.1: Moment-based characteristics of a distribution Measure Population Sample X ∼ N(0,1) First moment Center E(X) = μ X = (1/n) ∑X i 0 Second moment Spread E(X-μ) 2 = σ 2 S 2 = [1/(n-1)] ∑(X i - X ) 2 1 Third moment Skewness (1/σ 3 ) E(X-μ) 3 a 3 = (1/ns 3 ) ∑(X i - X ) 3 0 Fourth moment Kurtosis (1/σ 4 ) E(X-μ) 4 a 4 = (1/ns 4 ) ∑(X i - X ) 4 3 4) The skewness – kurtosis (Jarque – Bera) test for normality The hypothesis of normality distribution H 0 is as follows: H 0 : α 3 = 0 and α 4 = 3 Against H 1 : α 3 ≠ 0 or α 4 ≠ 3 or both The relevant test statistic is BJ which follows a chi-square distribution with two degree of freedom BJ = a 3 2 (n/6) + (a 4 – 3) 2 (n/24) If BJ > 5.99, the normality distribution is formally rejected. If BJ ≤ 5.99, we have no conclusion 5) Transformations towards normality If the data are unimodal but skewed, a data transformation is called for to correct for the skewness in the data. To do this we rely on the ladder of power transformations, which enable us to correct for differences in the direction of skewness (positive or negative) and its strength. Often, but not always, a transformation renders the transformed data symmetric, and, hopefully, also more normal in shape. If so, the classical model of inference about the population mean using the sample mean as estimator can again be used. Table 5.1 illustrates the hierarchy of these power transformations and their impact on the skewness in the data. Written by Nguyen Hoang Bao May 17, 2004 Applied Econometrics Normal Distribution 5 Table 5.1: Ladder of Power to Reduce Skewness Power p Transformation Effect on skewness 3 2 1 0 -1 X 3 X 2 X lnX 1/X Reduce extreme negative skewness Reduce negative skewness Leaves data unchanged Reduce positive skewness Reduce extreme positive skewness The power used in transformation need not be only an integer but can contain fractions as well. The choice of an appropriate transformation often involves a trade-off between one which is ideal for the purposes of data analysis and one which performs reasonably well on this count but also has the advantage that it lends itself to a more straightforward interpretation (in substantive terms) of the results. References Bao, Nguyen Hoang (1995), ‘Applied Econometrics’, Lecture notes and Readings, Vietnam-Netherlands Project for MA Program in Economics of Development. Maddala, G.S. (1992), ‘Introduction to Econometrics’, Macmillan Publishing Company, New York. Mukherjee Chandan, Howard White and Marc Wuyts (1998), ‘Econometrics and Data Analysis for Developing Countries’ published by Routledge, London, UK. Written by Nguyen Hoang Bao May 17, 2004 Applied Econometrics Normal Distribution 6 Workshop 1: Normal Distribution 1) Phil and Kim Bell do not know whether to buy a house now or wait a year, in which case a price increase may put a house beyond their reach. Their best guess is that, if they wait a year, the price increase will be approximately normal, with a mean of 8% and, reflecting the uncertainty of the market, a standard deviation of 10%. 1.1) If the price increase exceeds 25% they feel they will be unable to afford a house. What is the chance of this? 1.2) On the other hand, if the price drops, they will have won their gamble handsomely. What is the chance of this? 2) Using the data file SOCECON (with the world socioeconomic data for 1990) on the diskette, make histograms and compute means, median and modes for the following variables: GNP (gross national product) per capita HDI (human development index) FERT (fertility rate) LEXPM and LEXPF (male and female life expectancy) POPGRWTH (population growth rate) In each case, discuss the different averages in the light of the shape of the empirical distribution. Would you say that any of the distributions is reasonably symmetrical and bell-shaped? 3) Collecting the macroeconomic indicators Y (GDP), I (Investment), C (consumption), X (Exports) and M (Imports) at fixed price on the World Development Indicators 2003 for 200 countries in the world, 3.1) Make histograms and compute means, median and modes for the above variables 3.2) Calculate the coefficients of skewness and kurtosis 3.3) Use the Jarque – Bera test for normality of each variable 3.4) Transform each variable towards normality Written by Nguyen Hoang Bao May 17, 2004 Applied Econometrics Normal Distribution 7 4) Collecting data of life expectancy (LE) and GDP per capita (Y) of 200 countries (WDI 2003), 4.1) Plot the histogram (frequency graph) for each of your two samples (life expectancy and income per capita) 4.2) Calculate the mean, mode, and median for each of your two samples 4.3) Calculate the skewness and kurtosis for each of your two samples 4.4) Use the Bera – Jarque test for normality for each of your two samples 4.5) In each case find the most appropriate transformation so that the data are approximately normal 4.6) Calculate the regression coefficients from regressing LE on Y using a different functional forms LE = a 0 + a 1 Y ln(LE) = b 0 + b 1 Y LE = c 0 + c 1 lnY ln(LE) = d 0 + d 1 lnY and compare their coefficients of determination 4.7) Which of the models you have estimated best fits of the data? Discuss your results 4.8) Does the direction of causality exist? Written by Nguyen Hoang Bao May 17, 2004 . Econometrics Normal Distribution 1 Applied Econometrics Lecture 1: Normal Distribution For many random variables, the probability distribution is. Econometrics Normal Distribution 4 z If a 4 > 3, the distribution has heavier tails than a normal distribution If a 3 < 3, the distribution has

Ngày đăng: 27/01/2014, 11:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan