Statistical Tools for Environmental Quality Measurement

C H A P T E R 2 Basic Tools and Concepts Description of Data The goal of statistics is to gain information from data. The first step is to display the data in a graph so that our eyes can take in the overall pattern and spot unusual observations. Next, we often summarize specific aspects of the data, such as the average of a value, by numerical measures. As we study graphs and numerical summaries, we keep firmly in mind where the data come from and what we hope to learn from them. Graphs and numbers are not ends in themselves, but aids to understanding. ( Moore and McCabe, 1993) Every study begins with a sample, or a set of measurements, which is “representative” in some sense, of some population of possible measurements. For example, if we are concerned with PCB contamination of surfaces in a building where a transformer fire has occurred, our sample might be a set of 20 surface wipe samples chosen to represent the population of possible surface contamination measurements. Similarly, if we are interested in the level of pesticide present in individual apples, our sample might be a set of 50 apples chosen to be representative of all apples (or perhaps all apples treated with pesticide). Our focus here is the set of statistical tools one can use to describe a sample, and the use of these sample statistics to infer the characteristics of the underlying population of measurements. Central Tendency or Location The Arithmetic Mean Perhaps the first question one asks about a sample is what is a typical value for the sample. Usually this is answered by calculating a value that is in the middle of the sample measurements. Here we have a number of choices. We can calculate the arithmetic mean, , whose value is given by: [2.1] where the x i ’s are the individual sample measurements and N is the sample size. The Geometric Mean Alternatively, we can calculate the geometric mean, GM x , given by: [2.2] x x Σ x i N = GM x() Σ ln x i () N⁄()exp= steqm-2.fm Page 19 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC That is, GM(x) is the antilogarithm of the mean of the logarithms of the data value. Note that for the GM to be defined, all x’s must be greater than zero. If we calculate ln (GM(x)), this is called the logarithmic mean, LM(x), and is simply the arithmetic mean of the log-transformed x’s. The Median The median, M, is another estimator of central tendency. It is given by the 50th percentile of the data. If we have a sample of size N, sorted from smallest to largest (e.g., x 1 is the smallest observation and x N is the largest) and N is odd, the median is given by x j . Here j is given as: [2.3] That is, if we have 11 observations the median is equal to the 6th largest and if we have 7 observations, the median is equal to the 4th largest. When N is an even number, the median is given as: [2.4] In Equation [2.4], j and k are equal to (N/2) and ((N/2) + 1), respectively. For example if we had 12 observations, the median would equal the average of the 6th and 7th largest observations. If we had 22 observations, the median would equal the average of the 11th and 12th largest values. Discussion While there are other values, such as the mode of the data (the most frequent value) or the harmonic mean (the reciprocal of the mean of the 1/x values), the arithmetic mean, the geometric mean and the median are the three measures of central tendency routinely used in environmental quality investigations. The logarithmic mean is not of interest as a measure of central tendency because it is in transformed units (ln (concentration)), but does arise in considerations of hypothesis tests. Note also that all of these measures of sample central tendency are expected to represent the corresponding quantities in the population (often termed the “parent” population) from which the sample was drawn. That is, as the sample size becomes large, the difference between, for example, and µ (the parametric or “true” arithmetic mean) becomes smaller and smaller, and in the limit is zero. In statistical terms these “sample statistics” are unbiased estimators of the corresponding population parameters. Dispersion By dispersion we mean how spread out the data are. For example, say we have two areas, both with a median concentration of 5 ppm for some compound of interest. However, in the first area the 95th percentile concentration is 25 ppm while in the second, the 95th percentile concentration is 100 ppm. One might argue that the central tendency or location of the compound of interest is similar in these areas jN1 –()2⁄()1+= Mx j x k +()2⁄= x steqm-2.fm Page 20 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC (or not, depending on the purpose of our investigation; see Chapter 3), but the second area clearly has a much greater spread or dispersion of concentrations than the first. The question is, how can this difference be expressed? The Sample Range One possibility is the sample range, W, which is given by: [2.5] that is, W is the difference between the largest and smallest sample values. This is certainly a good measure of the dispersion of the sample, but is less useful in describing the underlying population. The reason that this is not too useful as a description of the population dispersion is that its magnitude is a function of both the actual dispersion of the population and the size of the sample. We can show this as follows: 1. The median percentile, mp max , of the population that the largest value in a sample of N observations will represent is given by: that is, if we have a sample of 10 observations, mp max equals 0.5 1/10 or 0.933. If instead we have a sample of 50 observations, mp max equals 0.5 1/50 or 0.986. That is, if the sample size is 10, the largest value in the sample will have a 50-50 chance of being above or below the 93.3rd percentile of the population from which the sample was drawn. However, if the sample size is 50, the largest value in the sample will have a 50-50 chance of being above or below the 98.6th percentile of the population from which the sample was drawn. 2. The median percentile, mp min , of the population that the smallest value in a sample of N observations will represent is given by: For a sample of 10 observations, mp min equals or 0.0.067, and for a sample of 50 observations, mp min equals 0.0.014. 3. Thus for a sample of 10 the range will tend to be the difference between the 6.7th and 93.3rd percentiles of the population from which the sample was drawn, while for a sample of 50, the range will tend to be the difference between the 1.4th and 98.6th percentiles of the population from which the sample was drawn. More generally, as the sample becomes larger and larger, the range represents the difference between more and more extreme high and low percentiles of the population. Wx max x min –= mp max 0.5 1/N = mp min 10.5– 1/N = steqm-2.fm Page 21 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC This is why the sample range is a function of both the dispersion of the population and the sample size. For equal sample sizes the range will tend to be larger for a population with greater dispersion, but for populations with the same dispersion the sample range will larger for larger N. The Interquartile Range One way to fix the problem of the range depending on the sample size is to calculate the difference between fixed percentiles of the data. The first problem encountered is the calculation of percentiles. We will use the following procedure: 1. Sort the N sample observations from smallest to largest. 2. Let the rank of an observation be I, its list index value. That is, the smallest observation has rank 1, the second smallest has rank 2, and so on, up to the largest value that has rank N. 3. The cumulative probability, P I, of rank I is given by: [2.6] This cumulative probability calculation gives excellent agreement with median probability calculated from the theory of order statistics. (Looney and Gulledge, 1995) To get values for cumulative probabilities not associated with a given rank. 1. Pick the cumulative probability, CP, of interest (e.g., 0.75). 2. Pick the P I value of the rank just less than CP. The next rank has cumulative probability value P I+1 (note that one cannot calculate a value for cumulative probabilities less than P 1 or greater than P N ). 3. Let the values associated with these ranks be given by V I = V L and V I+1 = V U . 4. Now if we assume probability is uniform between P I = P L and P I+1 = P U it is true that: [2.7] where V CP is the CP (e.g., 0.75) cumulative probability, V L is the value associated with the lower end of the probability interval, P L and V U is the value associated with the upper end of the probability interval, P U . One can rearrange [2.6] to obtain V 0.75 as follows: [2.8] This is general for all cumulative probabilities that we can calculate. Note that one cannot calculate a value for cumulative probabilities less than P 1 or greater than P N because in the first case P L is undefined and in the second P U is undefined. That is, if we wish to calculate the value associated with a cumulative probability of 0.95 in a sample of 10 observations, we find that we cannot because P 10 is only about 0.94. P I I3/8–()N1/4+()⁄= CP P L –()P U P L –()⁄ V CP V L –()V U V L –()⁄= V 0.75 V U V L –()x0.75 P L –()P U P L –()⁄()V L += steqm-2.fm Page 22 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC As one might expect from the title of this section, the interquartile range, IQ, given by: [2.9] is a commonly used measure of dispersion. It has the advantage that its expected width does not vary with sample size and is defined (calculable) for samples as small as 3. The Variance and Standard Deviation The sample variance, S 2 is defined as: [2.10] where the x i ’s are the individual sample measurements and N is the sample size. Note that one sometimes also sees the formula: [2.11] Here σ 2 is the population variance. The difference between [2.10] and [2.11] is the denominator. The (N − 1) term is used in [2.10] because using N as in [2.11] with any finite sample will result in an estimate of S 2 , which is too small relative to the true value of σ 2 . Equation [2.11] is offered as an option in some spreadsheet programs, and is sometimes mistakenly used in the calculation of sample statistics. This is always wrong. One should always use [2.10] with sample data because it always gives a more accurate estimate of the true σ 2 value. The sample standard deviation, S is given by: [2.12] that is, the sample standard deviation is the square root of the sample variance. It is easy to see that S and S 2 reflect the dispersion of the measurements. The variance is, for large samples, approximately equal to the average squared deviation of the observations from the sample mean, which as the observations get more and more spread out, will get larger and larger. If we can assume that the observations follow a normal distribution, we can also use and s to calculate estimates of extreme percentiles. We will consider this at some length in our discussion of the normal distribution. The Logarithmic and Geometric Variance and Standard Deviation Just as we can calculate the arithmetic mean of the log transformed observations, LM(x), and its anti-log, GM(x), we can also calculate the variance and standard deviation of these log-transformed measurements, termed the logarithmic variance, LV(y), and logarithmic standard deviation LSD(x), and their anti-logs, termed the geometric variance, GV(y), and geometric standard deviation, GSD(x), respectively. These measures of dispersion find application when the log-transformed measurements follow a normal distribution, which means that the measurements themselves follow what is termed a log-normal distribution. IQ V 0.75 V 0.25 –= S 2 Σ x i x–() 2 N1–() = σ 2 Σ x i x–() 2 N() = SS 2 () 1/2 = x steqm-2.fm Page 23 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC The Coefficient of Variation (CV) The sample CV is defined as: [2.13] that is, it is the standard deviation expressed as a percentage of the sample mean. Note that S and x have the same units. That is, if our measurements are in units of ppm, then both S and x are in ppm. Thus, the CV is always unitless. The CV is useful because it is a measure of relative variability. For example, if we have a measurement method for a compound, and have done ten replicates each at standard concentrations of 10 and 100 ppm, we might well be interested in relative rather than absolute precision because a 5% error at 10 ppm is 0.5 ppm, but the same relative error at 100 ppm is 5 ppm. Calculation of the CV would show that while the absolute dispersion at 100 ppm is much larger than that at 5 ppm, the relative dispersion of the two sets of measurements is equivalent. Discussion The proper measure of the dispersion of one’s data depends on the question one wants to ask. The sample range does not estimate any parameter of the parent population, but it does give a very clear idea of the spread of the sample values. The interquartile range does estimate the population interquartile range and clearly shows the spread between the 25th and 75th percentiles. Moreover, this is the only dispersion estimate that we will discuss that accurately reflects the same dispersion measure of the parent population and that does not depend on any specific assumed distribution for its interpretation. The arithmetic variance and standard deviation are primarily important when the population follows a normal distribution, because these statistics can help us estimate error bounds and conduct hypothesis tests. The situation with the logarithmic and geometric variance and standard deviation is similar. These dispersion estimators are primarily important when the population follows a log-normal distribution. Some Simple Plots The preceding sections have discussed some basic measures of location (arithmetic mean, geometric mean, median) and dispersion (range, interquartile range, variance, and standard deviation). However, if one wants to get an idea of what the data “look like,” perhaps the best approach is to plot the data (Tufte, 1983; Cleveland, 1993; Tukey, 1977). There are many options for plotting data to get an idea of its form, but we will discuss only three here. Box and Whisker Plots The first, called a “box and whisker plot” (Tukey, 1977), is shown in Figure 2.1. This plot is constructed using the median and the interquartile range (IQR). The IQR defines the height of the box, while the median is shown as a line within the box. The whiskers are drawn from the upper and lower hinges ((UH and LH; top and bottom of the box; 75th and 25th percentiles) to the largest and smallest observed values within 1.5 times the IQR of the UH and LH, respectively. Values between 1.5 and 3 times the IQR above or below the UH or LH are plotted as “*” and are termed CV S x⁄()100• = steqm-2.fm Page 24 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC “outside points.” Values beyond 3 times the IQR above or below the UH and LH values are plotted as “o” and are termed “far outside values.” The value of this plot is that is conveys a great amount of information about the form of one’s data in a very simple form. It shows central tendency and dispersion as well as whether there are any extremely large or small values. In addition one can assess whether the data are symmetric in the sense that values seem to be similarly dispersed above and below the median (see Figure 2.2D) or are “skewed” in the sense that there is a long tail toward high or low values (see Figure 2.4). Dot Plots and Histograms A dot plot (Figure 2.2A) is generated by sorting the data into “bins” of specified width (here about 0.2) and plotting the points in a bin as a stack of dots (hence the Figure 2.1 A Sample Box Plot 44 36 28 20 12 908 cases 4 Upper Whisker Lower Whisker steqm-2.fm Page 25 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC name dot plot). Such plots can give a general idea of the shape and spread of a set of data, and are very simple to interpret. Note also that the dot plot is similar in concept to a histogram (Figure 2.2B). A key difference is that when data are sparse, a dot plot will still provide useful information on the location and spread of the data whereas a histogram may be rather difficult to interpret (Figure 2.2B). When there are substantial number of data points, histograms can provide a good look at the relative frequency distribution of x. In a histogram the range of the data is divided into a set of intervals of fixed width (e.g., if the data range from 1 to 10, we might pick an interval width of 1, which would yield 10 intervals). The histogram is constructed by counting up the data points whose value lies in a given interval and drawing a bar whose height corresponds to the number of observations in the interval. In practice the scale for the heights of the bars may be in either absolute or relative units. In the first case the scale is simply numbers of observations, k, while in the second, the scale is in relative frequency, which is the fraction of the total sample, N, that is represented by a given bar (relative frequency = k/N). Both views are useful. An absolute scale allows one to see how many points a given interval contains, which can be useful for small- to medium-sized data sets, while the relative scale provides information on the frequency distribution of the data, which can be particularly useful for large data sets. Empirical Cumulative Distribution Plots If we sort the observations in a sample from smallest to largest, we can calculate the proportion of the sample less than or equal to a given observation by the simple equation I/N, where N is the sample size and I is the rank of the observation in the sorted sample. We could also calculate the expected cumulative proportion of the population associated with the observation using Equation [2.6]. In either case, we can then plot the x’s against their calculated cumulative proportions to produce a plot like that shown in Figure 2.2C. These empirical cumulative distribution plots can show how rapidly data values increase with increasing rank, and are also useful in determining what fraction of the observations are above some value of interest. Figure 2.2A Examples of Some Useful Plot Types A. An Example Dot Plot steqm-2.fm Page 26 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC Figure 2.2B Examples of Some Useful Plot Types Figure 2.2C Examples of Some Useful Plot Types Figure 2.2D Examples of Some Useful Plot Types B. An Example Histogram C. An Example Empirical Cumulative Distribution Plot 100 80 60 20 0 -1.9 -1.7 -1.5-1.3 -1.1 -0.9 -0.7 -0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.51.7 1.92.1 Score Percent D. An Example Box and Whisker Plot 2.1 1.3 0.5 0.3 -1.1 -1.9 Score steqm-2.fm Page 27 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC Describing the Distribution of Environmental Measurements Probability distributions are mathematical functions that describe the probability that the value of x will lie in some interval for continuous distributions, or, x will equal some integer value for discrete distributions (e.g., integers only). There are two functional forms that are important in describing these distributions, the probability density function (PDF) and the cumulative distribution function (CDF). The PDF, which is written as f(X) can be thought of, in the case of continuous distributions, as providing information on the relative frequency or likelihood of different values of x, while for the case of discrete distributions it gives the probability, P, that x equals X; that is: [2.14] The CDF, usually written as F(X), always gives the probability that y is less than or equal to x; that is: [2.15] The two functions are related. For discrete distributions: [2.16] For continuous distributions: [2.17] that is, the CDF is either the sum or integral of x between the minimum value for the distribution in question and the value of interest, X. If one can find a functional form that they are willing to assume describes the underlying probability distribution for the observational set of measurements, then this functional form may be used as a model to assist with decision making based Table 2.1 Data Used in Figure 2.2 − 1.809492 − 1.037448 − 0.392671 0.187575 0.9856874 1.4098688 − 1.725369 − 0.746903 − 0.275223 0.4786776 0.9879926 1.4513166 − 1.402125 − 0.701965 − 0.136124 0.7272926 0.9994073 1.594307 − 1.137894 − 0.556853 − 0.095486 0.8280398 1.1616498 1.6920667 − 1.038116 − 0.424682 − 0.017390 0.8382502 1.2449281 2.0837023 fX() Px( X)== FX() Px X)≤(= FX() fx() x min= X ∑ = FX() fx() xd x = min X ∫ = steqm-2.fm Page 28 Friday, August 8, 2003 8:05 AM ©2004 CRC Press LLC [...]... 32. 801 37.697 16 19.369 20 .465 21 .793 23 .5 42 26 .29 6 28 .845 32. 000 34 .26 7 39 .25 2 17 20 .489 21 .615 22 .977 24 .769 27 .587 30.191 33.409 35.718 40.790 18 21 .605 22 .760 24 .155 25 .989 28 .869 31. 526 34.805 37.156 42. 3 12 19 22 .718 23 .900 25 . 329 27 .20 4 30.144 32. 8 52 36.191 38.5 82 43. 820 20 23 . 828 25 .038 26 .498 28 .4 12 31.410 34.170 37.566 39.997 45.315 25 29 .339 30.675 32. 2 82 34.3 82 37.6 52 40.646 44.314 46. 928 ... 10.645 12. 5 92 14.449 16.8 12 18.548 22 .458 7 9.037 9.803 10.748 12. 017 14.067 16.013 18.475 20 .27 8 24 . 322 8 10 .21 9 11.030 12. 027 13.3 62 15.507 17.535 20 .090 21 .955 26 . 124 9 11.389 12. 2 42 13 .28 8 14.684 16.919 19. 023 21 .666 23 .589 27 .877 10 12. 549 13.4 42 14.534 15.987 18.307 20 .483 23 .20 9 25 .188 29 .588 11 13.701 14.631 15.767 17 .27 5 19.675 21 . 920 24 . 725 26 .757 31 .26 4 12 14.845 15.8 12 16.989 18.549 21 . 026 23 .337... 2. 25460 7 0 .21 901 − 0.77555 1 2. 35476 8 0 .25 207 − 0.66800 2 2.551 02 9 0 .28 5 12 − 0.56769 1 2. 821 49 10 0.31818 − 0.4 727 9 2 3. 025 82 11 0.35 124 − 0.38198 2 3.05 824 12 0.38430 − 0 .29 421 1 3. 124 14 13 0.41736 − 0 .20 866 1 3.30163 14 0.45041 − 0. 124 62 1 3.34199 15 0.48347 − 0.04144 1 3.53368 16 0.51653 0.04144 2 3.68704 17 0.54959 0. 124 62 1 3.85 622 18 0.5 826 4 0 .20 866 2 3. 920 88 19 0.61570 0 .29 421 2 3.95630 20 0.64876... 20 0.64876 0.38198 2 4.051 02 21 0.681 82 0.4 727 9 1 4.09 123 22 0.71488 0.56769 2 4.151 12 23 0.74793 0.66800 2 4.33303 24 0.78099 0.77555 2 4.34548 25 0.81405 0.8 929 2 2 4.35884 26 0.84711 1. 024 11 2 4.51400 27 0.88017 1.17581 2 4.67408 28 0.91 322 1.36087 2 5.04013 29 0.94 628 1.609 82 2 5.33090 30 0.97934 2. 04 028 20 04 CRC Press LLC steqm -2 . fm Page 37 Friday, August 8, 20 03 8:05 AM Figure 2. 5 A Normal Scores... 23 .337 26 .21 7 28 .300 32. 909 13 15.984 16.985 18 .20 2 19.8 12 22. 3 62 24.736 27 .688 29 .819 34. 528 14 17.117 18.151 19.406 21 .064 23 .685 26 .119 29 .141 31.319 36. 123 20 04 CRC Press LLC steqm -2 . fm Page 42 Friday, August 8, 20 03 8:05 AM Table 2. 7 (Cont’d) Critical Values of the χ 2 Distribution Tail Probabilities df 0 .25 0 0 .20 0 0.150 0.100 0.050 0. 025 0.010 0.005 0.001 15 18 .24 5 19.311 20 .603 22 .307 24 .996 27 .488... 0.0667 -1 .35 02 0.0885 0.0115 0. 021 8 1 1.99801 4 0.1333 0.1000 -1 .23 31 0.1088 0. 024 6 0.0088 1 2. 09184 5 0.1667 0.1333 -1 .1473 0. 125 6 0.0410 0.0077 1 2. 20077 6 0 .20 00 0.1667 -1 .0478 0.1474 0.0 526 0.0193 1 2. 25460 7 0 .23 33 0 .20 00 -0 .9986 0.1590 0.0743 0.0410 1 2. 35476 8 0 .26 67 0 .23 33 -0 .9071 0.1 822 0.0845 0.0511 2 2.551 02 9 0.3000 0 .26 67 -0 . 727 7 0 .23 34 0.0666 0.0333 1 2. 821 49 10 0.3333 0.3000 -0 .4805... Categories Used for Observed Category Expected Chi-Square Test Frequency Probabilities Frequency d Statistics 13 7 7 7 0.0957 9.5734 0.6918 14 12 12 12 0.1 025 10 .25 04 0 .29 86 15 14 14 14 0.1 024 10 .24 36 1.3775 16 10 10 10 0.0960 9.5969 0.0169 17 8 8 8 0.0846 8.4 622 0. 025 2 18 12 12 12 0.0705 7.04 72 3.4809 19 7 7 7 0.0556 5.5598 0.3730 22 3 20 + 6 0. 124 2 12. 422 0 3. 320 1 23 1 24 1 25 1 Totals 100 Table 2. 7 Critical... 0.1 124 2 3. 920 88 19 0.6333 0.6000 0. 524 2 0.6999 0.0666 0.0999 2 3.95630 20 0.6667 0.6333 0.5565 0.7111 0.0444 0.0777 2 4.051 02 21 0.7000 0.6667 0.6431 0.7399 0.0399 0.0733 1 4.09 123 22 0.7333 0.7000 0.6799 0.7517 0.0184 0.0517 2 4.151 12 23 0.7667 0.7333 0.7346 0.7687 0.0 020 0.0354 2 4.33303 24 0.8000 0.7667 0.9008 0.81 62 0.01 62 0.0495 2 4.34548 25 0.8333 0.8000 0.9 122 0.81 92 0.01 42 0.01 92 2 4.35884 26 ... distribution 20 04 CRC Press LLC steqm -2 . fm Page 36 Friday, August 8, 20 03 8:05 AM Table 2. 4 A Sample Calculation for a Normal Probability Plot and Goodness-of-Fit Test Sample ID Data Values Rank (Rank-3/8)/ 30 .25 Z-Score 1 0.88858 1 0. 020 66 − 2. 04 028 1 1.6 925 3 2 0.053 72 − 1.609 82 1 1.86986 3 0.08678 − 1.36087 1 1.99801 4 0.11983 − 1.17581 1 2. 09184 5 0.1 528 9 − 1. 024 11 1 2. 20077 6 0.18595 − 0.8 929 2 1 2. 25460... 0.0154 2 3. 025 82 11 0.3667 0.3333 -0 .29 38 0.3845 0.0178 0.0511 2 3.05 824 12 0.4000 0.3667 -0 .26 42 0.3958 0.00 42 0. 029 2 1 3. 124 14 13 0.4333 0.4000 -0 .20 40 0.41 92 0.0141 0.01 92 1 3.30163 14 0.4667 0.4333 -0 .0417 0.4834 0.0167 0.0500 1 3.34199 15 0.5000 0.4667 -0 .0049 0.4981 0.0019 0.0314 1 3.53368 16 0.5333 0.5000 0.1703 0.5676 0.0343 0.0676 2 3.68704 17 0.5667 0.5333 0.3105 0. 621 9 0.05 52 0.0886 1 3.85 622 . 0.8 929 2 1 2. 25460 7 0 .21 901 − 0.77555 1 2. 35476 8 0 .25 207 − 0.66800 2 2.551 02 9 0 .28 5 12 − 0.56769 1 2. 821 49 10 0.31818 − 0.4 727 9 2 3. 025 82 11 0.35 124 − 0.38198 2 3.05 824 12 0.38430 − 0 .29 421 1 3. 124 14. 3.95630 20 0.64876 0.38198 2 4.051 02 21 0.681 82 0.4 727 9 1 4.09 123 22 0.71488 0.56769 2 4.151 12 23 0.74793 0.66800 2 4.33303 24 0.78099 0.77555 2 4.34548 25 0.81405 0.8 929 2 2 4.35884 26 0.84711 1. 024 11 2. 2. 25460 7 0 .23 33 0 .20 00 -0 .9986 0.1590 0.0743 0.0410 1 2. 35476 8 0 .26 67 0 .23 33 -0 .9071 0.1 822 0.0845 0.0511 2 2.551 02 9 0.3000 0 .26 67 -0 . 727 7 0 .23 34 0.0666 0.0333 1 2. 821 49 10 0.3333 0.3000 -0 .4805

Statistical Tools for Environmental Quality Measurement - Chapter 2 pps

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Table of Contents

CHAPTER 2: Basic Tools and Concepts Description of Data

Central Tendency or Location

The Arithmetic Mean

The Geometric Mean

The Median

Discussion

Dispersion

The Sample Range

The Interquartile Range

The Variance and Standard Deviation

The Logarithmic and Geometric Variance and Standard Deviation

The Coefficient of Variation (CV)

Discussion

Some Simple Plots

Box and Whisker Plots

Dot Plots and Histograms

Empirical Cumulative Distribution Plots

Describing the Distribution of Environmental Measurements

The Normal Distribution

The t Distribution

The Log-Normal Distribution

Does a Particular Statistical Distribution Provide a Useful Model?

The Kolmogorov-Smirnov (K-S) Test for Goodness of Fit

Normal Probability Plots

Tài liệu cùng người dùng

Tài liệu liên quan