Extending ABC methods to high dimensions using gaussian copula

EXTENDING ABC METHODS TO HIGH DIMENSIONS USING GAUSSIAN COPULA LI JINGJING NATIONAL UNIVERSITY OF SINGAPORE 2012 ii ACKNOWLEDGEMENTS I would like to express my greatest gratitude to my supervisor, Associate Professor David Nott for his excellent guidance. This thesis would not have been possible without his help and suggestions. I was introduced to this interesting topic by him and I gained a lot on this topic from studying this topic through discussions with him. I am very impressed by his passion and patience when I approached him for explanations. It has been a very pleasant journey and I am very grateful to him. Also, I want to thank my fellow graduate students from both the mathematics and statistics departments for their helpful discussions. My special thanks go to Lu Jun and Shao Fang for their help when I struggled with LaTex and R. Finally, I wish to thank my family for their love and support. iii CONTENTS Acknowledgements ii Abstract v Chapter 1 Introduction 1 1.1 1.2 1.3 Methods and algorithms . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Standard rejection ABC . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Smooth rejection ABC with regression adjustment . . . . . . 5 1.1.3 MCMC-ABC . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Bayes linear analysis and ABC with regression adjustment . . . . . 12 1.2.1 Bayes linear analysis . . . . . . . . . . . . . . . . . . . . . . 12 1.2.2 An interpretation of ABC with regression adjustment . . . . 14 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 CONTENTS iv 1.3.1 Posterior mean . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.2 Semi-automatic ABC . . . . . . . . . . . . . . . . . . . . . . 18 Chapter 2 A Gaussian copula estimate 20 2.1 A marginal adjustment strategy . . . . . . . . . . . . . . . . . . . . 21 2.2 A Gaussian copula estimate . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 3 Examples 29 3.1 A simulated example . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Inference for g-and-k distribution . . . . . . . . . . . . . . . . . . . 34 3.3 Excursion set model for heather incidence data . . . . . . . . . . . . 41 Chapter 4 Conclusion 4.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography 46 46 48 v Abstract Approximate Bayesian computation (ABC) refers to a family of likelihood-free inference methods. It caters for the problems in which the likelihood is not analytically available or computationally intractable but forward simulation is not difficult. Conventional ABC methods can produce very good approximations to the true posterior when the problems are of low dimension. In practice, the problems are often of high dimension and the estimates obtained by conventional ABC methods are not reliable due to the curse of dimensionality. Regression adjustment methods have been suggested to improve the approximation for relatively high-dimensional problems. A marginal adjustment strategy proposed in Nott Abstract et al. (2011) combines the advantages of both conventional ABC and regression adjustment methods and extends the applicability of ABC a bit to problems of relatively higher dimension. Motivated by this marginal adjustment strategy and in view of the asymptotic normality of the Bayesian posterior, we propose a Gaussian copula method which estimates the bivariate densities for each pair first and then combines them together to estimate the posterior. The key advantage of this method is that for each pair we are able to obtain very accurate estimates, using previous ABC methods. If approximate normality holds, the multivariate dependence structure is completely determined by the dependence structures of each pair. As such, this Gaussian copula method can further extend ABC to problems of higher dimension by breaking down such problems into two dimensional ones. vi 1 CHAPTER 1 Introduction In Bayesian inference, the posterior distribution for parameters θ ∈ Θ is of paramount interest. Specifically, let p(θ) denote the prior distribution of θ and p(y|θ) the likelihood function. Then given the observation yobs , the posterior can be calculated as p(θ|yobs ) ∝ p(yobs |θ)p(θ). Inferences for θ are then based on the posterior distribution. In recent years, there has been interest in performing Bayesian analyses for complex models in which the likelihood function p(y|θ) is either analytically unavailable or computationally intractable. A class of simulation-based approximation methods known as approximate Bayesian computation (ABC) which circumvent explicit evaluation of the likelihood have been developed. 1.1 Methods and algorithms Loosely, these approaches use simulations from the model for different parameter values, and compare the simulated data with the observed data. Those parameters which produce data close to the observed data are retained to form an approximate posterior sample. Then these approximate sample values can be used for summarization of the posterior or predictive inference. This thesis first studies a few classical ABC methods in Chapter 1. Different ABC algorithms are presented along with a comparison of strengths and limitations. Chapter 2 describes a marginal adjustment strategy discussed by Nott et al. (2011) and then as an extension a Gaussian copula estimate is proposed. The introduction of the Gaussian copula estimate is the main contribution of this thesis. The algorithmic implementation of each method is also discussed. Chapter 3 investigates the performance of the Gaussian copula estimate. Finally, Chapter 4 summarizes the findings of the thesis. 1.1 Methods and algorithms In this section, standard rejection ABC, smooth rejection ABC with regression adjustment and MCMC-ABC are introduced successively. The algorithms of each method are also discussed. 2 1.1 Methods and algorithms 1.1.1 3 Standard rejection ABC Suppose the set Y of possible data values is a finite or countable set. Then if we simulate from the joint prior distribution of parameters and data p(θ)p(y|θ) an exact match is possible between simulated data and observed data yobs . This is the basis of the most basic ABC rejection sampling algorithm which works as follows: Iterate: For i = 1, 2, · · · , n : (1) Simulate θ∗ from p(θ); (2) Simulate y ∗ from p(y|θ∗ ); (3) If y ∗ = yobs , set θi = θ∗ . It is straightforward to show that the outcome (θ1 , θ2 , · · · , θn ) resulting from this algorithm is a sample from the posterior distribution since the density f (θi ) of θi satisfies f (θi ) ∝ p(yobs |θi )p(θi ) = p(θi |yobs ). However, in most applications, the sample spaces are continuous and hence an exact match is of zero probability. Pritchard et al. (1999) produced the first genuine ABC algorithm in which the exact match is relaxed to within a small distance h > 0 to the observed data. The distance of closeness is measured using the Euclidean norm, denoted · . The first two steps are the same as in the 1.1 Methods and algorithms previous algorithm while the third step is defined as follows: (3) If y ∗ − yobs < h, set θi = θ∗ and yi = y ∗ . Observe that the accepted parameter values have density proportional to p(y ∗ |θ)p(θ)I ( y ∗ − yobs < h) dy ∗ , where I(·) denotes the indicator function. As h → 0, one can show that it converges pointwise to the true posterior p(θ|yobs ) for each θ. The target distribution is now an approximation to the posterior whose quality depends on h. In practice, the observed data yobs is often of high dimension and hence the rejection rate can be very high if h is set to be small to ensure the approximation quality. The efficiency of the algorithm can be improved by replacing the full data yobs with a summary statistic sobs = S(yobs ) which is of lower dimension than that of yobs . If the summary statistic is sufficient, then p(θ|yobs ) = p(θ|sobs ). However, when the likelihood function is not available, it is challenging to obtain a sufficient statistic for complex models. Thus, a nearly sufficient low dimensional summary statistic has to be chosen instead of a sufficient statistic and hence another layer of approximation error is added. Although some of the available information is missing, this is offset by the increase in the efficiency of the algorithm. The first two steps of the algorithm using summary statistics are the same as before and the third step is defined as follows: 4 1.1 Methods and algorithms 5 (3) If S(y ∗ ) − sobs < h, set θi = θ∗ and si = S(y ∗ ). Similarly, the accepted parameters now have density proportional to p(s∗ |θ)p(θ)I ( s∗ − sobs < h) ds∗ . 1.1.2 Smooth rejection ABC with regression adjustment Beaumont et al. (2002) introduced a first improvement of the ABC rejection sampling algorithm in which the parameters θi were weighted by the values Kh ( yi − yobs ), where Kh ( u ) = K( u /h)/h is a standard smoothing kernel. Writing p(θ, y ∗ |yobs ) ∝ p(y ∗ |θ)p(θ)Kh ( y ∗ − yobs ), (1.1) the approximate posterior given yobs is constructed as p(θ|yobs ) ≈ p(θ, y ∗ |yobs )dy ∗ . (1.2) With a uniform kernel this reduces to the rejection algorithm. In the same manner, if a summary statistic S(·) is utilized in step (3), then by setting s∗ = S(y ∗ ) and writing p(θ, s∗ |sobs ) ∝ p(s∗ |θ)p(θ)Kh ( s∗ − sobs ), (1.3) the approximate posterior given sobs can be derived as p(θ|sobs ) ≈ p(θ, s∗ |sobs )ds∗ . (1.4) 1.1 Methods and algorithms 6 For simplicity, assume that the parameter θ = (θ1 , · · · , θp ) is of dimension p and the summary statistic chosen s = S(y) = (s1 , · · · , sd ) is of dimension d from now on. A second innovation in Beaumont et al. (2002) was the use of regression to weaken the effect of the discrepancy between si and sobs . Based on the sample {(θ1 , s1 ), · · · , (θn , sn )}, Beaumont et al. (2002) considered the weighted linear regression model θi = α + β (si − sobs ) + εi , (1.5) where α is a p × 1 vector, β is a d × p matrix of regression coefficients and εi ’s are independent identically distributed errors. Instead of considering the model holding globally, a local linear fit in the vicinity of sobs which is more plausible is applied. In particular, Beaumont et al. (2002) adopted the Epanechniov kernel with finite support to carry out the regression. Regression is a form of conditional density estimation, and so an estimate of the posterior of interest can be constructed from the model (1.5) when si = sobs . In particular, if the assumptions of (1.5) hold, then (α + ε1 , · · · , α + εn ) is a sample ˆ from the posterior distribution p(θ|sobs ). The weighted least squares estimate (ˆ α, β) in (1.5) minimizes n θi − (α + β (si − sobs )) i=1 2 Kh ( si − sobs ). 1.1 Methods and algorithms 7 Denoting the resulting empirical residuals εˆi , the linear regression adjusted vector θi,a = θi − βˆ (si − sobs ) = α ˆ + εˆi (1.6) is approximately a draw from p(θ|sobs ). Here a in the subscript of θi,a denotes adjustment. Assumptions of linearity and homoscedasticity cannot be satisfied in most problems. A nonlinear conditional heteroscedastic model was proposed to estimate both the location and the scale of θi in Blum et al. (2010). Specifically, the new regression model is considered taking the form θi = m(si ) + σ(si ) × ζi , where m(si ) denotes the conditional expectation E[θ|si ] and σ 2 (si ) denotes the conditional variance Var[θ|si ]. In particular, the feed-forward neural network (FFNN) is applied to carry out the nonlinear regression in view of the possibility of a reduction in dimensionality in the hidden layer. After an estimate of m(si ) denoted m(s ˆ i ) is obtained with FFNN, a second regression model concerning σ(si ) takes the form of log(θi − m(s ˆ i ))2 = log σ 2 (si ) + εi , where the εi ’s are independent identically distributed errors with mean zero and common variance. A new FFNN run can be performed to obtain the estimate of σ(si ) denoted σ ˆ (si ). In a similar way as in (1.6), the parameter after adjustment 1.1 Methods and algorithms 8 under this model is θi,a = m(s ˆ obs ) + (θi − m(s ˆ i )) × σ ˆ (sobs ) . σ ˆ (si ) (1.7) If θi = m(si ) + σ(si ) × ζi describes the true relationship between θi and si , then the θi,a ’s form a random sample from the distribution p(θ|sobs ). To improve upon the estimates of local linear fit, a slight modification using a quadratic regression adjustment is proposed in Blum (2010). The relative performances of the different regression adjustments are analyzed from a non-parametric perspective in Blum (2010). More discussion on FFNN can be found in the monograph of Ripley (1996). 1.1.3 MCMC-ABC In practice, the simulation-based rejection ABC is inefficient as the data or summary statistic is of high dimension which leads to a high rejection rate with direct simulation from the prior. Moreover, the prior used is often not informative about the posterior which further brings down the efficiency. As an answer to this difficulty, MCMC-ABC has been introduced so that more simulations are generated in regions of high posterior probability. Instead of considering the state space as Θ, a Metropolis-Hastings sampler on 1.1 Methods and algorithms 9 the joint state space (Θ, S) may be constructed to target the approximate joint posterior (1.3) without directly evaluating the likelihood. Considering a proposal distribution for this sampler, q[(θ, s), (θ∗ , s∗ )] = q(θ, θ∗ )p(s∗ |θ∗ ), the Metropolis-Hastings ratio can be calculated with p(θ∗ , s∗ |sobs )q[(θ∗ , s∗ ), (θ, s)] R[(θ, s), (θ , s )] = p(θ, s|sobs )q[(θ, s), (θ∗ , s∗ )] Kh ( s∗ − sobs )p(s∗ |θ∗ )p(θ∗ ) q(θ∗ , θ)p(s|θ) = Kh ( s − sobs )p(s|θ)p(θ) q(θ, θ∗ )p(s∗ |θ∗ ) Kh ( s∗ − sobs )p(θ∗ )q(θ∗ , θ) = . Kh ( s − sobs )p(θ)q(θ, θ∗ ) ∗ ∗ (1.8) Observe that the computation of R[(θ, s), (θ∗ , s∗ )] does not involve the evaluation of the likelihood since p(s∗ |θ∗ )p(s|θ) appears in both the numerator and denominator and hence they cancel out with each other. Starting with (θ0 , s0 ) with s0 = S(y0 ), the MCMC-ABC algorithm is defined as follows: (1) At time i, simulate θ∗ from q(θi , θ); (2) Simulate y ∗ from p(y|θ∗ ) and compute s∗ = S(y ∗ ); (3) With probability min{1, R[(θi , si ), (θ∗ , s∗ )]} set (θi+1 , si+1 ) = (θ∗ , s∗ ), otherwise set (θi+1 , si+1 ) = (θi , si ); (4) Increment i = i + 1 and return to step (1). To prove the Markov chain constructed indeed has the stationary distribution 1.1 Methods and algorithms p(θ, s∗ |sobs ) as in (1.3), one only needs to check the detailed-balance condition p(θ, s|sobs )P [(θ, s), (θ∗ , s∗ )] = p(θ∗ , s∗ |sobs )P [(θ∗ , s∗ ), (θ, s)] with P [(θ, s), (θ∗ , s∗ )] = q[(θ, s), (θ∗ , s∗ )] × min{1, R[(θi , si ), (θ∗ , s∗ )]}, is satisfied. Without loss of generality, assuming that R[(θ, s), (θ∗ , s∗ )] ≥ 1 so (θ∗ , s∗ ) is accepted, then it follows that p(θ∗ , s∗ |sobs )P [(θ∗ , s∗ ), (θ, s)] = p(θ∗ , s∗ |sobs )q[(θ∗ , s∗ ), (θ, s)]R[(θ∗ , s∗ ), (θ, s)] Kh ( s∗ − sobs )p(s∗ |θ∗ )p(θ∗ )q(θ∗ , θ)p(s|θ)Kh ( s − sobs )p(θ)q(θ, θ∗ ) ∝ Kh ( s∗ − sobs )p(θ∗ )q(θ∗ , θ) = Kh ( s − sobs )p(s|θ)p(θ)q(θ, θ∗ )p(s∗ |θ∗ ) ∝ p(θ, s|sobs )P [(θ, s), (θ∗ , s∗ )]. By symmetry the first and last lines are obviously equal. As a result, the marginal conditional distribution of θ is the same as the target posterior as stated in (1.4). An MCMC marginal sampler on Θ directly targeting (1.4) is constructed in Sisson et al. (2011). Utilizing a proposal distribution q(θ, θ∗ ), the acceptance probability is given by min{1, R(θ, θ∗ )}, with the Metropolis-Hastings ratio p(θ∗ |sobs )q(θ∗ , θ) p(θ|sobs )q(θ, θ∗ ) 1 Kh ( s∗i − sobs )p(θ∗ )q(θ∗ , θ) ≈ n1 Kh ( si − sobs )p(θ)q(θ, θ∗ ) n R(θ, θ∗ ) = where si ∼ p(s|θ) and s∗i ∼ p(s|θ∗ ). Note that when n = 1, R(θ, θ∗ ) is precisely R[(θ, s), (θ∗ , s∗ )] in (1.8). The performance of the marginal sampler is improved 10 1.1 Methods and algorithms 11 compared to the equivalent joint sampler targeting p(θ, s∗ |sobs ), due to the reduction in the variability of the Metropolis-Hastings ratio. In order to improve the mixing of the sampler and maintain the approximation quality as well, Bortot et al. (2007) proposed the error-distribution augmented sampler with target distribution p(θ, s, |sobs ) ∝ Kh ( s − sobs )p(s|θ)p(θ)p( ), where p( ) is a pseudo prior which serves only to influence the mixing. On one hand, small values are preferred so the approximation quality will not deteriorate. On the other hand, large values can raise the acceptance rate and improve the mixing. Thus, an acceptable approximate posterior is provided with pε (θ|sobs ) = p(θ, s, |sobs )dsd . More details on the selection of the pseudo prior are stated in Bortot et al. (2007). More variations on MCMC-ABC can be found in Sisson et al. (2011). In addition, some potential alternative MCMC samplers are suggested. A practical guide to the MCMC-ABC is also provided. 1.2 Bayes linear analysis and ABC with regression adjustment 1.2 12 Bayes linear analysis and ABC with regression adjustment Although the smooth ABC method with regression adjustment exhibits good performance, the posterior obtained is often hard to interpret. In this section, a link between ABC with regression adjustment and Bayes linear analysis is discussed. This is introduced in Nott et al. (2011). 1.2.1 Bayes linear analysis Consider random quantities (θ, s) with θ = (θ1 , · · · , θp ) and s = (s1 , · · · , sd ) as before and assume that the first and second order moments of (θ, s) are known. Bayes linear analysis aims to construct a linear estimator of θ in terms of s under squared error loss. Specifically, an estimator of the form a+Bs is considered where a is a p-dimensional vector and B is a p × d matrix and a and B are obtained by minimizing E[(θ − a − Bs) (θ − a − Bs)]. One can show that the optimal linear estimator is Es (θ) = E(θ) + Cov(θ, s)Var(s)−1 [s − E(s)]. (1.9) 1.2 Bayes linear analysis and ABC with regression adjustment The estimator, Es (θ), is named as the adjusted expectation of θ given s. Observe that a full joint distribution p(θ, s) does not have to be specified to obtain Es (θ). From a subjective Bayesian perspective, this is a key advantage of the Bayes linear approach as only a limited number of judgments about the prior moments need to be made. Moreover, if p(θ, s) is fully specified and the posterior mean is a linear function of s, then the adjusted expectation will coincide with the posterior mean. The adjusted variance of θ given s, denoted Vars (θ), is defined as E [θ − Es (θ)][θ − Es (θ)] . One can show that Vars (θ) = Var(θ) − Cov(θ, s)Var(s)−1 Cov(s, θ). Note that Vars (θ) is independent of s. With p(θ, s) fully specified, it can be shown that the inequality Vars (θ) ≥ E[Var(θ|s)] holds, where A ≥ C means that A − C is non-negative definite, and the outer expectation on the right hand side is with respect to the marginal distribution for s, p(s). This inequality indicates that Vars (θ) is a conservative upper bound on posterior variance. If the posterior mean is a linear function of s, then Vars (θ) = E[Var(θ|s)]. More information on Bayes linear analysis can be found in the monograph of Goldstein and Wooff (2007). 13 1.2 Bayes linear analysis and ABC with regression adjustment 1.2.2 14 An interpretation of ABC with regression adjustment Nott et al. (2011) drew an interesting connection between the regression adjustment ABC of Beaumont et al. (2002) and Bayes linear analysis. Under the ABC setting, a full probability model p(θ, s) = p(s|θ)p(θ) is available and hence Bayes linear analysis can be viewed as a computational approximation to a full Bayesian analysis. The first and second moments of the regression adjusted sample (θ1,a , · · · , θn,a ) were shown to be a Monte Carlo approximation to Esobs (θ) and Varsobs (θ) in Bayes linear anaylsis respectively. This Bayes linear interpretation may be helpful for motivating an exploratory use of regression adjustment ABC, even in problems of high dimension. ˆ θ), ˆ −1 Σ(s, The ordinary least squares estimate of β under the model (1.5) is βˆ = Σ(s) ˆ ˆ θ)i,j is the samwhere Σ(s) is the sample covariance of {(s1i , · · · , sdi ) }ni=1 and Σ(s, ple cross covariance of the pairs {(sil , θlj )}nl=1 , with i = 1, · · · , d and j = 1, · · · , p. For large n, βˆ is approximately β = Var(s)−1 Cov(s, θ), where Var(s) and Cov(s, θ) ˆ ˆ θ). Thus, for large n, the expecare the population counterparts of Σ(s) and Σ(s, tation of the θi,a in (1.6) is approximately E(θi,a ) ≈ E[θi − β (si − sobs )] 1.3 Summary statistics 15 = E(θ) + Var(s)−1 Cov(s, θ)[sobs − E(s)] = Esobs (θ). In a similar way, one can show that Var(θi,a ) ≈ Var[θi − β (si − sobs )] = Var(θ) + β Var(s)β − 2Cov(θ, s)β = Var(θ) − Cov(θ, s)Var(s)−1 Cov(s, θ) = Vars (θ). In the same manner, if an initial kernel based ABC analysis has been done giving approximate posterior p(θ, s∗ |sobs ) as in (1.3) and this is considered a prior to be updated in a Bayes linear analysis with the information sobs , then this corresponds to the kernel weighted least squares version in Beaumont et al. (2002). A link between the heteroscedastic adjustment and Bayes linear analysis through an appropriate basis expansion involving functions of s is discussed Nott et al. (2011). Further discussion on the connection can be found in Nott et al. (2011). 1.3 Summary statistics There are three sources of approximation error in an ABC analysis: Monte Carlo error, loss of information due to non-sufficient summary statistics S(·) and 1.3 Summary statistics the error in the target density due to h > 0. Among these, the summary statistics play a crucial role in determining the approximation quality. If a nearly-sufficient statistic which is often of high dimension is chosen, then Monte Carlo error will be large due to a low convergence rate and h needs to be set larger in order to improve the efficiency which also incurs large error. As such, an ideal summary statistic should be low-dimensional but representative enough. However, little guidance is available on how to choose good summary statistics. The ABC approximation is only feasible and reliable for special cases where such a choice of good summary statistics exists. In this section, a general method of choosing a proper summary statistic is discussed and a corresponding algorithm is described. 1.3.1 Posterior mean In Fearnhead and Prangle (2012), the Monte Carlo error is shown to be inversely related to hd , where d is the dimension of summary statistics. To control the Monte Carlo error, hd cannot be too small. On the other hand, h affects the accuracy of approximation to the true posterior and cannot be large. Instead of focusing on nearly sufficient statistics which are often high-dimensional, Fearnhead and Prangle (2012) proposed a different approach in which the main idea is for ABC approximation to be a good estimate solely in terms of the accuracy of certain estimates of the parameters. 16 1.3 Summary statistics 17 Considering snoise = S(yobs ) + hx, where x is a realization of a random variable obs with density K(x), events assigned probability any q > 0 by the ABC posterior will occur with true probability q. In the limit as h → 0, the ABC posteriors based on snoise and S(yobs ) are equivalent. The accuracy of estimates based on the ABC obs posterior is needed to be maximized. Let θ0 be the true parameter values and θˆ an estimate. The loss function is defined as ˆ θ0 ; A) = (θˆ − θ0 ) A(θˆ − θ0 ), L(θ, where A is a p × p positive definite matrix. A standard result of Bayesian statistics gives that the minimal quadratic error loss occurs when θˆ = E(θ|yobs ), which is the true posterior mean. It is also shown that, if S(y) = E(θ|y), then as h → 0, the minimum loss based on the ABC posterior is achieved when θˆ = EABC (θ|snoise obs ). Furthermore, the resulting losses of both methods are the same. These observations indicate that, under quadratic error loss, a good summary statistic would be the posterior mean E(θ|y). The dimension of this chosen summary statistic is the same as the that of parameters. At the same time, it maximizes the accuracy of estimating the parameters based on the quadratic loss. This result in some sense provides a guidance on the choice of a summary statistic when a good summary statistic is not available. More theories underpinning this particular choice of summary statistic can be found in Fearnhead and Prangle (2012) and Prangle (2012). 1.3 Summary statistics 1.3.2 Semi-automatic ABC Despite that the posterior mean is suggested to be as a summary statistic, it cannot be applied directly as the posterior mean cannot be evaluated. Thus, the posterior mean has to been estimated through simulation. The procedure of the semi-automatic ABC approach proposed in Fearnhead and Prangle (2012), is defined as follows: (1) Use a pilot run of ABC to obtain a rough posterior; (2) Simulate parameters and data from the truncated region of the original prior; (3) Use the simulated sets of parameters and data to estimate the posterior means; (4) Run ABC with the estimates of posterior means as the summary statistics. The pilot run is optional. However, if the prior is uninformative or improper, a pilot run can help to improve the efficiency of the algorithm. Some arbitrarily chosen summary statistics such as order statistics can be used in the pilot run. There are various approaches that can be utilized in step (3). Fearnhead and Prangle (2012) suggested that linear regression was both simple and worked well, with appropriate functions of data g(y) as predictors. The simplest choice is g(y) = y. In practice, it may be beneficial to include other transformations 18 1.3 Summary statistics such as higher moments. For example, in one simulation study in Chapter 3, using g(y) = (y, y 2 , y 3 , y 4 ) as predictors turned out to produce a better set of summary statistics. More discussion is available in Fearnhead and Prangle (2012) and Prangle (2012). 19 20 CHAPTER 2 A Gaussian copula estimate In this chapter, a marginal adjustment strategy which combines the merits of both rejection ABC and regression adjustment ABC will be discussed. This was introduced Nott et al. (2011) and motivated by this strategy a new copula estimate will be proposed. An algorithm to carry out the copula estimate will be provided as well. 2.1 A marginal adjustment strategy 2.1 A marginal adjustment strategy For problems of relatively low dimension, conventional sampler ABC methods, such as rejection ABC and MCMC-ABC, can produce good approximations. However, the approximation quality of such ABC algorithms deteriorates very quickly as the dimension of the problem becomes higher. On the other hand, the regression adjustment strategies, which can often be interpreted as Bayes linear adjustments, can be useful in problems with many parameters while it is hard to validate the accuracy. A marginal adjustment strategy combining the low-dimensional accuracy of conventional sampler ABC with the utility of the regression adjustment approach is suggested for high-dimensional problems in Nott et al. (2011). In essence, the idea is to construct a first rough estimate of the approximate joint posterior using regression adjustment ABC, and obtain good estimates of each of the marginal posterior distributions separately using rejection ABC. Then the marginal distributions of the rough posterior are adjusted to be those of separately estimated marginals, through an appropriate replacement of order statistics. The procedure is described as follows: (1) Simulate a sample (θi , si )N i=1 from p(θ)p(s|θ); (2) Obtain a regression adjusted sample {θi,a }ni=1 , where n < N and a in the 21 2.1 A marginal adjustment strategy subscript of θi,a stands for adjustment; (3) For j = 1, · · · , p, • Identify s(j) ⊂ s that is marginal informative for θj ; • Use a conventional ABC method to estimate the posterior for θ|s(j) j and extract the j th component {θi,m }ni=1 , where m in the subscript of j θi,m stands for marginal; j • Replace the k th order statistic of the sample {θi,a }ni=1 with the k th j order statistic of the sample {θi,m }ni=1 for k = 1, · · · , n. The marginal density for θj with summary statistic s(j) by rejection ABC can often be precisely estimated by rejection ABC, due to the reduction in the dimensionality of the summary statistic. With this marginal adjustment, the marginal densities obtained will be the same as those of separately estimated marginals. At the same time, the marginal adjustment maintains the multivariate dependence structure in the sample {θi,a }ni=1 . Thus, it is reasonable to expect that the joint posterior is better estimated with this marginal adjustment strategy. More implementation details and further explanation on this strategy can be found in Nott et al. (2011). 22 2.2 A Gaussian copula estimate 2.2 A Gaussian copula estimate For problems of much higher dimension, the multivariate dependence structure is poorly estimated due to the curse of dimensionality. Even with a very accurate marginal estimate, the marginal adjustment strategy will not produce an acceptable joint estimate. As such, the applicability of ABC is still limited to relatively low-dimensional problems. Motivated by the marginal adjustment strategy and the asymptotic normality of the Bayesian posterior, we suggest a Gaussian copula estimate which first estimates each bivariate distribution and then combines them together to obtain the joint posterior. In essence, the idea is to approximate each bivariate distribution using a 2-dimensional Gaussian copula, and then combine all the bivariate distributions to produce the joint distribution using a p-dimensional Gaussian copula of which each element is obtained from the corresponding 2-dimensional Gaussian copulas. For a Gaussian copula, the bivariate marginals determine the joint distribution so the bivariate marginals can be estimated separately in lower dimensional problems and then the estimates combined. First, let us have a look at the Gaussian copula. Suppose that θ = (θ1 , · · · , θp ) has a continuous multivariate distribution with marginal densities fi (·) and marginal 23 2.2 A Gaussian copula estimate 24 cumulative distributions Fi (·) for θi , i = 1, · · · , p. Let Φ be the cumulative distribution function of standard normal and let η i = Φ−1 (Fi (θi )), i = 1, · · · , p. Then the η i are standard normal. Let η = (η 1 , · · · , η p ) and let Λ be the correlation matrix of η. If η is multivariate normal, η ∼ N (0, Λ), then θ is called meta-Gaussian distribution and the density for θ is 1 q(θ) = exp |Λ|1/2 p 1 η(θ) (I − Λ−1 )η(θ) 2 fi (θi ). (2.1) i=1 In particular, the multivariate normal family is embedded into the family of meta-Gaussian distributions. The meta-Gaussian distributions are just the distributions with copulas which are of a multivariate Gaussian density. See Joe (1997) for background on copulas. The 1-dimensional N (ˆ x; µ ˆ, σ ˆ 2 ) has density f (ˆ x) = where φ(·) is the density of standard normal and uˆ = normal distribution N (x; µ, Σ) with ui = xi −µi σi x ˆ−ˆ µ . σ ˆ φ(ˆ u) , σ ˆ For p-dimensional and u = (u1 , · · · , up ) , the joint density can be expressed as f (x) = 1 (2π)p/2 |Σ|1/2 1 exp − (x − µ) Σ−1 (x − µ) 2 p 1 1 = exp − u R−1 u p/2 1/2 (2π) |R| 2 = 1 exp |R|1/2 1 u (I − R−1 )u 2 i=1 p i=1 1 σi φ(ui ) , σi where R is the corresponding correlation matrix of Σ. Observe that R here corresponds to Λ in (2.1), meaning that for the Gaussian distribution the initial correlation matrix is exactly the correlation matrix of the corresponding Gaussian 2.2 A Gaussian copula estimate copula. Under some mild assumptions, the posterior in Bayesian inference is asymptotically normal, which is the content of the classical Bernstein-von Mises theorem. This encourages us to use structured density estimates for ABC which contain the multivariate normal. In particular, we consider the meta-Gaussian family of distributions. An important feature of this class of densities is that the joint distribution can be reconstructed from the bivariate marginal distributions. This suggests that a high-dimensional joint posterior distribution in ABC could be better estimated by estimating the bivariate marginal distributions separately first and then combining them. More details on the asymptotic normality of the Bayesian posterior can be found in the monograph of O’Hagan and Forster (2004). Now let us go back to ABC. From the above discussion, we can see that the high-dimension joint posterior could be better estimated from the simple bivariate estimates. Due to a reduction in the dimension of summary statistic, the bivariate density of (θi , θj ) can be easily estimated and more importantly, a more accurate estimate can be obtained using conventional ABC methods. When the sample size is relatively large, Λi,j can be well estimated from the sample. In light of asymptotic normality of the posterior, it is reasonable to expect that density obtained using the 2-dimensional version of (2.1) can mimic the bivariate density obtained by bivariate rejection ABC well. Combining all the Λi,j and bivariate densities of 25 2.2 A Gaussian copula estimate (θi , θj ) together, we use (2.1) again to obtain the joint posterior which will approximate the true posterior asymptotically. In the case of finite observations, the Gaussian copula method will keep the same marginal densities previously obtained using conventional ABC methods and model the dependence structure between each pair with a Gaussian copula. Our Gaussian copula method allows the application of the rejection ABC to problems of much higher dimension than current ABC practice and the posterior obtained can be easily interpreted. Precisely, the procedure we use is as follows: (1) Simulate a sample (θl , sl )N l=1 from p(θ)p(s|θ); (2) For (i, j), i, j = 1, · · · , p, • Identify s(i, j) ⊂ s that is marginally informative for (θi , θj ). Apply conventional ABC methods with s(i, j) to obtain a sample of size n < N and extract the (i, j) components (θli , θlj )nl=1 ; • Use density estimation methods to obtain marginal densities fi , fj and the bivariate density fi,j based on (θli , θlj )nl=1 respectively; i i • For i, rank (θli )nl=1 . If (θm ) is the k th order statistic, then set ηm = k ). Do this for m = 1, · · · , n. Do the same for j. Calculate Φ−1 ( n+1 the sample correlation of (ηli , ηlj )nl=1 , denoted Λi,j ; • Use (2.1) with Λi,j , fi and fj to obtain the new bivariate estimate f˜i,j . (3) Obtain the joint posterior using Gaussian copula. 26 2.2 A Gaussian copula estimate • For i = 1, · · · , p, identify s(i) ⊂ s that is marginally informative for θi . Apply conventional ABC methods with s(i) to select a sample and extract the ith component (θli )nl=1 . Then use density estimation methods to obtain the marginal density f˜i based on (θli )nl=1 . • Use (2.1) with Λ and (f˜i )pi=1 to obtain the joint posterior. The goal of step (2) is to compare the bivariate density using bivariate rejection with that using our Gaussian copula estimate. Due to the low dimensionality, the bivariate density fi,j can be estimated accurately with a large sample simulated using bivariate rejection. If the bivariate density f˜i,j achieved using our Gaussian copula estimate imitates fi,j well for all the pairs, then it is natural to extend our Gaussian copula to estimate the joint posterior and reasonable to obtain a better estimate. The f˜i,j will mimic the fi,j well when the size of observation is large enough. As in the marginal adjustment strategy, the Gaussian copula method produces an accurate marginal density which is obtained by conventional ABC with a low-dimensional statistic. On the other hand, the dependence structure between each pair is well captured because of low dimensionality and can be well estimated by the Gaussian copula asymptotically. Thus, a good estimate of the multivariate dependence structure can be achieved using our copula method in light of the normality. Note that we only generate one sample and all subsequent experiments are based on this sample. Although many ABC runs may need to 27 2.2 A Gaussian copula estimate performed in our algorithm, the computational time will not increase a lot as the following analysis can be quickly carried out once the sample is simulated. When the size of observation is not large, the dependence structure between each pair may not be well estimated and neither is the multivariate dependence structure. However, the marginal densities are kept the same as those obtained by conventional ABC with a low-dimensional statistic and hence they are still reliable for marginal inferences. Besides, the dependence structure estimated by the Gaussian copula is relatively simple. Thus, more work needs to done to estimate the multivariate dependence structure under this circumstance. 28 29 CHAPTER 3 Examples In this chapter, we present three examples for which our copula method is applied. In what follows we refer to using rejection ABC with the summary statistics corresponding to each pair of the variables as the bivariate rejection. Also, when we refer to the bivariate rejection copula method we mean the bivariate rejection followed by applying our copula approach. 3.1 A simulated example 3.1 30 A simulated example We first construct a toy example where the likelihood is normal and hence a gold standard answer is available for comparison. While ABC methods are not needed for the analysis for this model, it is instructive for understanding the properties of our methods. We consider a model with prior θ ∼ N (µ0 , Σ0 ), where θ, µ0 are p × 1 vectors and Σ0 is a p×p symmetric positive definite matrix. The likelihood function is defined to be y ∼ N (Θ, Σ1 ), where Σ1 is a known p × p symmetric positive definite matrix. Then given n observations {yi }ni=1 , the posterior distribution of −1 −1 −1 −1 ¯) the parameter θ is θ ∼ N (µ2 , Σ2 ), where µ2 = (Σ−1 0 + nΣ1 ) (Σ0 µ0 + nΣ1 y −1 −1 ¯= and Σ2 = (Σ−1 with y 0 + nΣ1 ) 1 n n i=1 yi . For the following analysis we specify µ0 = (0, · · · , 0) , Σ0 = diag((3, · · · , 3)) and Σ1 with diagonal elements 1 and off-diagonal elements 0.5. We suppose there is only one observation yobs = (y 1 , · · · , y p ) = (0, · · · , 0) . Computations are performed using 105 simulations from the p(y|θ)p(θ), using a uniform kernel Kh ( · ), where h is chosen to select the 103 simulations closest to yobs . We use the full data as the summary statistics when performing standard rejection while use the corresponding components as summary statistics when performing bivariate inferences. As the bivariate densities for each pair are identical, the estimates of (θ1 , θ2 ) are demonstrated. We contrast the results obtained using standard 3.1 A simulated example rejection, bivariate rejection, bivariate rejection copula method, and the standard answer. All inferences are performed using the R package abc. In the figures 3.1, 3.2, 3.3, 3.4, standard, copula, kde, loclinear and rejection denote results obtained using exact calculation, bivariate rejection copula method, bivariate rejection, standard rejection followed by local linear fit and standard rejection with full data, respectively. The pictures on the top line are the contour plots while those on the bottom line are 3d density plots using various methods. Figure 3.1 shows contour plots and 3d density plots of (θ1 , θ2 ) for p = 2. Thanks to the low dimensionality, the estimates of kde and loclinear approximate the standard answer well. The result obtained with our copula method not only mimics that of rejection, but also matches with the standard answer very closely. Figure 3.2 paints contour plots and 3d density plots of (θ1 , θ2 ) for p = 4. As p increases, the restrictions on yi relax within the comparison yi − yobs in order to ensure a fixed number of accepted sample. As expected, the result of rejection deteriorates compared with the counterpart when p = 2. However, the quality of the kde estimate does not become worse. Our copula method further improves the estimate of kde and exhibits a close match with the standard answer. Figure 3.3 depicts contour plots and 3d density plots of (θ1 , θ2 ) for p = 8. There is a clear deterioration in the quality of the estimate of rejection. In this case, both the marginals and the dependence structure are not well captured due to the curse of 31 3.1 A simulated example 32 dimensionality. In contrast, the bivariate rejection continues to produce a good estimate. A good match between the results of kde and copula is demonstrated. A slight improvement of copula upon kde can also be observed as the plot approaches the standard answer more closely. Figure 3.4 demonstrates a similar observation as in figure 3.3. For p = 10, our copula method produces a better estimate over kde and rejection, and the result obtained approximates the standard answer well. Although the joint posterior density in high dimensions cannot be directly visualized, a better joint estimate using our copula method is expected due to the better bivariate estimate and the exact normality in this example of the true joint posterior. 0 1 2 2 3 −1 −3 −2 −1 1 0.18 0.2 0.1 0.06 0 1 2 −2 −3 −2 3 0 θ2 0.16 0.1 0.08 −3 −2 −1 0.08 0.14 .12 0.04 −3 −2 −2 0 θ2 θ2 0.18 0.14 0.04 −3 −3 8 0.1 4 8 0.0 0.04 0 −1 1 1 6 0.1 0.1 0.1 −1 0.2 8 0 1 0 −1 6 0.0 0.02 0.04 0.12 6 0.1 0.02 0.06 0.1 0.1 loclinear 3 0.02 0.06 2 0.1 0.18 0 2 2 2 0.02 0.06 0.12 .14 θ2 kde 3 copula 3 standard 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 θ1 θ1 θ1 θ1 standard copula kde loclinear 0.2 0.2 0 −2 2 0.0 0 −2 0 −2 2 3 0.2 2 0.0 0 −2 0 d 0 −2 2 d 0.0 0.2 d d 2 2 2 0.0 0 −2 −2 2 Figure 3.1 p=2, contour plots and 3d plots for (θ1 , θ2 ). 0 −2 2 3.1 A simulated example 33 3 4 1 2 0.14 −1 −1 0. 1 −1 3 −3 −2 0.08 −1 0 1 2 0.02 −3 0.02 −3 0.02 0.04 −2 −2 −2 0 1 6 0.1 0.06 −3 −3 −1 θ2 8 0.1 0.12 0.06 0.02 0.06 0.1 2 0.1 0 .16 0 1 θ2 18 0. 0.08 0.1 0.1 4 0 1 8 0.1 2 0. 0.1 0.06 −2 0.04 0.08 2 0 .1 0.1 6 0 0. θ2 22 0. 14 1 0 −1 2 0.1 −2 2 3 2 2 2 8 0.1 0.16 0.0 −3 rejection 0.04 0.04 θ2 kde 3 copula 3 standard 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 θ1 θ1 θ1 θ1 standard copula kde rejection 0.2 0.2 0.2 0 d 2 0.0 0 −2 0 −2 2 0.0 0 −2 2 0.0 0 −2 0 −2 2 3 0.2 d d d 2 0.0 2 0 −2 2 0 −2 −2 2 2 Figure 3.2 p=4, contour plots and 3d plots for (θ1 , θ2 ). −1 0 1 2 −3 −2 −1 0 1 2 3 1 0 θ2 6 0.1 −2 0.06 0.02 −3 −2 3 0.1 0.06 −3 −2 −3 −2 −2 0.12 0.14 0.04 0.08 0.1 −1 6 0.1 8 0.0 0.04 −3 −3 2 3 −1 4 −1 0.1 0 θ2 0.2 16 1 1 θ2 0 1 0. 0.0 0.04 0.04 0.08 0.12 18 0. 0 0.1 8 −1 0.0 2 0.1 0.1 0.2 8 0.1 8 0.12 0.02 6 0.1 0.1 4 rejection 2 2 2 0.02 0.02 0.06 θ2 kde 3 copula 3 standard 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 θ1 θ1 θ1 θ1 standard copula kde rejection 0.2 0.2 0.2 0 −2 2 0.0 0 −2 0 −2 2 3 0.1 2 0.0 0 −2 0 d 0 −2 2 d 0.0 d d 2 2 2 0.0 0 −2 −2 2 Figure 3.3 p=8, contour plots and 3d plots for (θ1 , θ2 ). 0 −2 2 3.2 Inference for g-and-k distribution 3 rejection 3 kde 3 copula 3 standard 34 0.04 −1 0 1 2 0.08 −3 −2 −1 0 1 2 8 0.0 0.1 0.06 −2 0.03 0.01 −3 −2 −3 3 2 0.1 −1 0.1 0.1 1 1 θ2 0.2 4 6 0.04 0.02 −2 −2 0 θ2 0.12 0 −3 −2 −1 −1 .08 0.06 −3 −3 0.18 0.1 0.18 0.04 0.05 0.07 0.09 0 1 1 θ2 0.16 16 0. 26 0. 0 θ2 −1 2 0. 0.2 8 0.1 8 0.0 0.12 0.12 0.1 14 0. 0.04 0.06 0.1 0 1 0.1 0.1 4 2 2 2 2 0.02 0.02 0.02 0.06 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 θ1 θ1 θ1 θ1 standard copula kde rejection 0.2 0.2 0.2 0 −2 2 0.0 0 −2 0 −2 2 3 0.1 2 0.0 0 −2 0 d 0 −2 2 d 0.0 d d 2 2 2 0.0 0 −2 −2 0 2 −2 2 Figure 3.4 p=10, contour plots and 3d plots for (θ1 , θ2 ). 3.2 Inference for g-and-k distribution The quantile function, Qgk , of the g-and-k distribution is Qgk (p; A, B, g, k) = A + B 1 + c 1 − exp(−gz(p)) 1 + exp(−gz(p)) (1 + z(p)2 )k z(p), (3.1) where z(p) is the pth standard normal quantile and the g-and-k parameters A, B, g and k represent location, scale, skewness and kurtosis, respectively. Unlike most common distributions, the g-and-k distribution is defined by the inverse of its distribution and has no closed form density. The parameter c measures the overall asymmetry, and is fixed at 0.8 following standard practice. Despite having only 4 3.2 Inference for g-and-k distribution 35 parameters, the g-and-k distribution admits great flexibility in shape, and hence can be used to model non-standard data. A wide range of common distributions, including the normal, Student’s t, logistic and gamma distributions, can also be approximated well by the g-and-k distribution. In particular, the normal distribution is a special case of the g-and-k distribution with g = 0 and k = 0. The only parameter restrictions are B > 0 and k > − 12 . Given θ = (A, B, g, k), simulations drawn from a uniform distribution can be transformed into samples from the g-and-k distribution using (3.1). Allingham et al. (2009) applied ABC to analyse a simulated data set of 104 independent draws from the g-and-k distribution with θ = (3, 1, 2, 0.5). A uniform prior on (0, 10)4 was used and the summary statistics were the full set of order statistics. A simple MCMC-ABC sampling was performed to estimate the posterior. Fearnhead and Prangle (2012) also considered this example and the estimated of posterior means were suggested as summary statistics. The semi-automatic approach was applied and a much better estimate was achieved compared with that of Allingham et al. (2009). In this section, we also use the estimates of posterior means as the summary statistics and apply our copula method to perform the analysis. We first simulate a set of 104 draws from g-and-k distribution with θ = (3, 1, 2, 0.5), and denote it yobs . Then we carry out a pilot MCMC-ABC to produce a training 3.2 Inference for g-and-k distribution 36 region as the prior is broad and uninformative. As in Allingham et al. (2009), we set h = 15 and use the full data of order statistics as summary statistics. The transition kernel for each parameter is set to be Gaussian with standard deviation of 0.4, centred on the current value of the parameter. 107 simulations are generated 3 0 0 1 2 Density 3 2 1 Density 4 4 5 5 and the marginal densities are given in Figure 3.5. 2.7 2.8 2.9 3.0 3.1 3.2 0.8 0.9 1.0 1.2 1.3 B 0 0.00 2 4 Density 6 0.20 0.10 Density 8 0.30 A 1.1 2 4 6 g 8 10 0.35 0.45 0.55 0.65 k Figure 3.5 Marginals with MCMC. Observe that MCMC does not produce a reasonable posterior for parameter 3.2 Inference for g-and-k distribution g while for the other 3 parameters narrow intervals are constructed. We set the original prior truncated to the region of the MCMC-ABC posterior as the new prior and then draw 3 × 106 simulations (θ, y) independently from the new joint prior. Note that each y is of dimension 104 . As suggested in Fearnhead and Prangle (2012), we select 100 evenly spaced order statistics yˆ from each y and perform a linear regression on θ with (ˆ y , yˆ2 , yˆ3 , yˆ4 ) as predictors, where y i denotes the vector consisting of the ith power of individual data points of y. The reason we choose these particular predictors is that the location, scale, skewness and kurtosis are informally linked to these powers of the data. After this regression, we use the fitted values as summary statistics and choose a uniform kernel Kh ( · ), where h is set to select the 1.5 × 104 simulations closest to the fitted values of yobs . For each pair, we use the bivariate rejection ABC and then apply our copula method. All the ABC analyses used in this section are standard rejection. 37 38 0 0 10 5 10 Density 30 20 Density 40 15 50 3.2 Inference for g-and-k distribution 2.92 2.96 3.00 3.04 0.90 0.95 1.05 1.10 B 15 0 0.0 0.2 5 10 Density 0.8 0.6 0.4 Density 1.0 20 1.2 A 1.00 1 2 3 4 0.45 0.50 g 0.55 0.60 k Figure 3.6 Marginals with posterior means as summary statistics. The solid lines denote individually estimated marginals. The dotted lines illustrate estimated marginals from the joint posterior using standard rejection. Figure 3.6 shows the marginal densities obtained with posterior means as summary statistics. Compared with marginal densities obtained by MCMC, the density becomes more concentrated around the true value for each parameter. In particular, for parameter g, a much better estimate has been produced with a narrow 3.2 Inference for g-and-k distribution 39 100 2.97 2.99 3.5 3.5 3.0 3.0 8 10 14 g 22 18 20 2.5 18 22 2.0 20 250 16 12 150 16 12 8 6 4 4 50 3.01 2.95 2.97 2.99 3.01 2.94 2.96 2.98 3.00 3.02 2.94 2.96 2.98 3.00 A A A A copula kde copula kde 2.98 A 3.00 0.92 3.02 2.5 2.98 A 2.0 3.00 3.02 2.96 3.5 3.0 2.5 g 2.96 20 15 10 5 0 2.94 3.5 3.0 g B 2.98 A 3.00 20 15 10 5 2.94 2.96 1.04 1.02 1.00 0.98 0.96 0.94 0.92 3.02 3.02 Z 2.96 Z 1.04 1.02 1.00 0.98 0.96 0.94 z 400 300 200 100 0 z 400 300 200 100 B 2.95 10 14 350 1.5 300 200 6 300 g 400 kde 2 100 200 2.5 B 350 copula 2 2.0 150 250 kde 1.5 50 0.92 0.94 0.96 0.98 1.00 1.02 1.04 copula 0.92 0.94 0.96 0.98 1.00 1.02 1.04 B interval around the true value 2. 2.98 A 3.00 2.0 3.02 1.5 1.5 Figure 3.7 Contour plots and 3d plots for (A, B) and (A, g). 250 150 3.00 2.5 6 10 10 g 2.0 g 14 12 8 8 4 4 2 50 0.48 0.48 2.98 2 200 100 2.96 12 300 0.50 0.50 350 2.0 400 1.5 k 450 6 14 450 0.52 k 500 0.52 350 0.54 0.54 400 3.02 2.94 2.96 2.98 3.00 3.02 0.90 0.95 1.00 1.05 0.90 0.95 1.00 A A B B copula kde copula kde 1.05 15 3.00 0.50 0.48 3.02 15 10 3.0 5 0.90 0.95 B 10 2.5 5 0.90 2.0 3.0 2.5 g k k 2.98 A 0.58 0.56 0.54 2.96 0.52 2.98 A 0.50 3.00 0.48 3.02 Z 2.96 500 400 300 200 100 0 2.94 Z 0.58 0.56 0.54 0.52 z z 500 400 300 200 100 2.94 g 2.94 3.0 150 250 300 1.5 200 kde 2.5 0.56 100 copula 3.0 50 0.56 kde 0.58 0.58 copula 2.0 0.95 1.00 1.5 B 1.00 1.05 Figure 3.8 Contour plots and 3d plots for (A, k) and (B, g). 1.05 1.5 3.2 Inference for g-and-k distribution 2 0.54 1.05 0.90 0.95 1.00 0.56 10 4 0.48 6 0.48 150 0.48 1.00 16 12 0.50 0.50 0.50 0.48 50 1.05 1.5 2.0 2.5 3.0 1.5 2.0 2.5 B B g g copula kde copula kde B 1.00 15 10 5 2.0 g 2.5 0.50 3.0 1.5 2.0 g 2.5 0.48 3.0 0.58 0.56 0.54 0.52 0.50 0.48 k 0.95 0.58 0.56 0.54 0.52 1.5 k 0.58 0.56 0.54 0.52 0.50 0.48 1.05 k k 0.50 0.48 1.05 15 10 5 Z 1.00 300 200 100 0 0.90 z z 0.58 0.56 0.54 0.52 Z 400 300 200 100 0 0.90 0.95 B 16 250 150 14 18 k k 350 250 0.95 18 0.52 350 6 8 12 14 0.52 k 300 0.52 0.52 k 300 0.90 10 200 0.54 0.54 200 2 8 0.54 100 0.50 100 4 0.56 0.56 0.56 50 kde 0.58 copula 0.58 kde 0.58 0.58 copula 40 3.0 Figure 3.9 Contour plots and 3d plots for (B, k) and (g, k). From the contour plots and 3d density plots for each pair, the results obtained through our copula method mimic the corresponding results obtained through the bivariate rejection ABC well. On the other hand, our copula results are more smooth. A strong correlation between B and k is found while for the other pairs the correlations are much weaker, which is consistent with the results in Allingham et al. (2009). As the bivariate distributions are well estimated, it is reasonable to expect a better joint posterior estimate using our copula method in view of the asymptotic normality. 3.3 Excursion set model for heather incidence data 3.3 Excursion set model for heather incidence data Figure 3.10 The heather incidence data. Figure 3.10 shows a picture illustrating the spatial incidence of heather in a 10 × 20 meter region. For more background on the data see Diggle (1981). We use binary variables to represent presence (1) or absence (0) of heather at a particular spatial location. Without loss of generality, we assume that the data are observed on an integer lattice. A medium resolution of the heather incidence data is available in the R package spatstat which will be used here. Excursion sets of Gaussian 41 3.3 Excursion set model for heather incidence data random fields were used to model these data in Nott et al. (1999) and Nott et al. (2011). Let A be a symmetric positive definite matrix. A stationary Gaussian random field Y (t) with mean zero and covariance function R(s, t) = Cov(Y (s), Y (t)) = exp[−(s − t) A(s − t)] where s, t ∈ R2 , is considered. For an introduction to random fields see Adler and Taylor (2007). The u-level excursion set of Y (t) is defined as Eu (Y ) = {t ∈ R2 : Y (t) ≥ u}. It is natural to use binary variable to indicate either inclusion or exclusion in an excursion set for each spatial location and apply this model to the heather data. Let B(i, j) be the binary variable which describes the presence or absence of the heather at the (i, j)th lattice point. Then under our model B(i, j) = I ((i, j) ∈ Eu (Y )) with I(·) denoting the indicator function. It is clear that the distribution of B = {B(i, j) : i = 0, · · · , 255, j = 0, · · · , 511} depends fully on u and A. As such, we can parameterize our model through θ = (θ1 , θ2 , θ3 , θ4 ), where √ θ1 = u, θ2 = log A11 , θ3 = log A22 and θ4 = logit[(A12 / A11 A22 + 1)/2] with Aij denoting the (i, j)th component of A. For this analysis, we adopt the independent prior distributions θ1 , θ4 ∼ N (0, 0.52 ) and θ2 , θ3 ∼ N (−4, 0.52 ). Simulations of Gaussian random fields are obtained with the RandomFields package in R. 42 3.3 Excursion set model for heather incidence data We choose the summary statistics as follows. Let n11 (v) for v ∈ R2 denote the number of pairs of variables in B, separated by displacement v, which are both 1. That is, if v = (a, b) and B(m, n) = B(m + a, n + b) = 1, then this is considered as one pair corresponding to n11 (v). In a similar way, we denote n00 (v) the number of such pairs which are both 0, and n01 the number of such pairs where precisely one of the pair is 1 regardless of the order. Let v1 = (0, 1), v2 = (1, 0), v3 = (1, 1) and v4 = (1, −1). We define the summary statistics s(1), · · · , s(4) as follows: B(i, j)/(256 × 512) s(1) = i,j s(2) = [n11 (v1 ), n00 (v1 ), n01 (v1 )] s(3) = [n11 (v2 ), n00 (v2 ), n01 (v2 )] s(4) = [n11 (v3 ), n11 (v4 ), n00 (v3 ), n00 (v4 ), n01 (v3 ), n01 (v4 )] . As suggested, s(1), · · · , s(4) are marginal informative for θ1 , · · · , θ4 , respectively. For our copula method, we use s(i, j) = (s(i) , s(j) ) as summary statistics for each pair (θi , θj ). For the bivariate regression adjustment, we applied the nonlinear conditional heteroscedastic regression, and used the uniform kernel Kh ( · ) with scale parameter specified to select the 103 simulations closest to each s(i, j) out of the 2 × 103 samples. 43 3.3 Excursion set model for heather incidence data kde 30 20 50 60 70 40 θ3 θ3 70 −4.60 θ2 −4.60 θ2 −4.60 60 40 50 60 70 20 30 50 40 kde 10 −4.50 20 copula 60 70 −4.60 10 −4.50 −4.50 −4.50 copula 44 50 40 30 30 0.00 0.05 0.10 −0.05 0.00 0.05 0.10 −0.05 −4.70 0.00 0.05 0.10 −0.05 0.00 0.05 θ1 θ1 θ1 copula kde copula kde 60 40 20 60 40 20 −4.65 0.05 0.10 −4.70 60 40 20 −4.50 −4.55 −4.60 −0.05 0.00 1 0.05 2 2 −4.45 −4.50 −4.55 −0.10 −0.05 −4.60 0.00 1 −4.65 0.05 0.10−4.70 0.10 Z −4.45 −4.50 −4.55 −4.60 z 60 40 20 −0.10 −0.05 0.00 1 −0.10 θ1 −4.50 −4.55 −4.60 −4.65 −4.70 0.10 −0.05 −4.65 3 −0.05 10 3 −0.10 20 −4.70 −4.70 −4.70 10 0.00 1 0.05 −4.70 0.10 Figure 3.11 Contour plots and 3d plots for (θ1 , θ2 ) and (θ1 , θ3 ). 20 25 35 θ3 45 25 15 35 10 −4.65 25 15 −4.65 −0.05 40 40 30 30 0.00 0.05 0.10 0.15 −0.10 0.00 0.05 0.10 0.15 −4.75 5 −4.75 −4.65 −4.55 −4.45 −4.75 −4.65 −4.55 θ1 θ1 θ2 θ2 copula kde copula kde 4 0.0 −0.1 0.10 0.15 −0.10 −0.05 0.00 1 0.05 0.1 0.0 4 −0.10 −0.05 0.00 1 0.05 0.10 −0.1 −4.5 −4.7 −4.6 −4.6 2 −4.5 −4.7 50 40 30 20 10 0 −4.5 −4.7 −4.6 2 −4.6 3 Z 0.1 −4.45 Z 50 40 30 20 10 50 40 30 20 10 z z 50 40 30 20 10 3 −0.10 10 −4.75 −0.15 −0.15 5 45 20 20 5 35 40 50 θ3 45 −4.55 15 40 θ4 θ4 −4.45 10 15 30 0.05 0.05 25 45 −0.05 5 20 30 kde −4.55 10 35 copula −4.45 kde 0.15 0.15 copula −4.5 0.15 Figure 3.12 Contour plots and 3d plots for (θ1 , θ4 ) and (θ2 , θ3 ). −4.7 3.3 Excursion set model for heather incidence data 5 10 15 10 5 10 15 20 20 25 0.05 35 45 40 30 20 −0.05 35 −0.05 20 40 θ4 θ4 30 30 θ4 35 −0.05 θ4 40 40 35 0.05 0.05 25 0.05 kde 0.15 5 −0.05 copula 0.15 kde 0.15 0.15 copula 45 25 30 25 15 15 −4.45 −4.75 −4.65 −0.15 −4.55 −4.45 −4.75 −4.65 −4.55 −4.75 −4.65 −4.55 θ2 θ3 θ3 copula kde copula kde z 4 −4.75 0.0 −4.70 −4.65 −4.60 −0.1 2 −4.55 −4.50 −4.45 0.1 −4.75 0.0 −4.70 −4.65 −4.60 −0.1 2 −4.55 −4.50 −4.45 0.1 0.0 −4.70 −4.65 −4.60 3 −0.1 −4.55 −4.50 40 30 20 10 0.1 4 Z 0.1 Z 40 30 20 10 40 30 20 10 0.0 −4.70 −4.65 −4.60 −0.1 3 −4.55 −4.50 4 −4.55 θ2 z 40 30 20 10 −4.65 4 −4.75 5 −0.15 −0.15 −0.15 10 Figure 3.13 Contour plots and 3d plots for (θ2 , θ4 ) and (θ3 , θ4 ). A strong correlation between θ2 and θ3 is demonstrated while for the other pairs the correlations are very weak. The results using our copula method imitate those of the bivariate rejection with regression adjustment very well. In addition, our copula results appear to be more smooth. With a good estimate from the bivariate rejection ABC, our copula method will produce a better estimate for each pair and hence the joint posterior approximation can also be improved. 46 CHAPTER 4 Conclusion 4.1 Conclusion For problems of moderate or high dimension, conventional ABC methods which use rejection or importance weighting are of limited use. Regression adjustment methods can be useful in such situations, however it is difficult to validate their accuracy. The marginal adjustment strategy can be applied by combining the advantages from the conventional ABC and the regression adjustment methods. However, the multivariate dependence structure can be very poorly estimated due to the curse of dimensionality for high dimensional problems. As such, even 4.1 Conclusion with the marginal adjustment strategy, the joint posterior is still not accurately estimated. In view of the asymptotic normality of the Bayesian posterior and the more accurate estimate for low dimension problems, we propose the copula method which can extend the applicability of ABC to high dimensional problems. The copula method first estimates each pair of the variables using Gaussian copulas, and then combines them together to estimate the joint density using another Gaussian copula. Because of the asymptotic normality, the multivariate dependence structure completely depends on the bivariate dependence structure of each pair. With more accurate estimates for each pair of the variables, we expect a better approximation to the true posterior with our copula method. 47 48 Bibliography [1] Adler, R. J. and J. E. Taylor (2007). Random fields and geometry. Springer Monographs in Mathematics. Springer. [2] Allingham, D., R. A. R. King and K. L. Mengersen (2009). Bayesian estimation of quantile distributions. Statistics and Computing 19, 189-201. [3] Beaumont, M. A., W. Zhang, and D. J. Balding (2002). Approximate Bayesian computation in population genetics. Genetics 162, 2025-2035. [4] Bernardo, J. M. and A. F. M. Smith (1994). Bayesian Theory. Wiley. [5] Blum, M. G. B. (2010). Approximate Bayesian computation: A nonparametric perspective. Journal of the American Statistical Association 105, 1178-1187. [6] Blum, M. G. B. and O. Francois (2010). Non-linear regression models for approximate Bayesian computation. Statistics and Computing 20, 63-75. Bibliography [7] Bortot, P., S. G. Coles, and S. A. Sisson (2007). Inference for sterelogical extremes. Journal of the American Statistical Association 102, 84-92. [8] Csillery, K., O. Francois and M. G. B. Blum (2011). ABC: An R package for approximate Bayesian computation. http://arxiv.org/abs/1106.2793. [9] Diggle, P. J. (1981). Binary mosaics and the spatial pattern of heather. Biometrics 237, 531-539. [10] Fearnhead, P. and D. Prangle (2012). Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Statist. Soc. B 74, pp. 1-28. [11] Givens, G. H. and J. A. Hoeting (2005). Computational Statistics. Wiley. [12] Goldstein, M. and D. Wooff (2007). Bayes Linear Statistics: Theory and Methods. Wiley. [13] Joe, H. (1997). Multivariate models and dependence concepts. London: Chapman and Hall. [14] Marin, J. M. Molitor, P. Pudlo, C. P. Robert and R. J. Ryder (2011). Approximate Bayesian computational methods. ArXiv Preprint, arXiv:1101.0955. [15] Marjoram, P. J. Molitor, V. Plagnol and S. Tavare (2003). Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 1532415328. [16] Nott, D. J. and T. Ryden (1999). Pairwise likelihood methods for inference in image models. Biometrika 83, 661-676. [17] Nott, D. J., Y. Fan, L. Marshall and S. A. Sisson (2011). Approximate Bayesian computation and Bayes linear analysis: Towards high-dimensional ABC. ArXiv Preprint, arXiv:1112.4755. [18] O’Hagan, A. and J. Forster (2004). Kendall’s Advanced Theory of Statistics: Bayesian Inference. Arnold. 49 Bibliography [19] Prangle, D. (2012). Summary statistics and sequential methods for approximate Bayesian computation. PhD Thesis, Lancaster University, Lancaster. [20] Pritchard, J., M. Seielstad, A. Perez-Lezaun and M. Feldman(1999). Population growth of human Y chromosomes: a study of Y chromosomes microsatellites. Molecular Biology and Evolution 16, 1791-1798. [21] Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Camberidge University Press. [22] Schlather, M. (2011). RandomFields: Simulation and Analysis of Random Fields. R package version 2.0.45. [23] Sisson, S. A., Y. Fan and M. M. Tanaka (2007). Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760-1765. Errta (2009), 106, 16889. [24] Sisson, S. A. and Y. Fan (2011). Likelihood-free Markov chain Monte Carlo. Handbook of Markov Chain Monte Carlo, 319-341. Chapman and Hall/CRC Press. 50 [...]... copula estimate (θi , θj ) together, we use (2.1) again to obtain the joint posterior which will approximate the true posterior asymptotically In the case of finite observations, the Gaussian copula method will keep the same marginal densities previously obtained using conventional ABC methods and model the dependence structure between each pair with a Gaussian copula Our Gaussian copula method allows the... (2) is to compare the bivariate density using bivariate rejection with that using our Gaussian copula estimate Due to the low dimensionality, the bivariate density fi,j can be estimated accurately with a large sample simulated using bivariate rejection If the bivariate density f˜i,j achieved using our Gaussian copula estimate imitates fi,j well for all the pairs, then it is natural to extend our Gaussian. .. the joint posterior using Gaussian copula 26 2.2 A Gaussian copula estimate • For i = 1, · · · , p, identify s(i) ⊂ s that is marginally informative for θi Apply conventional ABC methods with s(i) to select a sample and extract the ith component (θli )nl=1 Then use density estimation methods to obtain the marginal density f˜i based on (θli )nl=1 • Use (2.1) with Λ and (f˜i )pi=1 to obtain the joint... the asymptotic normality of the Bayesian posterior, we suggest a Gaussian copula estimate which first estimates each bivariate distribution and then combines them together to obtain the joint posterior In essence, the idea is to approximate each bivariate distribution using a 2-dimensional Gaussian copula, and then combine all the bivariate distributions to produce the joint distribution using a p-dimensional... obtained using conventional ABC methods When the sample size is relatively large, Λi,j can be well estimated from the sample In light of asymptotic normality of the posterior, it is reasonable to expect that density obtained using the 2-dimensional version of (2.1) can mimic the bivariate density obtained by bivariate rejection ABC well Combining all the Λi,j and bivariate densities of 25 2.2 A Gaussian copula. .. dimensionality and can be well estimated by the Gaussian copula asymptotically Thus, a good estimate of the multivariate dependence structure can be achieved using our copula method in light of the normality Note that we only generate one sample and all subsequent experiments are based on this sample Although many ABC runs may need to 27 2.2 A Gaussian copula estimate performed in our algorithm, the... statistics 1.3.2 Semi-automatic ABC Despite that the posterior mean is suggested to be as a summary statistic, it cannot be applied directly as the posterior mean cannot be evaluated Thus, the posterior mean has to been estimated through simulation The procedure of the semi-automatic ABC approach proposed in Fearnhead and Prangle (2012), is defined as follows: (1) Use a pilot run of ABC to obtain a rough... distribution using a p-dimensional Gaussian copula of which each element is obtained from the corresponding 2-dimensional Gaussian copulas For a Gaussian copula, the bivariate marginals determine the joint distribution so the bivariate marginals can be estimated separately in lower dimensional problems and then the estimates combined First, let us have a look at the Gaussian copula Suppose that θ = (θ1 ,... here corresponds to Λ in (2.1), meaning that for the Gaussian distribution the initial correlation matrix is exactly the correlation matrix of the corresponding Gaussian 2.2 A Gaussian copula estimate copula Under some mild assumptions, the posterior in Bayesian inference is asymptotically normal, which is the content of the classical Bernstein-von Mises theorem This encourages us to use structured... al (2011) 22 2.2 A Gaussian copula estimate 2.2 A Gaussian copula estimate For problems of much higher dimension, the multivariate dependence structure is poorly estimated due to the curse of dimensionality Even with a very accurate marginal estimate, the marginal adjustment strategy will not produce an acceptable joint estimate As such, the applicability of ABC is still limited to relatively low-dimensional ... obtained using conventional ABC methods and model the dependence structure between each pair with a Gaussian copula Our Gaussian copula method allows the application of the rejection ABC to problems... distributions to produce the joint distribution using a p-dimensional Gaussian copula of which each element is obtained from the corresponding 2-dimensional Gaussian copulas For a Gaussian copula, ... simulated using bivariate rejection If the bivariate density f˜i,j achieved using our Gaussian copula estimate imitates fi,j well for all the pairs, then it is natural to extend our Gaussian copula to

Extending ABC methods to high dimensions using gaussian copula

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan