A penalized likelihood approach in covariance graphical model selection

A PENALIZED LIKELIHOOD APPROACH IN COVARIANCE GRAPHICAL MODEL SELECTION LIN NAN (B.Sc. National University of Singapore) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY NATIONAL UNIVERSITY OF SINGAPORE 2010 ii Acknowledgements First of all, I would like to show my deepest gratitude to my supervisor A/P Leng Chenlei and co-supervisor Dr. Chen Ying, who conscientiously led me into the field of statistical research. This thesis would not have been possible without their patient guidance and continuous support. I really appreciate their efforts in helping me overcome all the problems I encountered in the past four years. It is my honor and luck to have them, two brilliant young professors as my PhD supervisors. Special acknowledgement also goes to all the professors and staffs in Department of statistics and applied probability. I have been in this warm family for almost eight years. With their help I have built up the statistical skills that can benefit me for my whole life. I can not find an exact word to express my gratitude to the department but I will definitely find a way to reciprocate the family in future. I further express my appreciation to my dear friends Ms. Liu Yan, Ms. Jiang Qian, iii Mr. Lu Jun, Mr. Liang Xuehua, Mr. Jiang Binyan and Dr. Zhang Rongli, for giving me help, support and encouragement during my PhD study. Thanks for your company, my PhD life becomes more colorful and enjoyable. Finally, I am forever indebted to my family. My dear parents, who gave me the courage to pursuit the PhD study at the beginning, and have always been my constant source of support by giving me endless love and understanding. My husband, Meng Chuan, he is my joy, my pillar and my guiding light. This thesis is also in memory of my dear grandmothers. Contents iv Contents Acknowledgements ii Summary vii List of Tables ix List of Figures x Introduction 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Review of penalized approaches . . . . . . . . . . . . . . . . . 1.2.2 Review of graphical model . . . . . . . . . . . . . . . . . . . . 13 1.2.3 Organization of the thesis . . . . . . . . . . . . . . . . . . . . 23 Contents v Methodology 25 2.1 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1 Proof of lemmas . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.2.2 Proof of theorems . . . . . . . . . . . . . . . . . . . . . . . . . 40 Simulation 49 3.1 Simulation settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.1 Simulation results for different models . . . . . . . . . . . . . . 53 3.3.2 Simulation results for models with different dimensions . . . . 60 Real Data analysis 70 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 Call center data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3 Financial stocks Vs. education stocks . . . . . . . . . . . . . . . . . . 72 Contents vi Conclusion and Further Research 78 5.1 Conclusion and discussion . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Bibliography 82 Summary vii Summary There has been a rising interest in high-dimensional data from many important fields recently. One of the major challenges in modern statistics is to investigate the complex relationships and dependencies existing in data, in order to build parsimonious models for inference. Covariance or correlation matrix estimation that addresses the relationships among random variables attracts a lot of attention due to its ubiquity in data analysis. Suppose we have a d-dimensional vector following multivariate normal distribution with mean zero and certain covariance matrix that we are interested in estimating. Of particular interest is to identify zero entries in this covariance matrix, since the zero entry corresponds to marginal independence between two variables. This is referred as covariance graphical model selection, which arises when the interest is to model pairwise correlation. Identifying pairwise independence in this model is helpful to elucidate relations between the variables. We propose a penalized likelihood approach for covariance graphical model selection and a BIC type criterion for the selection of the tuning parameter. An attractive Summary viii feature of a likelihood based approach is its improved efficiency comparing to banding or thresholding. Another attractive feature of the proposed method is that the positive definiteness of the covariance matrix is explicitly ensured. We show that the penalized likelihood estimator converges to the true covariance matrix under frobenius norm with explicit rate. In addition, we show that the zero entries in the true covariance matrix are estimated as zero with probability tending to 1. We also compare the penalized approach with other methods for covariance graphical model, such as sample covariance matrix, SIN approaches proposed by Drton and Perlman(2004), method developed by Bickel and Levina(2008b) and the shrinkage estimator of Ledoit andWolf (2003), in terms of both simulations and real data analysises. The results show that the penalized method not only can provide sparse estimates of the covariance matrix, but also has competitive estimation accuracy. List of Tables ix List of Tables 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.1 4.2 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 54 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 56 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 58 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 60 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 62 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 64 Simulations: Model with d=30 and n=100. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 66 Simulations: Model with d=100 and n=100. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . 66 Simulations: Model with d=100 and n=100. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . 66 Average (SE) KL, QL, OL, FL, FP and FN for Call Center Data with d=84,n=164. 4-fold CV on the training data minimizing the BIC. . . . . 74 Average (SE) KL, QL, OL, FL, FP and FN for Financial stock returns Vs. Education stock returns with d=10,n=49. 4-fold CV on the training data minimizing the BIC. . . . . . . . . . . . . . . . . . . . . . . . . . 75 List of Figures x List of Figures 3.1 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 55 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 57 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 59 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 61 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 63 Simulations: Model with d=10 and n=30. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 65 Simulations: Model with d=30 and n=100. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . . . 67 Simulations: Model with d=100 and n=100. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . 68 Simulations: Model with d=100 and n=100. Average (SE) KL, QL, OL, FL, FP and FN over 50 replications. . . . . . . . . . . . . . . . . . 69 4.1 Call center data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2 Financial stock Vs. Education Stock . . . . . . . . . . . . . . . . . . . 77 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 Chapter 4: Real Data Analysis 74 Table 4.1: Average (SE) KL, QL, OL, FL, FP and FN for Call Center Data with d=84,n=164. 4-fold CV on the training data minimizing the BIC. Method KL QL OL FL Penalization 20.87 599.31 537.74 581.52 (3.26) (93.60) (83.98) (90.82) Group Penalization 30.46 76496.60 2012.24 2170.23 (3.04) (9154.09) (29.65) (32.80) Ledoit 12.22 3876.10 79.05 166.37 (1.04) (100.97) (0.63) (1.29) Bickel 60.03 169186.13 77.53 176.02 16.42 (9093.01) (1.35) (1.33) SIN 91.11 4504.06 1933.79 2139.81 (0.35) (101.61) (21.82) (25.48) Sample 147.77 44507.88 90.02 184.94 (1.22) (321.93) (2.23) (1.71) Chapter 4: Real Data Analysis 75 Table 4.2: Average (SE) KL, QL, OL, FL, FP and FN for Financial stock returns Vs. Education stock returns with d=10,n=49. 4-fold CV on the training data minimizing the BIC. Method KL QL OL FL Penalization 2.22 301.42 54.72 63.70 (0.12) (60.46) (11.89) (11.98) Group Penalization 11.71 222.12 51.36 56.93 (1.58) (35.44) (8.07) (7.97) Ledoit 8.17 94.65 48.95 57.77 (1.02) (14.83) (6.28) (6.61) Bickel 18.64∗ 97265.20 50.95 62.22 ∗ (21.38 ) (27532.69) (5.73) (6.26) SIN 13.96 174.70 51.87 58.69 (1.44) (27.51) (8.11) (8.06) Sample 13.25 678.10 50.49 62.00 (2.03) (127.25) (5.68) (6.23) ∗ There are NaN results, all the contents are calculated without NaN. Chapter 4: Real Data Analysis 76 Figure 4.1: Call center data 61 67 62 64 60 77 68 74 76 71 69 73 72 78 75 83 82 80 84 81 79 10 41 17 24 12 58 53 39 26 38 47 57 59 66 63 55 65 54 51 49 52 48 45 70 46 56 19 15 11 31 37 22 33 27 16 32 21 13 14 28 34 30 18 20 25 29 23 35 40 36 50 43 42 44 20 25 18 23 35 29 56 46 48 52 21 34 30 28 27 33 16 22 15 11 19 43 50 45 42 44 40 36 37 32 31 69 65 66 64 60 63 70 67 59 57 77 68 49 51 53 54 58 55 12 13 14 17 47 62 61 38 26 39 41 24 10 75 83 72 73 78 76 74 71 80 82 84 81 79 Heatmaps of the absolute values of the covariance estimates. Red is magnitude and yellow is magnitude 1. Chapter 4: Real Data Analysis 77 Figure 4.2: Financial stock Vs. Education Stock 10 10 Heatmaps of the absolute values of the covariance estimates. Red is magnitude and yellow is magnitude 1. Chapter 5: Conclusion and Further research 78 Chapter Conclusion and Further Research 5.1 Conclusion and discussion High-dimensional data analysis becomes possible in a lot of areas due to fast development in science and computing. One of the major challenge in modern statistics is to investigate all the complex relationships and dependencies that existed in data. Covariance matrix estimation that addresses the relationships among variables attracts a lot of attention due to its ubiquity in data analysis. However, the number of parameters in the covariance matrix grows quickly with dimensionality, so high dimensional data leads to heavy burden of computation. As a result, the sparsity of the covariance matrix are frequently imposed to achieve a balance between biases and variances. In this thesis we proposed a penalized likelihood approach to estimate covariance matrix in order to strike parsimony on covariance graphical model selection. Chapter 5: Conclusion and Further research 79 Let xi = (xi1 , ., xid )T , i = 1, ., n be a d-dimensional vector following multivariate normal distribution N(0, Σ) and we are interested in estimating the covariance matrix Σ. Of particular interest is to identify zero entries in sigma, since the ij-th entry of Σ, σi j = corresponds to marginal independence of xi and x j . This is referred as covariance graphical model selection, which arises when the interest is to model pairwise correlation. Identifying pairwise independence in this model is helpful to elucidate relationships between the variables. In this thesis we proposed a penalized likelihood approach for covariance graphical model selection and a BIC type criterion for the selection of the tuning parameter. An attractive feature of a likelihood based approach is its improved efficiency comparing to banding or thresholding. Another attractive feature of the proposed method is that the positive definiteness of the covariance matrix is explicitly ensured. We have shown that the penalized likelihood estimator converge to the true covariance matrix under frobenius norm with certain rate that can be obtained explicitly. In addition, we have demonstrated that the zero entries in the true covariance matrix are estimated as zero with probability tending to 1. We also compared the penalized approach with other methods for covariance graphical model selection, such as sample covariance matrix, SIN approaches proposed by Drton and Perlman(2004), threshold method developed by Bickel and Levina(2008b) and the shrinkage estimator of Ledoit andWolf (2003), in terms of both simulation and real data analysis. The results indicated that the penalized method not only can provide sparse estimates of the covariance matrix, but also has competitive estimate accuracy. Chapter 5: Conclusion and Further research 80 5.2 Future research In this thesis, we focused on using penalized approach to identify zero entries in the covariance matrix. The work can be extended to identify the zero block matrixes of the      Σ11 . Σ1m      covariance matrix. In particular, Σ =  . . .  , where each Σi j , i = 1, ., n and       Σm1 . Σmm  j = 1, ., n, is a block matrix, and we would like to identify all the zero block matrices, which is equivalent to identify a block of variables that are independent to each other. The idea of block thresholding firstly proposed by Efromovich (1985) in orthogonal series estimators. More recent works, such as Kerkyacharian and Picard (1996), Cai and Silverman (2001), and Cai and Zhou (2009) have shown that block thresholding has a number of advantages over the regular thresholding, since it simultaneously keeps or kills a group of coefficients rather than individually. Block thresholding increases estimation accuracy by using information of neighboring coefficients. Nevertheless, the degree of adaptivity depends on the choice of block size and threshold level. Further work on this problem will be done in future, including derivation of penalized block estimator, study of estimator’s property, such as convergence rate and sparsity, Chapter 5: Conclusion and Further research and comparison with other approaches that can also provide block estimators. 81 Bibliography 82 Bibliography Rothman, A.J., Levina, E., and Zhu, J. (2009). Generalized threshold of large covariance matrices. J. Amer. Statist. Assoc. 104, 177-186. Akaike, H. (1973). Information theory and an extension and an extension of maximum likelihood principle. InProc. 2nd International Symposium on Information Theory (V. Petrov and F. Csáki, eds.) 267-281. Akadémiai Kiado, ´ Budapest. Anderson, T.W. (1969). Statistical inference for covariance matrices with linear structure. Multivariate anaylsis, II, Ed. P. R. Krishnaiah, pp. 55-56, New Yotk: Academic Press. Anderson, T.W. (1970). Estimation of covariance matrices which are linear combinations or whose inverses are linear combinations of given matrices. Essays in Probability and Statistics, Eds. R. C. Bose S.N. Roy, pp. 1-24, Chapel Hill: University of North Carolina Press. Anderson, T.W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Ann. Statist. 1, 135-141. Antoniadis, A. (1997). Wavelets in statistics: a review (with discussion). Journal of the Italian Statistical Association. 6, 97-144. Bai, Z. and Silverstein, J.w. (2006). Spectral analysis of large dimensional random matrices Science Press, Beijing. Bibliography 83 Bickel, P. and Levina, E. (2008a). Regularized estimation of large covariance matrices. Ann. Statist. 36, 199-227. Bickel, P. and Levina, E. (2008b). Covariance Regularization by Thresholding. Ann. Statist. 36, 2577-2604. Brown, L., Gans, N., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn,S. and Zhao, L. (2005). Statistical analysis of a telephone call center: a queueing-science perpective. J. Am. Statist. Assoc. 100, 36-50. Cai, T. and Siverman, B. W. (2001). Incorporating information on neighboring coefficients into wavelet estimation. Sankhy Ser B 63, 127-148. Cai, T., Zhang, C.-H., and Zhou, H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38, 2118-2144. Cai, T., Zhou, H. (2009). A data-driven block thresholding approach to wavelet estimation. Ann. Statist. 37, 569-595. Chaudhuri, S., Drton, M., and Richardson, T. (2007). Estimation of a covariance matrix with zeros. Biometrika 94, 199-216. Cox, D.R. and Wermuth, N. (1993). Linear dependencies represented by chain graphs (with discussion). Statist. Sci. 8, 204 -218, 247-277. Bibliography 84 Cox, D.R. and Wermuth, N. (1996). Multivariate Dependencies: Models, Analysis, and Interpretation, London: Chapman and Hall. d’Aspremont, A., Banerjee, O., and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and its Applications 30(1), 5666. Dobra, A., Hans, C., Jones, B., Nevins, J.R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90, 196-212. Drton, M. and Richardson, T.S. (2003). A new algorithm for maximum-likelihood estimation in Gaussian graphical models for marginal independence. In U. Kjærulff and C. Meek (eds), Uncertainty in artificial Intelligence: Proceedings of the Nineteenth Conference, pp. 181-191. Drton, M. and Perlman, M. (2004). Model selection for Gaussian concentraion graphs. Biometrika 91, 591-602. Edwards, D.M. (2000). Introduction to Graphical Modelling, and ed. New York: SpringerVerlag. El Karoui, N. (2008). Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Statist. 36, 2717-2756. Efromovich, S. Y. (1985). Nonparametric estimation of a density of unknown smoothness. Theor. Probab. Appl. 30, 557-661. Bibliography 85 Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32, 407-499. Fan, J. (1997). Comments on wavelets in statistics: a review by A. Antoniadis. Journal of the Italian Statistical Association 6, 131-138. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its orcal properties. J. Amer. Statist. Assoc. 96, 1348-1360. Fan, J. and Peng, H. (2004). Nonconcave penalized likelehood with a diverging number of parameters. J. Amer. Statist. Assoc. 96, 1348-1360. Frank, I.E. and Friedman, J.H. (1993). A Statistical View of Some Chemometrics Regression Tools. Technometrics 35, 109-135. Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with graphical lasso. Biostatistics 9, 432-441. Huang, I., Liu, N., Pourahmadi, M., and Liu, L. (2006). Covariance matrix selection and estimation via penalized normal likelihood. Biometrika 93(1), 85-98. Johnstone, I.M. (2001). On the distribution of the largest eigenvalue in principal component analysis. Ann. Statist., 29(2), 295-327. Kauermann, G. (1996). On a dualization of graphical Gaussian models. Scand. J. Statist. 23, 105-116. Bibliography 86 Kerkyacharian, G., Picard, D. and Tribouley, K. (1996). L p adaptive density estimation. Bernoulli 2, 229-147. Khare, K. and Rajaratnam, B. (2009). Wishart distributions for covariance graph models. Technical report, Stanford Univ. Lam, C. and Fan, J. (2008). Sparsistency and rates of convergence in large covariance matrices estimation. Technical report, Princeton Univ. Lauritzen, S. L. (1996). Graphical Models, Oxford: Clarendon Press. Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88, 365-411. Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica 16, 1273-1284. Li, H. and Gui, J. (2006). Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostatistics 7, 302-317. Marcenko, V.A. and Pastur, L.A. (1967). Distributions of eigenvalues of some sets of random matrices. Math. USSR-Sb, 1, 507-536. Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34, 1436-1462. Bibliography 87 Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation. Biometrika, 86, 677-690. Rothman, A.J., Bivkel, P.J., Levina, E. and Zhu, J. (2007). Sparse Permutation Invariant Covariance Estimation. Technical report No. 467, Dept. of Statistics, Univ. of Michigan. Rothman, A.J., Levina, E., and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika. To appear. Schäfer, J. and Strimmer, K. (2005). An empirical Bayes approach to inferring largescale gene association networks. Bioinformatics 21, 754-764. Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 16, 356-366. Shao, J. (1997). An asymptotic theory for linear model selection. Statist. Sin. 7, 221-264. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58, 267-288. Wang, H. and Leng, C. (2007). Unified LASSO estimation via least squares approximation. J. Am. Statist. Assoc. 101, 1418-1429. Wang, H., Li, R. and Tsai, C.L. (2007). On the consistency of SCAD tuning parameter selector. Biometrika 94, 553-558. Bibliography 88 Wang, H., Li, B., Leng, C. (2009). Shringkage tuning parameter selection with a diverging number of parameters. J. R. Statist. Soc. B 71, 671-683. Wermuth, N., Cox, D.R. and Marchetti, G.M. (2006). Covariance Chains. Bernoulli 12, 841-862. Whittaker, J. (1990). Graphical models in applied multivariate statistics, Chichester: John Wilet and Sons. Wong, F., Carter, C.K. and Kohn, R. (2003). Efficient estimation of covariance selection models. Biometrika 90, 809-830. Wu, W.B. and Pourahmadi, M. (2009). Banding sample covariance matrices of stationary processes. Statist. Sinica 19, 1755-1768. Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19-35. Yuan, M. and Lin, Y. (2007). On the non-negative garrotte estimator. J. Roy. Statist. Soc. Ser. B 69, 143-161. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67, 301-320. Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101, 1418-1429. Bibliography 89 Zou, H. and Zhang, H. H. (2009). On The Adaptive Elastic-Net With A Diverging Number of Parameters. Ann. Statist. 37(4), 1733-1751. [...]... Generally speaking, there are two broad classes of covariance matrix estimators: those that assume variables are naturally ordered and those far apart in the ordering are only weakly correlated, e.g., longitudinal data, time series, spatial data or spectroscopy, and those invariant to variable permutations, such as genetics and social science The first class includes banding or taping the sample covariance. .. for covariance and precision matrices under Frobenius norm There are no comprehensive theoretical framework for Bayesian inference for covariance graphical models until Khare and Rajaratnam (2009) Due to the limitation of Bayesian theory, Khare and Rajaratnam constructed a family of Wishart distributions as the parameter space for covariance graphical model, instead of the cone of positive definite matrices... quadratic discriminant analysis (LDA and QDA) and analysis of independence relations in the context of graphical models all need to Chapter 1: Introduction 2 estimate the covariance matrix However, the number of parameters in the covariance matrix grows quickly with dimensionality, so high dimensional data leads to heavy burden of computation As a result, the sparsity assumption of the covariance matrix... provided introduction to the background of this thesis and reviewed penalized approaches and the graphical models Chapter 1: Introduction 24 Chapter 2 is the main result of the thesis We present the main methods and prove the main results In chapter 3 we do simulation analysises to compare our penalized approach to other methods that are also used in covariance graphical model In chapter 4 we apply the penalized. .. sparsistency for all the estimators presented in their paper There has also been considerable interest in bidirected covariance graphical models, where lacking of a bidirected edge between two variables indicates a marginal independence Covariance matrix estimation is a common statistical problem that arises in many scientific applications, such as financial risk assessment and longitudinal study Let X = (X1... entries of the covariance matrix are exactly zero) is frequently imposed to achieve a balance between biases and variances In this thesis, we propose a penalized likelihood approach to estimate covariance matrix in order to strike parsimony on covariance graphical model selection 1.2 Literature review 1.2.1 Review of penalized approaches Consider the linear regression model y = Xβ + ǫ, where y is an n × 1... proposed a SIN approach that produces conservative simultaneous 1-α confidence intervals, and use these confidence intervals to do model selection in a single step Best subset selection and SIN approach improve OLS estimates by providing interpretable models Recently many statisticians have proposed various penalization methods, that usually shrink estimates to make trade-offs between bias and variance,... paper Similar to other banding estimators, its low computational property is vary attractive However Adam did not provide a convergence rate to support his estimator due to technical difficulties Wu and Pourahmadi (2009) established a banded estimator for covariance matrix by banding the sample autocovariance matrix, which is attractive in time series analysis Let X1 , , Xn be a realization of a mean... the following marginal independent structures: Chapter 1: Introduction 16 1 ↔ 2 ↔ 3 ↔ 4 Figure 2 An example of covariance graphical model Actually, statistical inference regarding covariance graphical model selection problem is not well developed For model selection, in principle, one can employ backward elimination or forward selection However, it is now well understood that such a process may suffer... imaging and gene expression arrays Hence, one of the major challenges in modern statistics is to investigate the complex relationships and dependencies existing in data, in order to build a correct model for inference Covariance or correlation matrix estimation that addresses the relationships attracts a lot of attention due to its ubiquity in data analysis Principal Component analysis (PCA), linear and . and quadratic discriminant analysis (LDA and QDA) and analysis of independence relations in the context of graphical models all need to Chapter 1: Introduction 2 estimate the covariance matrix inference. Covariance or correlation matrix estimation that addresses the relationships attracts a lot of attention due to its ubiquity in data analysis. Principal Component analysis (PCA), linear and. likelihood approach to estimate covariance matrix in order to strike parsimony on covariance graphical model selection. 1.2 Literature review 1.2.1 Review of penalized approaches Consider the linear