Sparse dimensionality reduction methods algorithms and applications

SPARSE DIMENSIONALITY REDUCTION METHODS: ALGORITHMS AND APPLICATIONS ZHANG XIAOWEI (B.Sc., ECNU, China) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF MATHEMATICS NATIONAL UNIVERSITY OF SINGAPORE JULY 2013 To my parents DECLARATION I hereby declare that the thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. Zhang Xiaowei July 2013 Acknowledgements First and foremost I would like to express my deepest gratitude to my supervisor, Associate Professor Chu Delin, for all his guidance, support, kindness and enthusiasm over the past five years of my graduate study at National University of Singapore. It is an invaluable privilege to have had the opportunity to work with him and learn many wonderful mathematical insights from him. Back in 2008 when I arrived at National University of Singapore, I knew little about the area of data mining and machine learning. It is Dr. Chu who guided me into these research areas and encouraged me to explore various ideas, and patiently helped me improve how I research. It would not have been possible to complete this doctoral thesis without his support. Beyond being an energetic and insightful researcher, he also helped me a lot on how to communicate with other people. I feel very fortunate to be advised by Dr. Chu. I would like to thank Professor Li-Zhi Liao and Professor Michael K. Ng, both from Hong Kong Baptist University, for their assistance and support in my research. Interactions with them were very constructive and helped me a lot in writing this thesis. I am greatly indebted to National University of Singapore for providing me a full scholarship and an exceptional study environment. I would also like to thank v vi Acknowledgements Department of Mathematics for providing financial support for my attendance of IMECS 2013 in Hong Kong and ICML 2013 in Atlanta. The Centre for Computational Science and Engineering provides large-scale computing facilities which enable me to conduct numerical experiments in my thesis. I am also grateful to all friends and collaborators. Special thanks go to Wang Xiaoyan and Goh Siong Thye, with whom I closely worked and collaborated. With Xiaoyan, I shared all experience as a graduate student and it was enjoyable to discuss research problems or just chat about everyday life. Siong Thye is an optimistic man and taught me a lot about machine learning, and I am more than happy to see that he continues his research at MIT and is working to become the next expert in his field. Last but not least, I want to warmly thank my family, my parents, brother and sister, who encouraged me to pursue my passion and supported my study in every possible way over the past five years. Contents Acknowledgements Summary v xi Introduction 1.1 Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Sparsity and Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sparse Linear Discriminant Analysis 11 2.1 Overview of LDA and ULDA . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Characterization of All Solutions of Generalized ULDA . . . . . . . . 16 2.3 Sparse Uncorrelated Linear Discriminant Analysis . . . . . . . . . . . 21 2.3.1 Proposed Formulation . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2 Accelerated Linearized Bregman Method . . . . . . . . . . . . 24 2.3.3 Algorithm for Sparse ULDA . . . . . . . . . . . . . . . . . . . 29 vii viii Contents 2.4 2.5 Numerical Experiments and Comparison with Existing Algorithms . . 31 2.4.1 Existing Algorithms . . . . . . . . . . . . . . . . . . . . . . . 32 2.4.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . 35 2.4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4.4 Real-World Data . . . . . . . . . . . . . . . . . . . . . . . . . 38 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Canonical Correlation Analysis 3.1 3.2 45 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1.1 Various Formulae for CCA . . . . . . . . . . . . . . . . . . . . 47 3.1.2 Existing Methods for CCA . . . . . . . . . . . . . . . . . . . . 49 General Solutions of CCA . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2.1 Some Supporting Lemmas . . . . . . . . . . . . . . . . . . . . 53 3.2.2 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3 Equivalent relationship between CCA and LDA . . . . . . . . . . . . 66 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Sparse Canonical Correlation Analysis 71 4.1 A New Sparse CCA Algorithm . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.3 4.2.1 Sparse CCA Based on Penalized Matrix Decomposition . . . . 76 4.2.2 CCA with Elastic Net Regularization . . . . . . . . . . . . . . 78 4.2.3 Sparse CCA for Primal-Dual Data Representation . . . . . . . 78 4.2.4 Some Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.3.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.2 Gene Expression Data . . . . . . . . . . . . . . . . . . . . . . 87 Contents 4.3.3 4.4 ix Cross-Language Document Retrieval . . . . . . . . . . . . . . 93 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Sparse Kernel Canonical Correlation Analysis 101 5.1 An Introduction to Kernel Methods . . . . . . . . . . . . . . . . . . . 102 5.2 Kernel CCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3 Kernel CCA Versus Least Squares Problem . . . . . . . . . . . . . . . 108 5.4 Sparse Kernel Canonical Correlation Analysis . . . . . . . . . . . . . 114 5.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.6 5.5.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . 120 5.5.2 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.5.3 Cross-Language Document Retrieval . . . . . . . . . . . . . . 123 5.5.4 Content-Based Image Retrieval . . . . . . . . . . . . . . . . . 127 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Conclusions 133 6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Bibliography 137 A Data Sets 157 Bibliography 147 IEEE Workshop on Neural Networks for Signal Processing XII, H. Bourlard, T. Adali, S. Bengio, J. Larsen, and S. Douglas, eds., 2002, pp. 757–766. [96] G. Kowalski and M. Maybury, Information Storage and Retrieval Systems: Theory and Implementation, Kluwer Academic Publishers, Norwell, MA, USA, 2nd ed., 2000. [97] M. Kuss and T. Graepel, The geometry of kernel canonical correlation analysis, tech. report, Max Plank Institute for Biological Cybernetics, Germany, 2003. [98] M. Lai and W. Yin, Augmented and nuclear-norm models with a globally linearly convergent algorithm, tech. report, Rice CAAM, 2012. [99] P. Lai and C. Fyfe, Kernel and nonlinear canonical correlation analysis, International Journal of Neural Systems, 10 (2001), pp. 365–374. [100] J. Lee and M. Verleysen, Nonlinear Dimensionality Reduction, Springer, 2007. [101] Q. Mai, H. Zou, and M. Yuan, A direct approach to sparse discriminant analysis in ultra-high dimensions, Biometrika, 99 (2012), pp. 29–42. [102] T. Melzer, M. Reiter, and H. Bischof, Nonlinear feature extraction using generalized canonical correlation analysis, in Proceedings of the International Conference on Artificial Neural Networks, 2001, pp. 353–360. [103] L. Merchante, Y. Grandvalet, and G. Govaert, An efficient approach to sparse linear discriminant analysis, in Preceedings of the 29th International Conference on Machine Learning, 2012. ¨ tsch, J. Weston, B. Scho ¨ lkopf, and K.-R. Mu ¨ ller, [104] S. Mika, G. Ra Fisher discriminant analysis with kernels, in Neural Networks for Signal Processing IX, IEEE, 1999, pp. 41–48. 148 Bibliography [105] B. Moghaddam, Y. Weiss, and S. Avidan, Generalized spectral bounds for sparse LDA, in Preceedings of the 23th International Conference on Machine Learning, 2006, pp. 641–648. [106] M. Momma, Efficient computations via scalable sparse kernel partial least squares and boosted latent features, in Proceedings of 11th ACM International Conference on Knowledge Discovery in Data Mining, 2005, pp. 654–659. [107] M. Momma and K. P. Bennett, Sparse kernel partial least squares regression, in Proceedings of 16th International Conference on Computational Learning Theory, B. Schölkopf and M. K. Warmuth, eds., 2003, pp. 216–230. ¨ ller, S. Mika, G. Ra ¨ tsch, K. Tsuda, and B. Scho ¨ lkopf, [108] K.-R. Mu An introduction to kernel-based learning algorithms, IEEE Transactions on Neural Networks, 12 (2001), pp. 181–202. [109] B. K. Natarajan, Sparse approximation solutions to linear systems, SIAM Journal on Computing, 24 (1995), pp. 227–234. [110] Y. Nesterov, A method for unconstrained convex minimization problem with the rate of convergence O(1/k ), Doklady Akademii Nauk SSSR, 269 (1983), pp. 543–547. [111] M. K. Ng, L. Liao, and L. Zhang, On sparse linear discriminant analysis algorithm for high-dimensional data classification, Numerical Linear Algebra with Applications, 18 (2011), pp. 223–236. [112] S. Osher, Y. Mao, B. Dong, and W. Yin, Fast linearized Bregman iteration for compressive sensing and sparse denoising, Communications in Mathematical Sciences, (2010), pp. 93–111. [113] C. H. Park and H. Park, A comparison of generalized linear discriminant analysis algorithms, Pattern Recognition, 41 (2008), pp. 1083–1097. Bibliography [114] E. Parkhomenko, D. Tritchler, and J. Beyene, Sparse canonical correlation analysis with application to genomic data integration, Statistical Applications in Genetics and Molecular Biology, (2009). Issue 1, Article 1. ´ III, Multi-label prediction via sparse infinite CCA, [115] P. Rai and H. Daume in Proceedings of the Conference on Neural Information Processing Systems (NIPS), 2009. [116] B. Recht, M. Fazel, and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, SIAM Review, 52 (2010), pp. 471–501. [117] B. Richard, Adaptive Control Processes: A Guided Tour, Princeton University Press, 1961. [118] R. Rosipal and L. J. Trejo, Kernel partial least squares regression in reproducing kernel Hilbert space, Journal of Machine Learning Research, (2001), pp. 97–123. [119] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), pp. 2323–2326. [120] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1986. [121] L. K. Saul and S. T. Roweis, Think globally, fit locally: Unsupervised learning of low dimensional manifolds, Journal of Machine Learning Research, (2003), pp. 119–155. ¨ lkopf, S. Mika, C. J. Burges, P. Knirsch, K.-R. Mu ¨ ller, [122] B. Scho ¨ tsch, and A. J. Smola, Input space versus feature space in kernelG. Ra based methods, IEEE TRANSACTIONS ON NEURAL NETWORKS, 10 (1999), pp. 1000–1017. 149 150 Bibliography ¨ lkopf and A. J. Smola, Learning with Kernels: Support Vector [123] B. Scho Machines, Regularization, Optimization, and Beyond, MIT Press, 2002. ¨ lkopf, A. J. Smola, and K.-R. Mu ¨ ller, Nonlinear compo[124] B. Scho nent analysis as a kernel eigenvalue problem, Neural Computation, 10 (1998), pp. 1299–1319. [125] S. Shalev-Shwartz, N. Srebro, and T. Zhang, Trading accuracy for sparsity in optimization problems with sparsity constraints., SIAM Journal on Optimization, 20 (2010), pp. 2807–2832. [126] J. Shao, Y. Wang, X. Deng, and S. Wang, Sparse linear discriminant analysis by thresholding for high-dimensional data, The Annals of Statistics, 39 (2011), pp. 1241–1265. [127] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. [128] H. Shen and J. Z. Huang, Sparse principal component analysis via regularized low rank matrix approximation, Journal of Multivariate Analysis, 99 (2008), pp. 1015–1034. [129] B. Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, 1986. [130] B. K. Sriperumbudur, D. A. Torres, and G. R. G. Lanckriet, Sparse eigen methods by D.C. programming, in The 24th International Conference on Machine Learning, 2007, pp. 831–838. [131] , A majorization-minimization approach to the sparse generalized eigenvalue problem, Machine Learning, 85 (2011), pp. 3–39. [132] N. Subrahmanya and Y. C. Shin, Sparse multiple kernel learning for signal processing applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), pp. 788–798. Bibliography [133] M. Sugiyama, Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis, Journal of Machine Learning Research, (2007), pp. 1027–1061. [134] L. Sun, S. Ji, and J. Ye, Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2011), pp. 194–200. [135] D. Swets and J. Weng, Using discriminant eigenfeatures for image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (1996), pp. 831–836. [136] L. Tan and C. Fyfe, Sparse kernel canonical correlation analysis, in Proceedings of 9th European Symposium on Artificial Neural Networks, 2001, pp. 335–340. [137] J. B. Tenenbaum, Mapping a manifold of perceptual observations, in Advances in Neural Information Processing, vol. 10, MIT Press, 1998, pp. 682– 688. [138] R. Tibshirani, Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society (Series B), 58 (1996), pp. 267–288. [139] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, Sparsity and smoothness via the fused Lasso, Journal of the Royal Statistical Society, Series B, (2005), pp. 91–108. [140] M. E. Tipping, Sparse Bayesian learning and the relevance vector machine, Journal of Machine Learning Research, (2001), pp. 211–244. [141] D. Torres, B. Sriperumbudur, and G. Lanckriet, Finding musically meaningful words by sparse CCA, in NIPS Workshop on Music, the Brain and Cognition, 2007. 151 152 Bibliography [142] J. A. Tropp, Just relax: Convex programming methods for subset selection and sparse approximation, tech. report, Caltech, 2004. [143] J.-P. Vert and M. Kanehisa, Graph-driven features extraction from microarray data using diffusion kernels and kernel CCA, in Advances in Neural Information Processing Systems, 2003. [144] A. Vinokourov, J. Shawe-taylor, and N. Cristianini, Inferring a semantic representation of text via cross-language correlation analysis, in Advances in Neural Information Processing Systems, 2003. [145] S. Waaijenborg, P. C. de Witt Hamer, and A. H. Zwinderman, Quantifying the association between gene expressions and DNA-markers by penalized canonical correlation analysis, Statistical Applications in Genetics and Molecular Biology, (2008). Issue 1, Article 3. [146] G. Wahba, Spline Models for Observational Data, vol. 59 of CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, Philadelphia, 1990. [147] , Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV, Advances in kernel methods − Support Vector Learning, MIT Press, 1999, pp. 69–88. [148] J. A. Wegelin, A survey of partial least squares (PLS) methods, with emphasis on the two-block case, tech. report, University of Washington, 2000. [149] Z. Wen, W. Yin, D. Goldfarb, and Y. Zhang, A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation, SIAM Journal on Scientific Computing, 32 (2010), pp. 1832–1857. [150] A. Wiesel, M. Kliger, and A. O. Hero, A greedy approach to sparse canonical correlation analysis, http://arxiv.org/abs/0801.2748, (2008). Bibliography [151] D. M. Witten and R. Tibshirani, Extensions of sparse canonical correlation analysis with applications to genomic data, Statistical Applications in Genetics and Molecular Biology, (2009). Issue 1, Article 28. [152] D. M. Witten and R. Tibshirani, Penalized classification using Fisher’s linear discriminant, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73 (2011), pp. 753–772. [153] D. M. Witten, R. Tibshirani, and T. Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10 (2009), pp. 515–534. [154] K. J. Worsley, J. b. Poline, K. J. Friston, and A. C. Evans, Characterizing the response of PET and fMRI data using multivariate linear models, NeuroImage, (1997), pp. 305–319. [155] S. Wright, R. Nowak, and M. Figueiredo, Sparse reconstruction by separable approximation, IEEE Transactions on Signal Processing, 57 (2009), pp. 2479–2493. ¨ lkopf, and G. Bakır, A direct method for building sparse [156] M. Wu, B. Scho kernel learning algorithms, Journal of Machine Learning Research, (2006), pp. 603–624. [157] M. Wu, L. Zhang, Z. Wang, D. Christiani, and X. Lin, Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection, Bioinformatics, 25 (2009), pp. 1145–1151. [158] Y. Yamanishi, J. P. Vert, A. Nakaya, and M. Kanehisa, Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis, Bioinformatics, 19 (2003), pp. i323–i330. 153 154 Bibliography [159] J. Yang and Y. Zhang, Alternating direction algorithms for -problems in compressive sensing, SIAM Journal on Scientific Computing, 33 (2011), pp. 250–278. [160] J. Ye, Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems, Journal of Machine Learning Research, (2005), pp. 483–502. [161] J. Ye, Least squares linear discriminant analysis, in Preceedings of the 24th International Conference on Machine Learning, 2007, pp. 1087–1094. [162] J. Ye, R. Janardan, V. Cherkassky, T. Xiong, J. Bi, and C. Kambhamettu, Efficient model selection for regularized linear discriminant analysis, in Proceedings of the 15th ACM International Conference on Information and Knowledge Management, 2006. [163] J. Ye, R. Janardan, Q. Li, and H. Park, Feature extraction via generalized uncorrelated linear discriminant analysis, IEEE Transactions on Knoledge and Data Engineering, 18 (2006), pp. 1312–1321. [164] J. Ye and Q. Li, A two-stage linear discriminant analysis via QRdecomposition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (2005), pp. 929–941. [165] J. Ye, T. Li, T. Xiong, and R. Janardan, Using uncorrelated discriminant analysis for tissue classification with gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics, (2004), pp. 181– 190. [166] J. P. Ye and T. Xiong, Computational and theoretical analysis of null space and orthogonal linear discriminant analysis, Journal of Machine Learning Research, (2006), pp. 1183–1204. Bibliography 155 [167] W. Yin, Analysis and generalizations of the linearized Bregman method, SIAM Journal on Imaging Sciences, (2010), pp. 856–877. [168] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for -minimization with applications to compressed sensing, SIAM Journal on Imaging Sciences, (2008), pp. 143–168. [169] H. Yu and J. Yang, A direct LDA algorithm for high-dimensional data with application to face recognition, Pattern Recognition, 34 (2001), pp. 2067–2070. [170] M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society, Series B, 68 (2006), pp. 49–67. [171] S. Yun and K.-C. Toh, A coordinate gradient descent method for 1- regularized convex minimization, Computational Optimization and Applications, 48 (2011), pp. 273–307. [172] W. M. Zheng, L. Zhao, and C. R. Zou, An efficient algorithm to solve the small sample size problem for LDA, Pattern Recognition, 37 (2004), pp. 1077– 1079. [173] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, Series B, 67 (2005), pp. 301–320. [174] H. Zou, T. Hastie, and R. Tibshirani, Sparse principal component analysis, Journal of Computational and Graphical Statistics, 15 (2006), pp. 265– 286. Appendix A Data Sets In numerical experiments of this thesis, we have used many different types of data to evaluate our algorithms, including gene expression data and image data. For convenience and future reference, we describe the acquisition and some important information of these data sets in this appendix. Gene Expression Data Colon: Colon cancer data set contains the expression measurements of 40 tumor and 22 normal colon tissues, that is k = 2, for 6,500 human genes that are measured using the Affymetrix technology. A selection of 2,000 genes with highest minimal intensity across the samples has been made and was further processed in [44], the data set is available at http://stat.ethz.ch/~dettling/bagboost.html. Leukemia: Leukemia data set consists of samples from patients with either acute lymphoblastic leukemia (ALL) or acute myeloid leukemia (AML), that is k = 2. The data set can be downloaded from http://stat.ethz.ch/~dettling/bagboost. html. 157 158 Chapter A. Data Sets Prostate: Prostate cancer raw data comprise the expression of 52 prostate tumors and 50 non-tumor prostate samples, obtained using the Affymetrix technology. It was processed in [44] and the data set is available at http://stat.ethz. ch/~dettling/bagboost.html. Lymphoma: Lymphoma is a data set of the three most prevalent adult lymphoid malignancies. The data set is available at http://stat.ethz.ch/~dettling/ bagboost.html. SRBCT: SRBCT (Small Round Blood Cell Tumor) data set has 2308 genes and 63 experimental conditions, Burkitt Lymphoma (BL), 23 Ewing Sarcoma (EWS), 12 neuroblastoma (NB), and 20 rhabdomyosarcoma (RMS). The data set is available at http://stat.ethz.ch/~dettling/bagboost.html. Brain: Brain tumor dataset contains n = 42 microarray gene expression profiles from k = different tumors of the central nervous system, that is, 10 medulloblastomas, 10 malignant gliomas, 10 atypical teratoid/rhabdoid tumors (AT/RTs), primitive neuro-ectodermal tumors (PNETs) and human cerebella. The raw data were originated using the Affymetrix technology and the processed data set is available at http://stat.ethz.ch/~dettling/bagboost.html. Image Data Yale: Yale database contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. Each image of data set Yale64×64 was downsampled to 64 × 64 pixels. Yale face data is available at http://cvc.yale.edu/projects/ yalefaces/yalefaces.html. 159 UMIST: UMIST Face Database consists of 575 images of 20 individuals (mixed race/gender/appearance). Each individual is shown in a range of poses from profile to frontal views and images are numbered consecutively as they were taken. The files are all in PGM format, approximately 220 x 220 pixels with 256-bit grey-scale. In our experiments, each image in UMIST was downsampled to 112×92 pixels. UMIST face data is available at http://www.sheffield.ac.uk/eee/research/iel/research/ face. ORL: ORL data set contains ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The files are in PGM format, and the size of each image is 92 × 112 pixels, with 256 grey levels per pixel. The database can be retrieved from http: //www.cl.cam.ac.uk/Research/DTG/attarchive:pub/data/attfaces.tar.Z. In our experiment, the size of each image is resized to 64 × 64 pixels for ORL64×64 . Palmprint: Palmprint database is available at http://www4.comp.polyu.edu. hk/~biometrics/. The PolyU Palmprint Database contains 7752 grayscale images corresponding to 386 different palms in BMP image format. Around twenty samples from each of these palms were collected in two sessions, where around 10 samples were captured in each session. We selected 100 different palms from this database. samples from each of these palms were collected in two sessions, where samples were captured in the first session and the second session, respectively. All images were compressed to 64 × 64 pixels. SPARSE DIMENSIONALITY REDUCTION METHODS: ALGORITHMS AND APPLICATIONS ZHANG XIAOWEI NATIONAL UNIVERSITY OF SINGAPORE JULY 2013 Sparse Dimensionality Reduction Methods: Algorithms and Applications Zhang Xiaowei 2013 [...]... recognition and microarray data analysis Besides avoiding the curse of dimensionality, there are many other motivations for us to consider dimensionality reduction For example, dimensionality reduction can remove redundant and noisy data and avoid data over-fitting, which improves 1.2 Dimensionality Reduction 5 the quality of data and facilitates further processing tasks such as classification and retrieval... investigate sparse CCA and propose a new sparse CCA algorithm Besides that, we also obtain a theoretical result showing that ULDA is a special case of CCA Numerical results with synthetic and real-world data sets validate the efficiency of the proposed methods, and comparison with existing state-of-the-art algorithms shows that our algorithms are competitive Beyond linear dimensionality reduction methods, ... thesis focuses on sparse dimensionality reduction methods, which aim to find optimal mappings to project high-dimensional data into low-dimensional spaces and at the same time incorporate sparsity into the mappings These methods have many applications, including bioinformatics, text processing and computer vision One challenge posed by high dimensionality is that, with increasing dimensionality, many... Rd , f (x) is a low-dimensional representation of x The subject of dimensionality reduction is vast, and can be grouped into different categories based on different criteria For example, linear and non-linear dimensionality reduction techniques; unsupervised, supervised and semi-supervised dimensionality reduction techniques In linear dimensionality, the function f is linear, that is, xL = f (x) = W... solution and a sparse solution [24, 25, 28, 48, 49] In the thesis, we address the problem of incorporating sparsity into the transformation matrices of LDA, CCA and kernel CCA via 1 -norm minimization or regularization Although many sparse LDA algorithms [34, 38, 101, 103, 105, 111, 126, 152, 157] and sparse CCA algorithms [71, 114, 145, 150, 151, 153] have been proposed, they are all sequential algorithms, ... the sparse transformation matrix in (1.1) is computed one column by one column These sequential algorithms are usually computationally expensive, especially when there are many columns to compute Moreover, there does not exist effective way to determine the number of columns l in sequential algorithms To deal with these problems, we develop new algorithms for sparse LDA and sparse CCA in Chapter 2 and. .. 4, respectively Our methods compute all columns of sparse solutions at one time, and the computed sparse solutions are exact to the accuracy of specified tolerance Recently, more and more attention has been drawn to the subject of sparse kernel approaches [15, 156], such as support vector machines [123], relevance vector machine [140], sparse kernel partial least squares [46, 107], sparse multiple kernel... Chapter 3 Extensive experiments and comparisons with existing state-of-the-art sparse CCA algorithms have been done to demonstrate the efficiency of our sparse CCA algorithm • Chapter 5 focuses on designing an efficient algorithm for sparse kernel CCA We study sparse kernel CCA via utilizing established results on CCA in Chapter 3, aiming at computing sparse dual transformations and alleviating overfitting problem... in handling non-linear data and can successfully find non-linear relationship between two sets of variables It has also been shown that solutions of both CCA and kernel CCA can be obtained by solving generalized eigenvalue problems [14] Applications of CCA and kernel CCA can be found in [1, 4, 42, 59, 63, 72, 92, 93, 102, 134, 143, 144, 158] 1.3 Sparsity and Motivations One major limitation of dimensionality. .. computationally expensive Sparsity is a highly desirable property both theoretically and computationally as it can facilitate interpretation and visualization of the extracted feature, and a sparse solution is typically less complicated and hence has better generalization ability In many applications such as gene expression analysis and medical diagnostics, one can even tolerate a small degradation in performance . proposed methods, and comparison with existing state-of-the-art algorithms shows that our algorithms are competitive. Beyond linear dimensionality reduction methods, we also investigate sparse. SPARSE DIMENSIONALITY REDUCTION METHODS: ALGORITHMS AND APPLICATIONS ZHANG XIAOWEI (B.Sc., ECNU, China) A THESIS SUBMITTED FOR. the curse of dimensionality. To deal with this problem, many significant dimensionality reduction methods have been proposed. However, one major limitation of these dimensionality reduction techniques

Sparse dimensionality reduction methods algorithms and applications

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Acknowledgements

Summary

Introduction

Curse of Dimensionality

Dimensionality Reduction

Sparsity and Motivations

Structure of Thesis

Sparse Linear Discriminant Analysis

Overview of LDA and ULDA

Characterization of All Solutions of Generalized ULDA

Sparse Uncorrelated Linear Discriminant Analysis

Proposed Formulation

Accelerated Linearized Bregman Method

Algorithm for Sparse ULDA

Numerical Experiments and Comparison with Existing Algorithms

Existing Algorithms

Experimental Settings

Simulation Study

Real-World Data

Conclusions

Canonical Correlation Analysis

Background

Various Formulae for CCA

Existing Methods for CCA

General Solutions of CCA

Some Supporting Lemmas

Main Results

Tài liệu cùng người dùng

Tài liệu liên quan