Identifying and characterizing cis regulatory elements in the human genome

200 252 0
Identifying and characterizing cis regulatory elements in the human genome

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

IDENTIFICATION AND CHARACTERIZATION OF CONSERVED CIS-REGULATORY ELEMENTS IN THE HUMAN GENOME ALISON P. LEE B. Computing (Computer Science) (Honours) National University of Singapore, 2004 A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY INSTITUTE OF MOLECULAR AND CELL BIOLOGY & NUS GRADUATE SCHOOL FOR INTEGRATIVE SCIENCES AND ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE 2009 Acknowledgements There are several people I would like to acknowledge: Firstly, my advisor Professor Byrappa Venkatesh for being an excellent and patient mentor, and for devoting so much time and energy to helping me improve my reasoning, writing and presentation skills; Dr Sydney Brenner and Dr Ng Huck Hui (GIS, Singapore), members of my thesis advisory committee, for their scientific advice; Dr Alice Tay for her constant encouragement and discussions at work and conferences, and her work on the fugu Hox clusters project; Esther Koh for her work on the fugu Hox clusters project and Yang Yuchen for his work on the TFCONES project; Gene Yeo, Eddie Loh, Nidhi Dandona for showing me the ropes in bioinformatics; Luo Ming, Wang Jianli, Kevin Lam for useful software/hardware discussions and troubleshooting; Krish Jon Mathavan, Elizabeth Yeoh, Tay Boon Hui and Sumanty Tohari for guidance in laboratory techniques; Patrick Gilligan and Ravi Vydianathan for their valuable comments on manuscripts; Data Storage and Cluster Computing teams at Bioinformatics Institute, led by Lai Loong Fong and Stephen Wong respectively, for speedy response to my requests for server fixes or software and database updates; Arun Kumar, Zhao Zhiyang, Peng Huiling, and Chua Yiwen at A*STAR Biological Resource Centre for their skilled work in DNA microinjections and mouse husbandry; i The members of IMCB Histopathology Unit, especially Keith Rogers and Susan Rogers for histology tips, protocols and equipment; The staff of A*STAR Graduate Academy and NUS Graduate School for Integrative Sciences and Engineering for the handling of administrative affairs; All present and former members of the Comparative Genomics Laboratory and DNA Sequencing Facility in IMCB, Singapore for making the laboratory a conducive and pleasant place to work in; Finally, my greatest appreciation goes to my parents and my brother for their strongest support. ii Table of contents Acknowledgements .i Table of contents iii Summary .vi List of tables . viii List of figures ix List of abbreviations .xi Chapter 1: Introduction 1.1 The Human Genome Project 1.2 Cis-regulatory elements .2 1.3 Methods to identify cis-regulatory elements 1.4 Comparative genomics approach for identifying cis-regulatory elements 20 1.5 Transcription factors (TFs) 24 1.6 Objectives of my work .26 Chapter 2: Materials and methods .28 2.1 Identifying human, mouse and fugu TF-encoding genes .28 2.2 Identifying CNEs in TF-encoding gene loci 29 2.3 Computing CNE statistics of human TF-encoding genes 31 2.4 Enrichment analysis of CNEs within experimentally defined TFBS 31 2.5 Gene Ontology enrichment analysis of human TF-encoding genes 32 2.6 Expression analysis of human TF-encoding genes 33 2.7 Motif finding 33 2.8 Predicting TFBS in human-fugu CNEs .34 2.9 Building and implementing the TFCONES database 35 2.10 Sequencing and annotating the fugu Hox gene clusters 35 iii 2.11 Functional assay of CNEs in transgenic mice 37 Chapter 3: Results - CNEs in transcription factor-encoding genes .45 3.1 TF-encoding genes in human, mouse and fugu .45 3.2 Conserved clusters of TF-encoding genes in human, mouse and fugu 48 3.3 Identification of human-mouse and human-fugu CNEs 51 3.4 Distribution pattern of human-fugu CNEs in the introns and flanking regions of TF-encoding genes 62 3.5 Presence of experimentally verified TFBS within human-fugu CNEs 64 3.6 Functional categories of human TF-encoding genes associated with humanfugu CNEs 66 3.7 Association between human-fugu CNEs and expression profiles of TFencoding genes .69 3.8 Over-represented motifs in the human-fugu CNEs of central nervous system-expressing TF-encoding genes 72 3.9 Database of TFs and CNEs associated with TF-encoding genes .74 3.10 Discussion 77 Chapter 4: Results – CNEs in the Hox gene clusters .82 4.1 The Hox gene clusters 82 4.2 Fugu Hox loci and conserved syntenic blocks .87 4.3 CNEs in the Hox loci .90 4.4 Distribution of CNEs in fugu Hox loci 94 4.5 CNEs in the human Hox loci .95 4.6 Discussion 99 Chapter 5: Results – Functional assay of CNEs associated with a representative TFencoding gene 102 5.1 Introduction 102 5.2 Functions and expression pattern of Lhx2 gene .102 5.3 Organization of the Lhx2 locus in vertebrates .106 5.4 CNEs in the Lhx2 gene locus .109 iv 5.5 Expression patterns of Lhx2, Crb2 and Dennd1a in developing mouse embryos 113 5.6 Functional assay of Lhx2 CNEs in E11.5 transgenic mouse embryos .115 5.7 Site-directed mutagenesis of a predicted motif in Lhx2_CNE2/3 124 5.8 Discussion 128 Chapter 6: Discussion 132 Bibliography 139 Annexes 159 List of my publications 188 v Summary Comparative genomics is a powerful approach for identifying conserved cisregulatory elements in the human genome. Since functional sequences evolve slower than the surrounding neutrally evolving regions, cis-regulatory elements can be identified as conserved noncoding elements (CNEs) in comparisons of human and other vertebrate genomes. In particular, comparison of human with distantly related vertebrates such as fishes increases the likelihood that most predicted CNEs are functional sequences. The objectives of my project were to identify all the CNEs associated with transcription factor (TF)-encoding genes in the human genome by comparison with pufferfish (fugu) sequences, analyze the characteristics of the CNEs and assay the functions of selected CNEs in transgenic mice. I started by building a curated database of all TF-encoding genes in human, mouse and fugu and predicted CNEs (≥ 65% identity over 50 bp) associated with orthologous genes by locus-bylocus global alignments. In total, 1,738 human, 1,495 mouse, and 1,762 fugu TFencoding genes were identified, with 1,145 genes having orthologs in all three genomes. Further analyses focused on the set of 816 DNA-binding TF-encoding orthologous genes. A total of 2,843 human-fugu CNEs (total length ~388 kb) were found to be associated with these TF-encoding genes. An online database was constructed to catalog the human, mouse and fugu TF-encoding genes, together with their associated CNEs, and this database is named TFCONES (Transcription Factor Genes & Associated COnserved Noncoding ElementS; http://tfcones.fugu-sg.org/). The TFCONES database would be useful to researchers interested in studying the regulation of TF-encoding genes and understanding gene regulatory networks in vertebrates. vi The human-fugu CNEs identified showed a significant overlap with experimentally verified transcription factor binding sites (TFBS) of known transcriptional activators and repressors, confirming that some CNEs function as transcriptional enhancers or silencers. In addition, functional enrichment analyses indicated that the CNEs are significantly associated with TF-encoding genes that are involved in regulating development, particularly the development of central nervous system. Furthermore, expression profiling based on publicly available expression data, showed that genes that express most highly in central nervous system tissues are enriched with humanfugu CNEs. Motif discovery within the CNEs of TF-encoding genes that express most highly in the central nervous system, revealed four 8-mer motifs that are likely to be involved in transcriptional enhancer activity in the central nervous system. To verify the functions of CNEs and the motifs, I assayed the CNEs of a representative TFencoding gene, the LIM homeobox gene Lhx2, in transgenic mice. Four out of eight CNEs tested demonstrated enhancer activity by recapitulating Lhx2 expression in the midbrain and hindbrain at embryonic day 11.5. Mutagenesis of a predicted motif in a selected CNE abolished gene expression in the neural tube and dorsal root ganglia, demonstrating that the motif is indeed critical for enhancer activity. vii List of tables Table 1. Fugu scaffolds (assembly v3.0) for the seven Hox loci .36 Table 2. Primer sequences of the eight Lhx2 constructs 38 Table 3. TF-encoding genes in human, mouse and fugu genomes 47 Table 4. Conserved clusters of human, mouse and fugu TF-encoding genes 49 Table 5. Top twenty TF-encoding genes associated with the highest density of humanfugu CNEs in human 57 Table 6. Top twenty TF-encoding genes associated with the highest density of humanfugu CNEs in fugu .58 Table 7. Top twenty TF-encoding genes associated with the highest number and total length of human-fugu CNEs 61 Table 8. Location of human-fugu CNEs in relation to the protein-coding sequence of nearest human TF-encoding genes .63 Table 9. Significantly over-represented and under-represented Gene Ontology (GO) terms (P < 0.01) of CNE-associated human TF-encoding genes .68 Table 10. Over-represented 8-mer motifs in human-fugu CNEs of cluster #3 human TF-encoding genes .73 Table 11. Conserved syntenic fragments at the fugu and human Hox loci .87 Table 12. Human-fugu CNEs in the fugu Hox loci .95 Table 13. Human-fugu CNEs in the human Hox loci 96 Table 14. Details of Lhx2 CNE constructs tested in transgenic mice 111 Table 15. Summary of the expression patterns directed by the CNEs tested 124 viii List of figures Figure 1. A flowchart of protocols used for identifying human, mouse and fugu TFs, and human-mouse, human-fugu CNEs 51 Figure 2. Distribution of lengths of human-mouse and human-fugu CNEs 53 Figure 3. Plot of total length of CNEs associated with DNA-binding TF-encoding genes against the length of non-repetitive noncoding sequence (in kilobases; kb) of human or fugu gene locus 54 Figure 4. Plot of CNE density against the total length of CNEs associated with DNAbinding TF-encoding genes 55 Figure 5. CNEs in the MEIS2 gene locus 62 Figure 6. Distribution of UTR-intronic and internal-intronic human-fugu CNEs .64 Figure 7. TF-encoding genes predominantly expressed in the central nervous system are enriched with CNEs .71 Figure 8. List of human-fugu CNEs associated with a representative TF-encoding gene FOXA2 .74 Figure 9. An image of the location of CNEs relative to the associated TF-encoding gene FOXA2 .75 Figure 10. Conserved syntenic blocks at the fugu and human Hox loci 88 Figure 11. VISTA plot of the MLAGAN alignment of fugu HoxAa locus with human HoxA and mouse HoxA loci 92 Figure 12. VISTA plot of the MLAGAN alignment of fugu HoxDa locus with human HoxD and mouse HoxD loci 93 Figure 13. Profiles of CNEs at the human HoxA, HoxB, HoxC, and HoxD loci 97 Figure 14. Lhx2 gene loci in human, mouse, chicken and fugu .106 Figure 15. Syntenic genes surrounding Lhx2 in human, mouse and fugu .109 Figure 16. CNEs in the Lhx2 locus 110 Figure 17. Ten CNEs associated with the human LHX2 gene locus are located within the introns of the upstream gene DENND1A .112 Figure 18. Expression of Lhx2, Crb2, Dennd1a in E11.5 mouse embryos .114 ix Annex 13. Human-mouse-fugu alignment and predicted TFBS of Lhx2_CNE5 TFBS on the forward strand are shown in blue and TFBS on the reverse strand are shown in red below the human-mouse-fugu sequence alignment. The binding TF and position weight matrix similarity score are listed next to each predicted site. 173 Annex 14. Human-mouse-fugu alignment and predicted TFBS of Lhx2_CNE6 TFBS on the forward strand are shown in blue and TFBS on the reverse strand are shown in red below the human-mouse-fugu sequence alignment. The binding TF and position weight matrix similarity score are listed next to each predicted site. Sequence enclosed in blue box is an instance of a motif over-represented in the CNEs of central nervous system-expressed TF-encoding genes. Motif numbers are listed and described in Table 10. Motif #1 174 Annex 15. Human-mouse-fugu alignment and predicted TFBS of Lhx2_CNE7 TFBS on the forward strand are shown in blue and TFBS on the reverse strand are shown in red below the human-mouse-fugu sequence alignment. The binding TF and position weight matrix similarity score are listed next to each predicted site. 175 Annex 16. Human-mouse-fugu alignment and predicted TFBS of Lhx2_CNE8 TFBS on the forward strand are shown in blue and TFBS on the reverse strand are shown in red below the human-mouse-fugu sequence alignment. The binding TF and position weight matrix similarity score are listed next to each predicted site. 176 Annex 17. Human-mouse-fugu alignment and predicted TFBS of Lhx2_CNE9 TFBS on the forward strand are shown in blue and TFBS on the reverse strand are shown in red below the human-mouse-fugu sequence alignment. The binding TF and position weight matrix similarity score are listed next to each predicted site. 177 Annex 18. Human-mouse-fugu alignment and predicted TFBS of Lhx2_CNE10 TFBS on the forward strand are shown in blue and TFBS on the reverse strand are shown in red below the human-mouse-fugu sequence alignment. The binding TF and position weight matrix similarity score are listed next to each predicted site. 178 Annex 19. Lhx2_CNE1 does not act as a transcriptional enhancer at E11.5 Ventral, lateral and dorsal views of three transgenic embryos of Lhx2_CNE1hsp68Prom-lacZ construct. (A) lacZ expresses almost ubiquitously in the dorsoposterior regions of the embryo, and only in the ventral regions of the forebrain, midbrain and hindbrain. (B) lacZ expression is detected in the hindbrain, dorsal root ganglia (red arrows) and neural tube. (C) Barely noticeable lacZ expression in dorsal root ganglia (red arrows). Scale bar denotes mm in length. A B C 179 Annex 20. Lhx2_CNE2/3 directs reporter expression in neural tube and dorsal root ganglia at E11.5 Ventral, lateral and dorsal views of three transgenic embryos of Lhx2_CNE2/3hsp68Prom-lacZ construct. (A) lacZ expresses strongly in the hindbrain, neural tube and dorsal root ganglia. (B) lacZ expression in the neural tube, dorsal root ganglia and also the dorsal medial regions of the telencephalon, diencephalon and midbrain. (C) lacZ expression in the neural tube and dorsal root ganglia. Scale bar denotes mm in length. A B C 180 Annex 21. Lhx2_CNE5/6 directs neural tube expression at E11.5 Ventral, lateral and dorsal views of three transgenic embryos of Lhx2_CNE5/6hsp68Prom-lacZ construct. (A, B) lacZ expression in the neural tube, extending into the ventral region of the hindbrain. (C) lacZ expression occurs throughout the embryo, including the ventral hindbrain and neural tube. (D) Extensive lacZ expression in the entire head and in the dorsal part of the embryo. Scale bar denotes mm in length. A B C 181 D 182 Annex 22. Lhx2_CNE7 directs lacZ expression in the hindbrain and neural tube at E11.5 Ventral, lateral and dorsal views of two transgenic embryos of Lhx2_CNE7hsp68Prom-lacZ construct. (A) lacZ expression in the hindbrain and neural tube. (B) lacZ expression is observed not only in the hindbrain and neural tube, but also ectopically in the liver and stomach. Scale bar denotes mm in length. A B 183 Annex 23. Lhx2_CNE8 does not act as an enhancer at E11.5 Ventral, lateral and dorsal views of two transgenic embryos of Lhx2_CNE8hsp68Prom-lacZ construct. (A, B) Both embryos exhibited ectopic lacZ expression in various anatomical structures with no reproducible similarities. Scale bar denotes mm in length. A B 184 Annex 24. Lhx2_CNE9 does not act as an enhancer at E11.5 Ventral, lateral and dorsal views of all four transgenic embryos of Lhx2_CNE9hsp68Prom-lacZ construct that exhibit lacZ expression. (A – D) All embryos exhibited ectopic lacZ expression in various anatomical structures with no reproducible similarities. Scale bar denotes mm in length. A B C 185 D 186 Annex 25. Lhx2_CNE10 directs reporter gene expression in the midbrain, hindbrain and neural tube at E11.5 Ventral, lateral and dorsal views of two transgenic embryos of Lhx2_CNE10hsp68Prom-lacZ construct that exhibit lacZ expression. (A, B) lacZ expression was observed in the midbrain, hindbrain and neural tube for both embryos with additional ectopic expression in the eye for (A) and additional expression in the heart for (B). Scale bar denotes mm in length. A B 187 List of my publications In addition to the work presented in this thesis, I was also involved in the identification and characterization of CNEs associated with the human and elephant shark genomes, which are described in more detail in publications #2 and #3. 1. Lee, A.P., Koh, E.G., Tay, A., Brenner, S., and Venkatesh, B. 2006. Highly conserved syntenic blocks at the vertebrate Hox loci and conserved regulatory elements within and outside Hox gene clusters. Proc Natl Acad Sci U S A 103: 6994-6999. 2. Venkatesh, B., Kirkness, E.F., Loh, Y.H., Halpern, A.L., Lee, A.P., Johnson, J., Dandona, N., Viswanathan, L.D., Tay, A., Venter, J.C. et al. 2006. Ancient noncoding elements conserved in the human genome. Science 314: 1892. 3. Venkatesh, B., Kirkness, E.F., Loh, Y.H., Halpern, A.L., Lee, A.P., Johnson, J., Dandona, N., Viswanathan, L.D., Tay, A., Venter, J.C. et al. 2007. Survey Sequencing and Comparative Analysis of the Elephant Shark (Callorhinchus milii) Genome. PLoS Biol 5: e101. 4. Lee, A.P., Yang, Y., Brenner, S., and Venkatesh, B. 2007. TFCONES: a database of vertebrate transcription factor-encoding genes and their associated conserved noncoding elements. BMC Genomics 8: 441. [“Highly Accessed”] 5. Lee, A.P. and Venkatesh, B. 2008. Ultraconserved DNA Sequence Elements in the Human Genome. In Encyclopedia of Life Sciences. John Wiley & Sons, Ltd; Chichester. 6. Wang, J.*, Lee, A.P.*, Kodzius, R., Brenner, S., and Venkatesh, B. 2009. Large number of ultraconserved elements were already present in the jawed vertebrate ancestor. Mol Biol Evol (MBE Advance Access published online on December 3, 2008) (* Co-first author) 188 [...]... vertebrate genome for identifying genes and other functional elements in the human genome (Brenner et al 1993) and it was the second vertebrate genome to be fully sequenced (Aparicio et al 2002), the first being the human gneome Fugu is a particularly attractive model for identifying conserved cis- regulatory elements in the human genome due to its compact genome size (reducing noise in in silico predictions)... elements Hence, their identification and verification remains a challenge The main focus of my project is to identify and verify functional cis- regulatory elements in the human genome 1.2 Cis- regulatory elements Cis- regulatory elements are sequences that regulate the precise spatial and temporal expression of the target gene While the genomic content in every cell in the body is largely the same, important... carried out at the one-cell stage so that ideally, all the cells in the resulting 11 organism contain a copy of the transgene in its genome However, mosaicism arises if cell divisions begin before the transgene can be integrated into the genome especially in the rapidly dividing zebrafish embryo This means that insertion of the transgene into the host genome occurs in only a subset of all cells and reporter... chromatin position effects and to ensure sustained expression of transgenes introduced into the human genome during gene therapy (Recillas-Targa et al 2004) Hence, cis- regulatory elements have become useful in understanding and correcting genetic disorders Furthermore, the identification of cis- regulatory elements and mutations that may occur in them has implications for the evolutionary processes... conserved in human, mouse and several other mammals, and one CNE was determined to regulate the expression of cytokine genes interleukin-4, interleukin-13 and interleukin-5 located in the same genomic interval While human- mouse sequence comparisons can be particularly useful for finding mammalian-specific cis- regulatory elements, this approach tends to identify many false positives due to the relatively... cis- regulatory elements are required to cover extensive stretches of the genome and both strands of the DNA The challenge is not only to identify a cis- regulatory element, but also to dissect the core sequences of a cis- regulatory element that are crucial for its function and to determine their mode of action The methods that have been used to date in identifying and characterizing cis- regulatory elements. .. et al 2008) The role of FRG1 in FSHD is still unknown The above examples of cis- regulatory mutations that are associated with genetic diseases, underscore the importance of cis- regulatory elements in maintaining the correct spatio-temporal expression of crucial genes Besides gleaning new insights into genetic diseases from cis- regulatory elements, clinicians have begun to use cisregulatory elements to... identify cis- regulatory elements Cis- regulatory elements such as transcriptional enhancers lack a well-defined structure similar to that of protein-coding genes They are typically composed of clusters of binding sites for several different transcription factors and there is very limited information on how these binding sites are arranged within the cis- regulatory element Transcription factor binding sites... noncoding functional elements poses a major challenge Comparisons of the human and mouse genomes have revealed that approximately 5% of the human genome is under purifying selection and represent functional sequences (Waterston et al 2002) Of these functional sequences, 1.5% is accounted for by protein-coding genes and the remaining 3.5% is functional noncoding sequence Functional noncoding elements in the. .. polymerase chain reaction TF transcription factor TFBS transcription factor binding site TSS transcription start site UTR untranslated region (of an mRNA) xi Chapter 1: Introduction 1.1 The Human Genome Project A major quest in biology is to understand the function and regulation of all the human genes and their role in human biology and diseases The Human Genome Project launched in 1990 was the first . 1: Introduction 1.1 The Human Genome Project A major quest in biology is to understand the function and regulation of all the human genes and their role in human biology and diseases. The Human. functional cis- regulatory elements in the human genome. 1.2 Cis- regulatory elements Cis- regulatory elements are sequences that regulate the precise spatial and temporal expression of the target. genes and the remaining 3.5% is functional noncoding sequence. Functional noncoding elements in the human genome include noncoding RNA genes, cis- regulatory elements involved in transcriptional

Ngày đăng: 14/09/2015, 14:08

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan