Báo cáo khoa học: From functional genomics to systems biology Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany) Sergii Ivakhno pot

10 478 0
Báo cáo khoa học: From functional genomics to systems biology Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany) Sergii Ivakhno pot

Đang tải... (xem toàn văn)

Thông tin tài liệu

REVIEW ARTICLE From functional genomics to systems biology Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany) Sergii Ivakhno Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, UK Introduction The third EMBL Biennial Symposium, From func- tional genomics to systems biology, was held in Heidel- berg, Germany, 14–17 October 2006. The title of the conference clearly states the major challenges and issues that were addressed by the speakers – how to combine different ‘omics’ technologies and bioinfor- matics ⁄ computational methodologies to address increasingly complex biological questions. The main conference was divided into five separate sessions, which discussed different functional genomic approa- ches in systems biology: (a) A global view of transcrip- tional regulation, (b) Genomics of development and disease, (c) Protein–protein interaction networks and beyond, (d) Towards functional interaction networks, (e) Systems level analysis: from organisms to commu- nities. Table 1 gives a broad overview of topics presented at the meeting according to the systems biology applications, types of high-throughput techniques, and biological networks. From the sheer number of various high-throughput genomic approaches des- cribed at the meeting, it becomes clear that ‘postge- nome’ science has already entered the most exciting period of analyzing biological functions at the sys- tems-wide level. Chromatin immunoprecipitation arrays (chip-on-chip), tiling arrays, DNA microar- rays, synthetic genetic arrays, high-content fluorescent microscopy, protein microarrays, RNA interference Keywords DNA microarray medical applications; functional genomics; genetic interaction networks; networks biology; signalling networks; systems biology Correspondence S. Ivakhno, Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, E4, 5 Forrest Hill, Edinburgh EH1 2QL, UK Fax: +44 (0) 131 6506899 Tel: +44 (0) 131 6676000, ext. 0131 6684266 E-mail: s0567096@sms.ed.ac.uk (Received 30 January 2007, revised 1 March 2007, accepted 12 March 2007) doi:10.1111/j.1742-4658.2007.05794.x This review discusses the talks presented at the third EMBL Biennial Sym- posium, From functional genomics to systems biology, held in Heidelberg, Germany, 14–17 October 2006. Current issues and trends in various sub- fields of functional genomics and systems biology are considered, including analysis of regulatory elements, signalling networks, transcription networks, protein–protein interaction networks, genetic interaction networks, medical applications of DNA microarrays, and metagenomics. Several technological advances in the fields of DNA microarrays, identification of regulatory ele- ments in the genomes of higher eukaryotes, and MS for detection of pro- tein interactions are introduced. Major directions of future systems biology research are also discussed. Abbreviations RNAi, RNA interference; SGA, synthetic genetic array; TF, transcription factor; Y1H, yeast one-hybrid; Y2H, yeast two-hybrid. FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2439 (RNAi) screens, and high-throughput metagenomic sequencing are some of the technologies discussed by the speakers. Computational methods and algorithms were also an integral part of the conference, with various systems biology applications of machine learning, algorithmic network theory, differential equation modelling, and simulation being introduced. In the following, I will discuss some of the talks representing different areas of functional genomics, networks and systems biology. Analysis of regulatory elements in the genomes of higher eukaryotes The first session of the conference began with a talk by E. Birney from the European Bioinformatics Insti- tute (Hinxton, Cambridgeshire, UK). Birney described recent efforts of the ENCODE project (Encyclopaedia of DNA Elements), a multi-institutional collaboration supported by NIH and the Welcome Trust that attempts to map all functional elements in the human genome: promoters, enhancers, repressors ⁄ silencers, exons, origins of replication, sites of replication ter- mination, transcription factor (TF)-binding sites, methylation sites, deoxyribonuclease I-hypersensitive sites, chromatin modifications, and multispecies con- served sequences of as yet unknown function [1]. The pilot phase, which began in September 2003, is less ambitious and targets 44 uniformly distributed regions that comprise 1% of the genome. Birney’s talk empha- sized the problem of mapping TF-binding sites and other elements that regulate transcription. Standardiza- tion of the protocols and comparison of different tech- niques was one of the major challenges encountered in the pilot phase. Another big problem concerned the annotation of the transcription regulatory elements. In contrast with genomes of simple eukaryotes such as yeast, in which the regulatory elements occur upstream of the genes that they regulate, in the human genome they are widely dispersed and occur between and within intones, making them very hard to map. This will probably be the next big challenge for computa- tional biologists, who need to develop new algorithms for detecting regulatory elements with varying position in the genome. Comparative genomic approaches previously gave the best results for finding regulatory elements in eukaryotes [2]; however, additional devel- opments will be required to detect elements dispersed throughout the genome. L. Steinmetz from EMBL (Heidelberg, Germany) described the application of tiling arrays for detection of new transcripts and refinement of boundary, struc- ture, and expression level of coding and noncoding transcripts in the yeast genome [3]. Although the con- cept of using tiling arrays and gene expression to find functional transcribed elements is not new (a review of the topic can be found in [4]), Steinmetz’s group and collaborators developed a new and more sensitive Table 1. Overview of the topics covered in the meeting report according to systems biology applications, types of high-throughput tech- niques, and biological networks. Type of biological networks ⁄ area of functional genomics High-throughput functional genomic techniques Systems biology applications Cancer diagnosis DNA microarray Head and neck cancer [9,10], leukaemia AmpliChip Analysis of regulatory elements Tiling arrays, chromosome conformation capture ENCODE project [1], expression in the yeast genome [3], analysis of globin locus enhancers [5] Transcription regulatory networks Chip-on-chip, Y1H system, DNA microarray Transcription regulatory network during muscle development in Drosophila [19] Epistasis gene networks High-throughput RNAi screens [23,24] Wnt pathway [25] Genetic interaction networks Yeast SGA [29,31], epistatic mini-array profiles [35] Yeast genetic interaction network Chemical–genetic interaction networks Yeast SGA [31,32] Yeast chemical–genetic interaction networks Protein interaction networks Y2H screens, MS-based analysis of protein complexes [39] Coverage and false positives in protein interaction networks [37] Signalling networks ODE modelling Epigenetic inheritance of gene-expression dynamics in single cells using [48] Networks of networks: metagenomics High-throughput DNA sequencing Bacterial communities [50] Functional genomics and systems biology S. Ivakhno 2440 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS oligonucleotide array which contains 6.5 million probes and interrogates both strands of the full genomic sequence (accomplishing 8 nucleotide resolu- tion for double-stranded targets). Significant expres- sion above background was detected for 5104 ORFs (90%) during exponential growth in rich medium. Remarkably, 16% of the transcribed base pairs had not been annotated before, which is rather surprising considering more than 10 years of intensive analysis of the yeast genome. As already mentioned, in many cases regulatory ele- ments are located at distances up to several megabases from their target genes, in which case control of gene expression cannot be mediated through direct physical interaction between genes and their regulatory ele- ments. The development of techniques to detect long- distance interactions was the topic of J. Dekker’s talk from the University of Massachusetts Medical School (Boston, MA, USA). He described the chromosome conformation capture methodology which uses formal- dehyde cross-linking to covalently link interacting chromatin segments in intact cells [5]. Cross-linked chromatin is then solubilized and digested with an appropriate restriction enzyme, which is then followed by intramolecular ligation of cross-linked fragments. The resulting template therefore contains a large col- lection of ligation products that reflect interaction between two genomic loci and can be detected by quantitative PCR using specific primers. The abun- dance of each ligation product can be used in a quan- titative manner to measure the frequency with which the two loci in the genome interact with each other. Dekker’s group applied this technique to the analysis of globin locus enhancers and showed that chromo- some conformation capture has a similar or better sen- sitivity than the chip-on-chip approach. The advantage of chromosome conformation capture is that it can detect regulatory elements that are active only in a particular cellular state, developmental stage or cell type. Functional genomics approaches to diagnosis of diseases One of the goals of systems biology and high-through- put functional genomics is to develop better diagnostic tools that would allow adoption of personalized medi- cine approaches in clinical settings [6]. Medical appli- cations of systems biology and functional genomics were widely discussed at the conference, with several talks devoted to the use of DNA microarrays for can- cer diagnosis and prognosis. For instance, leukaemia comprises more than 20 subgroups, which may require different approaches for successful treatment. Cur- rently, the diagnosis and classification of leukaemia rely on the simultaneous application of multiple tech- niques, such as cytomorphology, histomorphology, cytochemistry and multiparameter flow cytometry, often supplemented by fluorescence in situ hybridi- zation and molecular techniques, such as PCR. These high-cost and time-consuming approaches have encouraged the development of more effective diagnos- tic techniques. The use of DNA microarrays for can- cer diagnosis was proposed more than 10 years ago, yet not a single microarray diagnostic kit has been approved by the FDA. One of the key challenges in using DNA microarrays for cancer diagnosis is the reproducibility of signature genes characterized by dif- ferent groups [7,8]. This issue was addressed by F. Holstege from the Genomics Laboratory (UMC Utrecht, The Netherlands) in his talk on signatures for detection of lymph node metastasis in patients with head and neck cancer. It can often be very difficult to detect lymph node metastases reliably, but their early detection is crucial for the appropriate treatment. Using DNA microarray, Holstege’s group and colla- borators built a 102-gene classifier from 82 tumours, which outperformed current clinical diagnosis tech- niques in its predictive accuracy when independently validated [9]. However, further examination revealed that, when the oldest tumour samples were excluded, the predictive accuracy remained high but the overlap between two signature gene sets found was limited to 49 genes [10]. This is a typical example that led many researchers to question the validity of DNA micro- array approaches for cancer classification [11]. Hols- tege proposed an alternative explanation for such a discrepancy: incomplete overlap may be caused by the presence of a large number of genes with similar pat- terns of expression across samples. This suggests that many predictive genes can be interchanged without influencing the predictive outcome and that multiple, different gene sets can be used for accurate prediction [10]. Holstege described how through repetitive samp- ling they found that 3000 different signature gene sets (comprising 825 unique genes occurring in one set at least) can classify tumour samples with similar high accuracy. Holstege concluded that there is no single set of genes with optimal predictive accuracy and that various signatures can be identified by different insti- tutes or simply by using different samples. This study also exposes the flaw behind common attempts to make signature gene lists as small as possible, the argument being that molecular signatures based on more genes will be less prone to biases towards specific samples. S. Ivakhno Functional genomics and systems biology FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2441 Next, T. Haferlach from Ludwig-Maximilians-Uni- versity (Munich, Germany) described the progress in building the first commercial ampliChip DNA microar- ray for testing leukaemia which will be released by Roche. The major challenge facing clinical trials is the large number of tumour samples that must be analyzed to ensure the high accuracy of signature gene lists, which often results in high costs and time delay. Reporting on the preliminary screens, Haferlach des- cribed a DNA microarray study of 937 bone marrow and peripheral blood samples from 892 patients with all clinically relevant leukaemia subtypes. They were used to build a classifier with overall prediction accu- racy of 95.1%. In the follow up round of clinical trials carried out by Microarray Innovations in Leukaemia (MILE, an international initiative with 11 centres from Europe, USA and Singapore), DNA microarrays are being used to analyze samples from more than 2000 patients The results from this and other studies will help to restrict the number of genes on the ampliChip to about 500 of the most predictive ones. Although the AmpliChip is not the first array to enter the market (the MammaPrint 70-gene signature for diagnosis of breast cancer based on a study by van’t Veer et al . [12] is already available through Agendia), it could be the first one to obtain FDA approval for clinical tests. Haferlach estimates that, once the AmpliChip is avail- able, it will provide a more accurate, faster and cost- saving strategy for diagnosis of leukaemia. From systems to networks biology Talks related to networks biology covered a large por- tion of the meeting. Among many different types of biological networks discussed were gene regulatory net- works, protein interaction networks, genetic networks, signalling networks and networks of bacterial commu- nities. For interested readers, comprehensive surveys of networks biology principles can be found in [13,14]. One of the reasons why networks biology receives such close attention is that the network-based representation of high-throughput biological data can serve as a core around which more comprehensive information about biological models can be arranged. It also provides a natural method for integration of different biological data. Transcription regulatory networks I begin by describing talks that addressed analysis of transcription regulatory networks. Transcription regu- latory networks, first described for Escherichia coli [15] and yeast [16], consist of physical and functional inter- actions between TFs and their target genes represented on the graph [17]. The systematic mapping of TF–target gene interactions has been very successful in unicellular systems using ‘TF-centred’ approaches, such as combi- nation of chromatin immunoprecipitation (ChIP) with promoter DNA microarrays (known as chip-on-chip), which identifies a list of direct target genes for a partic- ular transcription factor under a given set of conditions. However, as suggested by M. Walhout from the Uni- versity of Massachusetts Medical School, metazoan sys- tems are less amenable to application of chip-on-chip methods. First, TFs that are expressed at low levels, in a few cells, or during a narrow developmental interval are not suitable for ‘TF-centred’ experiments. Secondly, antibodies are only available for a very limited num- ber of metazoan TFs, restricting the applicability of chip-on-chip. Walhout described an alternative ‘gene-centred’ approach for elucidating transcription regulatory networks, which uses a high-throughput gateway-compatible yeast one-hybrid (Y1H) system [18]. Y1H is a genetic system based on the reporter gene expression in yeast that detects interactions between a ‘DNA bait’ (e.g. cis-regulatory DNA elements or gene promoters) and ‘protein prey’ (e.g. TFs). When a prey protein binds to the DNA bait, the heterologous activa- tion domain activates reporter gene expression. Thus, physical interactions between repressors ⁄ activators and their DNA targets can be identified. Walhout described an application of the Y1H sys- tem in Caenorhabditis elegans in which her group iden- tified 283 interactions between 72 digestive tract genes and 117 proteins, providing the first set of putative tar- get genes for nearly 10% of all predicted worm TFs. Detailed analysis found that more than 70% of the promoters are bound by at least one of the top 10% most highly connected TFs. In addition, 82% of the promoters are bound by at least one of the other less- well-connected interactors, and more than half of the target promoters bind both. Summarizing these obser- vations, Walhout described a model of the transcrip- tion regulatory network in C. elegans, where genes are subjected to three or more layers of transcriptional control. The first layer consists of global regulators which control the expression of many genes in many different systems. The second layer involves ‘master regulators’ which control the expression of multiple genes involved in specific cellular processes. Finally, the third layer constitutes ‘specifiers’ which fine-tune the expression of a relatively small number of genes. The description of the layered architecture for the C. elegans transcription regulatory network provides an additional level of network hierarchy to previously described network motifs. Quite interestingly, the Functional genomics and systems biology S. Ivakhno 2442 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS layered architecture of the C. elegans network resem- bles dense overlapping regions of the E. coli transcrip- tion network [15], although in the latter case such coherent division into different levels of global regula- tion were not observed. E. Furlong from EMBL devoted her talk to the recent study of the transcription regulatory network during muscle development in Drosophila . The main approach adopted by Furlong’s group is a combination of chip-on-chip arrays with DNA microarrays and immunohistochemistry. Using a combination of these techniques, they obtained a temporal regulatory net- work of Mef2 activity, the key myogenesis regulator during Drosophila embryonic development [19]. Two novel ideas behind this approach are worth mentioning. First, they used gene expression profiling of Mef2 mutant embryos during the time course and found genes requiring Mef2 for their correct expression at var- ious stages of development. This provided functional validation of the chip-on-chip results and distinguished between direct and indirect regulation. Second, the chip-on-chip was itself performed over the time course, which identified temporal patterns of Mef2 target gene regulation. Although most of the reported transcription networks based on chip-on-chip data are static, Fur- long described one of the first examples of a dynamic transcription network which is relevant in the context of developmental biology [20]. This example also reveals other crucial themes in biological networks ana- lysis: integration of different data types for reconstruc- tion of temporal and spatial relations in the networks. As different high-throughput techniques become more established and widespread, we can expect much wider utilization of data integration approaches for building more complex biological networks. Ultimately, this will lead to fusion of different biological networks, such as signalling, transcription and metabolic, into the cellular super network. Several early attempts in this direction have already produced interesting results. For example, Zhang et al. [21] assembled an integrated yeast network in which nodes represent genes (or their protein prod- ucts) and edges represent various biological interac- tions, such as protein–protein interactions, genetic interactions, transcriptional regulation, sequence homology, and expression correlation. A search for sig- nificantly enriched motifs in this integrated network found specific ‘network themes’, higher-order network structures that correspond to various biological phe- nomena, such as ‘compensatory complexes’. Another similar study found that ‘action’ networks (metabolic, co-expression, and interaction) share the same scaffold- ing of hubs, whereas the regulatory network uses differ- ent regulatory hubs [22]. Networks derived from synthetic genetic interactions and RNAi screens Other approaches to the construction of biological net- works focus on functional relations between different genes. RNAi and synthetic lethal screens that are used for building epistatic and genetic networks were also covered at the meeting. N. Perrimon from Harvard Medical School (Boston, MA, USA) described how high-throughput RNAi screens can be used to analyze information flow in Drosophila signal-transduction pathways. One of the key considerations in such screens is the choice of appropriate read-out assays that can accurately assess the effect of gene knock- down on the pathway of interest [23]. Whereas more proximal assays that measure activity near receptors would identify fewer regulators and may miss compo- nents of input branches from other receptors, distal readouts (e.g. transcriptional reporters or morpho- logical outputs through ‘high-content screening’ micro- scopy [24]) may integrate more pathways than is desirable. Therefore, for the comprehensive analysis of a particular signalling pathway, several approaches should be combined to accurately identify correspond- ing phenotypes. Perrimon described one example where 22 000 duplex RNAs were used for identifica- tion of new Wnt pathway targets [25]. The screening method relied on sensitive reporter genes containing T-cell factor-binding sites fused to a minimal promoter upstream of a the luciferase gene. This set-up led to the identification of 238 potential Wnt pathway genes. In the other RNAi screen, DNA microarrays were used as phenotypes to infer epistatic interactions or epistasis gene networks [26,27]. Interestingly, similar approaches were independently developed for the ana- lysis of signalling networks, where kinase inhibitors and multiparameter flow cytometry are used in place of RNAi and DNA microarrays [28]. In this case, availability of the single-cell data from flow cytometry allows accurate de novo reconstruction of signalling networks using machine learning algorithms. However, disadvantages of this approach are the limited availa- bility of phospho-specific antibodies and the difficulty in scaling up the flow cytometry for simultaneous ana- lysis of multiple kinases. C. Boone from The University of Toronto, Canada described two recent extensions to the synthetic genetic array (SGA) technology developed at his laboratory, which are based on detecting synthetic genetic inter- action of essential genes and chemical–genetic inter- actions. The idea behind the original technique is that most yeast genes are nonessential and therefore their knockdowns do not produce any observable S. Ivakhno Functional genomics and systems biology FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2443 phenotypic defects [29]. However, the combination of mutations in two genes that cause cell death or reduced fitness provides a means of mapping genetic interactions. Genetic interactions among essential genes were not examined systematically because of the inherent difficulty in creating and working with hypo- morphic (similar) alleles. Boone described the use of temperature-sensitive conditional alleles based on the tetracycline (tet) promoter that overcomes this chal- lenge. A mutation in a particular query gene is first crossed to an input array of single mutants, and then a series of robotic pinning steps generates the array of double mutants, which is then scored for fitness defects relative to either of the single mutants. With this approach, Boone’s laboratory conducted 30 SGA screens of 575 essential genes and built the correspond- ing genetic network [30]. This network resembles the genetic network of nonessential genes: both have a scale-free topology and most of the interactions do not overlap with protein–protein interactions. However, the most notable property of the essential gene genetic network is its density (median frequency of interac- tions is 3%), which is five times higher than the net- work density for nonessential genes. These results indicate that essential genes are well connected hubs on the genetic interaction network, and that essential pathways are also highly buffered compared with the network of nonessential genes. Interestingly, analogous results were recently reported for the yeast transcrip- tion network [31]. Similar results obtained from the analysis of different biological networks suggest that scale-free architecture is not the only way to produce biological robustness and that distributed architecture may also contribute to the robustness in the same net- work (although it may apply to different nodes in the network, e.g. TFs versus housekeeping genes). SGA can also be used in combination with chemical treatments to identify genes involved in mediating the response to drug compounds [32]. The approach is based on the premise that, if a small molecule disrupts the function of its target protein, then cells with a smaller amount of that target protein would be more sensitive to the compound. In the second part of the talk, Boone described a new screen with 82 compounds against the Saccharomyces cerevisiae-viable deletion set to generate chemical–genetic interaction profiles [32]. The clustering of the resulting data matrix identified sets of compounds with similar biological effects and genes that show sensitivity to similar compounds [33]. Several other talks also discussed analysis of genetic networks. For instance, one limitation of SGA is that only negative interactions can be identified. Conse- quently, interactions that are detected generally involve genes that have unrelated functions, which obscures the biological relevance and interpretation. To over- come this limitation, N. Krogan (University of Tor- onto, Canada) described a new technique, epistatic mini-array profiles, which consists of arrays of all double-mutant combinations for the genes involved in a specific process [34]. This approach involves measur- ing quantitative effects on colony growth, which, unlike looking for viability, can detect both positive and negative interactions. Protein interaction networks Protein interaction networks were also extensively dis- cussed at the meeting. These networks usually repre- sent either direct or indirect (a part of a protein complex) physical interactions between proteins and are typically derived from yeast two-hybrid (Y2H) screens or MS-based analysis of protein complexes (co- AP ⁄ MS) [35]. In most cases, protein interaction net- works are static and represent only a small subset of the true biological interactions. M. Vidal from Har- vard Medical School devoted his talk to the issues of network coverage and the effect of false negatives on the accuracy of the protein interaction network. The small overlap between different Y2H maps is often attributed to low data accuracy. However, Vidal argued that each map covers only 3–9% of the total interactome, so limited overlap should be expected. To test this assumption, Vidal’s group developed a samp- ling algorithm for generation of many low coverage networks with properties similar to the current Y2H maps. In almost 23 000 such comparisons, the interac- tome that was common to each pair comprised only 2.1%, which suggests that it is possible to observe per- fectly accurate samples (without false positives) that have very limited overlap solely because of the low coverage of their maps [36]. Drawing from examples in the genome sequencing community, Vidal proposed a solution to this problem. As any single study cannot possibly cover all the protein interactions, he suggested that individual research groups should continually con- tribute small subnetworks to the global interactome repository in the way it was done during sequencing of the human genome. The incompleteness of protein interaction networks might raise concerns about such well-established con- cepts as scale-free architecture, as it becomes unclear whether extrapolation of network topology from the currently limited data to the whole network can be achieved accurately and with high confidence. Current interactome networks are often attributed with power law degree distribution, in which most proteins interact Functional genomics and systems biology S. Ivakhno 2444 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS with a few partners, whereas a few proteins, ‘hubs’, interact with many partners [37]. In the biological con- text, power law topology might relate to the generic robustness of protein interaction networks, and the hubs may be considered the most suitable targets for drugs. Vidal described a recent study by his group that attempted to relate interactome network coverage to the observable degree distribution [36]. By sampling from random networks with different degree distribu- tions, they created multiple subnetworks of different size (relating to the original random networks). For instance, at 10% of coverage, random networks that did not have power law distribution started exhibiting scale-free behaviour. Although more detailed compar- ison with real Y2H and co-AP ⁄ MS networks suggested that complete a protein interactome map is still more likely to be scale-free, other possibilities cannot be ruled out, especially considering that many technical false positives are auto-activators or sticky proteins (creating nodes of artificially high degree). Affinity purification methods allow macromolecules physically associated with a tagged bait to be retrieved and identified by MS. These methods have been used as large-scale screens in prokaryotic and eukaryotic cells, leading to the construction of many protein interaction maps [38]. However, without genome-wide coverage, assignment of a protein to a particular complex relies heavily on experimental stringency and arbitrary thresholds. A C. Gavin from EMBL described the first genome-wide screen for protein complexes in budding yeast based on tandem affinity purification coupled to MS [39]. This method identified 491 complexes, of which 257 were novel. Commenting on the data analysis, Gavin pointed out that complexes can be partitioned into the core and attachment proteins, which provide diversity to the core and allow execution of functions under different conditions. Using the ‘guilt by association’ principle, Gavin and collaborators also identified functions for several novel modules involved in ribosome biogenesis and RNA metabolism. The functional association was further aided by integration of protein interaction data with data on gene expression, localization, func- tion, evolutionary conservation, protein structure and binary interactions. Finally, Gavin reported the deve- lopment of a new scoring system for measuring the potency of proteins for forming associations, the ‘socioaffinity index’. The socioaffinity index represents the tendency of proteins to associate under different conditions and therefore could be used to analyze the yeast interactome network from the dynamic perspec- tive. The socioaffinity index is similar to several meth- ods developed for detecting community structures in social networks and therefore could be extended with algorithms proposed in that context [40,41]. It would be interesting to compare the ‘community structures’ in protein interaction networks obtained by different algorithms. Signalling networks The issue of modelling signalling networks was also discussed at the meeting. Signalling networks differ from protein interaction or transcription networks in that they are by nature temporal and therefore amenable to modelling of signal propagation in the network. Stochastic and deterministic differential equa- tions (i.e. ODE), process algebra and Boolean kinetics have been used to analyze signalling networks [42]. These approaches attain the highest level of modelling accuracy by incorporating kinetic parameters directly into the network. However, they require availability of complete information about the structure of the signal- ling network and the values of kinetic parameters. Unfortunately, this is not available in many cases, especially when large cascades of 30 or more proteins are considered. Excellent reviews on mechanistic mod- elling of signalling networks can be found in Kholo- denko [43] and Mogilner et al. [42]; the motif- based ⁄ dynamic systems approach is covered in [44]. At least two distinct levels of modelling signalling net- works can be described (although many examples lie between these two extremes). In one approach, com- prehensive ODE modelling of all the species deemed to participate in a particular signal-transduction cascade is attempted with numerical methods (an example of this approach can be found in [45]). An alternative ‘hypothesis-driven’ approach starts by introducing some prior assumptions into the model to simplify it to a few equations that can then be solved analytically. Although the resulting model becomes a highly abstract representation of the signalling network, it can be very powerful in addressing specific questions ([46] contains typical examples). A. van Oudenaarden from Massachusetts Institute of Technology devoted his talk to the epigenetic inheritance of gene-expression dynamics in single cells using a ‘hypothesis-driven’ modelling approach. van Oudenaarden described how, on induction of cell dif- ferentiation, distinct cell phenotypes can be encoded by complex signalling networks that prevent pheno- type reversion even in the presence of significant environmental fluctuations [47]. To explore the key parameters that determine the stability of cellular memory, the galactose network of yeast was used as a model system. One of the advantages of this S. Ivakhno Functional genomics and systems biology FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2445 system over the networks of prokaryotes is that it contains multiple nested feedback loops that bring different functionalities to the complete network. Using fluorescent microscopy and computational energy landscape approaches [48], the van Oude- naarden group revealed intricate combinations of signalling circuits. One of the findings was that the core positive-feedback loop through GAL3 is neces- sary for this cellular memory, whereas a negative- feedback loop through GAL80 competes with the positive GAL3 loop and reduces the potential for memory storage. Consistently, when the negative feedback loop is opened and Gal80p levels are con- trolled constitutively, the memory persistence can be tuned from hours to months. Such observations pro- vide a quantitative understanding of the stability and reversibility of cellular differentiation states. It should be noted that the definition of epigenetic inheritance in this talk was not restricted to nonmutational changes in the chromatin, but comprised all possible sources of inheritance unrelated to DNA sequence, such as distribution and concentration of key regula- tory proteins in the cytoplasm. Networks of networks: metagenomics applications in systems biology Several other avenues in systems and networks biology were also briefly introduced by several speakers. For instance, in the talk ‘Metagenomics of organisms and the air’, E. Rubin from Lawrence Berkeley National Laboratory (Berkeley, CA, USA) showed how high- throughput DNA sequencing approaches can be used to study and characterize organisms that are imposs- ible to grow in the laboratory-controlled environment [49]. Metagenomic approaches rely on sequencing as a tool to characterize microbial communities. Rubin des- cribed a study that investigated the composition of organisms in the air harvested from two densely popu- lated urban buildings. Comparison of air samples with each other and with nearby terrestrial and aquatic environments suggested that indoor air microbes are not random transients from surrounding environments, but rather originate from indoor niches including human occupants. In another study described by Rubin, an approach called ‘reverse genomics’ was used to characterize a symbiotic microbial community in the worm, Olavius algarvensis, which lacks mouth, gut, and nephridia [50]. This worm lives in several sediment layers and forms species-specific associations with extracellular bacterial endosymbionts located just below the worm cuticle. As the symbionts have not been grown in culture, their phylogeny has only been accessible through 16S ribosomal RNA analysis and fluorescence in situ hybridization, which uses reverse genomics to decipher the organism’s functions from its sequence. By shotgun sequencing, Rubin’s group was able to reconstruct the symbiotic relationship between the worm and four different microbes that accounts for the loss of digestive and excretory systems in O. algarvensis. In one plausible model, the selective advantage of harbouring multiple symbionts lies in their ability to supply the worm with energy from the diverse supply of reducing and oxidizing compounds needed for the worm to survive in various environ- ments of different oxidized and reduced sediment layers. The third EMBL Biennial Symposium brought together researchers from several fields to discuss cur- rent issues and trends in various subfields of func- tional genomics and systems biology. The overall meeting and the talks of the individual speakers out- lined several important directions in which systems biology may significantly progress over the next few years. First, analysis of regulatory elements in the human genome could yield novel results with the availability of new technologies such as discussed in the report of the chromosome conformation capture technique, supplemented by new computational algo- rithms for detection of functional elements. In the lat- ter respect, advanced probabilistic graphic modelling approaches that extend hidden Markov models might produce the best results. Secondly, the networks bio- logy paradigm will probably gain a more central role in systems biology research and produce many inter- esting research directions in the areas of algorithmic networks theory (i.e. various topological and cluster- ing measures), flow of biological information (i.e. maximum flow in biological networks), ODE-based modelling of signalling networks, and obviously net- works integration through algorithmic and machine learning approaches. Finally, systems biology should progress from its promise to direct examples of medi- cally relevant research projects. DNA microarrays may be the first successful systems biology ⁄ functional genomics application for diagnosis and treatment of patients with cancer. Another important trend that was noticeable at the symposium was the methodology and scope of systems and networks biology research. The meeting was no longer a place for computer scientists, physicists and biologists who wanted to apply their individual exper- tise for solving complex systems-wide biological prob- lems. It was a meeting of systems biologists who understand the methodologies and paradigms of com- puter science, physics and biology and recognize the Functional genomics and systems biology S. Ivakhno 2446 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS limitations of each individual discipline and its role in systems biology research. More significantly, there was a clear trend towards a general understanding of what constitutes an important problem in systems biology and how it should be resolved by application of rele- vant methods and techniques. References 1 Encode Project Consortium (2004) The ENCODE (ENCyclopedia of DNA Elements) Project. Science 306, 636–640. 2 Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES & Kellis M (2005) Sys- tematic discovery of regulatory motifs in human promo- ters and 3¢ UTRs by comparison of several mammals. Nature 434, 338–345. 3 David L, Huber W, Granovskaia M, Toedling J, Palm CJ, Bofkin L, Jones T, Davis RW & Steinmetz LM (2006) A high-resolution map of transcription in the yeast genome. Proc Natl Acad Sci USA 103, 5320–5325. 4 Royce TE, Rozowsky JS, Bertone P, Samanta M, Stolc V, Weissman S, Snyder M & Gerstein M (2005) Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends Genet 21, 466–475. 5 Dekker J (2006) The three ‘C’ s of chromosome con- formation capture: controls, controls, controls. Nat Methods 3, 17–21. 6 Hood L, Heath JR, Phelps ME & Lin B (2004) Systems biology and new technologies enable predictive and preventative medicine. Science 306, 640–643. 7 Ein-Dor L, Kela I, Getz G, Givol D & Domany E (2005) Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21, 171–178. 8 Novak K (2006) News feature: where the chips fall. Nat Med 12 , 158–159. 9 Roepman P, Wessels LFA, Kettelarij N, Kemmeren P, Miles AJ, Lijnzaad P, Tilanus MGJ, Koole R, Hordijk G-J, van der Vliet PC et al. (2005) An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nat Genet 37, 182–186. 10 Roepman P, Kemmeren P, Wessels LFA, Slootweg PJ & Holstege FCP (2006) Multiple robust signatures for detecting lymph node metastasis in head and neck can- cer. Cancer Res 66, 2361–2366. 11 Michiels S, Koscielny S, Hill C. (2205) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365, 488–492. 12 van ‘T, Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536. 13 Dobrin R, Beg Q, Barabasi A-L, Oltvai Z. (2004) Aggregation of topological motifs in the E. coli tran- scriptional regulatory network. BMC Bioinformatics 5, 10. 14 Tong AHY, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M et al. (2004) Global mapping of the yeast genetic interaction network. Science 303, 808–813. 15 Shen-Orr SS, Milo R, Mangan S & Alon U (2002) Net- work motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31, 64–68. 16 Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, et al. (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804. 17 Blais A & Dynlacht BD (2005) Constructing transcriptional regulatory networks. Genes Dev 19, 1499–1511. 18 Deplancke B, Mukhopadhyay A, Ao W, Elewa AM, Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm L, Reece-Hoyes JS & Hope IA (2006) A gene-centered C. elegans protein–DNA interaction network. Cell 125, 1193–1205. 19 Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM, Eichenlaub MP, Bork P & Furlong EEM (2006) A tem- poral map of transcription factor activity: Mef2 directly regulates target genes at all stages of muscle develop- ment. Dev Cell 10, 797–807. 20 Furlong EE (2004) Integrating transcriptional and sig- nalling networks during muscle development. Curr Opin Genet Dev 14, 343–350. 21 Zhang L, King O, Wong S, Goldberg D, Tong A, Lesage G, Andrews B, Bussey H, Boone C & Roth F (2005) Motifs, themes and thematic maps of an inte- grated Saccharomyces cerevisiae interaction network. J Biol 4,6. 22 Qi Y & Ge H (2006) Modularity and dynamics of cellu- lar networks. PLoS Comput Biol 2, 174. 23 Friedman A & Perrimon N (2006) High-throughput approaches to dissecting MAPK signaling pathways. Methods 40, 262–271. 24 Pepperkok R & Ellenberg J (2006) High-throughput fluorescence microscopy for systems biology. Nat Rev Mol Cell Biol 7, 690–696. 25 DasGupta R, Kaykas A, Moon RT & Perrimon N (2005) Functional genomic analysis of the Wnt-Wingless signaling pathway. Science 308, 826–833. 26 Boutros M, Agaisse H & Perrimon N (2002) Sequential activation of signaling pathways during innate immune responses in Drosophila. Dev Cell 3, 711–722. 27 Markowetz F, Bloch J & Spang R (2005) Non-transcrip- tional pathway features reconstructed from secondary S. Ivakhno Functional genomics and systems biology FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2447 effects of RNA interference. Bioinformatics 21, 4026– 4032. 28 Sachs K, Perez O, Pe’er D, Lauffenburger DA & Nolan GP (2005) Causal protein-signaling networks derived from multiparameter single-cell data. Science 308, 523–529. 29 Tong AHY, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, Robinson M, Raghibizadeh S, Hogue CWV, Bussey H et al. (2001) Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368. 30 Mnaimneh S (2004) Exploration of essential gene func- tions via titratable promoter alleles. Cell 118, 31–44. 31 Balaji S, Iyer LM, Aravind L & Babu MM (2006) Uncovering a hidden distributed architecture behind scale-free transcriptional regulatory networks, J Mol Biol 360, 204–212. 32 Parsons AB, Brost RL, Ding H, Li Z, Zhang C, Sheikh B, Brown GW, Kane PM, Hughes TR & Boone C (2004) Integration of chemical-genetic and genetic inter- action data links bioactive compounds to cellular target pathways. Nat Biotech 22, 62–69. 33 Dueck D, Morris QD & Frey BJ (2005) Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics 21, i144–151. 34 Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF, et al. (2005) Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123, 507–519. 35 Cusick ME, Klitgord N, Vidal M & Hill DE (2005) Interactome: gateway into systems biology. Hum Mol Genet 14 Spec No. 2, R171–81. 36 Han J-DJ, Dupuy D, Bertin N, Cusick ME & Vidal M (2005) Effect of sampling on topology predictions of protein–protein interaction networks. Nat Biotechnol 23, 839–844. 37 Roverato A (2005) A unified approach to the characteri- zation of equivalence classes of DAGs, chain graphs with no flags and chain graphs. Scand J Stat 32, 295–312. 38 Bouwmeester T, Bauch A, Ruffner H, Angrand P-O, Bergamini G, Croughton K, Cruciat C, Eberhard D, Gagneur J, Ghidelli S et al. (2004) A physical and functional map of the human TNF-a ⁄ NF-jB signal transduction pathway. Nat Cell Biol 6, 97–105. 39 Gavin A-C, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al. (2006) Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–6. 40 Girvan M & Newman MEJ (2002) Community struc- ture in social and biological networks. Proc Natl Acad Sci USA 99, 7821–7826. 41 Newman MEJ (2006) From the cover: modularity and community structure in networks. Proc Natl Acad Sci USA 103 , 8577–8582. 42 Mogilner A, Wollman R & Marshall WF (2006) Quan- titative modeling in cell biology: what is it good for? Dev Cell 11, 279–287. 43 Kholodenko BN (2006) Cell-signalling dynamics in time and space. Nat Rev Mol Cell Biol 7, 165–176. 44 Tyson JJ, Chen KC & Novak B (2003) Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signal- ing pathways in the cell. Curr Opin Cell Biol 15, 221–231. 45 Elowitz MB & Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature 403, 335–338. 46 Amonlirdviman K, Khare NA, Tree DRP, Chen W-S, Axelrod JD & Tomlin CJ (2005) Mathematical model- ing of planar cell polarity to understand domineering nonautonomy. Science 307, 423–426. 47 Acar M, Becskei A & van Oudenaarden A (2005) Enhancement of cellular memory by reducing stochastic transitions. Nature 435, 228–232. 48 Becskei A, Kaufmann BB & van Oudenaarden A (2005) Contributions of low molecule number and chromo- somal positioning to stochastic gene expression. Nat Genet 37, 937–944. 49 Tringe SG & Rubin EM (2005) Metagenomics: DNA sequencing of environmental samples. Nat Rev Genet 6, 805–814. 50 Woyke T, Teeling H, Ivanova NN, Huntemann M, Richter M, Gloeckner FO, Boffelli D, Anderson IJ, Barry KW, Shapiro HJ, et al. (2006) Symbiosis insights through metagenomic analysis of a microbial consor- tium. Nature 443, 950–955. Functional genomics and systems biology S. Ivakhno 2448 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS . REVIEW ARTICLE From functional genomics to systems biology Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany) Sergii Ivakhno Institute. be the first successful systems biology ⁄ functional genomics application for diagnosis and treatment of patients with cancer. Another important trend that was noticeable at the symposium was the. (2005) Non-transcrip- tional pathway features reconstructed from secondary S. Ivakhno Functional genomics and systems biology FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation

Ngày đăng: 30/03/2014, 09:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan