Báo cáo khoa học: Universal positions in globular proteins From observation to simulation ppt

7 384 0
Báo cáo khoa học: Universal positions in globular proteins From observation to simulation ppt

Đang tải... (xem toàn văn)

Thông tin tài liệu

Universal positions in globular proteins From observation to simulation Nikolaos Papandreou 1 , Igor N. Berezovsky 2,3 , Anne Lopes 4 , Elias Eliopoulos 1 and Jacques Chomilier 4 1 Laboratory of Genetics, Agricultural University of Athens, Greece; 2 Department of Structural Biology, The Weizmann Institute of Science, Rehovot, Israel; 3 Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA; 4 Equipe Biologie Structurale, LMCP, Universite ´ s Paris 6 and Paris 7, Paris, France The description of globular protein structures as a n e nsem- ble of c ontiguous Ôclosed loops Õ or Ôtightened end fragments Õ reveals fold elements crucial for the formation of stable structures and for navigating the very process o f protein folding. These are the ends of the loops, which are spatially close t o each o ther but are situated apart in the polypeptide chain by 25–30 r esidues. They also c orrelate with the l oca- tions of highly conserved hydrophobic residues (referred to as topohydrophobic), in a structural alignment of the members of a protein family. This study analysed these positions in 111 representatives of d ifferent protein folds, and then c arried out dynamic Monte Carlo simulations of the first steps of the folding process, aimed at predicting the origins of the assembling folds. The simulations demon- strated that there is an obvious trend f or certain sets of residues, named Ômostly interacting r esiduesÕ,tobeburiedat the early stages of the folding process. Location of these residues at t he loop ends and c orrelation with topohydro- phobic positio ns are demonstrated, t hereby giving a route t o simulations of the protein folding process. Keywords: folding nucleus; hydrophobic core; lattice simu- lation; protein folding. Despite t he continu ously incr easing number of e xperiment- ally determined protein structures, many new folds are s till to be discove red. This was illustrated c learly in a rec ent study [1], where a plot of the numbe r of prote in families vs. the number of resolved complete genomes resulted in a quasi-linearly increasing f unction. Elucidating t he evolu- tionary mechanisms leading to the em ergence of a finite number of protein folds [2,3] from the v ast number of protein sequences [4,5], as well as the mechanisms of the formation of mature protein globules [6], remains a topic both o f g reat challenge and inter est. T he latter mechanisms are related to the physical basis of protein structure formation and stability [7], and thus can point to possible evolutionary routes [8]. This study is based o n univers al structural units of protein folds, named Ôclosed loopsÕ [9] or Ôtightened-end fragmentsÕ (TEFs) [10]. These major elements are universally present in all types of protein folds and h ave t he following features in common: (a) they usually s tart a nd end in t he hydrophobic core [11]; (b) they form loop-like structures of nearly standard size (25–30 amino acid residues); (c) they serve as universal units of protein domain structure [12]; (d) the ends of these elements ( or so-called l o cks [13]), mainly correspond to clusters of hydrophobic amino acids in general (WIMVYLF), and h ighly conserved ones, the topohydrophobic (TH) positions [14,15], in particular. Determination of t he TH positions is based on the analysis of multiple structural alignmen ts of members of a protein family, limited to a p air s equence i dentity with a maximu m of 30%. TH positions are of particular importance for the formation and stability of the protein core [16]. From a dynamic point of view, the early formation of a nucleus composed of TH positions would favor the formation of closed loops and c onsiderably speed up the folding process [17]. The coupled concepts of TH and closed loops/TEFs therefore o ffer a s imple and general scenario for the folding mechanism o f globular proteins [11,15] and p rovide a set of critical positions in the protein core [10,11,13]. The loop structure of g lobular proteins is a general concept, inde- pendent from second ary s tructure, a s well a s f rom the particular folding mechanism of each protein [9,10,13]. This stu dy addresses the question o f p redicting these critical positions from the s equence, a task of major importance to approach the structure of a protein of unknown f olding. T o successfully build such a structure, numerous pieces of information have to be collected by combining various meth ods. An i nitial calculation o f c ritical positions could be a first step, providing a f rame of structural restraints, as TEF limits and TH residues are located mainly inside the protein core. The notion of topohydrophobic positions suggests that the forces that bury these residues a nd lead to a s table core do not rely on the details of the amino acid side chain structure, but rather on an adequate succession of hydrophobic and polar amino a cid residues a long the polypeptide chain. Thus simplified protein m odels, such a s lattice ones, are adequate tools f or calculations aim ed at locating critical r esidues. Correspondence to N. Papandreou, Laboratory of Genetics, Agricul- tural University of Athe ns, Iera Odos 75, 11855 Athens, Greece. Fax: +30 2105294322, Tel .: +30 210529 4372, E-mail: papandre@aua.gr Abbreviations: MIR, mostly interacting residues; PDB, protein data bank; SCOP, structural classification of proteins; TEF, tightened end fragment; TH, topohydrophobic. (Received 2 9 June 200 4, revised 2 2 September 2004, accepted 15 October 2004) Eur. J. Biochem. 271, 4762–4768 (2004) Ó FEBS 2004 doi:10.1111/j.1432-1033.2004.04440.x This study was carried out on a dataset of 111 globular proteins with well-defined structures in the Protein Data Bank (PDB), that were representative of various f olds, and for which the TEFs were available. For a subset of 73 proteins of the above database, the TH positions have also been determined. The initial stages of folding were simulated using a simplified model, which consists o f an alpha-carbon reduced representation of the polypeptide chain on a 24-first neighbour lattice. A standard Monte Carlo algorithm dynamically simulated the folding process an d a statistical mean force potential was used to describe the interactions between noncontigu ous residues. A commonly a ccepted lattice model has been used [18] and was focused on the first stages of folding process, by measuring the tendency of amino acids to be packed inside the h ydrophobic core, depending on the peculiarities of polypeptide chain sequence. Starting from random conformations, the Monte Carlo simulations revealed that a subset of hydrophobic residues had a strong tend ency to be buried. These residues, named Ômostly interacting residuesÕ (MIR), were found to statisti- cally match TEF limits and TH positions. These r esults are in agreement with the hydrophobic collapse mechanism, which can be f urther generalized onto the nucleation–condensation mechanism, a h ybrid o f h ier- archical and hydrophobic collapse mechanism [23,24]. Materials and methods The p rotein database consisted o f 111 globular protein chains, r epresenting 78 different folds, according to t he structural classification of p roteins ( SCOP classification) [22]. In detail, there are 26 a class proteins, 23 b class, 26 a + b class, 18 a/b class and 18 o f t he small proteins class, providing a balanced representation of the major known folds. The polypeptide chain lengths var y between 50 and 250 residues. Simulations have b een carried out using a Ca represen- tation of the polypeptide chains and the lattice geometry (Fig.1)isasin[18]. On an underlying cubic lattice (Fig. 1, dotted lines) with edges of unit length, contiguous al pha carbons are c onnec- ted by v ectors o f t he form (± 2, ± 1, 0 ) ( Fig. 1, solid lines). Thelengthofsuchavectoris ffiffiffi 5 p lattice units and is equivalent to 3.8 A ˚ , the typical distance b etween contiguous alpha carbons in proteins. In this geometry, for residue i, there are 24 possible positions for residue i + 1 to occupy. This kind of polypeptide chain projection a llows for a more realistic representation of the polypeptide chain [18]. Two spatial constraints are implemented. First, the distance between noncontiguous alpha carbons cannot be less than 3.8 A ˚ ( ffiffiffi 5 p lattice units), and second (contrary to cubic lattice, where only a ngles o f 9 0° and 180° are possible), limit angles here are 66° and 143° (seven possible values), approximating the range of pseudo-angle s in natural proteins [19]. The d ifferent nature of amino acids is taken into a ccount in the force field used to attribute an energy value to each chain conformation. The distance-independent 20 · 20 residue pair energy matrix of Miyazawa and Jernigan was used [20]. In detail, if two noncontiguous residues i and j are found within a d istance s maller or equal to 5 .88 A ˚ ,atermE ij is added t o the total energy, depending on their nature. The maximum interaction range of 5 .88 A ˚ corresponds to ffiffiffiffiffi 12 p lattice units and seems a reasonable estimate for the mean noncovalent interaction range b etween amino acid residues. For each protein, 100 different i nitial conformations were randomly generated and used as starting points for 100 simulation runs, to avoid dependency from the initial state. The only constraint placed on initial states is their noncompactness, in the sense that amino acid residues placed far away i n the se quence were not allowed t o b e close in space, to avoid clustering due to particular initial state conformation. Quantitatively, this constraint introduces a minimum spatial distance, dmin, according to the separ- ation Delta ¼ |i–j| between r esidues i and j: (1) Delta ¼ 6‚10, dmin ¼ 7A ˚ ;(2)Delta¼ 11‚15, dmin ¼ 11 A ˚ ;(3) Delta ¼ 16‚20, dmin ¼ 19 A ˚ ; (4) Delta more than 20, dmin ¼ 27 A ˚ . The single residue movements [18] are of two kinds; end flip movement for the N and C terminal re sidues and corner movements for the o thers. The choice of the move s et is more or less arbitrary, as the e lementary one-residue moves are s ufficient to bring the protein t o a folded state. In th is case, the restriction to elementary m oves only, apart from its simplicity, permits a sequential a nalysis of the c hain tendency t o f orm c ompact fragments around particular amino acid residues from the beginning of the simulation. After each move, the calculated conformational energy was subjected to a standard Metropolis criterion, at constant temperature. Because the goal was to analyse the propensity of residues to be buried from the start o f folding, w e ensured that the maximum number of Monte Carlo s teps was sufficient to allow formation of compact chain fragments. Due to the serial nature of the algorithm, this time limit is correlated to protein c hain length L. It was empirically determined that for small proteins of about 50 residues, the value t max is around 10 6 Monte Carlo steps. Thus, t he following linear relation was adopted t o generalize t max to proteins of any length L: t max ¼ INT (10 6 L/50), where INT is integer part, because t max is an integer by definition (Monte Carlo steps). Fig. 1. The lattice model. The solid line represents the backbone from Ca to C a positions, while the dotted line i s the underlying cubic lattice. Ó FEBS 2004 Universal positions in globular proteins (Eur. J. Biochem. 271) 4763 For each simulation, 10 4 records of i ntermediate confor- mations were taken at regular time intervals. As the number of simulations per protein is 100 (one for each initial state), the end result is a set of 10 6 records per protein. For every recorded confor mation, and for each amino acid residue the number o f residues w ith which it is in noncovalent interaction was calculated. In spatial terms, these noncovalent neighbours are the amino acid residues lying w ithin a distanc e of 5.88 A ˚ or ffiffiffiffiffi 12 p lattice units. For a given p rotein and for residue i, at the r-th record, the number of no ncovalent neighbours i s n c(i,r). The t ime mean of this quantity is NCðiÞ¼ 1 10 6 X 10 6 r¼1 nc(i,r) NC(i) values a re r ounded t o the nearest integer. This mean number of noncovalent neighbours is a quantitative m eas- ure of the tendency of a residue to be buried from solvent. The higher the NC(i), the stronger this tendency. If NC is the mean value of N C(i) over t he sequence for a given protein, the residues for which NC(i) is significantly higher than NC are of particular interest a nd are called mostly interacting residues (MIRs). Their selection requires fixing a cut-off value above the mean value NC. It was found that NC(i) varies b etween 1 a nd 8 a nd that NC ¼ 4 for all studied proteins. Figure 2 presents the distribution of the different values of N C(i) over the amino acid residues o f all 111 proteins. The most probab le value is four, w hich coincides with the mean sequence value, which is also fou r for all proteins as stated above. From this distribution, it appears that 13% of residues have a number o f noncovalent neighbours equal to o r higher than s ix, which was a dopted as the l owest NC(i) value for considering residue i as a MIR. In order t o validate this m odel, once the positions of MIRs were determined they were compared to TEF limits a nd to topohydrophobic positions. The comparison with TEFs was performed on the complete database of 111 proteins. The comparison with TH positions was performed on a 73- protein subset of this database, where these positions were determined. For the remaining 38 pr oteins, the calculation o f TH positions was not possible, because t o obtain this a t l east four 3D structures of m embers of the same family are required, with a pair i dentity not e xceeding 30% [14,15]. This critirion w as not fullfilled for these 38 cases. The P DB codes [ 21] of t he database are given in Table 1. Results The Monte Carlo algorithm for folding simulation has been applied to the entire protein dataset and the histograms NC(i), containing the distribution of noncovalent neigh- bours along the amino acid sequence, have been obtained for each protein. In Fig. 3 t he positions of TEFs, TH and MIR for 10 proteins of the database representative of t he various classes as determined by SCOP [22] are illustrated. Among the 1920 calculated MIRs, 92% were hydropho- bic, following the definition of topohydrophobic residues (i.e. they belonged to the set ÔVIMWYLFÕ). Also, the total numbers of MIRs and TH positions, in t he 73-protein subset where they are compared, are relatively close (1299 MIRs vs. 1011 TH). In the same subset, t he total number of TEFs was 309; thus the number of TEF limits w as 618, about half the number of MIRs. To assess the overall quality of agreement betwee n predicted critical positions (MIR) and structure-defined ones (TH and TEF limits), a statistical analysis is required. This has b een carried out over the whole database, i.e. over a ll 111 proteins for the comparis on between MIR and TEF limits and f or the subset of 73 proteins for the comparison between MIR a nd TH. T he results are presented in t wo histograms in Figs 4 and 5. The histogram of Fig. 4 gives t he comparison between MIR a nd TH positions and is constructed as follows. Eac h TH position is p laced at the origin of the abscissa. T hen, the neighbouring MIRs t hat are closer to this central TH than to any other TH arel ocated. Their number is plotted a s a function of their s equence distance with respect to the central TH. This is r eproduced for all THs along all the 73 proteins of the data s et. T hus Fig. 4 s hows a histogram of the s ep aration between TH and t he closest M IR. The plotted distances range from )20 to +20, and MIRs l ying at distances greater t han ± 20 residues from t he closest T H a re added t o the histogram at t he ± 2 0 positions. The second histogram (Fig. 5) follows the same rules and concerns the comparison of MIR to TEF limits. It is constructed using the whole database of 111 proteins. From observation o f Figs 4 and 5 it is evident that comparison of MIR with TH and TEF limits clearly pre se nts a p eak a t t he origin. T his i s a n indication that the residues predicted to be MIRs actually do correspond to TH positions. They a lso statistically correlate with TE F limits , which are mostly hydrophobic [ 13] as it was already shown that most TH position s are located in or in vicinity of TEF ends [10]. T he agreement between MIR and TH is very clear and 63% of M IR were found within ± 5 positions from a T H residue. The TEF histogram presents two main secondary maxima at positions ± 3 and 57% of MIR w as found within ± 5 positions from a TEF limit. Th is good agreement between prediction and analysis [13] is o f great i nterest i n the prediction of elements of the protein core from the s equence. Discussion The existence of critical positions in protein s tructures, punctuated by TH positions and/or TEF limits, is of great importance f or protein folding and stability. Consecutive formation of the globule c ore [10,11,17] composed essen- tially of these residues [13] leads to tremendous optimi- Fig. 2. Distribution of the mean number of noncovalent neighbours over all 1 11 sequences of the dataset. 4764 N. Papandreou et al.(Eur. J. Biochem. 271) Ó FEBS 2004 Table 1 . A list of the PDB codes, name s and S CO P c la sses of the pro teins studied. The TEFs are known for all these proteins. Proteins with known TH positions a re in bold. The uppercase letters at the end of the code correspond to the chain. PDB code Name SCOP PDB code Name SCOP PDB code Name SCOP 1aep Apolipophorin-III a 2sns Staphylococcal nuclease b 1gmpA RNase Sa a + b 1utg Uteroglobin a 1yna Xylanase II b 1aba Glutaredoxin a/b 2mhr Myohemerythin a 2ayh Bacillus 1–3, 1–4-b-glucanase b 1opr Orotate phosphoribosyltransferase a/b 256bA Cytochrome b562 a 1lcl Serine esterase b 1ble Fructose permease, subunit Iib a/b 1aa0 Fibritin a 2pelA Legume lectin b 3cla Chloramphenicol acetyltransferase a/b 1occD Cytochrome c oxidase a 1knb Adenovirus fibre b 5nll Flavodoxin a/b 1poc Phospholipase A2 a 2stv STNV coat protein b 3chy Signal transduction protein a/b 1lis Lysin a 1pmy Pseudoazurin b 1 cls Cutinase a/b 1lbd Retinoid-X receptor a a 1qabA Transthyretin b 1dhr Dihydropteridin reductase a/b 2cy3 Cytochrome c3 a 2plv3 Picornavirus b 5p21 cH-p21 Ras a/b 2ilk Interleukin-10 a 1cbs Cellular retinoic- acid-binding protein b 1asu Retroviral integrase, catalytic domain a/b 1rro Oncomodulin a 1ivpA 2 (HIV-2) protease b 1lbbA Glutamate receptor ligand binding core a/b 2sas Calcium-binding protein a 1ptf Histidine-containing phosphocarrier a + b 1tml Cellulase E2 a/b 4cpv Parvalbumin a 1ubi Ubiquitin a+ b 1tpfB Triosephosphate isomerase a/b 1bvd Myoglobin a 1frd 2Fe-2S ferredoxin a + b 1brsA Endonuclease a/b 1hbg Glycera globin a 153 L Lysozyme, Goose a + b 1akz Uracil-DNA glycosylase a/b 2lhb Lamprey globin a 1lsg Lysozyme, Chicken a + b 1rvvA Lumazine synthase a/b 2mhbA Hemoglobin (horse) a 1acf Profilin a + b 1 ns5A Hypothetical protein YbeA a/b 1dkeA Hemoglobin (human) a 1ctf Ribosomal protein L7/12 a + b 1jkeB D-Tyr tRNAtyr deacylase a/b 1eca Erythrocruorin a 1aihA Integrase a + b 1iodG Coagulation factor X small 1lki Leukemia inhibitory factor a 1apyA Glycosylasparaginase a + b 1dtdB Carboxypeptidase inhibitor small 3cytO Mitochondrial cytochrome c a 1ast Astacin a + b 1icfI MHC class II p41 invariantchain fragment small 3c2c Cytochrome c2 a 1dtp Diphtheria toxin a + b 2bbkL Methylamine dehydrogenase small 1 bp2 Phospholipase A2 a 1nox NADH oxidase a + b 1sgpI Ovomucoid III domain small 1enh DNA-binding protein a 2pii Signal transduction protein a + b 1ajj ldl Receptor small 2erl Pheromone a 1durA Ferredoxin II, Peptostreptococcus a + b 1i8nA Anti-platelet protein small 1pht Phosphatidylinositol 3-kinase b 1fxd Ferredoxin II, Desulfovibrio gigas a + b 1ejgA Crambin small 1pwt a-Spectrin, SH3 domain b 1c0bA Ribonuclease A a + b 1ehs Heat-stable enterotoxin B small 1semA Signal transduction protein b 1shaA c-src Tyrosine kinase a + b 1tgj TGF-b3 small 1cauB Seed storage protein 7 s vicillin b 1ag2 Prion protein domain a + b 4rxn Rubredoxin, Clostridium pasteurianum small 1reiA Immunoglobulin b 1abrA Abrin A-chain a + b 1caa Rubredoxin, Archaeon Pyrococcus furiosus small 1cdcA CD2, first domain b 1plfB Platelet factor 4 a + b 1fas Fasciculin small 2 lm Macromycin b 1mgsA Chemokine (growth factor) a + b 1pk4 Plasminogen small 1anu Cohesin-2 domain b 1hucB (Pro)cathepsin B a + b 1hpi HIPIP, Ectothiorhodospira vacuolata small 1f3g Glucose-specific factor III b 2act Actinidin a+ b 1hip HIPIP, Allochromatium vinosum small 1sno Staphylococcal nuclease b 2 ci2 Chymotrypsin inhibitor CI-2 a + b 1knt Collagen type VI small 1gpc DNA-binding protein b 1fkb FK-506 binding a + b 1edmB Factor IX small Ó FEBS 2004 Universal positions in globular proteins (Eur. J. Biochem. 271) 4765 zation of the folding process, by reducing the conform- ational s pace to be explored. Thus, the prediction of these ÔhotÕ residues becomes an important step in approaching the native three-dimensional structure. A first approach to this goal was undertaken in this study. The guiding hypothesis was that, in order to achieve fast folding, Fig. 3. Examples of comparison of MIR, TH and TEF for 10 sequenc es of various folds. In each example, the PDB code (with t he chain) is given, followed by t he name, the SCOP class and the f old of t he protein in p are ntheses. The followin g lines re prese nt the sequen ce and t he TEFs. The residues belong ing t oaTEFareindicatedÔIÕ. In case o f TEF overlap, two lines are u sed for this representation (for example in protein 1 shaA). The next line s hows TH positions, where t he corresponding residues are indicated ÔTÕ. The final line shows MIR residues, indicated by ÔMÕ. For 3chy and 5p21, due to the sequence length, the re sults appear in two consecu tive blocks. 4766 N. Papandreou et al.(Eur. J. Biochem. 271) Ó FEBS 2004 critical residues should have a tendency to contact each other and thus form the origins of the hydrophobic core. The results confirmed this hypothesis. Using a simple alpha-carbon lattice m odel, formation of t he nucleation sites a t i nitial steps of t he folding process was demon- strated. These results suggest that folding i nitiation can be based on the early formation o f a se t of nucleation s ites around selected hydrophobic residues [10,11,13]. This is e ssentially the basis of the hydrophobic c ollapse m echanism [23], which supposes formation of hydrophobic tertiary interactions that initiate secondary structure. It can be extended onto a unified nucleation–condensation mechanism, which is a combination of hierarchical and hydrophobic collapse mechanisms [23,24]. In the latter case, h ydrophobic t ertiary interactions are c onsolidated at the s ame t ime as e lements o f secondary structure (with possible variations of t he kinetics of the m echanism caused by the different intrinsic stabilities of the secondary structural elements). These models have been developed from experiments and simulations of folding and u nfolding of several small proteins [23,24] and particularly from the analysis of the residual s tructure of denatured s tates, which a re thought to correlate to t he nucleation s ites. T he comparison of MIR predictions with this type of data is being c onsidered for future studies. The secondary peaks in the histogram representing the correlation between MIR and TEF (Fig. 5) come f rom the p roteins b elonging mainly to the a class. For these folds, the TEF limits are often located inside a helices and are mainly hydrophobic. Sometimes, the predicted MIR are not exactly these limits but are the nearest hydropho- bic residues, which in a helix are located three positions away because of the a-helix periodicity. This observation is in full agreement with t he definition of the van der Waals locks, as extended (three to five residues long) segments of polypeptide chains interacting with each other, and thus forming Ôloop-n-lockÕ structures in globu- lar proteins [13]. The main c onclusion o f this study is that burying MIR positions can serve as the creation of a nchors for sequential formation of closed loops. T hese results remarkably c orro- borate experimental evidence on the initial stages of the folding process. NMR analysis of folding intermediates of protein bovine pancreatic trypsine inhibitor [25] revealed loop formation i n early, non-native states, stabilized by nonlocal interactions. Also, an NMR s tudy on the f olding of lysozyme [26] showed the early formation of hydrophobic clusters, which are linked together by l ong-range i nter- actions. These inter actions were shown not to occur in the native structure, but they are apparently important for keeping the loop structure and thereby speeding up the folding procedure. The appearance of these essential f eatures in this folding simulation p ermits an initial estimation of the anchor regions for loop formation. This approach therefore provides a set of structural constraints from first principles for an unknown structure. This information could be incorporated at the early steps of a prediction method for building protein structures from the sequence by producing a nchor residues known to b elong to the structural core. In a second stage they can be introduced as a set of constraint distances in a more d etailed m odeling p rocess. Acknowledgements This project has been funded by a Concerted Action from the European Union, QLG2-CT-2002–01298, and by the Greek-French bilateral PLATO program (grant no 04146WM). I. N. B. was also supported by the Post-Doctoral Fellowship of the Feinberg Graduate School, Weizmann Institute o f Science. References 1. Kunin, V., Cases, I ., E nright, A.J., de L orenzo, V . & Ouzo unis, C.A. (2 003) Myriads of protein famil ies, and still counting. Genome Biol. 4, 401. 2. Koonin, E.V ., Wol f, Y.I. & Karev, G. P. (2002) The structure of the protein universe and g eno me evolution. Nature 420, 2 18–223. 3. Xia, Y. & Levitt, M. (2004) Simulating protein evolution in sequence and structure space. Curr. Opin. Struct. Biol. 14, 202– 207. 4. Rost, B . (2002) Did evolution leap to create the protein universe? Curr. Opin. Struct. Biol. 12, 409–416. 5. Liu, J. & Rost, B. (2003) Domains, motifs and clusters in the protein universe. Curr. Op in . Chem. Biol. 7, 5–11. 6. Daggett, V. & Fersht, A. (2003) The present view of the mechanism of protein folding. Nat. Rev. M ol. Cell. Biol. 4, 497 – 502. 7. Shakhnovich, E.I. (1997) Theoretical studies of protein-folding thermodynamics and kinetics. Curr. Opin. Struct. Biol. 7, 29–40. 8. Tiana, G., Shakhnovich , B.E., Dokholyan, N.V. & Shakhnovich, E.I. (2 004) Imprint of evolution on p rotein structures. Proc. Natl Acad. Sc i. USA 101 , 2846–2851. Fig. 4. Histogram of the correspondence between TH positions a nd MIRfromasetof73proteins. Fig. 5. Histogram of the correspondence between TEF ends and MIR from a set of 111 proteins. Ó FEBS 2004 Universal positions in globular proteins (Eur. J. Biochem. 271) 4767 9. Berezovsky, I.N ., Grosberg, A.Y. & Trifonov, E .N. (2000) Closed loops of nearly standard size: common basic element of protein structure. FEBS Lett. 466 , 283–286. 10. Lamarine,M.,Mornon,J.P.,Berezovsky,I.N.&Chomilier,J. (2001) Distribution of tightened end f ragments of globular pro- teins statistically match that of topohydro phobic positions: towards an e fficient punctuation of protein folding? Cell. Mol. Life Sci. 58, 492–498. 11. Berezovsky, I.N., Kirznher, V., Kirzhner, A. & Trifonov, E.N. (2001) Protein folding: looping from hydrophobic nuclei. Proteins 45, 3 46–350. 12. Berezovsky, I.N. (2003) Discrete structure of van der Waals domains in globular proteins. Protein E ngineering 16, 1 61–167. 13. Berezovsky, I.N. & Trifonov, E.N. (2001) Van der Waals l ocks: loop-n-lock structure o f globular prote ins. J. Mol. Biol. 307, 1419– 1426. 14. Poupon, A. & Mornon, J.P. ( 1998) Population s of h ydrophob ic amino acids within protein globular domains; identification of conserved ÔtopohydrophobicÕ positions. Proteins 33, 329– 342. 15. Poupon, A. & Mornon, J.P. (1999) ÔTopohydrophobic positionsÕ as key markers of glob ular protein f olds. Theoret C hem. Acc ounts 101, 2 –8. 16. Poupon, A. & Mornon, J.P. ( 1999) Predicting the protein folding nucleus from sequences. FEBS Lett. 452, 2 83–289. 17. Berezovsky, I .N. & Trifonov, E.N. (2002) Loop fold structure o f proteins: resolution of Levinthal’s pa radox. J. Biom ol. Struct. Dynamics 20 , 5–6. 18. Skolnick, J. & Ko linski, A . (1991) D ynamic M on te Carlo s imu- lations of a ne w lattice model of globular protein f olding, structure and dynamics. J. M ol. Biol. 221, 499–531. 19. Labesse, G., Colloc’h, N., Pothier, J. & Mornon, J.P . (1997) P-SEA: a new efficie nt assignment of secondary structure from C alpha trace of proteins. Comput Appl. Biosci. 13, 291–295. 20. Miyazawa, S . & J ernigan, R.L. ( 1996) Residue-residue pote ntials with a favorable contact pari term and an unfavorable high packing d ensity term for simulation a nd threading. J. Mol. Biol. 256, 6 23–644. 21. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T .N., Weissig, H., Shindyalov, I.N. & Bourne, P.E. (2000) The P rotein Data Bank. Nucleic Acids Res. 28, 235– 242. 22. Murzin, A.G., Brenner, S.E., Hubbard, T . & Chothia, C. (199 5) SCOP: a structural classification of proteins database for the investigation of sequences an d structures. J. Mol. Biol. 247, 536– 540. 23. Fersht, A. & Daggett, V. (2002) Protein f olding at atomic resolution. Cell 108, 573–582. 24. Fersht, A . ( 1997) Nucleation mechanisms i n protein folding. Cur r. Opin. Struct. Biol. 7, 3–9 . 25. Ittah, V. & Haas, E. ( 1995) Nonlocal interactions stabilize long range loops in the initial f olding i ntermediates of reduced bovine pancreatic trypsin i nhibitor. Biochemistry 34 , 4493–4506. 26. Klein-Seetharaman, J ., Oikawa, M., Grimshaw, S.B., Wirmer, J ., Duchardt,E.,Ueda,T.,Imoto,T.,Smith,L.J.,Dobson,C.M.& Schwalbe, H. ( 2002) Long-range interactions within a non-native protein. Science 295 , 1719–1722. 4768 N. Papandreou et al.(Eur. J. Biochem. 271) Ó FEBS 2004 . Universal positions in globular proteins From observation to simulation Nikolaos Papandreou 1 , Igor N. Berezovsky 2,3 ,. s in natural proteins [19]. The d ifferent nature of amino acids is taken into a ccount in the force field used to attribute an energy value to each chain

Ngày đăng: 07/03/2014, 16:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan