Báo cáo khoa học: and protein bilinear indices – novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor ppt

29 406 0
Báo cáo khoa học: and protein bilinear indices – novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

TOMOCOMD-CAMPS and protein bilinear indices novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor Sadiel E. Ortega-Broche 1 , Yovani Marrero-Ponce 1,2,3 , Yunaimy E. Dı ´ az 1 , Francisco Torrens 2 and Facundo Pe ´ rez-Gime ´ nez 3 1 Unit of Computer-Aided Molecular ‘Biosilico’ Discovery and Bioinformatics Research (CAMD-BIR Unit), Faculty of Chemistry–Pharmacy, Central University of Las Villas, Santa Clara, Villa Clara, Cuba 2 Institut Universitari de Cie ` ncia Molecular, Universitat de Vale ` ncia, Edifici d’Instituts de Paterna, Spain 3 Unidad de Investigacio ´ n de Disen˜ o de Fa ´ rmacos y Conectividad Molecular, Departamento de Quı ´ mica Fı ´ sica, Facultad de Farmacia, Universitat de Vale ` ncia, Spain Keywords arc repressor; bilinear indices; linear discriminant analysis; linear multiple regression; protein stability Correspondence Y. Marrero-Ponce, Unit of Computer-Aided Molecular ‘Biosilico’ Discovery and Bioinformatics Research (CAMD-BIR Unit), Faculty of Chemistry–Pharmacy, Central University of Las Villas, Santa Clara, 54830, Villa Clara, Cuba Fax: +53 42 281130; +53 42 281455; +34 96354 3156 Tel: +53 42 281192; +53 42 281473; +34 96354 3156 E-mail: ymarrero77@yahoo.es; ymponce@gmail.com; yovanimp@uclv.edu.cu Website: http://www.uv.es/yoma/ (Received 3 March 2009, revised 15 April 2010, accepted 14 May 2010) doi:10.1111/j.1742-4658.2010.07711.x Descriptors calculated from a specific representation scheme encode only one part of the chemical information. For this reason, there is a need to construct novel graphical representations of proteins and novel protein descriptors that can provide new information about the structure of proteins. Here, a new set of protein descriptors based on computation of bilinear maps is presented. This novel approach to biomacromolecular design is relevant for QSPR studies on proteins. Protein bilinear indices are calculated from the kth power of nonstochastic and stochastic graph– theoretic electronic-contact matrices, M k m and s M k m , respectively. That is to say, the kth nonstochastic and stochastic protein bilinear indices are calcu- lated using M k m and s M k m as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combi- nations of amino acid properties as weightings. Classification models based on a protein bilinear descriptor that discriminate between Arc mutants of stability similar or inferior to the wild-type form were developed. These equations permitted the correct classification of more than 90% of the mutants in training and test sets, respectively. To predict t m and DDG o f values for Arc mutants, multiple linear regression and piecewise linear regression models were developed. The multiple linear regression models obtained accounted for 83% of the variance of the experimental t m . Statistics calcu- lated from internal and external validation procedures demonstrated robust- ness, stability and suitable power ability for all models. The results achieved demonstrate the ability of protein bilinear indices to encode biochemical information related to those structural changes significantly influencing the Arc repressor stability when punctual mutations are induced. Abbreviations BOOT, bootstrapping; ECI, electronic charge index; HPI, hydropathy index; ISA, isotropic surface area; LDA, linear discrimination analysis; LOO, leave-one out; MCC, Matthew’s correlation coefficient; QSAR, quantitative structure–activity relationship; QSPR, quantitative structure–property relationship; SDEC, standard error in calculation. 3118 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS Introduction The advent of the automatic-sequence techniques and the fast growing number of DNA and protein sequences available from diverse organisms have moti- vated the development of graphical representations of biopolymers as a method for the analysis and compari- son of sequences [1]. Initially, this approach was applied in the inspection and visual analysis of nucleic acids sequences [2,3]. Subsequently, its usefulness for the numerical characterization of the similarity ⁄ dissim- ilarity degree among nucleotide sequences was demon- strated, and it then became an alternative to the alignment-based comparison methods [4]. The numerical characterizations of the biopolymer structure are also known as biomacromolecular de- scriptors. Combined with machine-learning techniques, they have proved to be effective in the prediction of physical–chemical and biological features [5–12], the interpretation of properties in structural terms, and the study of similarity⁄ dissimilarity among biomolecules [13–17], amongst others. A general strategy adopted in the design of biomac- romolecular descriptors is the association of mathe- matical objects with diverse graphical representations of biopolymers [4]. One such strategy aims to represent the biomacromolecular structure by means of a graph and then calculates the invariants of the associated matrices. For example, Randic ´ and Basak used the principal eigenvalues from matrices as invariants in an analysis of the similarity degree among DNA sequences [18]; Raychaudhury and Nandy considered graph mean-moments as descriptors of polynucleotide sequences [19]; Benedetti and Morosetti [16], Shu et al. [20], Bermu´ dez et al. [15] and Galindo et al. [21] also applied graph–theoretical invariants to numerically describe the structure of RNA molecules for different purposes. When a mathematical invariant is calculated from a specific representation scheme, only a partial character- ization from the chemical structure can be achieved because only a part of the chemical information can be encoded [22]. This can be overcome either by develop- ing diverse graphical representations, because each of them captures different information from the biomo- lecular structures, or by calculating several mathemati- cal invariants from the same representation scheme [22]. The construction of novel representation forms for biomolecules and the design of new descriptors that provide new information and better characteriza- tion is therefore necessary [22]. Marrero-Ponce et al. [23–25] have recently applied linear and quadratic forms on R n to calculate graph– theoretical invariants of organic compound structures. These descriptors were successfully applied in the pre- diction of physical–chemical properties and rational drug design. Subsequently, the use of linear and quadratic forms was extended to obtain numerical characterizations of proteins and nucleic acids. Such descriptors were effectively applied in the modelling of the interaction between RNA and drugs [26,27] and for predicting the stability of proteins [6,28]. Bilinear forms have also been used in the definition of molecu- lar descriptors [29], which have been applied appropri- ately in molecular modelling [30]. The successful application of linear and quadratic forms to obtain graph–theoretical invariants of the biopolymer structure has encouraged us to explore the use of bilinear forms on R n as a logical–mathe- matical procedure for designing novel protein descrip- tors. More precisely, we used bilinear forms to transform the chemical information encoded by a graph-based representation of proteins, similar to that proposed by Marrero-Ponce et al. [6,28]. To validate the utility of these descriptors, we applied them in combination with multivariant analysis methods to predict the effects of a set of alanine substitutions in the stability of the Arc repressor. Arc is a small, homodimeric repressor of 53 amino acids encoded by P22, a temperate bacteriophage of Salmonella typhimurium [31]. This homodimer has been widely studied by Milla et al. [32], who determined the con- tribution of specific residues to stabilize the native structure by means of alanine substitutions. The set of Arc mutants obtained in these experiments was used in subsequent studies to validate the usefulness of diverse schemes for the numerical characterization of proteins [5,28,33–35]. Numerical characterization of polypeptide chains Here, we describe the strategy proposed by us to numerically characterize the structure of peptides and proteins by means of bilinear transformations of their structural information. This information is encoded through elements of R n vector space and graph– theoretic representations of polypeptide chains. Accordingly, a background in amino acid-based mac- romolecular vector and nonstochastic and stochastic graph–theoretic electronic-contact matrices will be described, followed by an outline of the mathematical definition of bilinear maps as well as a definition of our procedures. S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3119 Macromolecular vectors for representing amino acids sequences In analogy to the molecular vector  x used to represent organic molecules [23,36–47], we introduce here the macromolecular vector (  x m ). The components of this vector are numeric values, which represent a certain side-chain amino acid property. These properties char- acterize each kind of amino acid (R group) within a protein. Such properties can be z-values [48], the side- chain isotropic surface area (ISA) and atomic charges (electronic charge index; ECI) of the amino acid [49], and the hydropathy index (Kyte–Doolittle scale; HPI) [50], as well as other hydrophobicity scales such as Hopp–Woods [51], and so on. For example, the z 1(AA) scale of the amino acid, AA, takes the values z 1(V) = )2.69 for valine, z 1(A) = 0.07 for alanine, z 1(M) = 2.49 for methionine, and so on [48,49]. Table 1 depicts several side-chain descriptors for the natural amino acids [48–50]. Thus, a peptide (or protein) having 5, 10, 15, , n amino acids can be represented by means of vectors, with 5, 10, 15, , n components, belonging to the spaces < 5 ; < 10 ; < 15 ; ; < n , respectively. Where n is the dimension of the real sets ð< n Þ. This approach allows us encoding peptides such as SKEERN throughout the macromolecular  x m ¼ 1:96 2:84 3:08 3:08 2:88 3:22½, in the z 1 -scale (Table 1). This vector belongs to the product space < 6 . The use of other scales defines alternative macromolec- ular vectors. If we are interested in codifying the chemical information by means of two different macromolecular vectors, for example,  x m =[x m1 , ,x mn ] and  y m =[y m1 , , y mn ], then different combinations of macromolecular vectors ð  x m 6¼  y m Þ) are possible when a weighting scheme is used. In the present study, we characterized each amino acid with the biochemical parameters shown in Table 1. From this weighting scheme, fifteen (or thirty if  x mw À  y mz 6¼  x mz À  y mw ) combinations (pairs) of macromolecular vectors (  x m ,  y m ;  x m „  y m ) can be computed,  x mz1 )  y mz2 ,  x mz1 )  y mz3 ,  x mz1 )  y mHPI ,  x mz1 )  y mISA ,  x mz1 )  y mECI ,  x mz2 )  y mz3 ,  x mz2 )  y mHPI ,  x mz2 )  y mISA ,  x mz2 )  y mECI ,  x mz3 )  y mHPI ,  x mz3 )  y mISA ,  x mz3 )  y mECI ,  x mHPI )  y mECI ,  x mHPI )  y mECI and  x mISA )  y mECI . Here, we used the symbols  x mw )  y mz , where the subscripts w and z repre- sent two amino acid properties from our weighting scheme and a dash (–) represents the combination (pair) of two selected amino acid label biochemical properties. To illustrate this, let us consider the same peptide as in the example above SKEERN and the weight- ing scheme: z 1 and z 2 (  x mz1 )  y mz2 =  x mz2 )  y mz1 ). The following macromolecular vectors  x m ¼ ½ 1:96 2:84 3:08 3:08 2:88 3:22  and  y m ¼ ½À1:63 1:41 0:39 0:39 2:52 1:45  are obtained when we use z 1 and z 2 as chemical weights for codify- ing each amino acid in the example peptide in  x m and  y m vectors, respectively (Table 2). Graph-theoretic representations of polypeptide chains In molecular topology, molecular structure is expressed, generally, by the hydrogen-suppressed graph. That is, a molecule is represented by a graph. Informally, a graph G is a collection of vertices (points) and edges (lines or bonds) connecting these vertices [52–54]. In more formal terms, a simple graph G is defined as an ordered pair [V(G), E(G )], which consists of a nonempty set of vertices V(G) and a set E(G) of unordered pairs of elements of V(G ), termed edges [52–54]. In this particular case, we are not deal- ing with a simple graph but with a so-called pseudo- graph (G). Informally, a pseudograph is a graph with multiple edges or loops between the same vertices or the same vertex. Formally, a pseudograph is a set V of vertices along a set E of edges, and a function f from E to {{u,v}|u,v in V} (the function f shows which pair of vertices are connected by which edge). An edge is a loop if f(e)={u} for some vertex u in V [23,55,56]. Table 1. Descriptors for the natural amino acids. Amino acids z-scale [48,49] HPI [50] ISA [49] ECI [49] z 1 z 2 z 3 Ala A 0.07 )1.73 0.09 1.8 62.90 0.05 Val V )2.69 )2.53 )1.29 4.2 120.91 0.07 Leu L )4.19 )1.03 )0.98 3.8 154.35 0.01 Ile I )4.44 )1.68 )1.03 4.5 149.77 0.09 Pro P )1.22 0.88 2.23 )1.6 122.35 0.16 Phe F )4.92 1.30 0.45 2.8 189.42 0.14 Trp W )4.75 3.65 0.85 ) 0.9 179.16 1.08 Met M )2.49 )0.27 )0.41 1.9 132.22 0.34 Lys K 2.84 1.41 )3.14 )3.9 102.78 0.53 Arg R 2.88 2.52 )3.44 )4.5 52.98 1.69 His H 2.41 1.74 1.11 )3.2 87.38 0.56 Gly G 2.23 )5.36 0.30 )0.4 19.93 0.02 Ser S 1.96 )1.63 0.57 )0.8 19.75 0.56 Thr T 0.92 )2.09 )1.40 )0.7 59.44 0.65 Cys C 0.71 )0.97 4.13 2.5 78.51 0.15 Tyr Y )1.39 2.32 0.01 )1.3 132.16 0.72 Asn N 3.22 1.45 0.84 )3.5 17.87 1.31 Gln Q 2.18 0.53 )1.14 )3.5 19.53 1.36 Asp D 3.64 1.13 2.36 )3.5 18.46 1.25 Glu E 3.08 0.39 )0.07 )3.5 30.19 1.31 Predicting the stability of the Arc repressor S. E. Ortega-Broche et al. 3120 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS On the other hand, Anfinsen’s experiments with small proteins demonstrated that a protein amino acid sequence encodes their peptidic backbone folding. However, at present, merely knowledge of the amino acid sequence of a protein does not provide us with its 3D structure. The primary structure of proteins con- sists of unbranched amino acid sequences, which are linked by amide bonds between the a-carboxyl group of one residue and the a-amino group of the next. The 3D distribution of all atoms in a protein is referred to as the protein’s tertiary structure. Whereas the term secondary structure refers to the spatial arrangement of amino acid residues that are adjacent in the primary structure, the tertiary structure includes longer-range aspects of the amino acid sequence. Lastly, individual polypeptidic chains in multi-subunit proteins are orga- nized in 3D complexes reaching quaternary-structural levels. As previously outlined, essential information for protein folding is contained in the amino acid sequence and, more specifically, in the amino acid side-chains of the polypeptidic chain. Taking the above statement into account, in the present study, we apply a graph–theoretic model, as developed and applied previously by Marrero-Ponce et al. [33], to represent the molecular structure of pro- teins. This is called a macromolecular graph. Here, the graph vertices are C a -atoms in polypeptide backbone and the edges are both covalent interactions between amino acids (peptidic bonds) and noncovalent interac- tions between amino acid side-chains in the same or different subunit. Noncovalent interactions can also occur between an amino acid side-chain and its main- chain, where this amino acid represents a pseudovertice in the macromolecular pseudograph. These interactions can be considered as contacts, which can exist among amino acids that are near (or far) in the polypeptide backbone (i.e. the contact can be subdivided into short, medium and large contacts). Table 2 shows how to depict two interacting polypeptide chains by means of a macromolecular pseudograph because the heterodimer (SKEERN) contains an amino acid with a hydrogen bond between its side-chain and its main-chain atom. The n · nkth nonstochastic graph–theoretic elec- tronic-contact matrix, M k m , is a square and symmetric matrix, where n is the number of amino acids in the protein [6,28]. The coefficients k m ij are the elements of the kth power of M m and are defined as: m ij ¼ 1if i 6¼ j and 9 e k 2 EðG m Þð1Þ =1 if i = j and the amino acid i has a hydrogen bond between its side-chain and its main-chain atom, = 0 otherwise. where E(G m ) represents the set of edges of G m . The matrix M k m provides the number of walks of length k that link every pair of vertices v i and v j . For this reason, each edge in M 1 m represents a peptidic bond (covalent bond) or a hydrogen bond as well as a salt-bridge interaction (noncovalent bond) between amino acids i and j. On the other hand, the kth stochastic graph–theo- retic electronic-contact matrix of G m , s M k m , can be Table 2. Representation of two interacting polypeptide chains and its associated pseudograph and macromolecular vector. 46 Ser Lys Glu Glu Arg Asn 1 2 3 4 56 NH 2 COOH chain 1 chain 2 2 3 4 5 6 1 Cα Cα Cα Cα Cα Cα NH 2 NH 2 NH 2 COOH COOH COOH Macromolecular ‘pseudograph’ (G m ) of the a-carbon atoms (polypeptide’s backbone): Here, we consider both the covalent interaction (peptidic bond between amino acid shown with solid line) and the noncovalent interaction (salt-bridge and hydrogen bond shown with dashed line) between amino acid side-chains (R-groups) in the same polypeptidic chain or different chains. The loop in the third position (Glu 3 ) indicates a hydrogen bond between an amino acid main chain and its side-chain Macromolecular vector:  x m ¼½SKEERN2R 6 In the definition of the  x m , as macromolecular vector, the one-letter symbol of the amino acids indicates the corresponding side-chain amino acid property, e.g. z 1 -values. That is to say, if we write S, it means z 1 (S), z 1 -values or some amino acid property, which characterizes each side chain in the polypeptide. Therefore, if we use the canonical bases of R 6 , the coordinates of any vector  x m coincide with the components of that macromolecular vector. ½X m  T ¼½SKEERN [X m ] T = transposed of [X m ] and it means the vector of the coordinates of  x m in the canonical basis of R 6 (a 1 · 6 matrix) [X m ]: vector of coordinates of  x m in the canonical basis of R 6 (a 6 · 1matrix)  x m ,  y m components are z 1 and z 2 -values, respectively.  x m ¼½1:96 2:84 3:08 3:08 2:88 3:22   y m ¼  y m ¼½À1:63 1:41 0:39 0:39 2:52 1:45  S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3121 directly obtained from M k m . Here, s M k m =[ k sm ij ], is a square matrix of order n (n = number of C a atoms) and the elements k sm ij are defined as: k sm ij ¼ k m ij k SUM i ¼ k m ij k d i ð2Þ where, k m ij are the elements of the kth power of M k m and the sum of the ith row of M k m is named the k-order vertex degree of C a atom i, k d i . It should be noted that the matrix s M k m in Eqn (2) has the property that the sum of the elements in each row is 1. An n · n matrix with nonnegative entries having this property is called a ‘stochastic matrix’ [57]. Table 3 shows the zero, first and second powers of the total nonstochastic and sto- chastic graph–theoretic electronic-contact matrices of macromolecular pseudograph depicted in Table 2. Mathematical bilinear forms: a theoretical framework In mathematics, a bilinear form in a real vector space is a mapping b:V  V !<, which is linear in both arguments [58–63]. That is, this function satisfies the following axioms for any scalar a and any choice of vectors  v;  w;  v 1 ;  v 2 ;  w 1 and  w 2 : (1) bða  v;  wÞ¼bð  v; a  wÞ¼abð  v;  wÞ (2) bð  v 1 þ  v 2 ;  wÞ¼bð  v 1 ;  wÞþbð  v 2 ;  wÞ (3) bð  v;  w 1 þ  w 2 Þ¼bð  v;  w 1 Þþbð  v;  w 2 Þ That is, b is bilinear if it is linear in each parameter, taken separately. Let V be a real vector space in < n ðV 2< n Þ and con- sider that the following vector set,  e 1 ;  e 2 ; ;  e n fg is a basis set of < n . This basis set permits us to write in unambiguous form any vectors  x and  y of V, where ðx 1 ; x 2 ; ; x n Þ2< n and ðy 1 ; y 2 ; ; y n Þ2< n are the coordinates of the vectors  x and  y, respectively. That is to say:  x ¼ X n i¼1 x i  e i ð3Þ and  y ¼ X n j¼1 y j  e j ð4Þ Subsequently, bð  x;  yÞ¼bðx i  e i ; y j  e j Þ¼x i y j bð  e i ;  e j Þð5Þ if we take the a ij as the n · n scalars bð  e i ;  e j Þ. That is: a ij ¼ bð  e i ;  e j Þ; to i ¼ 1; 2; ; n and j ¼ 1; 2; ; n ð6Þ Then: bð  x;  yÞ¼ X n i;j a ij x i y j ¼ X½ T AY½¼ x 1 ::: x n Âà a 11 ::: a jn ::: ::: ::: a n1 ::: a nn 2 4 3 5 y 1 . . . y n 2 6 4 3 7 5 ð7Þ As can be seen, the defined equation for b may be written as the single matrix equation [see Eqn (7)], where [Y] is a column vector (an n · 1 matrix) of the coordinates of  y in a basis set of < n , and [X] T (a 1 · n matrix) is the transpose of [X], where [X] is a column vector (an n · 1 matrix) of the coordinates of  x in the same basis of < n : Finally, we introduce the formal definition of sym- metric bilinear form. Let V be a real vector space and b be a bilinear function in V · V. The bilinear function Table 3. The zero (k = 0), first (k = 1) and second (k = 2) powers of the total nonstochastic and stochastic graph–theoretic electronic-contact matrices of G m , respectively. Order (k) Nonstochastic Stochastic k =0 100000 010000 001000 000100 000010 000001 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 100000 010000 001000 000100 000010 000001 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 k =1 010010 101001 011000 000011 100101 010110 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 0 1 2 00 1 2 0 1 3 0 1 3 00 1 3 0 1 2 1 2 000 0000 1 2 1 2 1 3 00 1 3 0 1 3 0 1 3 0 1 3 1 3 0 2 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 5 k =2 201102 031120 112001 110211 020131 201113 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 1 3 0 1 6 1 6 0 1 3 0 3 7 1 7 1 7 2 7 0 1 5 1 5 2 5 00 1 5 1 6 1 6 0 1 3 1 6 1 6 0 2 7 0 1 7 3 7 1 7 1 4 0 1 8 1 8 1 8 3 8 2 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 5 Predicting the stability of the Arc repressor S. E. Ortega-Broche et al. 3122 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS b is called symmetric if bð  x;  yÞ¼bð  y;  xÞ; 8  x;  y 2 V [58– 63]. Then: bð  x;  yÞ¼ X n i;j a ij x i y j ¼ X n i;j a ji x j y i ¼ bð  y;  xÞð8Þ Nonstochastic and stochastic amino acid-based bilinear indices: total (global) definition The kth nonstochastic and stochastic bilinear indices for a protein, b m k ð  x m ;  y m Þ and s b m k ð  x m ;  y m Þ, are com- puted from these kth nonstochastic and stochastic graph–theoretic electronic-contact matrix, M k m and s M k m as shown in Eqns (9) and (10), respectively: b mk ð  x m ;  y m Þ¼ X n i¼1 X n j¼1 k m ij x i m y j m ð9Þ s b mk ð  x m ;  y m Þ¼ X n i¼1 X n j¼1 k sm ij x i m y j m ð10Þ where n is the number of amino acids (C a atom) in the protein, and x 1 m ; ; x n m and y 1 m ; ; y n m are the coordi- nates or components of the macromolecular vectors  x m and  y m in a canonical basis set of < n : The defined Eqns (9) and (10) for b m k ð  x m ;  y m Þ and s b m k ð  x m ;  y m Þ may be also written as the single matrix equations: b m k ð  x m ;  y m Þ¼½X m  T M k m ½Y m ð11Þ s b m k ð  x m ;  y m Þ¼½X m  Ts M k m ½Y m ð12Þ where [Y m ] is a column vector (an n · 1 matrix) of the coordinates of  y m in the canonical basis set of < n , and [X m ] T is the transpose of [X m ], where [X m ] is a column vector (an n · 1 matrix) of the coordinates of  x m in the canonical basis of < n : Therefore, if we use the canoni- cal basis set, the coordinates [(x 1 m , , x n m ) and (y 1 m , , y n m )] of any macromolecular vectors (  x m and  y m ) coin- cide with the components of those vectors [(x m1 , , x mn ) and (y m1 , , y mn )]. For that reason, those coordi- nates can be considered as weights (R-group in C a atom, that is to say ‘amino acid labels’) of the vertices of G m , as a result of the fact that components of the molecular vectors are values of some amino acid property that characterizes each kind of R-chain in the protein. The calculation of the three first values of bilinear indices for the example protein (Tables 2 and 3) is shown in Table 4. It should be noted that nonstochastic and stochastic bilinear indices are symmetric and nonsymmetric bilin- ear forms, respectively. Therefore, if, in the following weighting scheme, W and Z are used as amino acid weights to compute the protein bilinear indices, two dif- ferent sets of stochastic bilinear indices, WÀZs b m k ð  x m ;  y m Þ and ZÀWs b m k ð  x m ;  y m Þ [because  x mW À  y mZ 6¼  x mZ À  y mW ] can be obtained, and only one group of nonstochastic bilinear i ndices WÀZ b m k ð  x m ;  y m Þ¼ ZÀW b m k ð  x m ;  y m Þ because, in this case,  x mW À  y mZ ¼  x mZ À  y mW can be calculated. Nonstochastic and stochastic local bilinear indices: definition of amino acid, amino acid-type and peptide fragment bilinear indices In the last decade, Randic ´ [64] proposed a list of desir- able attributes for a molecular descriptor. Therefore, this list can be considered as a methodological guide for the development of new topological indices. One of the most important criteria is the possibility of defining the descriptors locally. This attribute refers to the fact that the index could be calculated for the molecule (protein) as a whole but also over certain fragments of the structure itself. Therefore, in addition to total bilinear indices com- puted for the whole protein, a local-fragment (peptide fragment) formalism can be developed. These descrip- tors are termed local nonstochastic and stochastic bilinear indices: b mk L ð  x m ;  y m Þ and s b mk L ð  x m ;  y m Þ, respec- tively. The definition of these descriptors is: b mk L ð  x m ;  y m Þ¼ X n i¼1 X n j¼1 k m ij L x i m y j m ð13Þ s b mk L ð  x m ;  y m Þ¼ X n i¼1 X n j¼1 k sm ij L x i m y j m ð14Þ where k m ijL [ k sm ijL ] is the kth element of the row ‘i’ and column ‘j’ of the local matrix M k mL ½ s M k mL . This matrix is extracted from the M k m ½ s M k m  matrix and contains information referring to the vertices of the specific protein fragments (F r ) and also to the molecu- lar environment in step k. The matrix M k mL ½ s M k mL  with elements k m ijL [ k sm ijL ] is defined as (Table 5): k m ijL [ k sm ijL ]= k m ij [ k sm ijL ] if both v i and v j are vertices (amino acid) contained within the F r =1⁄ 2 k m ij [ k sm ijL ]ifv i or v j are vertices contained within F r but not both ¼ 0 otherwise ð15Þ These local analogues can also be expressed in matrix form by the expressions: b mk L ð  x m ;  y m Þ¼½X m  T M k mL ½Y m ð16Þ s b m k ð  x m ;  y m Þ¼½X m  Ts M k mL ½Y m ð17Þ S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3123 It should be noted that the scheme above follows the spirit of a Mulliken population analysis [65]. It should be also noted that for every partitioning of a protein into Z macromolecular fragments, there will be Z local macromolecular fragment matrices. In this case, if a protein is partitioned into Z molecular frag- ments, the matrix M k m ½ s M k m  can be correspondingly partitioned into Z local matrices M k mL ½ s M k mL , L =1, , Z, and the kth power of matrix M k m ½ s M k m  is exactly the sum of the kth power of the local Z matrices. In this way, the total nonstochastic and stochastic bilinear indices are the sum of the nonstochastic and stochastic bilinear indices, respectively, of the Z macromolecular fragments: b m ð  x m ;  y m Þ¼ X Z L¼1 b mkL ð  x m ;  y m Þð18Þ s b m ð  x m ;  y m Þ¼ X Z L¼1 s b mkL ð  x m ;  y m Þð19Þ In addition, the amino acid-type bilinear indices can also be calculated. Amino acid and amino acid-type bilinear indices are specific cases of local protein bilin- ear indices. In this sense, the kth amino acid-bilinear indices are calculated by summing the kth amino acid bilinear indices of all amino acids of the same amino Table 4. Values of nonstochastic and stochastic total bilinear indices for two interacting peptides (SKEERN) used as example above (see also Tables 2 and 3). Nonstochastic total bilinear indices b m0 ¼ P n i¼1 P n j¼1 0 m ij x i m y j m ¼½X m  T M 0 m ½Y m ¼½1:96 2:84 3:08 3:08 2:88 3:22  100000 010000 001000 000100 000010 000001 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 À1:63 1:41 0:39 0:39 2:52 1:45 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 ¼ 15:14 b m1 ¼ P n i¼1 P n j¼1 1 m ij x i m y j m ¼½X m  T M 1 m ½Y m ¼½1:96 2:84 3:08 3:08 2:88 3:22 010010 101001 011000 000011 100101 010110 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 À1:63 1:41 0:39 0:39 2:52 1:45 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 ¼ 40:59 b m2 ¼ P n i¼1 P n j¼1 2 m ij x i m y j m ¼½X m  T M 2 m ½Y m ¼½1:96 2:84 3:08 3:08 2:88 3:22  201102 031120 112001 110211 020131 201113 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 À1:63 1:41 0:39 0:39 2:52 1:45 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 ¼ 98:84 Stochastic total bilinear indices s b m0 ¼ P n i¼1 P n j¼1 0 sm ij x i m y j m ¼½X m  T s M 0 m ½Y m ¼½1:96 2:84 3:08 3:08 2:88 3:22  100000 010000 001000 000100 000010 000001 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 À1:63 1:41 0:39 0:39 2:52 1:45 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 ¼ 15:14 s b m1 ¼ P n i¼1 P n j¼1 1 sm ij x i m y j m ¼½X m  T s M 1 m ½Y m ¼½1:96 2:84 3:08 3:08 2:88 3:22  0 1 2 00 1 2 0 1 3 0 1 3 00 1 3 0 1 2 1 2 000 0000 1 2 1 2 1 3 00 1 3 0 1 3 0 1 3 0 1 3 1 3 0 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 À1:63 1:41 0:39 0:39 2:52 1:45 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 ¼ 17:77 s b m2 ¼ P n i¼1 P n j¼1 2 sm ij x i m y j m ¼½X m  T s M 2 m ½Y m ¼½1:96 2:84 3:08 3:08 2:88 3:22  1 3 0 1 6 1 6 0 1 3 0 3 7 1 7 1 7 2 7 0 1 5 1 5 2 5 00 1 5 1 6 1 6 0 1 3 1 6 1 6 0 2 7 0 1 7 3 7 1 7 1 4 0 1 8 1 8 1 8 3 8 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 À1:63 1:41 0:39 0:39 2:52 1:45 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 ¼ 14:57 Predicting the stability of the Arc repressor S. E. Ortega-Broche et al. 3124 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS Table 5. The zero (k = 0), first (k = 1) and second (k = 2) powers of the local nonstochastic and stochastic graph–theoretic electronic- contact matrices of G m , respectively. The zero, first and second powers of the local (amino acid) nonstochastic matrices M 0 ðG m ; SÞ¼ 100000 000000 000000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; SÞ¼ 0 1 2 00 1 2 0 1 2 00000 000000 000000 1 2 00000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; SÞ¼ 20 1 2 1 2 01 000000 1 2 00000 1 2 00000 000000 100000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 0 ðG m ; KÞ¼ 000000 010000 000000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; KÞ¼ 1 1 2 0000 1 2 0 1 2 00 1 2 0 1 2 0000 000000 000000 0 1 2 0000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; K Þ¼ 000000 03 1 2 1 2 10 0 1 2 0000 0 1 2 0000 010000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 0 ðG m ; EÞ¼ 000000 000000 001000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; EÞ¼ 000000 00 1 2 000 0 1 2 1000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; EÞ¼ 00 1 2 000 00 1 2 000 1 2 1 2 200 1 2 000000 000000 00 1 2 000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 0 ðG m ; EÞ¼ 000000 000000 000000 000100 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; EÞ¼ 000000 000000 000000 0000 1 2 1 2 000 1 2 00 000 1 2 00 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; EÞ¼ 000 1 2 00 000 1 2 00 000000 1 2 1 2 02 1 2 1 2 000 1 2 00 000 1 2 00 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 0 ðG m ; RÞ¼ 000000 000000 000000 000000 000010 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; RÞ¼ 0000 1 2 0 000000 000000 0000 1 2 0 1 2 00 1 2 0 1 2 0000 1 2 0 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; RÞ¼ 000000 000010 000000 0000 1 2 0 010 1 2 3 1 2 0000 1 2 0 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 0 ðG m ; NÞ¼ 000000 000000 000000 000000 000000 000001 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; NÞ¼ 000000 00000 1 2 000000 00000 1 2 00000 1 2 0 1 2 0 1 2 1 2 0 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 0 ðG m ; NÞ¼ 000001 000000 00000 1 2 00000 1 2 00000 1 2 10 1 2 1 2 1 2 3 2 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 5 The zero, first and second powers of the local (amino acid) stochastic matrices M 0 ðG m ; SÞ¼ 100000 000000 000000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; SÞ¼ 0 1 4 00 1 4 0 1 6 00000 000000 000000 1 6 00000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; SÞ¼ 1 3 0 1 12 1 12 0 1 6 1 6 00000 1 10 00000 1 12 00000 1 8 00000 000000 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 M 0 ðG m ; KÞ¼ 000000 010000 000000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; KÞ¼ 0 1 4 0000 1 6 0 1 6 00 1 6 0 1 4 0000 000000 000000 0 1 6 0000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; K Þ¼ 000000 0 3 7 1 14 1 14 1 7 0 0 1 10 0000 0 1 12 0000 0 1 7 0000 000000 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 M 0 ðG m ; EÞ¼ 000000 000000 001000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; EÞ¼ 000000 00 1 6 000 0 1 4 1 2 000 000000 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; EÞ¼ 00 1 12 000 00 1 14 000 1 10 1 10 2 5 00 1 10 000000 000000 00 1 16 000 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3125 acid type in the protein. In the amino acid-type bilin- ear indices formalism, each amino acid in the molecule is classified into an amino acid-type (fragment), such as apolar, polar uncharged, polar charged, positive charged, negative charged, aromatic, and so on. For all data sets, including those with a common molecular scaffold, as well as those with very diverse structure, the kth amino acid-type bilinear indices provide important information. The calculation of the three first values of local (amino acid) bilinear indices for the example protein (Tables 2 and 3) is shown in Table 6. Any local protein bilinear index has a particular meaning, especially for the first values of k, where the information about the structure of the fragment F R is contained. Higher values of k relate to the environ- ment information of the fragment F R considered within the macromolecular pseudograph. In any case, a complete series of indices performs a specific characterization of the chemical structure. The generalization of the matrices and descriptors to ‘superior analogues’ is necessary for the evaluation of situations where only one descriptor is unable to allow good structural characterization [64,66]. The local macromolecular indices can also be used together with the total ones as variables for quantita- tive structure–activity relationship (QSAR) ⁄ quantita- tive structure–property relationship (QSPR) modelling of properties or activities that depend more on a region or a fragment than on the macromolecule as a whole. Data preparation Computation of protein bilinear indices The calculation of total and local macromolecular bilinear indices for any peptide or protein was Table 5. (Continued). M 0 ðG m ; EÞ¼ 000000 000000 000000 000100 000000 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; EÞ¼ 000000 000000 000000 0000 1 4 1 4 000 1 6 00 000 1 6 00 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; EÞ¼ 000 1 12 00 000 1 14 00 000000 1 12 1 12 0 1 3 1 12 1 12 000 1 14 00 000 1 16 00 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 M 0 ðG m ; RÞ¼ 000000 000000 000000 000000 000010 000000 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; RÞ¼ 0000 1 14 0 000000 00000 0 0000 1 14 0 1 6 00 1 6 0 1 6 0000 1 6 0 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; RÞ¼ 000 0 0 0 000 0 1 7 0 000 0 0 0 000 0 1 12 0 0 1 7 0 1 14 3 7 1 14 000 0 1 16 0 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 M 0 ðG m ; NÞ¼ 000000 000000 000000 000000 000000 000001 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 1 ðG m ; NÞ¼ 000000 00000 1 6 000000 00000 1 4 00000 1 6 0 1 6 0 1 6 1 6 0 2 6 6 6 6 6 6 4 3 7 7 7 7 7 7 5 M 2 ðG m ; NÞ¼ 000 0 0 1 6 000000 000 0 0 1 10 000 0 0 1 12 000 0 0 1 14 1 8 0 1 16 1 16 1 16 3 8 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 Table 6. Values of amino acid-based (local) bilinear indices for the heterodimer SKEERN. Amino acid Local nonstochastic bilinear indices b 0L (  x m ,  y m ) b 1L (  x m ,  y m ) b 2L (  x m ,  y m ) Ser (S) )3.1948 )0.8104 )13.0522 Lys (K) 4.0044 6.1215 28.6812 Glu (E) 1.2012 3.9264 5.8605 Glu (E) 1.2012 7.3033 10.3029 Arg (R) 7.2576 10.71 43.578 Asn (N) 4.669 13.3352 23.4674 Heterodimer (SKEERN) 15.1386 40.586 98.8378 Amino acid Local stochastic bilinear indices s b 0L ð  x m ;  y m Þ s b 1L ,ð  x m ;  y m Þ s b 2L ð  x m ;  y m Þ Ser (S) )3.1948 0.37176667 )2.04034833 Lys (K) 4.0044 2.6327 4.27309429 Glu (E) 1.2012 1.8709 1.08062179 Glu (E) 1.2012 3.4534 1.66443036 Arg (R) 7.2576 4.6284 6.24537857 Asn (N) 4.669 4.81723333 3.34964405 Heterodimer (SKEERN) 15.1386 17.7744 14.5728207 Predicting the stability of the Arc repressor S. E. Ortega-Broche et al. 3126 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS implemented in tomocomd-camps software [67]. The main steps for the application of this method in QSAR ⁄ QSPR can be briefly summarized: (1) Draw the macromolecular pseudographs for each protein of the data set, using the software’s drawing mode. This procedure is carried out by selection of the active amino acid symbol belonging to the ‘natural’ amino acid code. Here, we consider covalent (peptidic bond) and noncovalent [hydrogen bond and other elec- trostatic interaction (within a chain as well as between chains)] interaction. Afterwards, we draw the mutants by changing an amino acid for alanine and considering that this change only affects the possibility of this region of the protein to form a polar interaction (because we suppressed the hydrogen interaction if the former amino acid had it). (2) Use appropriated amino acid weights to differenti- ate the side-chain of each amino acid. In the present study, we used some descriptors for the natural amino acid as the amino acid property: the three z-values [48], Kyte–Doolittle’s hydrophobicity scale [50], ISA and ECI [49]. (3) Compute the nonstochastic and stochastic protein bilinear indices. They can be performed in the software calculation mode, where it is possible to select the side-chain properties and the family descriptor previ- ously to calculate the bio-macromolecular indices. This software generates a table in which the rows and columns correspond to the compounds and the b mk ð  x m ;  y m Þ,respectively. (4) Find a QSPR ⁄ QSAR equation by using statistical techniques, such as multilinear regression analysis, neural networks, linear discrimination analysis (LDA), and so on. That is to say, we can find a quantitative relationship between a property P and the b mk ð  x m ;  y m Þ having, for example, the appearance: P ¼ a 0 b m0 ðx m ; y m Þþa 1 b m1 ðx m ; y m Þþa 2 b m2 ðx m ; y m Þ þ ÁÁÁþa k b mk ðx m ; y m Þþc ð20Þ where P is the measurement of the property, b mk ð  x m ;  y m Þ½or b mkL ð  x m ;  y m Þ is the kth total [or local] macromolecular nonstochastic bilinear indices, and the a k are the coefficients obtained by the statistical analysis. (5) Test the robustness and predictive power of the QSPR ⁄ QSAR equation by using internal and external cross-validation techniques. (6) Develop a structural interpretation of the obtained QSAR ⁄ QSPR model using macromolecular bilinear indices as molecular descriptors. Database Arc is a homodimer in which each monomer inter- twines with the other to form a single, globular domain with a well-defined core. Several side-chain hydrogen bond and salt-bridge interactions are involved in the Arc crystal structure. An exhaustive representation of these interactions are provided in detail elsewhere [32]. Nevertheless, an overview of these electrostatic interac- tions in Arc repressor structure will be given. Hydro- gen bond interactions take place [32]: (1) Between a side-chain in the same subunit (N29- E36) and between side-chains in different subunits (R40-S44). (2) Between a side-chain and main-chain atom intersubunit (W14-N34, N34-R13) and between a side-chain and main-chain atom intrasubunits (E17- E17, S32-S35, S44-R40). On the other hand, salt-bridge interactions take place [32]: (3) Between a side-chain in the same subunit (R16- D20, D20-R23, R31-E36, E36-R40, E43-K46, E43- K47) and between side-chains in different subunits (E28-R50, R40-E48). The data of Arc repressor mutants were taken from the literature. In the present study, alanine substitu- tions were constructed at each of the 51 non-alanine positions in the wild-type Arc sequence. To avoid intracellular proteolysis and purification difficulties, the alanine substitution mutant was constructed in backgrounds containing the carboxy-terminal exten- sions (His) 6 (designated st6) or (His) 6 -Lys-Asn-Gln- His-Glu (designated st11) [68,69]. These tail sequences allow affinity purification, reduce degradation and cause no significant changes in protein stability [70]. Milla et al. [32] subjected each purified mutant of Arc to thermal and urea denaturation experiments. The stability of the proteins was checked by melting temper- ature (t m ). The values of t m for 53 Arc homodimers reported by these authors are given in Tables 7 and 8. In equilibrium and kinetic unfolding–refolding stud- ies, only native Arc dimers and denatured monomers are significantly populated. Thus, folding and dimer- ization are concerted processes [32,71,72]. For this reason, it is important to note that t m refers to the unfolding of the Arc homodimer. Accordingly, the fact that each single mutation changes two side-chains in the Arc dimer one must take into consideration, with stability effects being approximately twice those observed for monomeric proteins. Moreover, changes in stability may arise as a result of mutation disrupts of a native interaction, when the native structure of S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3127 [...]... development of linear discriminant functions, which permits the classication of mutants as having near wild-type stability or reduced stability, and therefore describe the protein stability effects of a complete set of alanine substitutions in the Arc repressor Here, we consider a general set of data that consists of 53 A- mutants, with 28 of them having near wildtype stability (128) and the remainder being... linear combinations of nonstochastic [Eqn (27)] and stochastic [Eqn (28)] protein bilinear descriptors account for 83% of variance of the tm for the cases in the training series; the values of F-ratio for Eqns (27) Table 10 Results of the stochastic bilinear indices- driven LDA models of the Arc A- mutants in the training and test sets Mutants with near wild-type stability a Mutant DP% 1 PA8-st 6a 2 SA35-st6... 0.96 and 1.03 for Eqns (31) and (32)] In Tables 11 and 12, we depict the observed, calculated [by using Eqns (29) to (32)] and residual values of tm for cases in both training and test sets Different protein folding may be the reason for the lack of linear correlation between protein bilinear indices and stability (tm) for these mutants, leading to a nonlinear dependence between tm and the protein bilinear. .. information about the electrostatic interactions among amino acids appears to be necessary Here, we analyze the relevance of the inclusion of this type of information for obtaining descriptors that encode relevant structural information correlating with the stability changes of the Arc mutants Accordingly, we compared the accuracies of classication models based on nonstochastic protein bilinear indices. .. analysis This dataset was randomly divided into two subsets: one containing 39 mutants, which was used as a training set, and the other containing nine mutants (ve having near wild-type stability and four having reduced stability) , which was used as a test set Combining nonstochastic and stochastic total protein bilinear indices with MLR analysis, we developed the QSSR linear models to describe tm for. .. derivation is straightforward, and it is easy to interpret the QSARs QSPRs that include them We have shown that the use of protein total bilinear indices can account for the thermodynamic parameters for both wild-type and mutant Arc proteins The resulting quantitative models are signicant from a statistical point of view Concluding Remarks In the present study, a new set of bio-macromolecular descriptors. .. the data set and the test set (full set) , the accuracy was 98.11% (52 53) and 96.23% (51 53) for Eqns (25) and (26), respectively, by using nonstochastic and stochastic bilinear indices in that order These statistical parameters suggest that linear combinations of protein bilinear indices are appropriate for the discrimination of near wild-type stability reduced stability mutants studied here Equations... dimmer These results suggest that Arc folding is a rather complicated process that depends on various processes and the combinations of parameters (bilinear indices calculated with each pairs of amino acid properties) are necessary to describe adequately the tm of these Arc mutants [Eqns (25) and (26)] From a comparison of the accuracies of classication models based on nonstochastic protein bilinear indices. .. importance of protein structural information for the numerical characterization of Arc mutants and its relationship with stability changes It is well known that salt-bridges and hydrogen bonds play an important role in maintaining the 3D structure of proteins [87] Therefore, to obtain a useful numerical characterization of proteins for the study of its properties (stability, folding, etc.), the use of information... based on the kind of method use for deriving the QSPR and their statistical parameter, the explored molecular descriptors, the overall accuracy (%), Matthews correlation coefcient and the validation method used Table 15 shows a comparison between nonstochastic and stochastic protein bilinear indices based on classication methods and other reported approaches for predicting the stability of Arc repressor . macromolecular fragments: b m ð  x m ;  y m Þ¼ X Z L¼1 b mkL ð  x m ;  y m Þð18Þ s b m ð  x m ;  y m Þ¼ X Z L¼1 s b mkL ð  x m ;  y m Þð19Þ In addition, the amino acid-type bilinear indices can also be calculated. Amino acid and amino acid-type bilinear indices are specific cases of local protein bilin- ear indices. In this sense,. TOMOCOMD-CAMPS and protein bilinear indices – novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor Sadiel. the kth amino acid -bilinear indices are calculated by summing the kth amino acid bilinear indices of all amino acids of the same amino Table 4. Values of nonstochastic and stochastic total bilinear

Ngày đăng: 29/03/2014, 09:20

Tài liệu cùng người dùng

Tài liệu liên quan