Thông tin tài liệu
TOMOCOMD-CAMPS and protein bilinear indices – novel
bio-macromolecular descriptors for protein research:
I. Predicting protein stability effects of a complete set of
alanine substitutions in the Arc repressor
Sadiel E. Ortega-Broche
1
, Yovani Marrero-Ponce
1,2,3
, Yunaimy E. Dı
´
az
1
, Francisco Torrens
2
and
Facundo Pe
´
rez-Gime
´
nez
3
1 Unit of Computer-Aided Molecular ‘Biosilico’ Discovery and Bioinformatics Research (CAMD-BIR Unit), Faculty of Chemistry–Pharmacy,
Central University of Las Villas, Santa Clara, Villa Clara, Cuba
2 Institut Universitari de Cie
`
ncia Molecular, Universitat de Vale
`
ncia, Edifici d’Instituts de Paterna, Spain
3 Unidad de Investigacio
´
n de Disen˜ o de Fa
´
rmacos y Conectividad Molecular, Departamento de Quı
´
mica Fı
´
sica, Facultad de Farmacia,
Universitat de Vale
`
ncia, Spain
Keywords
arc repressor; bilinear indices; linear
discriminant analysis; linear multiple
regression; protein stability
Correspondence
Y. Marrero-Ponce, Unit of Computer-Aided
Molecular ‘Biosilico’ Discovery and
Bioinformatics Research (CAMD-BIR Unit),
Faculty of Chemistry–Pharmacy, Central
University of Las Villas, Santa Clara, 54830,
Villa Clara, Cuba
Fax: +53 42 281130; +53 42 281455;
+34 96354 3156
Tel: +53 42 281192; +53 42 281473;
+34 96354 3156
E-mail: ymarrero77@yahoo.es;
ymponce@gmail.com;
yovanimp@uclv.edu.cu
Website: http://www.uv.es/yoma/
(Received 3 March 2009, revised 15 April
2010, accepted 14 May 2010)
doi:10.1111/j.1742-4658.2010.07711.x
Descriptors calculated from a specific representation scheme encode only
one part of the chemical information. For this reason, there is a need to
construct novel graphical representations of proteins and novel protein
descriptors that can provide new information about the structure of
proteins. Here, a new set of protein descriptors based on computation of
bilinear maps is presented. This novel approach to biomacromolecular
design is relevant for QSPR studies on proteins. Protein bilinear indices are
calculated from the kth power of nonstochastic and stochastic graph–
theoretic electronic-contact matrices, M
k
m
and
s
M
k
m
, respectively. That is to
say, the kth nonstochastic and stochastic protein bilinear indices are calcu-
lated using M
k
m
and
s
M
k
m
as matrix operators of bilinear transformations.
Moreover, biochemical information is codified by using different pair combi-
nations of amino acid properties as weightings. Classification models based
on a protein bilinear descriptor that discriminate between Arc mutants of
stability similar or inferior to the wild-type form were developed. These
equations permitted the correct classification of more than 90% of the
mutants in training and test sets, respectively. To predict t
m
and DDG
o
f
values
for Arc mutants, multiple linear regression and piecewise linear regression
models were developed. The multiple linear regression models obtained
accounted for 83% of the variance of the experimental t
m
. Statistics calcu-
lated from internal and external validation procedures demonstrated robust-
ness, stability and suitable power ability for all models. The results achieved
demonstrate the ability of protein bilinear indices to encode biochemical
information related to those structural changes significantly influencing the
Arc repressor stability when punctual mutations are induced.
Abbreviations
BOOT, bootstrapping; ECI, electronic charge index; HPI, hydropathy index; ISA, isotropic surface area; LDA, linear discrimination analysis;
LOO, leave-one out; MCC, Matthew’s correlation coefficient; QSAR, quantitative structure–activity relationship; QSPR, quantitative
structure–property relationship; SDEC, standard error in calculation.
3118 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS
Introduction
The advent of the automatic-sequence techniques
and the fast growing number of DNA and protein
sequences available from diverse organisms have moti-
vated the development of graphical representations of
biopolymers as a method for the analysis and compari-
son of sequences [1]. Initially, this approach was
applied in the inspection and visual analysis of nucleic
acids sequences [2,3]. Subsequently, its usefulness for
the numerical characterization of the similarity ⁄ dissim-
ilarity degree among nucleotide sequences was demon-
strated, and it then became an alternative to the
alignment-based comparison methods [4].
The numerical characterizations of the biopolymer
structure are also known as biomacromolecular de-
scriptors. Combined with machine-learning techniques,
they have proved to be effective in the prediction of
physical–chemical and biological features [5–12], the
interpretation of properties in structural terms, and the
study of similarity⁄ dissimilarity among biomolecules
[13–17], amongst others.
A general strategy adopted in the design of biomac-
romolecular descriptors is the association of mathe-
matical objects with diverse graphical representations
of biopolymers [4]. One such strategy aims to represent
the biomacromolecular structure by means of a graph
and then calculates the invariants of the associated
matrices. For example, Randic
´
and Basak used the
principal eigenvalues from matrices as invariants in an
analysis of the similarity degree among DNA
sequences [18]; Raychaudhury and Nandy considered
graph mean-moments as descriptors of polynucleotide
sequences [19]; Benedetti and Morosetti [16], Shu et al.
[20], Bermu´ dez et al. [15] and Galindo et al. [21] also
applied graph–theoretical invariants to numerically
describe the structure of RNA molecules for different
purposes.
When a mathematical invariant is calculated from a
specific representation scheme, only a partial character-
ization from the chemical structure can be achieved
because only a part of the chemical information can be
encoded [22]. This can be overcome either by develop-
ing diverse graphical representations, because each of
them captures different information from the biomo-
lecular structures, or by calculating several mathemati-
cal invariants from the same representation scheme
[22]. The construction of novel representation forms
for biomolecules and the design of new descriptors
that provide new information and better characteriza-
tion is therefore necessary [22].
Marrero-Ponce et al. [23–25] have recently applied
linear and quadratic forms on R
n
to calculate graph–
theoretical invariants of organic compound structures.
These descriptors were successfully applied in the pre-
diction of physical–chemical properties and rational
drug design. Subsequently, the use of linear and
quadratic forms was extended to obtain numerical
characterizations of proteins and nucleic acids. Such
descriptors were effectively applied in the modelling of
the interaction between RNA and drugs [26,27] and
for predicting the stability of proteins [6,28]. Bilinear
forms have also been used in the definition of molecu-
lar descriptors [29], which have been applied appropri-
ately in molecular modelling [30].
The successful application of linear and quadratic
forms to obtain graph–theoretical invariants of the
biopolymer structure has encouraged us to explore
the use of bilinear forms on R
n
as a logical–mathe-
matical procedure for designing novel protein descrip-
tors. More precisely, we used bilinear forms to
transform the chemical information encoded by a
graph-based representation of proteins, similar to that
proposed by Marrero-Ponce et al. [6,28]. To validate
the utility of these descriptors, we applied them in
combination with multivariant analysis methods to
predict the effects of a set of alanine substitutions in
the stability of the Arc repressor. Arc is a small,
homodimeric repressor of 53 amino acids encoded by
P22, a temperate bacteriophage of Salmonella
typhimurium [31]. This homodimer has been widely
studied by Milla et al. [32], who determined the con-
tribution of specific residues to stabilize the native
structure by means of alanine substitutions. The set
of Arc mutants obtained in these experiments was
used in subsequent studies to validate the usefulness
of diverse schemes for the numerical characterization
of proteins [5,28,33–35].
Numerical characterization of
polypeptide chains
Here, we describe the strategy proposed by us to
numerically characterize the structure of peptides and
proteins by means of bilinear transformations of their
structural information. This information is encoded
through elements of R
n
vector space and graph–
theoretic representations of polypeptide chains.
Accordingly, a background in amino acid-based mac-
romolecular vector and nonstochastic and stochastic
graph–theoretic electronic-contact matrices will be
described, followed by an outline of the mathematical
definition of bilinear maps as well as a definition of
our procedures.
S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor
FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3119
Macromolecular vectors for representing amino
acids sequences
In analogy to the molecular vector
x used to represent
organic molecules [23,36–47], we introduce here the
macromolecular vector (
x
m
). The components of this
vector are numeric values, which represent a certain
side-chain amino acid property. These properties char-
acterize each kind of amino acid (R group) within a
protein. Such properties can be z-values [48], the side-
chain isotropic surface area (ISA) and atomic charges
(electronic charge index; ECI) of the amino acid [49],
and the hydropathy index (Kyte–Doolittle scale; HPI)
[50], as well as other hydrophobicity scales such as
Hopp–Woods [51], and so on. For example, the z
1(AA)
scale of the amino acid, AA, takes the values
z
1(V)
= )2.69 for valine, z
1(A)
= 0.07 for alanine,
z
1(M)
= 2.49 for methionine, and so on [48,49].
Table 1 depicts several side-chain descriptors for the
natural amino acids [48–50].
Thus, a peptide (or protein) having 5, 10, 15, , n
amino acids can be represented by means of vectors,
with 5, 10, 15, , n components, belonging to the
spaces <
5
; <
10
; <
15
; ; <
n
, respectively. Where n is the
dimension of the real sets ð<
n
Þ.
This approach allows us encoding peptides such as
SKEERN throughout the macromolecular
x
m
¼
1:96 2:84 3:08 3:08 2:88 3:22½, in the z
1
-scale
(Table 1). This vector belongs to the product space <
6
.
The use of other scales defines alternative macromolec-
ular vectors.
If we are interested in codifying the chemical
information by means of two different macromolecular
vectors, for example,
x
m
=[x
m1
, ,x
mn
] and
y
m
=[y
m1
, , y
mn
], then different combinations of
macromolecular vectors ð
x
m
6¼
y
m
Þ) are possible when a
weighting scheme is used. In the present study, we
characterized each amino acid with the biochemical
parameters shown in Table 1. From this weighting
scheme, fifteen (or thirty if
x
mw
À
y
mz
6¼
x
mz
À
y
mw
)
combinations (pairs) of macromolecular vectors (
x
m
,
y
m
;
x
m
„
y
m
) can be computed,
x
mz1
)
y
mz2
,
x
mz1
)
y
mz3
,
x
mz1
)
y
mHPI
,
x
mz1
)
y
mISA
,
x
mz1
)
y
mECI
,
x
mz2
)
y
mz3
,
x
mz2
)
y
mHPI
,
x
mz2
)
y
mISA
,
x
mz2
)
y
mECI
,
x
mz3
)
y
mHPI
,
x
mz3
)
y
mISA
,
x
mz3
)
y
mECI
,
x
mHPI
)
y
mECI
,
x
mHPI
)
y
mECI
and
x
mISA
)
y
mECI
. Here, we used the
symbols
x
mw
)
y
mz
, where the subscripts w and z repre-
sent two amino acid properties from our weighting
scheme and a dash (–) represents the combination
(pair) of two selected amino acid label biochemical
properties.
To illustrate this, let us consider the same peptide
as in the example above SKEERN and the weight-
ing scheme: z
1
and z
2
(
x
mz1
)
y
mz2
=
x
mz2
)
y
mz1
).
The following macromolecular vectors
x
m
¼
½ 1:96 2:84 3:08 3:08 2:88 3:22 and
y
m
¼
½À1:63 1:41 0:39 0:39 2:52 1:45 are obtained
when we use z
1
and z
2
as chemical weights for codify-
ing each amino acid in the example peptide in
x
m
and
y
m
vectors, respectively (Table 2).
Graph-theoretic representations of polypeptide
chains
In molecular topology, molecular structure is
expressed, generally, by the hydrogen-suppressed
graph. That is, a molecule is represented by a graph.
Informally, a graph G is a collection of vertices
(points) and edges (lines or bonds) connecting these
vertices [52–54]. In more formal terms, a simple graph
G is defined as an ordered pair [V(G), E(G )], which
consists of a nonempty set of vertices V(G) and a set
E(G) of unordered pairs of elements of V(G ), termed
edges [52–54]. In this particular case, we are not deal-
ing with a simple graph but with a so-called pseudo-
graph (G). Informally, a pseudograph is a graph with
multiple edges or loops between the same vertices or
the same vertex. Formally, a pseudograph is a set V of
vertices along a set E of edges, and a function f from
E to {{u,v}|u,v in V} (the function f shows which pair
of vertices are connected by which edge). An edge is a
loop if f(e)={u} for some vertex u in V [23,55,56].
Table 1. Descriptors for the natural amino acids.
Amino
acids
z-scale [48,49]
HPI [50] ISA [49] ECI [49]
z
1
z
2
z
3
Ala A 0.07 )1.73 0.09 1.8 62.90 0.05
Val V )2.69 )2.53 )1.29 4.2 120.91 0.07
Leu L )4.19 )1.03 )0.98 3.8 154.35 0.01
Ile I )4.44 )1.68 )1.03 4.5 149.77 0.09
Pro P )1.22 0.88 2.23 )1.6 122.35 0.16
Phe F )4.92 1.30 0.45 2.8 189.42 0.14
Trp W )4.75 3.65 0.85 ) 0.9 179.16 1.08
Met M )2.49 )0.27 )0.41 1.9 132.22 0.34
Lys K 2.84 1.41 )3.14 )3.9 102.78 0.53
Arg R 2.88 2.52 )3.44 )4.5 52.98 1.69
His H 2.41 1.74 1.11 )3.2 87.38 0.56
Gly G 2.23 )5.36 0.30 )0.4 19.93 0.02
Ser S 1.96 )1.63 0.57 )0.8 19.75 0.56
Thr T 0.92 )2.09 )1.40 )0.7 59.44 0.65
Cys C 0.71 )0.97 4.13 2.5 78.51 0.15
Tyr Y )1.39 2.32 0.01 )1.3 132.16 0.72
Asn N 3.22 1.45 0.84 )3.5 17.87 1.31
Gln Q 2.18 0.53 )1.14 )3.5 19.53 1.36
Asp D 3.64 1.13 2.36 )3.5 18.46 1.25
Glu E 3.08 0.39 )0.07 )3.5 30.19 1.31
Predicting the stability of the Arc repressor S. E. Ortega-Broche et al.
3120 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS
On the other hand, Anfinsen’s experiments with
small proteins demonstrated that a protein amino acid
sequence encodes their peptidic backbone folding.
However, at present, merely knowledge of the amino
acid sequence of a protein does not provide us with its
3D structure. The primary structure of proteins con-
sists of unbranched amino acid sequences, which are
linked by amide bonds between the a-carboxyl group
of one residue and the a-amino group of the next. The
3D distribution of all atoms in a protein is referred to
as the protein’s tertiary structure. Whereas the term
secondary structure refers to the spatial arrangement
of amino acid residues that are adjacent in the primary
structure, the tertiary structure includes longer-range
aspects of the amino acid sequence. Lastly, individual
polypeptidic chains in multi-subunit proteins are orga-
nized in 3D complexes reaching quaternary-structural
levels. As previously outlined, essential information for
protein folding is contained in the amino acid sequence
and, more specifically, in the amino acid side-chains of
the polypeptidic chain.
Taking the above statement into account, in the
present study, we apply a graph–theoretic model, as
developed and applied previously by Marrero-Ponce
et al. [33], to represent the molecular structure of pro-
teins. This is called a macromolecular graph. Here, the
graph vertices are C
a
-atoms in polypeptide backbone
and the edges are both covalent interactions between
amino acids (peptidic bonds) and noncovalent interac-
tions between amino acid side-chains in the same or
different subunit. Noncovalent interactions can also
occur between an amino acid side-chain and its main-
chain, where this amino acid represents a pseudovertice
in the macromolecular pseudograph. These interactions
can be considered as contacts, which can exist among
amino acids that are near (or far) in the polypeptide
backbone (i.e. the contact can be subdivided into short,
medium and large contacts). Table 2 shows how to
depict two interacting polypeptide chains by means of a
macromolecular pseudograph because the heterodimer
(SKEERN) contains an amino acid with a hydrogen
bond between its side-chain and its main-chain atom.
The n · nkth nonstochastic graph–theoretic elec-
tronic-contact matrix, M
k
m
, is a square and symmetric
matrix, where n is the number of amino acids in the
protein [6,28]. The coefficients
k
m
ij
are the elements of
the kth power of M
m
and are defined as:
m
ij
¼ 1if i 6¼ j and 9 e
k
2 EðG
m
Þð1Þ
=1 if i = j and the amino acid i has a hydrogen
bond between its side-chain and its main-chain atom,
= 0 otherwise.
where E(G
m
) represents the set of edges of G
m
.
The matrix M
k
m
provides the number of walks of
length k that link every pair of vertices v
i
and v
j
. For
this reason, each edge in M
1
m
represents a peptidic
bond (covalent bond) or a hydrogen bond as well as a
salt-bridge interaction (noncovalent bond) between
amino acids i and j.
On the other hand, the kth stochastic graph–theo-
retic electronic-contact matrix of G
m
,
s
M
k
m
, can be
Table 2. Representation of two interacting polypeptide chains and its associated pseudograph and macromolecular vector.
46
Ser
Lys
Glu
Glu
Arg
Asn
1
2
3
4
56
NH
2
COOH
chain 1
chain 2
2
3
4
5
6
1
Cα
Cα
Cα
Cα
Cα
Cα
NH
2
NH
2
NH
2
COOH
COOH
COOH
Macromolecular ‘pseudograph’ (G
m
) of the a-carbon
atoms (polypeptide’s backbone):
Here, we consider both the covalent interaction (peptidic bond
between amino acid shown with solid line) and the noncovalent
interaction (salt-bridge and hydrogen bond shown with dashed line)
between amino acid side-chains (R-groups) in the same polypeptidic chain
or different chains. The loop in the third position (Glu
3
) indicates a hydrogen
bond between an amino acid main chain and its side-chain
Macromolecular vector:
x
m
¼½SKEERN2R
6
In the definition of the
x
m
, as macromolecular
vector, the one-letter symbol of the amino acids
indicates the corresponding side-chain amino acid
property, e.g. z
1
-values. That is to say, if we write S,
it means z
1
(S), z
1
-values or some amino acid property,
which characterizes each side chain in the polypeptide.
Therefore, if we use the canonical bases of R
6
, the
coordinates of any vector
x
m
coincide with the
components of that macromolecular vector.
½X
m
T
¼½SKEERN
[X
m
]
T
= transposed of [X
m
] and it means the vector of the
coordinates of
x
m
in the canonical basis of R
6
(a 1 · 6 matrix)
[X
m
]: vector of coordinates of
x
m
in the canonical basis of R
6
(a 6 · 1matrix)
x
m
,
y
m
components are z
1
and z
2
-values, respectively.
x
m
¼½1:96 2:84 3:08 3:08 2:88 3:22
y
m
¼
y
m
¼½À1:63 1:41 0:39 0:39 2:52 1:45
S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor
FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3121
directly obtained from M
k
m
. Here,
s
M
k
m
=[
k
sm
ij
], is a
square matrix of order n (n = number of C
a
atoms)
and the elements
k
sm
ij
are defined as:
k
sm
ij
¼
k
m
ij
k
SUM
i
¼
k
m
ij
k
d
i
ð2Þ
where,
k
m
ij
are the elements of the kth power of M
k
m
and the sum of the ith row of M
k
m
is named the k-order
vertex degree of C
a
atom i,
k
d
i
. It should be noted that
the matrix
s
M
k
m
in Eqn (2) has the property that the
sum of the elements in each row is 1. An n · n matrix
with nonnegative entries having this property is called
a ‘stochastic matrix’ [57]. Table 3 shows the zero, first
and second powers of the total nonstochastic and sto-
chastic graph–theoretic electronic-contact matrices of
macromolecular pseudograph depicted in Table 2.
Mathematical bilinear forms: a theoretical
framework
In mathematics, a bilinear form in a real vector space
is a mapping b:V Â V !<, which is linear in both
arguments [58–63]. That is, this function satisfies the
following axioms for any scalar a and any choice of
vectors
v;
w;
v
1
;
v
2
;
w
1
and
w
2
:
(1) bða
v;
wÞ¼bð
v; a
wÞ¼abð
v;
wÞ
(2) bð
v
1
þ
v
2
;
wÞ¼bð
v
1
;
wÞþbð
v
2
;
wÞ
(3) bð
v;
w
1
þ
w
2
Þ¼bð
v;
w
1
Þþbð
v;
w
2
Þ
That is, b is bilinear if it is linear in each parameter,
taken separately.
Let V be a real vector space in <
n
ðV 2<
n
Þ and con-
sider that the following vector set,
e
1
;
e
2
; ;
e
n
fg
is a
basis set of <
n
. This basis set permits us to write in
unambiguous form any vectors
x and
y of V, where
ðx
1
; x
2
; ; x
n
Þ2<
n
and ðy
1
; y
2
; ; y
n
Þ2<
n
are the
coordinates of the vectors
x and
y, respectively. That is
to say:
x ¼
X
n
i¼1
x
i
e
i
ð3Þ
and
y ¼
X
n
j¼1
y
j
e
j
ð4Þ
Subsequently,
bð
x;
yÞ¼bðx
i
e
i
; y
j
e
j
Þ¼x
i
y
j
bð
e
i
;
e
j
Þð5Þ
if we take the a
ij
as the n · n scalars bð
e
i
;
e
j
Þ. That is:
a
ij
¼ bð
e
i
;
e
j
Þ; to i ¼ 1; 2; ; n and j ¼ 1; 2; ; n ð6Þ
Then:
bð
x;
yÞ¼
X
n
i;j
a
ij
x
i
y
j
¼ X½
T
AY½¼
x
1
::: x
n
ÂÃ
a
11
::: a
jn
::: ::: :::
a
n1
::: a
nn
2
4
3
5
y
1
.
.
.
y
n
2
6
4
3
7
5
ð7Þ
As can be seen, the defined equation for b may be
written as the single matrix equation [see Eqn (7)],
where [Y] is a column vector (an n · 1 matrix) of the
coordinates of
y in a basis set of <
n
, and [X]
T
(a 1 · n
matrix) is the transpose of [X], where [X] is a column
vector (an n · 1 matrix) of the coordinates of
x in the
same basis of <
n
:
Finally, we introduce the formal definition of sym-
metric bilinear form. Let V be a real vector space and
b be a bilinear function in V · V. The bilinear function
Table 3. The zero (k = 0), first (k = 1) and second (k = 2) powers of the total nonstochastic and stochastic graph–theoretic electronic-contact
matrices of G
m
, respectively.
Order (k) Nonstochastic Stochastic
k =0
100000
010000
001000
000100
000010
000001
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
100000
010000
001000
000100
000010
000001
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
k =1
010010
101001
011000
000011
100101
010110
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
0
1
2
00
1
2
0
1
3
0
1
3
00
1
3
0
1
2
1
2
000
0000
1
2
1
2
1
3
00
1
3
0
1
3
0
1
3
0
1
3
1
3
0
2
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
5
k =2
201102
031120
112001
110211
020131
201113
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
1
3
0
1
6
1
6
0
1
3
0
3
7
1
7
1
7
2
7
0
1
5
1
5
2
5
00
1
5
1
6
1
6
0
1
3
1
6
1
6
0
2
7
0
1
7
3
7
1
7
1
4
0
1
8
1
8
1
8
3
8
2
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
5
Predicting the stability of the Arc repressor S. E. Ortega-Broche et al.
3122 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS
b is called symmetric if bð
x;
yÞ¼bð
y;
xÞ; 8
x;
y 2 V [58–
63]. Then:
bð
x;
yÞ¼
X
n
i;j
a
ij
x
i
y
j
¼
X
n
i;j
a
ji
x
j
y
i
¼ bð
y;
xÞð8Þ
Nonstochastic and stochastic amino acid-based
bilinear indices: total (global) definition
The kth nonstochastic and stochastic bilinear indices
for a protein, b
m
k
ð
x
m
;
y
m
Þ and
s
b
m
k
ð
x
m
;
y
m
Þ, are com-
puted from these kth nonstochastic and stochastic
graph–theoretic electronic-contact matrix, M
k
m
and
s
M
k
m
as shown in Eqns (9) and (10), respectively:
b
mk
ð
x
m
;
y
m
Þ¼
X
n
i¼1
X
n
j¼1
k
m
ij
x
i
m
y
j
m
ð9Þ
s
b
mk
ð
x
m
;
y
m
Þ¼
X
n
i¼1
X
n
j¼1
k
sm
ij
x
i
m
y
j
m
ð10Þ
where n is the number of amino acids (C
a
atom) in the
protein, and x
1
m
; ; x
n
m
and y
1
m
; ; y
n
m
are the coordi-
nates or components of the macromolecular vectors
x
m
and
y
m
in a canonical basis set of <
n
:
The defined Eqns (9) and (10) for b
m
k
ð
x
m
;
y
m
Þ and
s
b
m
k
ð
x
m
;
y
m
Þ may be also written as the single matrix
equations:
b
m
k
ð
x
m
;
y
m
Þ¼½X
m
T
M
k
m
½Y
m
ð11Þ
s
b
m
k
ð
x
m
;
y
m
Þ¼½X
m
Ts
M
k
m
½Y
m
ð12Þ
where [Y
m
] is a column vector (an n · 1 matrix) of the
coordinates of
y
m
in the canonical basis set of <
n
, and
[X
m
]
T
is the transpose of [X
m
], where [X
m
] is a column
vector (an n · 1 matrix) of the coordinates of
x
m
in the
canonical basis of <
n
: Therefore, if we use the canoni-
cal basis set, the coordinates [(x
1
m
, , x
n
m
) and (y
1
m
, ,
y
n
m
)] of any macromolecular vectors (
x
m
and
y
m
) coin-
cide with the components of those vectors [(x
m1
, ,
x
mn
) and (y
m1
, , y
mn
)]. For that reason, those coordi-
nates can be considered as weights (R-group in C
a
atom, that is to say ‘amino acid labels’) of the vertices
of G
m
, as a result of the fact that components of the
molecular vectors are values of some amino acid
property that characterizes each kind of R-chain in the
protein. The calculation of the three first values of
bilinear indices for the example protein (Tables 2 and
3) is shown in Table 4.
It should be noted that nonstochastic and stochastic
bilinear indices are symmetric and nonsymmetric bilin-
ear forms, respectively. Therefore, if, in the following
weighting scheme, W and Z are used as amino acid
weights to compute the protein bilinear indices, two dif-
ferent sets of stochastic bilinear indices,
WÀZs
b
m
k
ð
x
m
;
y
m
Þ
and
ZÀWs
b
m
k
ð
x
m
;
y
m
Þ [because
x
mW
À
y
mZ
6¼
x
mZ
À
y
mW
]
can be obtained, and only one group of nonstochastic
bilinear i ndices
WÀZ
b
m
k
ð
x
m
;
y
m
Þ¼
ZÀW
b
m
k
ð
x
m
;
y
m
Þ because,
in this case,
x
mW
À
y
mZ
¼
x
mZ
À
y
mW
can be calculated.
Nonstochastic and stochastic local bilinear
indices: definition of amino acid, amino
acid-type and peptide fragment bilinear indices
In the last decade, Randic
´
[64] proposed a list of desir-
able attributes for a molecular descriptor. Therefore,
this list can be considered as a methodological guide
for the development of new topological indices. One of
the most important criteria is the possibility of defining
the descriptors locally. This attribute refers to the
fact that the index could be calculated for the molecule
(protein) as a whole but also over certain fragments of
the structure itself.
Therefore, in addition to total bilinear indices com-
puted for the whole protein, a local-fragment (peptide
fragment) formalism can be developed. These descrip-
tors are termed local nonstochastic and stochastic
bilinear indices: b
mk
L
ð
x
m
;
y
m
Þ and
s
b
mk
L
ð
x
m
;
y
m
Þ, respec-
tively. The definition of these descriptors is:
b
mk
L
ð
x
m
;
y
m
Þ¼
X
n
i¼1
X
n
j¼1
k
m
ij
L
x
i
m
y
j
m
ð13Þ
s
b
mk
L
ð
x
m
;
y
m
Þ¼
X
n
i¼1
X
n
j¼1
k
sm
ij
L
x
i
m
y
j
m
ð14Þ
where
k
m
ijL
[
k
sm
ijL
] is the kth element of the row ‘i’
and column ‘j’ of the local matrix M
k
mL
½
s
M
k
mL
. This
matrix is extracted from the M
k
m
½
s
M
k
m
matrix and
contains information referring to the vertices of the
specific protein fragments (F
r
) and also to the molecu-
lar environment in step k. The matrix M
k
mL
½
s
M
k
mL
with
elements
k
m
ijL
[
k
sm
ijL
] is defined as (Table 5):
k
m
ijL
[
k
sm
ijL
]=
k
m
ij
[
k
sm
ijL
] if both v
i
and v
j
are
vertices (amino acid) contained within the F
r
=1⁄ 2
k
m
ij
[
k
sm
ijL
]ifv
i
or v
j
are vertices contained
within F
r
but not both
¼ 0 otherwise ð15Þ
These local analogues can also be expressed in
matrix form by the expressions:
b
mk
L
ð
x
m
;
y
m
Þ¼½X
m
T
M
k
mL
½Y
m
ð16Þ
s
b
m
k
ð
x
m
;
y
m
Þ¼½X
m
Ts
M
k
mL
½Y
m
ð17Þ
S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor
FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3123
It should be noted that the scheme above follows
the spirit of a Mulliken population analysis [65]. It
should be also noted that for every partitioning of a
protein into Z macromolecular fragments, there will be
Z local macromolecular fragment matrices. In this
case, if a protein is partitioned into Z molecular frag-
ments, the matrix M
k
m
½
s
M
k
m
can be correspondingly
partitioned into Z local matrices M
k
mL
½
s
M
k
mL
, L =1,
, Z, and the kth power of matrix M
k
m
½
s
M
k
m
is exactly
the sum of the kth power of the local Z matrices. In
this way, the total nonstochastic and stochastic bilinear
indices are the sum of the nonstochastic and stochastic
bilinear indices, respectively, of the Z macromolecular
fragments:
b
m
ð
x
m
;
y
m
Þ¼
X
Z
L¼1
b
mkL
ð
x
m
;
y
m
Þð18Þ
s
b
m
ð
x
m
;
y
m
Þ¼
X
Z
L¼1
s
b
mkL
ð
x
m
;
y
m
Þð19Þ
In addition, the amino acid-type bilinear indices can
also be calculated. Amino acid and amino acid-type
bilinear indices are specific cases of local protein bilin-
ear indices. In this sense, the kth amino acid-bilinear
indices are calculated by summing the kth amino acid
bilinear indices of all amino acids of the same amino
Table 4. Values of nonstochastic and stochastic total bilinear indices for two interacting peptides (SKEERN) used as example above (see
also Tables 2 and 3).
Nonstochastic total bilinear indices
b
m0
¼
P
n
i¼1
P
n
j¼1
0
m
ij
x
i
m
y
j
m
¼½X
m
T
M
0
m
½Y
m
¼½1:96 2:84 3:08 3:08 2:88 3:22
100000
010000
001000
000100
000010
000001
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
À1:63
1:41
0:39
0:39
2:52
1:45
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
¼ 15:14
b
m1
¼
P
n
i¼1
P
n
j¼1
1
m
ij
x
i
m
y
j
m
¼½X
m
T
M
1
m
½Y
m
¼½1:96 2:84 3:08 3:08 2:88 3:22
010010
101001
011000
000011
100101
010110
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
À1:63
1:41
0:39
0:39
2:52
1:45
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
¼ 40:59
b
m2
¼
P
n
i¼1
P
n
j¼1
2
m
ij
x
i
m
y
j
m
¼½X
m
T
M
2
m
½Y
m
¼½1:96 2:84 3:08 3:08 2:88 3:22
201102
031120
112001
110211
020131
201113
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
À1:63
1:41
0:39
0:39
2:52
1:45
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
¼ 98:84
Stochastic total bilinear indices
s
b
m0
¼
P
n
i¼1
P
n
j¼1
0
sm
ij
x
i
m
y
j
m
¼½X
m
T
s
M
0
m
½Y
m
¼½1:96 2:84 3:08 3:08 2:88 3:22
100000
010000
001000
000100
000010
000001
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
À1:63
1:41
0:39
0:39
2:52
1:45
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
¼ 15:14
s
b
m1
¼
P
n
i¼1
P
n
j¼1
1
sm
ij
x
i
m
y
j
m
¼½X
m
T
s
M
1
m
½Y
m
¼½1:96 2:84 3:08 3:08 2:88 3:22
0
1
2
00
1
2
0
1
3
0
1
3
00
1
3
0
1
2
1
2
000
0000
1
2
1
2
1
3
00
1
3
0
1
3
0
1
3
0
1
3
1
3
0
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
À1:63
1:41
0:39
0:39
2:52
1:45
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
¼ 17:77
s
b
m2
¼
P
n
i¼1
P
n
j¼1
2
sm
ij
x
i
m
y
j
m
¼½X
m
T
s
M
2
m
½Y
m
¼½1:96 2:84 3:08 3:08 2:88 3:22
1
3
0
1
6
1
6
0
1
3
0
3
7
1
7
1
7
2
7
0
1
5
1
5
2
5
00
1
5
1
6
1
6
0
1
3
1
6
1
6
0
2
7
0
1
7
3
7
1
7
1
4
0
1
8
1
8
1
8
3
8
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
À1:63
1:41
0:39
0:39
2:52
1:45
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
¼ 14:57
Predicting the stability of the Arc repressor S. E. Ortega-Broche et al.
3124 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS
Table 5. The zero (k = 0), first (k = 1) and second (k = 2) powers of the local nonstochastic and stochastic graph–theoretic electronic-
contact matrices of G
m
, respectively.
The zero, first and second powers of the local (amino acid) nonstochastic matrices
M
0
ðG
m
; SÞ¼
100000
000000
000000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; SÞ¼
0
1
2
00
1
2
0
1
2
00000
000000
000000
1
2
00000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; SÞ¼
20
1
2
1
2
01
000000
1
2
00000
1
2
00000
000000
100000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
0
ðG
m
; KÞ¼
000000
010000
000000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; KÞ¼
1
1
2
0000
1
2
0
1
2
00
1
2
0
1
2
0000
000000
000000
0
1
2
0000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; K Þ¼
000000
03
1
2
1
2
10
0
1
2
0000
0
1
2
0000
010000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
0
ðG
m
; EÞ¼
000000
000000
001000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; EÞ¼
000000
00
1
2
000
0
1
2
1000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; EÞ¼
00
1
2
000
00
1
2
000
1
2
1
2
200
1
2
000000
000000
00
1
2
000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
0
ðG
m
; EÞ¼
000000
000000
000000
000100
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; EÞ¼
000000
000000
000000
0000
1
2
1
2
000
1
2
00
000
1
2
00
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; EÞ¼
000
1
2
00
000
1
2
00
000000
1
2
1
2
02
1
2
1
2
000
1
2
00
000
1
2
00
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
0
ðG
m
; RÞ¼
000000
000000
000000
000000
000010
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; RÞ¼
0000
1
2
0
000000
000000
0000
1
2
0
1
2
00
1
2
0
1
2
0000
1
2
0
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; RÞ¼
000000
000010
000000
0000
1
2
0
010
1
2
3
1
2
0000
1
2
0
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
0
ðG
m
; NÞ¼
000000
000000
000000
000000
000000
000001
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; NÞ¼
000000
00000
1
2
000000
00000
1
2
00000
1
2
0
1
2
0
1
2
1
2
0
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
0
ðG
m
; NÞ¼
000001
000000
00000
1
2
00000
1
2
00000
1
2
10
1
2
1
2
1
2
3
2
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
5
The zero, first and second powers of the local (amino acid) stochastic matrices
M
0
ðG
m
; SÞ¼
100000
000000
000000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; SÞ¼
0
1
4
00
1
4
0
1
6
00000
000000
000000
1
6
00000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; SÞ¼
1
3
0
1
12
1
12
0
1
6
1
6
00000
1
10
00000
1
12
00000
1
8
00000
000000
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
M
0
ðG
m
; KÞ¼
000000
010000
000000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; KÞ¼
0
1
4
0000
1
6
0
1
6
00
1
6
0
1
4
0000
000000
000000
0
1
6
0000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; K Þ¼
000000
0
3
7
1
14
1
14
1
7
0
0
1
10
0000
0
1
12
0000
0
1
7
0000
000000
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
M
0
ðG
m
; EÞ¼
000000
000000
001000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; EÞ¼
000000
00
1
6
000
0
1
4
1
2
000
000000
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; EÞ¼
00
1
12
000
00
1
14
000
1
10
1
10
2
5
00
1
10
000000
000000
00
1
16
000
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor
FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3125
acid type in the protein. In the amino acid-type bilin-
ear indices formalism, each amino acid in the molecule
is classified into an amino acid-type (fragment), such
as apolar, polar uncharged, polar charged, positive
charged, negative charged, aromatic, and so on. For
all data sets, including those with a common molecular
scaffold, as well as those with very diverse structure,
the kth amino acid-type bilinear indices provide
important information. The calculation of the three
first values of local (amino acid) bilinear indices for
the example protein (Tables 2 and 3) is shown in
Table 6.
Any local protein bilinear index has a particular
meaning, especially for the first values of k, where the
information about the structure of the fragment F
R
is
contained. Higher values of k relate to the environ-
ment information of the fragment F
R
considered
within the macromolecular pseudograph.
In any case, a complete series of indices performs a
specific characterization of the chemical structure.
The generalization of the matrices and descriptors to
‘superior analogues’ is necessary for the evaluation of
situations where only one descriptor is unable to
allow good structural characterization [64,66]. The
local macromolecular indices can also be used
together with the total ones as variables for quantita-
tive structure–activity relationship (QSAR) ⁄ quantita-
tive structure–property relationship (QSPR) modelling
of properties or activities that depend more on a
region or a fragment than on the macromolecule as a
whole.
Data preparation
Computation of protein bilinear indices
The calculation of total and local macromolecular
bilinear indices for any peptide or protein was
Table 5. (Continued).
M
0
ðG
m
; EÞ¼
000000
000000
000000
000100
000000
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; EÞ¼
000000
000000
000000
0000
1
4
1
4
000
1
6
00
000
1
6
00
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; EÞ¼
000
1
12
00
000
1
14
00
000000
1
12
1
12
0
1
3
1
12
1
12
000
1
14
00
000
1
16
00
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
M
0
ðG
m
; RÞ¼
000000
000000
000000
000000
000010
000000
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; RÞ¼
0000
1
14
0
000000
00000 0
0000
1
14
0
1
6
00
1
6
0
1
6
0000
1
6
0
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; RÞ¼
000 0 0 0
000 0
1
7
0
000 0 0 0
000 0
1
12
0
0
1
7
0
1
14
3
7
1
14
000 0
1
16
0
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
M
0
ðG
m
; NÞ¼
000000
000000
000000
000000
000000
000001
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
1
ðG
m
; NÞ¼
000000
00000
1
6
000000
00000
1
4
00000
1
6
0
1
6
0
1
6
1
6
0
2
6
6
6
6
6
6
4
3
7
7
7
7
7
7
5
M
2
ðG
m
; NÞ¼
000 0 0
1
6
000000
000 0 0
1
10
000 0 0
1
12
000 0 0
1
14
1
8
0
1
16
1
16
1
16
3
8
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
Table 6. Values of amino acid-based (local) bilinear indices for the
heterodimer SKEERN.
Amino acid
Local nonstochastic bilinear indices
b
0L
(
x
m
,
y
m
) b
1L
(
x
m
,
y
m
) b
2L
(
x
m
,
y
m
)
Ser (S) )3.1948 )0.8104 )13.0522
Lys (K) 4.0044 6.1215 28.6812
Glu (E) 1.2012 3.9264 5.8605
Glu (E) 1.2012 7.3033 10.3029
Arg (R) 7.2576 10.71 43.578
Asn (N) 4.669 13.3352 23.4674
Heterodimer
(SKEERN)
15.1386 40.586 98.8378
Amino acid
Local stochastic bilinear indices
s
b
0L
ð
x
m
;
y
m
Þ
s
b
1L
,ð
x
m
;
y
m
Þ
s
b
2L
ð
x
m
;
y
m
Þ
Ser (S) )3.1948 0.37176667 )2.04034833
Lys (K) 4.0044 2.6327 4.27309429
Glu (E) 1.2012 1.8709 1.08062179
Glu (E) 1.2012 3.4534 1.66443036
Arg (R) 7.2576 4.6284 6.24537857
Asn (N) 4.669 4.81723333 3.34964405
Heterodimer
(SKEERN)
15.1386 17.7744 14.5728207
Predicting the stability of the Arc repressor S. E. Ortega-Broche et al.
3126 FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS
implemented in tomocomd-camps software [67]. The
main steps for the application of this method in
QSAR ⁄ QSPR can be briefly summarized:
(1) Draw the macromolecular pseudographs for each
protein of the data set, using the software’s drawing
mode. This procedure is carried out by selection of the
active amino acid symbol belonging to the ‘natural’
amino acid code. Here, we consider covalent (peptidic
bond) and noncovalent [hydrogen bond and other elec-
trostatic interaction (within a chain as well as between
chains)] interaction. Afterwards, we draw the mutants
by changing an amino acid for alanine and considering
that this change only affects the possibility of this
region of the protein to form a polar interaction
(because we suppressed the hydrogen interaction if the
former amino acid had it).
(2) Use appropriated amino acid weights to differenti-
ate the side-chain of each amino acid. In the present
study, we used some descriptors for the natural amino
acid as the amino acid property: the three z-values
[48], Kyte–Doolittle’s hydrophobicity scale [50], ISA
and ECI [49].
(3) Compute the nonstochastic and stochastic protein
bilinear indices. They can be performed in the software
calculation mode, where it is possible to select the
side-chain properties and the family descriptor previ-
ously to calculate the bio-macromolecular indices. This
software generates a table in which the rows and
columns correspond to the compounds and the
b
mk
ð
x
m
;
y
m
Þ,respectively.
(4) Find a QSPR ⁄ QSAR equation by using statistical
techniques, such as multilinear regression analysis,
neural networks, linear discrimination analysis (LDA),
and so on. That is to say, we can find a quantitative
relationship between a property P and the b
mk
ð
x
m
;
y
m
Þ
having, for example, the appearance:
P ¼ a
0
b
m0
ðx
m
; y
m
Þþa
1
b
m1
ðx
m
; y
m
Þþa
2
b
m2
ðx
m
; y
m
Þ
þ ÁÁÁþa
k
b
mk
ðx
m
; y
m
Þþc ð20Þ
where P is the measurement of the property,
b
mk
ð
x
m
;
y
m
Þ½or b
mkL
ð
x
m
;
y
m
Þ is the kth total [or local]
macromolecular nonstochastic bilinear indices, and
the a
k
are the coefficients obtained by the statistical
analysis.
(5) Test the robustness and predictive power of the
QSPR ⁄ QSAR equation by using internal and external
cross-validation techniques.
(6) Develop a structural interpretation of the obtained
QSAR ⁄ QSPR model using macromolecular bilinear
indices as molecular descriptors.
Database
Arc is a homodimer in which each monomer inter-
twines with the other to form a single, globular domain
with a well-defined core. Several side-chain hydrogen
bond and salt-bridge interactions are involved in the
Arc crystal structure. An exhaustive representation of
these interactions are provided in detail elsewhere [32].
Nevertheless, an overview of these electrostatic interac-
tions in Arc repressor structure will be given. Hydro-
gen bond interactions take place [32]:
(1) Between a side-chain in the same subunit (N29-
E36) and between side-chains in different subunits
(R40-S44).
(2) Between a side-chain and main-chain atom
intersubunit (W14-N34, N34-R13) and between a
side-chain and main-chain atom intrasubunits (E17-
E17, S32-S35, S44-R40).
On the other hand, salt-bridge interactions take
place [32]:
(3) Between a side-chain in the same subunit (R16-
D20, D20-R23, R31-E36, E36-R40, E43-K46, E43-
K47) and between side-chains in different subunits
(E28-R50, R40-E48).
The data of Arc repressor mutants were taken from
the literature. In the present study, alanine substitu-
tions were constructed at each of the 51 non-alanine
positions in the wild-type Arc sequence. To avoid
intracellular proteolysis and purification difficulties,
the alanine substitution mutant was constructed in
backgrounds containing the carboxy-terminal exten-
sions (His)
6
(designated st6) or (His)
6
-Lys-Asn-Gln-
His-Glu (designated st11) [68,69]. These tail sequences
allow affinity purification, reduce degradation and
cause no significant changes in protein stability [70].
Milla et al. [32] subjected each purified mutant of
Arc to thermal and urea denaturation experiments. The
stability of the proteins was checked by melting temper-
ature (t
m
). The values of t
m
for 53 Arc homodimers
reported by these authors are given in Tables 7 and 8.
In equilibrium and kinetic unfolding–refolding stud-
ies, only native Arc dimers and denatured monomers
are significantly populated. Thus, folding and dimer-
ization are concerted processes [32,71,72]. For this
reason, it is important to note that t
m
refers to the
unfolding of the Arc homodimer. Accordingly, the fact
that each single mutation changes two side-chains in
the Arc dimer one must take into consideration, with
stability effects being approximately twice those
observed for monomeric proteins. Moreover, changes
in stability may arise as a result of mutation disrupts
of a native interaction, when the native structure of
S. E. Ortega-Broche et al. Predicting the stability of the Arc repressor
FEBS Journal 277 (2010) 3118–3146 ª 2010 The Authors Journal compilation ª 2010 FEBS 3127
[...]... development of linear discriminant functions, which permits the classication of mutants as having near wild-type stability or reduced stability, and therefore describe the protein stability effects of a complete set of alanine substitutions in the Arc repressor Here, we consider a general set of data that consists of 53 A- mutants, with 28 of them having near wildtype stability (128) and the remainder being... linear combinations of nonstochastic [Eqn (27)] and stochastic [Eqn (28)] protein bilinear descriptors account for 83% of variance of the tm for the cases in the training series; the values of F-ratio for Eqns (27) Table 10 Results of the stochastic bilinear indices- driven LDA models of the Arc A- mutants in the training and test sets Mutants with near wild-type stability a Mutant DP% 1 PA8-st 6a 2 SA35-st6... 0.96 and 1.03 for Eqns (31) and (32)] In Tables 11 and 12, we depict the observed, calculated [by using Eqns (29) to (32)] and residual values of tm for cases in both training and test sets Different protein folding may be the reason for the lack of linear correlation between protein bilinear indices and stability (tm) for these mutants, leading to a nonlinear dependence between tm and the protein bilinear. .. information about the electrostatic interactions among amino acids appears to be necessary Here, we analyze the relevance of the inclusion of this type of information for obtaining descriptors that encode relevant structural information correlating with the stability changes of the Arc mutants Accordingly, we compared the accuracies of classication models based on nonstochastic protein bilinear indices. .. analysis This dataset was randomly divided into two subsets: one containing 39 mutants, which was used as a training set, and the other containing nine mutants (ve having near wild-type stability and four having reduced stability) , which was used as a test set Combining nonstochastic and stochastic total protein bilinear indices with MLR analysis, we developed the QSSR linear models to describe tm for. .. derivation is straightforward, and it is easy to interpret the QSARs QSPRs that include them We have shown that the use of protein total bilinear indices can account for the thermodynamic parameters for both wild-type and mutant Arc proteins The resulting quantitative models are signicant from a statistical point of view Concluding Remarks In the present study, a new set of bio-macromolecular descriptors. .. the data set and the test set (full set) , the accuracy was 98.11% (52 53) and 96.23% (51 53) for Eqns (25) and (26), respectively, by using nonstochastic and stochastic bilinear indices in that order These statistical parameters suggest that linear combinations of protein bilinear indices are appropriate for the discrimination of near wild-type stability reduced stability mutants studied here Equations... dimmer These results suggest that Arc folding is a rather complicated process that depends on various processes and the combinations of parameters (bilinear indices calculated with each pairs of amino acid properties) are necessary to describe adequately the tm of these Arc mutants [Eqns (25) and (26)] From a comparison of the accuracies of classication models based on nonstochastic protein bilinear indices. .. importance of protein structural information for the numerical characterization of Arc mutants and its relationship with stability changes It is well known that salt-bridges and hydrogen bonds play an important role in maintaining the 3D structure of proteins [87] Therefore, to obtain a useful numerical characterization of proteins for the study of its properties (stability, folding, etc.), the use of information... based on the kind of method use for deriving the QSPR and their statistical parameter, the explored molecular descriptors, the overall accuracy (%), Matthews correlation coefcient and the validation method used Table 15 shows a comparison between nonstochastic and stochastic protein bilinear indices based on classication methods and other reported approaches for predicting the stability of Arc repressor . macromolecular fragments: b m ð x m ; y m Þ¼ X Z L¼1 b mkL ð x m ; y m Þð18Þ s b m ð x m ; y m Þ¼ X Z L¼1 s b mkL ð x m ; y m Þð19Þ In addition, the amino acid-type bilinear indices can also be calculated. Amino acid and amino acid-type bilinear indices are specific cases of local protein bilin- ear indices. In this sense,. TOMOCOMD-CAMPS and protein bilinear indices – novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor Sadiel. the kth amino acid -bilinear indices are calculated by summing the kth amino acid bilinear indices of all amino acids of the same amino Table 4. Values of nonstochastic and stochastic total bilinear
Ngày đăng: 29/03/2014, 09:20
Xem thêm: Báo cáo khoa học: and protein bilinear indices – novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor ppt, Báo cáo khoa học: and protein bilinear indices – novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor ppt