Thông tin tài liệu
Neural Network Based
Equalization
In this chapter, we will give an overview of neural network based equalization. Channel
equalization can be viewed
as
a
classification problem. The optimal solution to this classifi-
cation problem is inherently nonlinear. Hence we will discuss, how the nonlinear structure of
the artificial neural network can enhance the performance of conventional channel equalizers
and examine various neural network designs amenable to channel equalization, such
as
the
so-
called multilayer perceptron network [236-2401, polynomial perceptron network 1241-2441
and radial basis function network 185,245-2471. We will examine
a
neural network structure
referred to
as
the Radial Basis Function (RBF) network in detail in the context of equaliza-
tion.
As
further reading, the contribution by Mulgrew [248] provides an insightful briefing
on applying RBF network for both channel equalization and interference rejection problems.
Originally RBF networks were developed for the generic problem of data interpolation in
a
multi-dimensional space 1249,2501. We will describe the RBF network in general and
motivate its application. Before we proceed, our forthcoming section will describe the dis-
crete time channel model inflicting intersymbol interference that will be used throughout this
thesis.
8.1
Discrete Time Model for Channels Exhibiting
Intersymbol Interference
A
band-limited channel that results in intersymbol interference
(ISI)
can be represented by
a
discrete-time transversal filter having
a
transfer function
of
n=O
where
fn
is the nth impulse response tap of the channel and
L
+
1
is the length of the
channel impulse response
(CIR).
In this context, the channel represents the convolution of
299
Adaptive Wireless Tranceivers
L. Hanzo, C.H. Wong, M.S. Yee
Copyright © 2002 John Wiley & Sons Ltd
ISBNs: 0-470-84689-5 (Hardback); 0-470-84776-X (Electronic)
300
CHAPTER
8.
NEURAL NETWORK BASED EOUALIZATION
t
Figure
8.1:
Equivalent discrete-time model
of
a channel exhibiting intersymbol interference and expe-
riencing additive white Gaussian noise.
the impulse responses of the transmitter filter, the transmission medium and the receiver filter.
In our discrete-time model discrete symbols
I,
are transmitted to the receiver at a rate of
$
symbols per second and the output
‘uk
at the receiver is also sampled at
a
rate of per second.
Consequently,
as
depicted in Figure
8.1,
the passage of the input sequence
{Ik}
through the
channel results in the channel output sequence
{vk}
that can be expressed as
n=o
where
{qk}
is a white Gaussian noise sequence with zero mean and variance
0:.
The number
of interfering symbols contributing to the
IS1
is
L.
In general, the sequences
{vk},
{Ik},
(7,)
and
{fn}
are complex-valued. Again, Figure
8.1
illustrates the model
of
the equivalent
discrete-time system corrupted by Additive White Gaussian Noise (AWGN).
8.2
Equalization as a Classification Problem
In this section we will show that the characteristics of the transmitted sequence can be ex-
ploited by capitalising
on
the finite state nature
of
the channel and by considering the equal-
ization problem as
a
geometric classification problem. This approach was first expounded
by Gibson, Siu and Cowan
[237],
who investigated utilizing nonlinear structures offered by
Neural Networks (NN) as channel equalisers.
We assume that the transmitted sequence is binary with equal probability
of
logical ones
and zeros in order
to
simplify the analysis. Referring
to
Equation
8.2
and using the notation
8.2.
EQUALIZATION AS A CLASSIFICATION PROBLEM
301
Vk
I
1
Equaliser Decision Function
l
Figure 8.2:
Linear
m-tap
equalizer
schematic.
of Section 8.1, the symbol-spaced channel output is defined by
L
where
{qk}
is the additive Gaussian noise sequence,
{
fiL},
n
=
0,
l!.
.
.
,L
is the CIR,
{II;}
is the channel input sequence and
{Vk}
is the noise-free channel output.
The mth order equaliser, as
-
illustrated in Figure
8.2,
has
m
taps as well as a delay of
7,
and it produces an estimate
Ik-T
of the transmitted signal
IkPT.
The delay
T
is due to
the precursor section of the CIR, since it is necessary to facilitate the causal operation of the
equalizer by supplying the past and future received samples, when generating the delayed
detected symbol
IkP7.
Hence the required length of the decision delay is typically the length
of
the CIR's precursor section, since outside this interval the CIR is zero and therefore the
equaliser does not have
to
take into account any other received symbols. The channel output
observed by the linear mth order equaliser can be written in vectorial form as
vk
[
vk
Vk-1
. .
.
VkPm+l
l',
(8.4)
and hence we can say that the equalizer has an m-dimensional channel output observation
space. For a CIR of length
L
+
1,
there
are
hence
n,
=
2L+m
possible combinations of the
binary channel input sequence
II,
=
[
II,
Ik-1
.
.
.
Ik-m-L+1
IT
(8.5)
that produce
71,
=
2L+7n
different possible noise-free channel output vectors
Vk
=
[
Vk
Vk-1 . .
.
Vk-m+l
]
.
T
(8.6)
The possible noise-free channel output vectors
Vk
or particular points in the observation space
will be referred to as the desired channel states. Expounding further, we denote each of the
n,
=
2L+m
possible combinations
of
the channel input sequence
Ik
of length
Lfm
symbols
302
CHAPTER
S.
NEURAL NETWORK BASED EQUALIZATION
as
si,
1
5
i
5
R,
=
2L+Tn, where the channel input state
si
determines the desired channel
output state
ri,
i
=
1,
2,
.
.
.
,
n,$
=
2L+m. This is formulated as:
vk
=
r,
if
Ik
=
S,,
i
=
1,2,.
.
. ,
n,.
The desired channel output states can be partitioned into two classes according to the
binary value
of
the transmitted symbol
IkPr,
as seen below:
and
We can denote the desired channel output states according to these two classes as follows:
where the quantities
nf
and
71.;
represent the number of channel states
rt
and
r;
in the set
K:,7
and
V&,
respectively.
The relationship between the transmitted symbol
I,
and the channel output
Uk
can also
be written in a compact form as:
(8.10)
where
vk
is
an m-component vector that represents the AWGN sequence, is the noise-free
channel output vector and
F
is an
m
x
(m
+
L)
CIR-related matrix in the form
of
with
f3,
j
=
0,.
. .
,
L
being the CIR taps.
Below we demonstrate the concept of finite channel states in a two-dimensional output
observation space
(m
=
2) using a simple two-coefficient channel
(L
=
l), assumming the
CIR
of:
F(z)
=
1
+
0.52-l.
(8.12)
Thus,
F
=
[
1,
Vk
=
[
ijk
ijk-1
]
and
11,
=
[
I,+
1,-l
1k-2
]
.
T T
All the possible combinations of the transmitted binary symbol
Ik
and the noiseless channel
outputs
cl;,
ijk-1,
are listed in Table
8.1.
8.2.
EQUALIZATION AS A CLASSIFICATION PROBLEM
303
2-
1-
-l
-3
'
-3
Figure
8.3:
The noiseless BPSK-related channel states
Vk
=
ri
and the noisy channel outputs
Vk
of
a
Gaussian channel having
a
CIR
of
F(z)
=
1
+
0.5~~~
in
a
two-dimensional observation
space. The noise variance
a:
=
0.05,
the number
of
noisy received
Vk
samples output by
the channel and input to the equalizer is
2000
and the decision delay
is
T
=
0.
The linear
decision boundary separates the
noisy
received
vk
clusters that correspond to
IkPr
=
+l
from those that correspond to
Ik r
=
-1.
304
CHAPTER
8.
NEURAL NETWORK BASED EOUALIZATION
II,
Ik,-l
Ik-2
+1.5
+IS
+l
+l
+l
+1.5
+OS
+l
+l
-1
+0.5
-0.5
+l
-1
+l
+0.5
-1.5
+l
-1
-1
-0.5
+1.5
-1
+l
+l
-0.5
+0.5
-1
+l
-1
-1.5
-0.5
-1
-1
+l
-1.5 -1.5
-1 -1 -1
‘(?k
Table
8.1:
Transmitted signal and noiseless channel states for the
CIR
of
F(z)
=
1
+
0.5~~’
and
an
equalizer order
of
m
=
2.
Figure 8.3 shows the
8
possible noiseless channel states
VI,
for a
BPSK
modem and the
noisy channel output
vk
in the presence of zero mean AWGN with variance
0;
=
0.05. It is
seen that the observation vector
VI,
forms clusters and the centroids of these clusters are the
noiseless channel states
rz.
The equalization problem hence involves identifying the regions
within the observation space spanned by the noisy channel output
vk
that correspond to the
transmitted symbol
of
either
II,
=
+l
or
1,
=
-1.
A linear equalizer performs the classification in conjunction with a decision device, which
is often a simple sign function. The decision boundary, as seen in Figure 8.3, is constituted
by the locus
of
all
values
of
vk,
where the output
of
the linear equalizer is zero as it is
demonstrated below. For example, for
a
two tap linear equalizer having tap coefficients
(-1
and
Q,
at the decision boundary we have:
and
(8.14)
gives a straight line decision boundary as shown in Figure 8.3, which divides the observa-
tion space into two regions corresponding
to
II,
=
+l
and
1,
=
-1.
In
general, the linear
equalizer can only implement a hyperplane decision boundary, which in our two-dimensional
example was constituted by a line. This
is
clearly a non-optimum classification strategy, as
our forthcoming geometric visualization will highlight. For example, we can see in Figure 8.3
that the point
V
=
[
0.5 -0.5
]
associated with the
II,
=
+l
decision is closer to the de-
cision boundary than the point
V
=
[
-1.5 -0.5
]
associated with the
II,
=
-1
decision.
Therefore, in the presence of noise, there is a higher probability of the channel output centred
at point
V
=
[
0.5 -0.5
]
to be wrongly detected as
Ik
=
-1,
than that
of
the channel
output centred around
V
=
[
-
1.5
-0.5
]
being incorrectly detected as
I,
=
+l.
Gibson
et
ul.
[237]
have shown examples of linearly non-separable channels, when the decision de-
lay is zero and the channel is of non-minimum phase nature. The linear separability
of
the
channel depends
on
the equalizer order,
m,
on the delay
r
and in situations where the channel
characteristics are time varying, it may not be possible to specify values of
m
and
r,
which
will guarantee linear separability.
8.3.
INTRODUCTION TO NEURAL NETWORKS
305
According to Chen, Gibson and Cowan [241], the above shortcomings of the linear equal-
izer are circumvented by a Bayesian approach
[25
l]
to obtaining an optimal equalization
so-
lution. In this spirit, for an observed channel output vector
vk,
if the probability that it was
caused by
IkPT
=
+l
exceeds the probability that it was caused by
IkPT
=
-1,
then we
should decide in favour of
+l
and vice versa. Thus, the optimal Bayesian equalizer solution
is defined as [241]:
(8.15)
where the optimal Bayesian decision function
fsayes(.),
based on the difference
of
the asso-
ciated conditional density functions
is
given by
[85]:
where
p+
and
pi
is the
a
priori
probability of appearance of each desired state
rt
E
Vz,T
and
ri
E
V;,T,
respectively and
p(.)
denotes the associated probability density function.
The quantities
nf
and
n;
represent the number of desired channel states in
VA,,
and
V;,T,
respectively, which are defined implicitly in Figure 8.3.
If
the noise distribution is Gaussian,
Equation 8.16 can be rewritten as:
j=1
Again, the optima1 decision boundary is the locus of all values
of
Vk, where the probability
Ik-T
=
+l
given a value
vk
is equal to the probability
IkPT
=
-1
for the same
vk.
In general, the optimal Bayesian decision boundary is a hyper-surface, rather than just
a hyper-plane in the m-dimensional observation space and the realization of this nonlinear
boundary requires a nonlinear decision capability. Neural networks provide this capability
and the following section will discuss the various neural network structures that have been
investigated in the context of channel equalization, while also highlighting the learning algo-
rithms used.
8.3
Introduction
to
Neural Networks
8.3.1
Biological and Artificial Neurons
The human brain consists of a dense interconnection of simple computational elements re-
ferred to as neurons. Figure 8.4(a) shows a network of biological neurons. As seen in the
306
CHAPTER
8.
NEURAL NETWORK RASED
EQUALIZATION
W(>-
Apical dendrlte
&on-#
\
Basal dendrlte
(Initla1
segment)
(a)
Anatomy
of
a
typical biological neuron,
from Kandel
[252]
Inputs
l
4-
\
*\
Activation
\
\
function
’/
2
-
(b)
An artificial neuron (jth-neuron)
Figure
8.4:
Comparison
between
biological
and
artificial
neurons.
figure, the neuron consists
of
a cell body
-
which provides the information-processing func-
tions
-
and of the so-called axon with its terminal fibres. The dendrites seen
in
the figure are
the neuron’s ‘inputs’, receiving signals from other neurons. These input signals may cause
the neuron tofire, i.e. to produce
a
rapid, short-term change in the potential difference across
the cell’s membrane. Input signals to the cell may be excitatory, increasing the chances of
neuron firing, or inhibitory, decreasing these chances. The axon
is
the neuron’s transmission
line that conducts the potential difference away from the cell body towards the terminal fi-
bres. This process produces the so-called
synapses,
which form either excitatory
or
inhibitory
connections to the dendrites
of
other neurons, thereby forming a neural network. Synapses
mediate the interactions between neurons and enable the nervous system to adapt and react
to its surrounding environment.
In Artificial Neural Networks
(ANN),
which mimic the operation of biological neural
networks, the processing elements are artificial neurons and their signal processing properties
are loosely based on those
of
biological neurons. Refemng to Figure 8.4(b), the jth-neuron
has
a
set of
I
synapses
or
connection links. Each link is characterized by a synaptic weight
wiJ
,
i
=
l,
2,
.
.
.
, I.
The weight
wij
is positive,
if
the associated synapse is excitatory and it
is negative, if the synapse is inhibitory. Thus, signal
xi
at the input of synapse
i,
connected
to neuron
j,
is multiplied by the synaptic weight
wij.
These synaptic weights that store
‘knowledge’ and provide connectivity, are adapted during the learning process.
The weighted input signals of the neuron are summed up by an adder. If this summation
8.3.
INTRODUCTION TO NEURAL NETWORKS
307
exceeds a so-called firing threshold
e,,
then the neuron fires and issues an output. Otherwise
it remains inactive. In Figure 8.4(b) the effect
of
the firing threshold
0,
is represented by
a
bias, arising from an input which is always ‘on’, corresponding to
x0
=
1,
and weighted
by
WO,~
=
-Bj
=
bJ.
The importance of this is that the bias can be treated as just another
weight. Hence,
if
we have a training algorithm for finding an appropriate set of weights for
a network of neurons, designed to perform a certain function, we do not need to consider the
biases separately.
a
v
9
2
I
.5
1
0.5
0
-0.5
-1
m
-2
-1
0
1
2
U
a
v
9
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-1
0
1
2
21
(a)
Threshold activation function
(b)
Piecewise-linear activation function
h
S
v
9
1.5
1
0.5
0
-0.5
-1
-1.5
-10
-5
0
5
10
71
(c)
Sigmoid activation function
Figure
8.5:
Various
neural activation functions
f(u).
The activation function
f(.)
of Figure
8.5
limits the amplitude of the neuron’s output to
some permissible range and provides nonlinearities. Haykin
[253]
identifies three basic types
of activation functions:
308
CHAPTER
8.
NEURAL NETWORK BASED EQUALIZATION
1.
Threshold Function.
For the threshold function shown in Figure 8.5(a), we have
1
ifv
20
0
if21
<O
'
(8.18)
Neurons using this activation function are referred to in the literature as the
McCulloch-
Pirrs
model [253].
In this model, the output of the neuron gives the value of
1
if the
total internal activity level of that neuron is nonnegative and
0
otherwise.
2.
Piecewise-Linear Function.
This neural activation function, portrayed in Figure 8.5(b),
is represented mathematically by:
i
1,
v>l
-1,
21
<-l
f(v)
=
v,
-1
>W
>
1
,
where the amplification factor inside the linear region is assumed to be unity
activation function approximates a nonlinear amplifier.
(8.19)
.
This
3. Sigmoid Function.
A
commonly used neural activation function in the construction of
artificial neural networks is the sigmoid activation function. It is defined as a strictly
increasing function that exhibits smoothness and asymptotic properties, as seen in Fig-
ure 8.5(c). An example of the sigmoid function is the hyperbolic tangent function,
which is shown in Figure 8.5(c) and it is defined by
[253]:
(8.20)
This activation function is differentiable, which is an important feature in neural net-
work theory
[253].
The model of the jth artificial neuron, shown in Figure 8.4(b) can be described in mathe-
matical terms by the following pair of equations:
where:
I
vj
=
>,
W,lX,.
(8.22)
i=O
Having introduced the basic elements of neural networks, we will focus next on the as-
sociated network structures or architectures. The different neural network structures yield
different functionalities and capabilities. The basic structures will be described in the follow-
ing section.
8.3.2
Neural Network Architectures
The network's architecture defines the neurons' arrangement in the network. Various neural
network architectures have been investigated for different applications, including for example
[...]... propagation algorithm has a slower convergence rate than the conventional adaptive equalizer using the Least Mean Square (LMS) algorithm described in Appendix A.2 Thiswas illustrated for example by Siu et al [240] using experimental results The introduction of the so-called momentum term was suggested by Rumelhart et al [256] for the adaptive algorithm to improve the convergence rate The idea is based on... activation function introduces local minima to the error surfaceof the otherwise linear perceptron structure Thus, the stochastic CHAPTER 8 NEURAL NETWORK 316 BASED EQUALIZATION gradient algorithm [255,256] assisted the previously mentioned momentum term [256] can by be invoked in their scheme in order to adaptively train the equaliser The decision feedback in order to further structure of Figure 8.10 can... to each ! i centre provides the width parameter, g Finally, we can calculate the RBF weights w using Equation 8.76 or adaptively using the LMS algorithm [253] Note that apart from regularization, an alternative way of reducing the number of basis OLS learning functions required and thus reduce the associated complexity is to use the procedure proposed by Chen Cowan and Grant [274] This method is based... k The adaptive K-means clustering algorithm computes the new reference vector c i , k + l as Ci.lC+l = Ci,k +Mi(Xk){P(Xk - Ci,k)), (8.78) 330 CHAPTER 8 NEURAL NETWORK BASED EQUALIZATION where p is the learning rate governing the speed and accuracy the adaptation andMi (xk) of is the so-called membership indicator that specifies, whether the input pattern belongs to xk i In the traditional adaptive. .. ’feel guilty’ and therefore pulls it itself out of the competition Thus, the average rates of ’winning’ for each region is equalized and no reference vectors can get ’entrenched’in that region However, these two methods yield partitions that are not optimal with respect to the MSE cost function of Equation 8.77 The performance of the adaptive K-means algorithm depends on the learning rate p in Equation... The PP equalizer is attractive, since it has a simpler structure than that of the MLP The PP equalizer also has a multi-modal error surface - exhibiting a numberof local minimaand a global minimum - and thus still retains some problems associated with its convergence performance, although notas grave as theMLP structure Another drawback is that the number of terms in the polynomial of Equation 8.24 increases... namely X + and X - This dichotomy or binary partition the points with respect to a surface becomes successful,if the surface separates the points belonging to the class X + from those in the class X - Thus, to solve the pattern-classification problem, we need to provide this separating surface that gives the decision boundary, as shown in Figure 8.14 We will now non-linearly cast the problemof separating... constitutes the interpolation between the data points, where the interpolation is performed along the constrained surface generated by ? the fitting procedure, as the optimum approximation to the true surfacel Thus, we are led to the theory of multivariable interpolation in high-dimensional spaces Assuming a single-dimensional output space, the interpolation problem can be stated as follows: Given a set of N... coefficients w constitute a new set of weights Using radial basis functions, we set (P~(x) G ( / ~- cill), = x i = 1 , 2 , , M , (8.68) where ci, = 1 , 2 , , M , is the set of RBF centres to be determined Thus, with the aid of i Equation 8.67 and Equation 8.68 we have M F*(x) = C W ~ G ( Xi ) c ; Gill) (8.69) Now the problem we have to address is the determination the new set of weights wi, = 1 , 2 , of... yields [253]: (GTG+ XGo)w = G T d , (8.71) CHAPTER 8 NEURAL NETWORK BASED EQUALIZATION 328 where (8.75) Here, the matrix G is a non-symmetric N-by-&l matrix and the matrix Go is a symmetric M-by-M matrix Thus, upon solving Equation 8.71 to obtain the weights W, we get: W = (GTG+ XGo)-'GTd (8.76) Observe that the solution in Equation 8.76 is different from Tikhonov's solution in Equation 8.62 Specifically, . response
(CIR).
In this context, the channel represents the convolution of
299
Adaptive Wireless Tranceivers
L. Hanzo, C.H. Wong, M.S. Yee
Copyright © 2002. channel
(L
=
l), assumming the
CIR
of:
F(z)
=
1
+
0.52-l.
(8.12)
Thus,
F
=
[
1,
Vk
=
[
ijk
ijk-1
]
and
11,
=
[
I,+
1,-l
1k-2
Ngày đăng: 21/01/2014, 17:20
Xem thêm: Tài liệu Adaptive thu phát không dây P8 pptx