Mathematics for computer networking

Mathematical Foundations of Computer Networking by S Keshav To Nicole, my foundation Introduction Motivation Graduate students, researchers, and practitioners in the field of computer networking often require a firm conceptual understanding of one or more of its theoretical foundations Knowledge of optimization, information theory, game theory, control theory, and queueing theory is assumed by research papers in the field Yet these subjects are not taught in a typical computer science undergraduate curriculum This leaves only two alternatives: either to study these topics on one’s own from standard texts or take a remedial course Neither alternative is attractive Standard texts pay little attention to computer networking in their choice of problem areas, making it a challenge to map from the text to the problem at hand And it is inefficient to require students to take an entire course when all that is needed is an introduction to the topic This book addresses these problems by providing a single source to learn about the mathematical foundations of computer networking Assuming only a rudimentary grasp of calculus, it provides an intuitive yet rigorous introduction to a wide range of mathematical topics The topics are covered in sufficient detail so that the book will usually serve as both the first and ultimate reference Note that the topics are selected to be complementary to those found in a typical undergraduate computer science curriculum The book, therefore, does not cover network foundations such as discrete mathematics, combinatorics, or graph theory Each concept in the book is described in four ways: intuitively; using precise mathematical notation; with a carefully chosen numerical example; and with a numerical exercise to be done by the reader This progression is designed to gradually deepen understanding Nevertheless, the depth of coverage provided here is not a substitute for that found in standard textbooks Rather, I hope to provide enough intuition to allow a student to grasp the essence of a research paper that uses these theoretical foundations Organization The chapters in this book fall into two broad categories: foundations and theories The first five foundational chapters cover probability, statistics, linear algebra, optimization, and signals, systems and transforms These chapters provide the basis for the four theories covered in the latter half of the book: queueing theory, game theory, control theory, and information theory Each chapter is written to be as self-contained as possible Nevertheless, some dependencies exist, as shown in Figure 1, where light arrows show weak dependencies and bold arrows show strong dependencies Signals, Systems and Transforms Linear algebra Control theory Game theory Optimization Probability Queueing theory Statistics Information theory FIGURE Chapter organization Using this book The material in this book can be completely covered in a sequence of two graduate courses, with the first course focussing on the first five chapters and the second course on the latter four For a single-semester course, some posible alternatives are to cover: • • • • probability, statistics, queueing theory, and information theory linear algebra, signals, systems and transforms, control theory and game theory linear algebra, signals, systems and transforms, control theory, selected portions of probability, and information theory linear algebra, optimization, probability, queueing theory, and information theory This book is designed for self-study Each chapter has numerous solved examples and exercises to reinfornce concepts My aim is to ensure that every topic in the book should be accessible to the perservering reader Acknowledgements I have benefitted immensely from the comments of dedicated reviewers on drafts of this book Two in particular who stand out are Alan Kaplan, whose careful and copious comments improved every aspect of the book, and Prof Johnny Wong, who not only reviewed multiple drafts of the chapters on probability on statistics, but also used a draft to teach two graduate courses at the University of Waterloo I would also like to acknowledge the support I received from experts who reviewed individual chapters: Augustin Chaintreau, Columbia (probability and queueing theory), Tom Coleman, Waterloo (optimization), George Labahn, Waterloo (linear algebra), Kate Larson, Waterloo (game theory), Abraham Matta, Boston University (statistics, signals, systems, transforms, and control theory), Sriram Narasimhan, Waterloo (control theory), and David Tse, UC Berkeley (information theory) I received many corrections from my students at the University of Waterloo who took two courses based on book drafts in Fall 2008 and Fall 2011 These are: Andrew Arnold, Nasser Barjesteh, Omar Beg, Abhirup Chakraborty, Betty Chang, Leila Chenaei, Francisco Claude, Andy Curtis, Hossein Falaki, Leong Fong, Bo Hu, Tian Jiang, Milad Khalki, Robin Kothari, Alexander Laplante, Constantine Murenin, Earl Oliver, Sukanta Pramanik, Ali Rajabi, Aaditeshwar Seth, Jakub Schmidtke, Kanwaljit Singh, Kellen Steffen, Chan Tang, Alan Tsang, Navid Vafei, and Yuke Yang Last but not the least, I would never have completed this book were it not for the unstinting support and encouragement from every member of my family for the last four years Thank you S Keshav Waterloo, October 2011 CHAPTER Probability Introduction Outcomes Events Disjunctions and conjunctions of events Axioms of probability Subjective and objective probability Joint and conditional probability Joint probability Conditional probability Bayes’ rule Random variables 10 Distribution 11 Cumulative density function 12 Generating values from an arbitrary distribution 13 Expectation of a random variable 13 Variance of a random variable 15 Moments and moment generating functions 15 Moments 16 Moment generating functions 16 Properties of moment generating functions 17 Standard discrete distributions 18 Bernoulli distribution 18 Binomial distribution 18 Geometric distribution 20 Poisson distribution 20 Standard continuous distributions 21 Uniform distribution 21 Gaussian or Normal distribution 21 Exponential distribution 23 Power law distribution 25 Useful theorems 26 Markov’s inequality 26 Chebyshev’s inequality 27 Chernoff bound 28 Strong law of large numbers 29 Central limit theorem 30 Jointly distributed random variables 31 Bayesian networks 33 Further Reading Exercises CHAPTER 35 35 Statistics 39 Sampling a population 39 Types of sampling 40 Scales 41 Outliers 41 Describing a sample parsimoniously 41 Tables 42 Bar graphs, histograms, and cumulative histograms 42 The sample mean 43 The sample median 46 Measures of variability 46 Inferring population parameters from sample parameters 48 Testing hypotheses about outcomes of experiments 51 Hypothesis testing 51 Errors in hypothesis testing 51 Formulating a hypothesis 52 Comparing an outcome with a fixed quantity 53 Comparing outcomes from two experiments 54 Testing hypotheses regarding quantities measured on ordinal scales Fitting a distribution 58 Power 60 Independence and dependence: regression, and correlation 56 61 Independence 61 Regression 62 Correlation 64 Comparing multiple outcomes simultaneously: analysis of variance One-way layout 68 Multi-way layouts 70 Design of experiments 70 Dealing with large data sets 71 Common mistakes in statistical analysis 72 What is the population? 72 Lack of confidence intervals in comparing results 72 Not stating the null hypothesis 73 Too small a sample 73 Too large a sample 73 Not controlling all variables when collecting observations 73 Converting ordinal to interval scales 73 Ignoring outliers 73 Further reading Exercises CHAPTER 73 74 Linear Algebra Vectors and matrices 77 77 Vector and matrix algebra 78 Addition 78 Transpose 78 Multiplication 79 Square matrices 80 Exponentiation 80 Matrix exponential 80 Linear combinations, independence, basis, and dimension Linear combinations 80 Linear independence 81 Vector spaces, basis, and dimension 82 Solving linear equations using matrix algebra 82 Representation 82 Elementary row operations and Gaussian elimination 83 Rank 84 Determinants 85 Cramer’s theorem 86 The inverse of a matrix 87 Linear transformations, eigenvalues and eigenvectors 88 80 67 A matrix as a linear transformation 88 The eigenvalue of a matrix 89 Computing the eigenvalues of a matrix 91 Why are eigenvalues important? 93 The role of the principal eigenvalue 94 Finding eigenvalues and eigenvectors 95 Similarity and diagonalization 96 Stochastic matrices 98 Computing state transitions using a stochastic matrix 98 Eigenvalues of a stochastic matrix 99 Exercises CHAPTER 101 Optimization 103 System modelling and optimization 103 An introduction to optimization 105 Optimizing linear systems 107 Network flow 109 Integer linear programming 110 Total unimodularity 112 Weighted bipartite matching 112 Dynamic programming 113 Nonlinear constrained optimization 114 Lagrangian techniques 115 Karush-Kuhn-Tucker conditions for nonlinear optimization 116 Heuristic non-linear optimization 117 Hill climbing 117 Genetic algorithms 118 Exercises CHAPTER 118 Signals, Systems, and Transforms 121 Introduction 121 Background 121 Sinusoids 121 Complex numbers 123 Euler’s formula 123 Discrete-time convolution and the impulse function 126 Continuous-time convolution and the Dirac delta function 128 Signals 130 The complex exponential signal 131 Systems 132 Types of systems 133 Analysis of a linear time-invariant system 134 The effect of an LTI system on a complex exponential input The output of an LTI system with a zero input 135 The output of an LTI system for an arbitrary input 137 Stability of an LTI system 137 Transforms 138 The Fourier series 139 The Fourier Transform 142 Properties of the Fourier transform 144 134 The Laplace Transform 148 Poles, Zeroes, and the Region of convergence 149 Properties of the Laplace transform 150 The Discrete Fourier Transform and Fast Fourier Transform 153 The impulse train 153 The discrete-time Fourier transform 154 Aliasing 155 The Discrete-Time-and-Frequency Fourier Transform and the Fast Fourier Transform (FFT) 157 The Fast Fourier Transform 159 The Z Transform 161 Relationship between Z and Laplace transform 163 Properties of the Z transform 164 Further Reading Exercises CHAPTER 166 166 Stochastic Processes and Queueing Theory 169 Overview 169 A general queueing system 170 Little’s theorem 170 Stochastic processes 171 Discrete and continuous stochastic processes 172 Markov processes 173 Homogeneity, state transition diagrams, and the Chapman-Kolmogorov equations 174 Irreducibility 175 Recurrence 175 Periodicity 176 Ergodicity 176 A fundamental theorem 177 Stationary (equilibrium) probability of a Markov chain 177 A second fundamental theorem 178 Mean residence time in a state 179 Continuous-time Markov Chains 179 Markov property for continuous-time stochastic processes 179 Residence time in a continuous-time Markov chain 180 Stationary probability distribution for a continuous-time Markov chain Birth-Death processes 181 Time-evolution of a birth-death process 181 Stationary probability distribution of a birth-death process 182 Finding the transition-rate matrix 182 A pure-birth (Poisson) process 184 Stationary probability distribution for a birth-death process 185 The M/M/1 queue 186 Two variations on the M/M/1 queue 189 The M/M/ queue: a responsive server 189 M/M/1/K: bounded buffers 190 Other queueing systems 192 M/D/1: deterministic service times 192 G/G/1 193 Networks of queues 193 Further reading Exercises 194 194 180 CHAPTER Game Theory 197 Concepts and terminology 197 Preferences and preference ordering 197 Terminology 199 Strategies 200 Normal- and extensive-form games 201 Response and best response 203 Dominant and dominated strategy 203 Bayesian games 204 Repeated games 205 Solving a game 205 Solution concept and equilibrium 205 Dominant strategy equilibria 206 Iterated removal of dominated strategies 207 Maximin equilibrium 207 Nash equilibria 209 Correlated equilibria 211 Other solution concepts 212 Mechanism design 212 Examples of practical mechanisms 212 Three negative results 213 Two examples 214 Formalization 215 Desirable properties of a mechanism 216 Revelation principle 217 Vickrey-Clarke-Groves mechanism 217 Problems with VCG mechanisms 219 Limitations of game theory Further Reading Exercises CHAPTER 220 221 221 Elements of Control Theory 223 Introduction 223 Overview of a Controlled System Modelling a System 223 225 Modelling approach 226 Mathematical representation 226 A First-order System 230 A Second-order System 231 Case 1: - an undamped system 231 Case 2: - an underdamped system 232 Critically damped system () 233 Overdamped system () 234 Basics of Feedback Control 235 System goal 236 Constraints 237 PID Control 238 Proportional mode control 239 Integral mode control 240 Derivative mode control 240 Combining modes 241 Advanced Control Concepts 241 Cascade control 242 Control delay 242 Stability 245 BIBO Stability Analysis of a Linear Time-invariant System 246 Zero-input Stability Analysis of a SISO Linear Time-invariant System Placing System Roots 249 Lyapunov stability 250 State-space Based Modelling and Control 251 State-space based analysis 251 Observability and Controllability 252 Controller design 253 Digital control 254 Partial fraction expansion 255 Distinct roots 256 Complex conjugate roots 256 Repeated roots 257 Further reading Exercises CHAPTER 258 258 Information Theory Introduction 261 261 A Mathematical Model for Communication From Messages to Symbols Source Coding 261 264 265 The Capacity of a Communication Channel 270 Modelling a Message Source 271 The Capacity of a Noiseless Channel 272 A Noisy Channel 272 The Gaussian Channel 278 Modelling a Continuous Message Source 278 A Gaussian Channel 280 The Capacity of a Gaussian Channel 281 Further Reading Exercises 284 284 Solutions to Exercises 287 Probability Statistics 287 290 Linear Algebra Optimization 293 296 Transform domain techniques Queueing theory Game theory 304 307 Control Theory 309 Information Theory 312 300 249 Transform domain techniques DRAFT 2 ∂F = x +y –1 = ∂λ (EQ 3) Solving (1), we get two solutions for x, denoted x1 and x2 : x1 = 0, x2 = – 2λ -3 Corresponding to x1 we solve (3) to find y11 = 1, y12 = -1 and put these in (2) to get λ 11 =-1, λ 12 = The extermal values of z = x3 + 2y for this solution of x therefore are and -2, achieved at the points (0,1) and (0,-1) Corresponding to x2 we find from (3) that - λ + y2 = Substituting λ = – - from (2) and solving for y, we find y that y is complex, so that there are no real points (x,y) satisfying (3) Therefore, the only viable extremal points are the two found above, which correspond to a constrained maximum and constrained minimum respectively Hill climbing We start with K random points and compute the optimal value reached at each point If we have K unique results, we return the best point Otherwise, we eliminate the repeated results, say r of them, and start again with r points and repeat the process (remembering those results already computed) When we reach K different points the algorithm finishes and returns the global optimum Note that we could iterate infinitely before finding the K local optima However, without making any additional assumptions about the space, we cannot guarantees a better method to find the global optimum Chapter 5: Transform domain techniques Complex arithmetic π – j - π j - e 2+e π π π π = ⎛ ⎛ cos ⎛ – -⎞ + j sin ⎛ – -⎞ ⎞ + ⎛ cos ⎛ -⎞ + j sin ⎛ – -⎞ ⎞ ⎞ ⎝ 2⎠ ⎝ 2⎠ ⎠ ⎝ ⎝ 2⎠ ⎝ 2⎠ ⎠ ⎠ ⎝⎝ π = cos ⎛ -⎞ ⎝ 2⎠ = Phase angle This is given by atan ⎛⎝ -⎞⎠ = π - Discrete convolution ∞ z(5)= ∑ x ( τ )y ( – τ ) This reduces to computing products x(a).y(b) where a+b = These are the pairs (1, 9), τ = –∞ (3, 5), (5, 4), (2, 7), (5, 1), (8, 3), whose products are 9, 15, 20, 14, 5, 24 and whose sum = z(5) = 87 Signals Temperature readings from a digital thermometer Complex exponential The projection is obtained by setting the real value to 0, so that the curve is given by the expression j5e t sin ( 3t ) This curve lies entirely in the complex (Im-t) plane It corresponds to a sinusoid of frequency 3Hz whose amplitude increases exponentially with time At time 0, it has an amplitude of 5, at time 1/3 an amplitude of 5e, at time 2/3 an amplitude of 5e2, and, in general, at time 3k, an amplitude of 5ek 300 DRAFT Version Linearity 5d ( k x ) 5k dx 5d ( k x ) 5k dx ⎛ + 1⎞ + ⎛ - + 1⎞ = H ( k x ) + H ( k x ) - + 1⎞ + ⎛ + 1⎞ = ⎛ ⎝ dt ⎠ ⎝ dt ⎠ ⎝ ⎠ ⎝ ⎠ dt dt H(k1x1 + k2x2) = , so the system is linear LTI system Any sinusoid can be written as the sum of equal and opposite complex exponentials A complex exponential input to an LTI system results in a complex exponential output Since the system is LTI, a sinusoidal input will result in an output that is the sum of equal and opposite complex exponentials, which sum to a real sinusoid that is potentially a scaled and phase-shifted version of the input, but with the same frequency Natural response Compute the natural response of the LTI system given by d2 y (t ) dy ( t ) + 11 - + 15y ( t ) = 32x ( t ) dt dt Solution: The natural response is given by the differential equation ( 2D + 11D + 15 )y ( t ) = This can be factored as ( ( D + ) ( D + 2.5 ) )y ( t ) = Thus, the natural response is given by c1 e – 3t + c2 e –2.5t , where the two constants can be determined from the initial conditions y ( ) and y· ( ) Natural response j ⎞⎛ j The natural response is given by the differential equation 2D + = , whose factorization is ⎛⎝ D – D + -⎞ ⎠⎝ ⎠ The system is therefore given by c1 e – jtjt + c e = y(t) 2 Setting y ( ) = , we get c + c2 = Setting y· ( ) = , we – jc jc j j get + 2- = , which we can rewrite as c – c2 = j Solving, we get c = – , c = - , so that the natural 2 – jtjt j j -– - e + e = y( t) response is 2 10 - Hz The frequency of this signal is -2 Stability t ⎞ - whose real value is always zero, so that they system is staThe signal reduces to the complex sinusoid j sin ⎛⎝ -⎠ ble 11 Fourier series Since the series is infinite, we can choose to center one of the pulses around the origin and compute the Fourier coefficients in the range -T0/2 to T0/2 The kth coefficient of the Fourier series corresponding to this function is T0 1given by ck = T0 ∫ x ( t )e – jkω t dt In this range, the function is 1+t in the range [-τ, 0], 1-t in the range [0,τ] and T – 02 elsewhere For convenience, let a = – jkω0 Then, the integral reduces to τ ∫ ∫ ⎛ ⎞ c k = ⎜ ( + t )e at dt + ( – t )e at dt⎟ ⎟ T0 ⎜ ⎝ –τ ⎠ 0 τ τ ∫ ∫ ∫ ∫ ⎛ ⎞ = ⎜ e at dt + te at dt + e at dt – te at dt⎟ ⎜ ⎟ T0 ⎝ –τ ⎠ –τ 0 τ τ ∫ ∫ ∫ ⎛ ⎞ = ⎜ e at dt + te at dt – te at dt⎟ ⎟ T0 ⎜ ⎝ –τ ⎠ –τ 301 Transform domain techniques DRAFT ⎛ ⎜ at τ ate – e - - e We can solve this as + -T0 ⎜ a –τ a2 at at ⎝ τ at – e at ⎞ – ate - ⎟ ⎟ a2 –τ 0⎠ e aτ – e – aτ – - ⎛ ( e aτ – e – aτ ) + ⎛ - + τe – aτ⎞ – ⎛ - – τ e aτ⎞ ⎞ ⎠⎠ ⎝ ⎠ ⎝ a aT ⎝ a ⎛(a – 1) ⎞ τ = ⎜ ( e aτ – e – aτ ) + - ( e aτ + e – aτ )⎟ aT ⎝ a2 T0 ⎠ 12 , which reduces to Fourier series 2π The fundamental frequency ω = - The third coefficient is the value of 10 ⎛ ωτ -⎞ τω sin ⎝ ⎠ X ( ω ) = -2π ωτ sin ( 0.3π ) for the value ω = 3ω = 6π = 0.6π This is given by = 0.085 10 10 13 Fourier transform Since the function is non-zero only in the range [0,1], the transform is given by X ( jω ) = ∫ ( – t )e – j ωt dt = – j ωt e -– jω 14 0.3π ∫ ∫ e – j ωt dt – te – j ωt dt ( – jω )te – jωt – e – jωt+ -ω2 This reduces to – j ω – ( – jω )e – jω – e – jω + = e - + -– jω ω2 Inverse Fourier transform The inverse transform is given by ∞ x ( t ) = -2π ∫ ∞ π ( δ ( ω + ω ) + δ ( ω – ω ) )e jωt dω = –∞ ∫ ∞ δ ( ω + ω )e jωt dω + –∞ ∫ δ ( ω – ω0 ) ejωt dω Applying Equa- –∞ –j ω0 t jω t +e tion 15 twice, the integral reduces to e , which from Equation is simply cos ( ω t ) 15 Computing the Fourier transform Using the time-shift and linearity properties, and the standard transforms, this is given by ( π ( δ ( ω + ω ) + δ ( ω – ω ) ) )e 16 jωt + ( jπ ( δ ( ω – ω ) – δ ( ω + ω ) ) )e – j ωt Laplace transform jω t – jω t –e -⎞ By definition, We use Euler’s formula to rewrite the signal as u ( t ) ⎛⎝ e -⎠ 2j ∞ ∞ ∫ ∫ ⎛ ⎞ ⎜ e jω t e – st dt – e – j ω t e – st dt⎟ ∞ ⎜ ⎟ jω t – jω t ω0 ⎝0 ⎠ e – e ⎞ – st 1 ⎛ X(s) = u ( t ) - e dt = = - ⎛ - – -⎞ = -⎝ ⎠ 2j 2j ⎝ s – jω s + jω 0⎠ 2j s + ω 02 –∞ ∫ , with the region of convergence is Re ( s ) > The poles are at s = ±jω and the transform is either always 0, if ω0 is zero, or never zero, otherwise 17 Laplace transform e st ω From the previous exercise and the time-shifting property of the Laplace transform, this is given by - 2 s + ω0 302 DRAFT Version 18 Solving a system using the Laplace transform s From Table the Laplace transform of the transfer function H(s) is given by -, Re(s) > Moreover, because 2 s + ω0 - , Re(s) > Therefore, the transform of the system response , from Table X ( s ) = x ( t ) = e –t u ( t ) s+1 ⎛ s ⎞ s Y ( s ) = ⎛ -⎞ ⎜ ⎟ = ⎝ s + 1⎠ ⎝ s + ω 2⎠ ( s + ) ( s + jω ) ( s – jω ) , Re(s) > Expanding by partial fractions, we get ⎛ –1 ⎞ 1 ⎜ -2-⎟ ⎛ -⎞ ⎛ ⎞ ⎝ + ω ⎠ ⎝ ( – jω )⎠ ⎝ ( + jω )⎠ 0 Y ( s ) = + + ( s + jω ) ( s – jω ) (1 + s) – jω t jω t e e – e –t y ( t ) = + + -1 + ω 02 ( – jω ) ( + jω ) 19 This allows us to write the the time evolution of the system as Discrete-time Fourier transform - The transform is given by -– j ωT – 0.5e 20 Discrete-time-and-frequency Fourier transform x [ nT ]e ∑ 6π - is given by The fourth Fourier value,with k = , X j – 3j2πn = n=0 6π12π 18π 42π 48π – j – j – j – j -⎞ ⎛ – j - ⎜ 1e + 2e + 3e + … + 2e + 1e ⎟ 9⎝ 21 ⎠ Z transform ∞ ∑ kz –k The transform is given by Assuming that the series converges (we’ll get to that later), denote the sum k=0 - + - + … Therefore, by S Clearly, S = - + -z z2 z3 1 Sz = + - + - + … = ⎛ + - + - + …⎞ + ⎛ - + - + - + …⎞ = - + S ⎝ ⎠ ⎝ z z2 z3 ⎠ z z2 z z2 – z –1 z –1 S = = -– (1 – z )(z – 1) ( – z–1 ) Thus, S ( z – ) = , so that ( – z –1 ) Now, this series only converges when z – < or z > In this region of convergence, the operations on the sum are valid 22 Z transform From Example 26, the Z transform of the function is Therefore, from the time-shift rule, the desired – e –a z –1 –k0 z transform is –a –1 1–e 303 z Queueing theory DRAFT Chapter 6: Queueing theory Little’s law (a) The mean waiting time is 180 min, and the arrival rate is 0.2 patients/minute Thus, the mean number of patients is their product = 180*0.2 = 36 (b) We not have enough information to determine the maximum size of the waiting room! We know we need at least 36 spaces, but it’s possible that a burst of a hundred patients may arrive, for example, due to an incident of mass food poisoning But, as a rule of thumb, some small integer multiple of the mean, such as three or four times the mean, ought to be enough In real life, we are forced to work with such ‘fudge factors’ because it is often too difficult or too expensive to determine the exact arrival process, which, in any case, may abruptly change over time A stochastic process At time 0, P[X0=10] = 1.0 At time 1, P[X1 = 9] = 0.2; P[X1 = 10] = 0.6; P[X1 = 11] = 0.2 At time 2, P[X2 = 8] = 0.2(0.2) = 0.04; P[X2 = 9] = 0.2(0.6) + 0.6(0.2) = 0.24; P[X2 = 10] = 0.2(0.2) + 0.6(0.6) + 0.2 (0.2) = 0.44, and, by symmetry, P[X2 = 11] = 0.24; P[X2 = 12] = 0.04 Markov process The process is Markovian, because the probability of moving from stair i to stairs i-1, i, and i+1 not depend on how the person reached stair i Homogeneity The transition probabilities are time-independent, and therefore the process is homogeneous Representation (a) … … … … … 0.2 … … … 0.6 0.2 … … 0.2 0.6 0.2 … 0.6 0.2 … … 0.2 … … … … … 0.6 0.6 0.2 0.2 i-1 … 0.2 0.6 … 0.2 i 0.2 0.2 i+1 0.2 0.2 (b) The rows need to sum to 1, because at each time step, the process has to move to some state The columns not need to sum to (think of a star-shaped state transition diagram with N states surrounding state 0, where state has 1/N probability of going to any other state, and every state returns to state with probability 1) (c) We need to assume the boundary conditions Suppose the at stair 1, the probability of staying at the same stair is 0.8, and at stair 4, the probability of staying at the same stair is also 0.8 Then, the transition matrix and state transition diagram are as shown below 0.8 0.2 0 304 0.2 0.6 0.2 0 0.2 0.6 0.2 0 0.2 0.8 DRAFT Version 0.6 0.8 0.6 0.2 0.2 0.8 0.2 0.2 0.2 0.2 Reducibility The chain is irreducible because every state can be reached from every other state Recurrence State is recurrent because the chain is finite and irreducible f11 is the probability that the process first returns to state after one time step, and this is clearly 0.8 f12 is the probability that the process first returns to state after two time steps, and this is 0.2 * 0.2 = 0.04 f13 is the probability that the process first returns to state after three time steps This can happen after a transition to state 2, a self loop in state 2, and then back Thus, the value is 0.2*0.6*0.2 = 0.024 Periodicity The chain is not periodic because of the self-loop in every state A trivial chain with period N is a ring with N states, with the transition probability of going from state i to state (i+1) mod N = Ergodicity No state in the chain is non-ergodic because the chain is finite aperiodic and irreducible 10 Stationary probability From Theorem 2, because the chain is ergodic, we obtain: π = 0.8π + 0.2π π = 0.2π + 0.6π + 0.2π π = 0.2π + 0.6π + 0.2π π = 0.2π + 0.8π = π1 + π2 + π3 + π4 This can be easily solved to obtain π1=π2= π3=π4= 0.25 (If you choose other assumptions for the boundary states, your computation will differ) 11 Residence times p11 = p44 = 0.8, so the residence times in these states is 1/(1-0.8) = 1/0.2 = p22 = p33 = 0.6, so the residence times in these states is 1/0.4 = 2.5 12 Stationary probability of a birth-death-process (a) Similarities: both are graphs with each node corresponding to a discrete state Differences: the notation on an edge is the transition rate, not transition probability The sum of rates leaving a node does not add up to 1, but total ingress rate matches total egress rate at each node (b) –2 0 –6 0 –6 0 –2 (c) We have: -2P0 + 2P1 = 2P0 - 6P1 + 4P2 = 4P1 - 6P2 + 2P3 = 2P2 - 2P3 = 305 Queueing theory DRAFT P0 + P1 + P2 + P3= This yields: P0 = P1 = P2 = P3 = 0.25 13 Poisson process Consider a pure-death process, i.e a birth-death process whose birth rates are zero Clearly, the inter-departure times are nothing more than the residence times in each state But we know that the residence times in a homogeneous continuous-time Markov chain are exponentially distributed (see 6.3.2 on page 180) QED 14 Stationary probabilities of a birth-death process We see that in this chain, λi = μi+1 so immediately we get P0 = P1 = P2 = P3 By summing them to 1, we can see that they are all 0.25 15 M/M/1 queue It is not M/M/1 because the state-transition rates are state-dependent 16 M/M/1 queue (a) The packet length is 250 bytes = 2,000 bits, so that the link service rate of 1,000,000 bits/sec = 500 packets/ sec Therefore, the utilization is 450/500 = 0.9 When the link queue has packet, it is in state j=2, because one packet is being served at that time Thus, we need P2 = 0.92* 0.1 = 0.081 For the queue having two packets, we compute P3 = 0.93*0.1 = 0.0729 For 10 packets in the queue, we compute P11 = 0.911* 0.1 = 0.031 (Compare these with values in Example 20 where the load is 0.8) (b) The mean number of packets in the system is 0.9/1-0.9 = Of these, are expected to be in the queue (c) The mean waiting time is (1/500)/(1-0.9) = 0.002/0.1 = 0.02 s = 20 milliseconds 17 Responsive (M/M/ ∞ ) server The ratio is: –ρ j e ρ –ρ e C j! = - = - = -j ρ j! j! ( – ρ ) j! ( – ρ )e ρ (1 – ρ) , where C is a constant with respect to j Therefore, for an M/M/ ∞ queue the probability of being in state j diminishes proportional to j! compared to being in state j for an M/M/1 queue Clearly, this favors much lower queue lengths for the M/M/ ∞ queue 18 M/M/1/K server Packet losses happen when there is an arrival and the system is in state j=11 This is upper bounded by P11, which is given by j 1–ρ 0.1 P 11 = ρ = 0.9 11 = 0.0437 K+1 – 0.9 12 1–ρ 19 M/D/1 queue (a) The mean number of customers in the system for such a queue is given by ρ 0.9 ρ + = 0.9 + - = 4.95 2(1 – ρ) ( 0.1 ) , which is roughly half the size of an equivalently loaded M/M/1 queue (from Exercise 17) (b) The ratio is ρ ρ + -2 ( – ρ )- = – ρ ρ -1–ρ This tends to 0.5 as the utilization tends to (c) Under heavy loads, the mean waiting time for an M/D/1 queue is half that of a similarly loaded M/M/1 queue 306 DRAFT Version Chapter 7: Game theory Preferences Denote apple = A, banana = B, carrot = C, peach = P We are free to choose utilities as we wish, so let U(A)=0, U(C) = Then, U(B) =.7 and U(P) =.9, so you prefer peaches to bananas (b) Let P(win B) = p Then, 7p + 1(1-p) =.9, so 3p = 1, so p = 0.33 Utility functions Your net utility from transferring x GB is 100(1-e-0.25x) if x 0, otherwise it is N Dominant strategy If you attend, your payoff is your utility for either pass or fail, but if you miss, your payoff is your utility for fail Assuming that utility(pass) > utility(fail), your payoff for attending is as good as or better than the payoff for not attending So, your dominant strategy is to attend Bayesian game It is easy to verify that no matter the type of the Column player (strong or weak signal), the best response for Row if Column plays S is D and if Column plays D is S Therefore, knowing the type of the Column player does not help Row, and the game does not have a dominant strategy for Row 10 Repeated game ∞ The one shot payoff is -3 for each, so the repeated payoff is -3* ∑ 0.6i = -3/.4 = -7.5 i=0 11 Dominant strategy equilibrium It is dominant for both players to send rather than wait In equilibrium, they always send right away so their packets always collide, and in fact, no progress is made, so that delays are actually infinite This game illustrates the aphorism: haste makes waste The EDCA protocol allows higher priority (delay sensitive) stations to wait for a shorter time than lower-priority stations before accessing the medium, therefore making it more probable that they would get access to medium and experience a shorter delay 12 Iterated deletion Consider the following game, where we only show the payoffs for Row: C1 C2 R1 0 R2 -1 R3 -2 Neither R2 nor R3 dominate R1 However any mixed strategy of R2 and R3 that plays R3 with a probability greater than 2/3 dominates R1 Therefore, we can delete R1 from the game 308 DRAFT Version 13 Maximin In Example 10, Row can get as low as -1 with S, but at least with D, so its maximin strategy is D Column is assured with S, so its maximin strategy is S, and the equilibrium is DS In Example 14, Row maximizes its minimum payoff with S The game is symmetric, so the maximin equilibrium is SS 14 Maximin in a zero-sum game In Figure 3, note that when p is smaller than 0.5, the column player can play pure strategy C1 to reduce Row’s payoff below 2.5 Similarly, if p is greater than 0.5, Column can use a pure strategy C2 to reduce Row’s payoff For any value of p, Column can play a mixture qC1 + (1-q) C2 to give Row a payoff of q(p+2) + (1-q)(4-3p) To make this smaller than 2.5, we set q(p+2) + (1-q)(4-3p) < 2.5, or q > (3-6p)/(4-8p) For instance, if p=0, q > 3/4, and if p=1, q>3/4 (The inequality is not valid when p= 0.5.) 15 Nash equilibrium Let the row player play pH + (1-p)T Then, its payoff, given Column’s mixed strategy, is p(q-(1-q))+(1-p)(-q+(1-q)) = 4pq -2q -2p +1 =(1-2p)(1-2q) If q < 0.5, p should be 0, otherwise p should be Intuitively, if the Column player is more likely to play T, then Row should play T for sure and vice versa 16 Correlated equilibrium Consider an external agency that tells the players to play pDS + (1-p)SD When Row is told to play D, it knows that it will get a payoff of -1 if it deviates Similarly, when told to play S, it will get if it deviates (instead of 1) So, it will not deviate, independent of the value of p By symmetry, the same analysis holds for Column, and therefore we have a correlated equilibrium The external agency can arrange for any desired payoffs to Row and Column by adjusting p 17 Price discrimination Assume that the valuations of each player are v1, ,vn for minimum quantities of q1, ,qn The scheme is essentially to charge vi for qi adjusting for the fact that player i could buy multiples of qj j r ) ⎪ x· = ⎨ T ⎬ ⎪ w otherwise ⎪ ⎩ ⎭ Transfer function Solution: The output y is related to the input by u y· = – - + w T U(s) Taking the Laplace transform of both sides, we get Y ( s ) = – + W ( s ) Ignoring W, we have T Y(s) - = G ( s ) = – U(s) T , which is the desired transfer function First order system Solution: A first order system reaches the 63% mark at τ ,so τ = The asymptotic value is K, so K=4.25 The K 4.25 transfer function is G ( s ) = -= - + τs 1+s Second order system Solution: We have K K K Y ( s ) = = = s ⎛ s 2ςs ⎞ ⎛ s 2s ⎞ ⎛ -s + 1⎞ s ⎜ - + + 1⎟ s ⎜ - + + 1⎟ ⎝ω ⎠ 2 n ω ω ⎝ ωn ⎠ ⎝ ωn ⎠ n n ωn write this as Y ( s ) = K - – – ( s + ωn ) s ( s + ωn ) We use partial fraction expansion to The solution is obtained by finding the inverse Laplace transform term by term, using Table on page 152 Proportional mode control Solution: The system pole is at -(loop gain) Integral mode control K ⎛ K ⎞ i i Solution: The impulse response is Y = - = K i ⎜ -⎟ Taking the inverse Laplace transform, this 2 ( s2 + Ki ) is given by K i sin K i t ⎝ s + ( Ki ) ⎠ Multiple mode control Solution: With this control, we have K K U = ⎛ -i + K d s⎞ E = ⎛ -i + K d s⎞ ( R – Y ) ⎝s ⎠ ⎝s ⎠ 1 K Y = - ( U – W ) = - ⎛ ⎛ -i + K d s⎞ ( R – Y ) – W⎞ ⎠ ⎠ s s ⎝⎝ s and ⎛ K + K s2 ⎞ i d We can rearrange this as Y = ⎜ -⎟ ( R – W ) For the ⎝ K i + s ( K d + )⎠ impulse response, we set R=1 and W = The inverse transform is given, after some rearranging, by Ki ⎛ Kd + Ki ⎞ ⎜ -⎟ sin t Kd + ⎝ K i ( K d + )⎠ 310 DRAFT Version 10 Stability Solution: The roots are -0.10812, -0.72122 + j0.61911, -0.72122 - j0.61911, 1.44195 + j1.15457, 1.44195j1.15457 Because two roots have a real component in the right half of the complex s plane, the system is BIBO unstable 11 Matrix exponential · 2A t 3A t A3 t2 A4 t3 e At = + A + + - + … = A + A t + - + 3! 2! 3! 2! Solution: We have A2 t2 A3 t3 A3 t2 A4 t3 Ae At = A ⎛ e At = I + At + - + - + …⎞ = A + A t + - + - + … ⎝ ⎠ 2! 3! 2! 3! and Both terms are equal term by term, and 2 3 2! 3! A t A t therefore the infinite sums are also equal, proving that e At = I + At + + - + … satisfies x· = Ax 12 Matrix exponential r a 11 0 Solution: Because A is diagonal, A r = = 0 … 0 a 22 t 2 … … 0 ( a 11 t ) i ∑ -i! 0 = ∑ e ( a 22 t ) i - i! = i … … … 0 … a nn - t 2 a 11 -t3 3! + … 0 a 22 t 3! … … 0 +… … a nn - t 3! i 3 3! r a nn a 11 t 2 2 2! A t A t So, e At = I + At + + - + … … … … … a 11 t 0 0 0 a 22 t 0 0 + + … … …… … … … … 0 1r 0 a nn t r a 22 … ∑ a 11 t 0 0 e 0 … … … … ( a nn t ) i -i! a 22 t 0 e Therefore, e At for A = – = 0 –1 a nnt i 20.08 0 0.0183 0 0.367 13 Partial fraction expansion a a s+3 s+5 s Solution: Let + - = -(s + 3)(s + 5) s s a = lim ( s + ) - = lim = – 1.5 (s + 3)(s + 5) s → –3 s → –3 ( s + ) s s a = lim ( s + ) - = lim = 2.5 (s + 3)(s + 5) s → –5 s → –5 ( s + ) 14 Partial fraction expansion Solution: Using the quadratic formula, we find s = ± j5 Therefore, we write the fraction as a1 a2 f ( s ) = - = - + ( s – ( + j5 ) ) ( s – ( – j5 ) ) ( s – ( + j5 ) ) ( s – ( – j5 ) ) 311 Then, Information Theory a1 = DRAFT 1 lim ( s – ( + j5 ) ) ⎛ -⎞ = = -⎝ ( s – ( + j5 ) ) ( s – ( – j5 ) )⎠ ( + j5 – ( – j5 ) ) j10 s → ( + j5 ) 1 f ( s ) = - – -j10 ( s – ( + j5 ) ) j10 ( s – ( – j5 ) ) - , so that and a2 = -– j10 Chapter 9: Information Theory Entropy Solution: The entropy is – ( 0.25 log 0.25 + 0.25 log 0.25 + 0.25 log 0.25 + 0.125 log 0.125 + 0.125 log 0.125 ) = 2.25bits Entropy Solution: Since all messages are equally likely, the entropy is given by log(number of distinct messages) = log(16100) = 100 log 16 = 400 bits Instantaneous codes Solution: ‘acc’,’ad’, and ‘bd.’ Instantaneous codes Solution: The codewords lie at the leaves of a binary tree, so this code is instantaneous Digit coding Solution: The number of distinct strings represented by n decimal digits is 10 n This can be represented by a binary string of length log 10 n The mean number of bits per digit is given by log 10 n - = log 10 = 3.32 n , which is also the asymptotic limit Feasibility of a code Solution: From the Kraft inequality, ∑2 –li ≤1 Here, we have i ∑2 –li 1- + 1- + 1- + 1- + -1 - + -1 - + -1 - + -1 - = 1.015625 > - + - + - + - + = 4 16 64 64 64 128 128 128 128 This violates the Kraft inequal- i ity, so you should disbelieve your friend Optimal codes Solution: The source entropy is 1.94 bits, which is also the expected length of the shortest instantaneous code Huffman codes Solution: Two possible Huffman codes are ‘a’: ‘00’, ‘b’:’01’, ‘c’: ‘10’, ‘d’:’11’ ‘a’: ‘01’, ‘b’:’00’, ‘c’: ‘10’, ‘d’:’11’ The expected code length bits, which is less than (entropy + 1) bits, because the entropy is 1.94 bits Huffman codes Solution: Consider the source message ‘aabbaaabb.’ Since there are no ‘c’ or ‘d’ symbols in the message, we could use, for this message, the code ‘a’:’0’, ‘b’:’1’, which has a shorter encoding than the Huffman encoding 10 Entropy rate Solution: Each symbol has an entropy of 1.94 bits In one second, the source generates 100 independent symbols, so its entropy rate is 194 bits/second In 100 seconds, it generates an entropy of 19,400 bits, which corresponds to 19, 400 distinct messages 312 DRAFT Version 11 Typical messages Solution: The number of distinct messages with 12 symbols is 12 = 4096 For a message to be atypical, it must 12 12 have at least 11 ‘1’ symbols The total number of such messages is ⎛⎝ ⎞⎠ + ⎛⎝ ⎞⎠ = 12 + = 13 So, the fraction of 11 12 atypical messages is 13/4096 = 3.17* 10 –3 The entropy per symbol is 0.469 bits, so the entropy of a set of messages of length 50 symbols is 23.35 The size of the typical set is 23.35 = 10, 691, 789 messages and this is the number of codes that need to be assigned to messages of length 50 symbols to ensure that the number of uncoded messages is vanishingly small 12 A noiseless channel Solution: Each symbol from this source has an entropy of 1.94 bits So, the channel can carry 100/1.94 = 51.55 symbols/second 13 Mutual information Solution: The probability of each symbol on the channel is given by: P(X) X=0 0.2 X=1 0.8 Therefore, H(X) = -(0.2 log 0.2 + 0.8 log 0.8) = 0.72 To compute H(X|Y), we first need to know the distribution of Y From Table on page 273, we find this to be: P(Y) Y=0 0.028 Y=1 0.972 From Table on page 274, the conditional distribution of X given Y = is P(X|Y=0) X=0 0.7142 X=1 0.2857 which has an entropy of 0.863 bits From Table on page 274, the conditional distribution of X given Y = is P(X|Y=1) X=0 0.185 X=1 0.815 which has an entropy of 0.691 We multiply these conditional entropies by the probability of Y being or respectively, to compute H(X|Y) as 0.028* 0.863 + 0.972*0.691 = 0.695 Therefore, the mutual information is I(X;Y) = 0.72 - 0.695 = 0.024 bits/symbol 14 Mutual information I ( X ;Y ) = P ( xy ) ∑ ∑ P ( xy ) log -P ( x )P ( y ) (by definition) X Y = P(x y) ∑ ∑ P ( xy ) log -P(x) P ( xy ) (by definition of conditional probability P ( x y ) = -) P(y) X Y = ∑ ∑ P ( xy ) log P ( x y ) – ∑ ∑ P ( xy ) log P ( x ) X Y 313 X Y (expanding log (a/b) as log a - log b) Information Theory DRAFT ⎛ ⎞ = – H ( X Y ) – ∑ ⎜ ∑ P ( xy ) log P ( x )⎟ (from definition of H(X|Y)) X ⎝ Y ⎠ = – H ( X Y ) – ∑ P ( x ) log P ( x ) (summing P(xy) over Y gives us P(x)) X = H(X) – H(X Y) P(y x) P ( xy ) The symmetric result is obtained by converting -to in step and proceeding along the same lines P ( x )P ( y ) P(x) 15 Capacity of a binary symmetric channel Solution: C = + eloge + (1-e)log(1-e) bits/symbol We have e = 0.01, so C = + 0.001 log 0.001 + 0.999 log 0.999 = 0.988 bits/symbol This is 0.012 bits/symbol lower than the channel capacity of the noiseless channel, whose capacity is bit/symbol 16 Capacity of a Gaussian channel Solution: P/N = 10 / 10 = 3.162 So, the channel capacity is 10*106 * log(1+3.162) = 10*106*2.06 = 20.6 Mbps To achieve a capacity of 50 Mbps, we set 50*106= 10*106 *log(1+SNR), so that log(1+SNR) = 5, and SNR = 31 This corresponds to a dB value of 10*log10 (31) = 14.9 dB 314 [...]... hit rates From a mathematical standpoint, a good grasp of probability is a necessary foundation to understanding statistics, game theory, and information theory For these reasons, the first step in our excursion into the mathematical foundations of computer networking is to study the concepts and theorems of probability This chapter is a self-contained introduction to the theory of probability We begin... always lies in the range [0,1] Therefore, if we were to generate a random number u with uniform probability in the range [0,1], the probability that u lies in the range [ F ( x k – 1 ), F ( x k ) ] is p ( x k ) Moreover, x k = F – 1 ( u ) Therefore, the procedure to generate values from the discrete distribution p(Xd) is as follows: first, generate a random variable u uniformly in the range [0,1]; second,... ) = F ( x ) + f ( x )δ for very small values of δ Moreover, F ( X c ) always lies in the range [0,1] Therefore, if we were to generate a random number u with uniform probability in the range [0,1], the probability that u lies in the range [ F ( x ), F ( x + δ ) ] is f ( x )δ , which means that x = F – 1 ( u ) is distributed according to the desired density function f(Xc) Therefore, the procedure to... size of the counters for ∑ x i and ∑ Then, V [ X ] ≈ ⎜ n ⎝ ⎠ sample as a consequence of the law of large numbers, discussed in Section 1.7.4 on page 29 x i2 The following properties of the varianceof a random variable can be easily shown for both discrete and continuous random variables 1 2 3 For constant a, V[X+a] = V[X] (EQ 20) V[aX] = a2V[X] (EQ 21) V[X+Y] = V[X] + V[Y] (EQ 22) For constant a, If... necessary to revert to the series representation for certain operations In particular, if the compact notation of M(t) is not differentiable at t = 0, then we must revert to the series to evaluate M(0), as shown next EXAMPLE 26: MGF OF A STANDARD UNIFORM DISTRIBUTION Let X be a uniform random variable defined in the interval [0,1] This is also called a standard uniform distribution We 1 1 1 1 would like to... compute the rth moment about the mean for a random variable X, we can differentiate e−μtM(t) r times with respect to t and set t to 0 EXAMPLE 28: VARIANCE OF A STANDARD UNIFORM RANDOM VARIABLE 1 e –μt The MGF of a standard uniform random variable X is - [ e t – 1 ] , so, the MGF of (X-μ) is given by [ e t – 1 ] To find the t t variance of a standard uniform random variable, we need to differentiate... (called its domain D) and whose value can only be stated probabilistically In this section, we will study random variables and their distributions More formally, a real random variable (the one most commonly encountered in applications having to do with computer networking) is a mapping from events in a sample space S to the domain of real numbers The probability associated with each value assumed by a real... Probability 1.1 Introduction The concept of probability pervades every aspect of our life Weather forecasts are couched in probabilistic terms, as are economic predictions and even outcomes of our own personal decisions Designers and operators of computer networks need to often think probabilistically, for instance, when anticipating future traffic workloads or computing cache hit rates From a mathematical... mutually exclusive events occurs is the sum of their individual probabilities For any finite set of n mutually exclusive events, we can state the axiom equivalently as: n ⎛ n ⎞ ⎜ ⎟ P ∪ Ei = ∑ P ( Ei ) ⎜ ⎟ ⎝i = 1 ⎠ i=1 (EQ 2) P ( E1 ∨ E2 ) = P ( E1 ) + P ( E2 ) – P ( E1 ∧ E2 ) (EQ 3) An alternative form of Axiom 3 is: This alternative form applies to non-mutually exclusive events EXAMPLE 6: PROBABILITY OF... For constants a and b: E[aX + b] = aE[X] + b 2 E[X+Y] = E[X] + E[Y], or, more generally, for any set of random variables X i : n n E ∑ Xi = ∑ E [ Xi ] (EQ 17) i=1 i=1 3 (EQ 16) For a discrete random variable X d with probability mass function p(xi) and any function g(.), E[g( X d )] = ∑ g ( xi )p ( xi ) (EQ 18) i 4 For a continuous random variable X c with density function f(x), and any function g(.),

Mathematics for computer networking

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Mathematical Foundations of Computer Networking

by

S. Keshav

To Nicole, my foundation

CHAPTER 1 Probability

1.1 Introduction

1.1.1 Outcomes

1.1.2 Events

1.1.3 Disjunctions and conjunctions of events

1.1.4 Axioms of probability

1.1.5 Subjective and objective probability

1.2 Joint and conditional probability

1.2.1 Joint probability

1.2.2 Conditional probability

1.2.2 Conditional probability

1.2.3 Bayes’ rule

1.3 Random variables

1.3.1 Distribution

1.3.2 Cumulative density function

1.3.3 Generating values from an arbitrary distribution

1.3.4 Expectation of a random variable

1.3.5 Variance of a random variable

1.4 Moments and moment generating functions

1.4.1 Moments

1.4.2 Moment generating functions

1.4.3 Properties of moment generating functions

M(t) = E[et(X+Y)] = E[etXetY] = E[etX]E[etY] (from independence)

Tài liệu cùng người dùng

Tài liệu liên quan