Tài liệu Mạng lưới giao thông và đánh giá hiệu suất P18 ppt

18 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC TSUNYI TUAN AND KIHONG PARK Network Systems Lab, Department of Computer Sciences, Purdue University, West Lafayette, IN 47907 18.1 INTRODUCTION Recent measurements of local-area and wide-area traf®c [8, 28, 42] have shown that network traf®c exhibits variability at a wide range of scales. What is striking is the ubiquitousness of the phenomenon, which has been observed in diverse networking contexts, from Ethernet to ATM, LAN and WAN, compressed video, and HTTP- based WWW traf®c [8, 15, 23, 42]. Such scale-invariant variability is in strong contrast to traditional models of network traf®c, which show burstiness at short time scales but are essentially smooth at large time scales; that is, they lack long-range dependence. Since scale-invariant burstiness can exert a signi®cant impact on network performance, understanding the causes and effects of traf®c self-similarity is an important problem. In previous work [33, 34], we have investigated the causal and performance aspects of traf®c self-similarity, and we have shown that self-similar traf®c ¯ow is an intrinsic property of networked client=server systems with heavy-tailed ®le size distributions, and conjoint provision of low delay and high throughput is adversely affected by scale-invariant burstiness. From a queueing theory perspective, the principal distinguishing characteristic of long-range-dependent (LRD) traf®c is that the queue length distribution decays much more slowlyÐthat is, polynomiallyÐvis- a Á -vis short-range-dependent (SRD) traf®c sources such as Poisson sources, which exhibit exponential decay. A number of performance studies [1, 2, 11, 29, 32, 34] Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger ISBN 0-471-31974-0 Copyright # 2000 by John Wiley & Sons, Inc. 447 Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger Copyright # 2000 by John Wiley & Sons, Inc. Print ISBN 0-471-31974-0 Electronic ISBN 0-471-20644-X have shown that self-similarity has a detrimental effect on network performance, leading to increased delay and packet loss rate. In Grossglauser and Bolot [18] and Ryu and Elwalid [37], the point is advanced that for small buffer sizes or short time scales, long-range dependence has only a marginal impact. This is, in part, due to a saturation effect that arises when resources are overextended, whereby the burstiness associated with short-range-dependent traf®c is suf®cientÐand, in many cases, dominantÐto cause signi®cant buffer over¯ow. What is still in its infancy, however, is the problem of controlling self-similar network traf®c. By the control of self-similar traf®c, we mean the problem of modulating traf®c ¯ow such that network performance including throughput is optimized. Scale-invariant burstiness introduces new complexities into the picture, which make the task of providing quality of service (QoS) while achieving high utilization signi®cantly more dif®cult. First and foremost, scale-invariant burstiness implies the existence of concentrated periods of high activity at a wide range of time scales which adversely affects congestion control. Burstiness at ®ne time scales is commensurate with burstiness observed for traditional short-range dependent traf®c models. The distinguishing feature is burstiness at coarser time scales, which induces extended periods of either overload or underutilization and degrades overall performance. However, on the ¯ip side, long-range dependence, by de®nition, implies the existence of nontrivial correlation structure, which may be exploitable for congestion control purposes, information to which current algorithms are impervious. In this chapter, we show the feasibility of ``predicting the future'' under self- similar traf®c conditions with suf®cient reliability such that the information can be effectively utilized for congestion control purposes. First, we show that long-range dependence can be on-line detected to predict future traf®c levels and contention at time scales above and beyond the time scale of the feedback congestion control. Second, we present a traf®c modulation mechanism based on multiple time scale congestion control framework (MTSC) [46] and show that it is able to effectively exploit this information to improve network performance, in particular, throughput. The congestion control mechanism works by selectively applying aggressiveness using the predicted future when it is warranted, throttling the data rate upward if the predicted future contention level is low, being more aggressive the lower the predicted contention level. We show that the selective agressiveness mechanism is of bene®t even for short-range-dependent traf®c; however, being signi®cantly more effective for long-range dependent traf®c, leading to comparatively large performance gains. We also show that as the number of connections engaging in selective aggressiveness control (SAC) increases, both fairness and ef®ciency are preserved. The latter refers to the total throughput achieved across all SAC-controlled connections. The rest of the chapter is organized as follows. In Section 18.2, we give a brief overview of self-similar network traf®c and the speci®c setup employed in this chapter. In Section 18.3, we describe the predictability mechanism and its ef®cacy at extracting the correlation structure present in long-range dependent traf®c. This is followed by Section 18.4, where we describe the SAC protocol and a re®nement of 448 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC the predictability mechanism for on-line, per-connection estimation. In Section 18.5 we show performance results of SAC and show its ef®cacy under different long- range dependence conditions and when the number of SAC connections is varied. We conclude with a discussion of current results and future work. 18.2 PRELIMINARIES 18.2.1 Self-Similar Traf®c: Basic De®nitions Let X t  tPZ  be a time series, which, for example, represents the trace of data ¯ow at a bottleneck link measured at some ®xed time granularity. We de®ne the aggregated series X m i as X m i  1 m X imÀm1 ÁÁÁX im : That is, X t is partitioned into blocks of size m, their values are averaged, and i is used to index these blocks. Let rk and r m k denote the autocorrelation functions of X t and X m i , respectively. X t is self-similarÐmore precisely, asymptotically second-order self- similarÐif the following conditions hold: rk$const Á k Àb ; 18:1 r m k$rk; 18:2 for k and m large, where 0 < b < 1. That is, X t is ``self-similar'' in the sense that the correlation structure is preserved with respect to time aggregationÐrelation (18.2)Ð and rk behaves hyperbolically with P I k0 rkIas implied by Eq. (18.1). The latter property is referred to as long-range dependence. Let H  1 À b=2. H is called the Hurst parameter, and by the range of b, 1 2 < H < 1. It follows from Eq. (18.1) that the farther H is away from 1 2 the more long-range dependent X t is, and vice versa. Thus, the Hurst parameter acts as an indicator of the degree of self-similarity. A test for long-range dependence can be obtained by checking whether H signi®cantly deviates from 1 2 or not. We use two methods for testing this condition. The ®rst method, the variance±time plot, is based on the slowly decaying variance of a self-similar time series. The second method, the R=S plot, use the fact that for a self-similar time series, the rescaled range or R=S statistic grows according to a power law with exponent H as a function of the number of points included. Thus, the plot of R=S against this number on a log±log scale has a slope that is an estimate of H. A comprehensive discussion of the estimation methods can be found in Beran [4] and Taqqu et al. [39]. 18.2 PRELIMINARIES 449 A random variable X has a heavy-tailed distribution if PrfX > xg$x Àa as x 3I, where 0 < a < 2. That is, the asymptotic shape of the tail of the distribution obeys a power law. The Pareto distribution, pxak a x ÀaÀ1 ; with parameters a > 0; k > 0, x ! k, has the distribution function PrfX xg1 Àk=x a ; and hence is clearly heavy tailed. It is not dif®cult to check that for a 2 heavy-tailed distributions have in®nite variance, and for a 1, they also have in®nite mean. Thus, as a decreases, a large portion of the probability mass is located in the tail of the distribution. In practical terms, a random variable that follows a heavy-tailed distribution can take on extremely large values with nonnegligible probability. 18.2.2 Structural Causality In Park et al. [33], we show that aggregate traf®c self-similarity is an intrinsic property of networked client=server systems where the size of the objects (e.g., ®les) being accessed is heavy-tailed. In particular, there exists a linear relationship between the heavy-tailedness measure of ®le size distributions as captured by aÐ the shape parameter of the Pareto distributionÐand the Hurst parameter of the resultant multiplexed traf®c streams. That is, the aggregate network traf®c that is induced by hosts exchanging ®les with heavy-tailed sizes over a generic network environment running ``regular'' protocol stacks (e.g., TCP, ¯ow-controlled UDP) is self-similar, being more burstyÐin the scale-invariant senseÐthe more heavy-tailed the ®le size distribution are. This relationship is shown in Fig. 18.1. The relationship is robust with respect to changes in network resources (bandwidth, buffer capacity), topology, the in¯uence of cross-traf®c, and the distribution of interarrival times. We call this relationship between the traf®c pattern observed at the network layer and the structural property of a distributed, networked system in terms of its high-level object sizes structural causality [33]. H 3 À a=2 is the theoretical value predicted by the on=off model [42]Ða 0=1 renewal process with heavy-tailed on or off periodsÐassuming independent traf®c sources with no interactions due to sharing of network resources. Structural causality is of import to self-similar traf®c control since (1) it provides an environment where self-similar traf®c conditions are easily facilitatedÐjust simulate a client=server networkÐ(2) the degree of self-similar burstiness can be intimately controlled by the application layer parameter a, and (3) the self-similar network traf®c induced already incorporates the actions and modulating in¯uence of 450 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC the protocol stack since the observed traf®c pattern is a direct consequence of hosts exchanging ®les whose transport was mediated through protocols (e.g., TCP, ¯ow- controlled UDP) in the protocol stack. This provides us with a natural environment where the impact of control actions by a congestion control protocol can be discerned and evaluated under self-similar traf®c conditions. 18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC 18.3.1 Predictability Setup In this section, we show that the correlation structure present in long-range dependent (LRD) traf®c can be detected and used to predict the future over time scales relevant to congestion control. Time series analysis and prediction theory have long histories with techniques spanning a number of domains from estimation theory to regression theory to neural network based techniques to mention a few [3, 17, 22, 40, 44, 45, 49]. In many senses, it is an `àrt form'' with different methods giving variable performance depending on the context and modeling assumptions. Our goal is not to perform optimal time series prediction but rather to choose a simple, easy- to-implement scheme, and use it as a reference for studying congestion control techniques and their ef®cacy at exploiting the correlation structure present in LRD traf®c for improving network performance. Our prediction method, which is described next, is a time domain technique and can be viewed as an instance of conditional expectation estimation. Fig. 18.1 Hurst parameter estimates (R=S and variance±time) for a varying from 1.05 to 1.95. 18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC 451 Assume we are given a wide-sense stationary stochastic process x t  tPZ and two numbers T 1 ; T 2 > 0. At time t, we have at our disposal a  P iPtÀT 1 ;t q i ; where q i is a sample path of x t over time interval t À T 1 ; t. For notational clarity, let V 1  P iPtÀT 1 ;t x i ; V 2  P iPt;tT 2  x i : a may be thought of as the aggregate traf®c observed over the ``recent past'' t À T 1 ; t and V 1 , V 2 are composite random variables denoting the recent past and near future. We are interested in computing the conditional probability PrfV 2  bjV 1  ag18:3 for b in the range of V 2 . For example, if a represented a ``high'' traf®c volume, then we may be interested in knowing what the probability of encountering yet another high traf®c volume in the near future would be. Let V t max  max P iPtÀT À1;t q i ; V t min  min P iPtÀT 1 ;t q i where t  tkT 1 ; k  0; 1; ; V t max and V t min denote the highest and lowest traf®c volume seen so far at time t, respectively. To make sense of ``high'' and ``low,'' we will partition the range between V t max and V t min into h levels with quantization step m V t max  V t min =h: 0; V t min  m; V t min  m; V t min  2m; V t min  2m; V t min  3m; V t min h À 2m; V t min h À 1m; V t min h À 1m; I; We will de®ne two new random variables L 1 ; L 2 where L k  1 D V k P0; V t min  m; L k  2 D V k PV t min  m; V t min  2m; . . . L k  h À 1 D V k PV t min  h À 2m; V t min h À 1m; L k  h D V k V t min h À 1m; I: In other words, L k is a function of V k ; L k  L k V k ; and it performs a certain quantization. Thus if L k % 1 then the traf®c level is ``low'' relative to the mean, and if L k % h, then it is ``high.'' 452 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC In our case, eight levels h  8 were found to be suf®ciently granular for prediction purposes. In practice, V t max and V t min are determined by applying a 3% threshold to the previously observed traf®c volumes, i.e., the outliers corresponding to extraordinarily large or small data points are dropped to make the classi®cation reasonable. Returning to Eq. (18.3) and prediction, for certain values of T 1 , T 2 ,weare interested in knowing the conditional probability densities PrfL 2 jL 1  lg for l P1; 8.IfPrfL 2 jL 1  8g were concentrated toward L 2  8, and PrfL 2 j L 1  1g were concentrated toward L 2  1, then this information could be potentially exploited for congestion control purposes. 18.3.2 Estimation of Conditional Probability Density To explore and quantify the potential predictability of self-similar network traf®c, we use TCP traf®c traces used in Park et al. [33] whose Hurst parameter estimates are shown in Fig. 18.1 as the main reference point. First, we use off-line estimation of aggregate throughput traf®c, which is then re®ned to on-line estimation of aggregate traf®c using per-connection traf®c when performing predictive congestion control. Other traces including those collected from ¯ow-controlled UDP runs yield similar results. The traces used are each 10,000 seconds long at 10 ms granularity. They represent the aggregate traf®c of 32 concurrent TCP Reno connections recorded at a bottleneck router. We observe that the aggregate throughput series exhibit correlation structure at several time scales from 250 ms to 20 s and higher. To estimate PrfL 2 jL 1  lg from the aggregate throughput series X t , we segment X t into N  10;000 seconds T 1  T 2 seconds contiguous nonoverlapping blocks of length T 1  T 2 (except possibly for the last block), and for each block j P1; N compute the aggregate traf®c V 1 , V 2 over the subintervals of length T 1 , T 2 . For l; l H P1; 8, let h l P0; N denote the total number of blocks such that L 1 V 1 l and let h l H P0; h l  denote the size of the subset of those blocks such that L 2 V 2 l H . Then PrfL 2  l H jL 1  lg h l H h l : Figure 18.2 shows the estimated conditional probability densities for a  1:05, 1.95 traf®c for time scales 500 ms, 1 s, and 5 s. In the following, T 1  T 2 . 18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC 453 Fig. 18.2 Top row: Probability densities with L 2 conditioned on L 1 for a  1:05. Bottom row: Probability densities with L 2 conditioned on L 1 for a  1:95. 454 For the aggregate throughput traces with a  1:05ÐFigure 18.2 (top row)Ðthe three-dimensional (3D) conditional probability densities can be seen to be skewed diagonally from the lower left side toward the upper right side. This indicates that if the current traf®c level L 1 is low, say, L 1  1, chances are that L 2 will be low as well. That is, the probability mass of PrfL 2 jL 1  1g is concentrated toward 1. Conversely, the plots show that PrfL 2 jL 1  8g is concentrated toward 8. This is more clearly seen in Fig. 18.3(a), which shows two cross sections, that is, 2D projections, re¯ecting PrfL 2 jL 1  1g and PrfL 2 jL 1  8g. For the aggregate throughput traces with a  1:95 (Fig. 18.2 (bottom-row)), on the other hand, the shape of the distribution does not change as the conditioning variable L 1 is varied. This is more clearly seen in the projections of PrfL 2 jL 1  1g and PrfL 2 jL 1  8g shown in Fig. 18.3(b). This indicates that for a  1:95 traf®c observing the past (over the time scales considered) does not help much in predicting the future beyond the information conveyed by the ®xed a priori distribution. Given the de®nition of L k , the Gaussian shape of the marginal densities is consistent with short-range correlations, making the central limit theorem approximately applicable over larger time scales. In both cases a  1:05, 1.95), the shape of the distribution stays relatively constant across a wide range of time scales 500 ms to 20 s. For a  1:35, 1.65 the predictability structure lies `ìn-between'' (not shown here). 18.3.3 Predictability and Time Scale An important issue is how time scale affects predictability when traf®c is long-range dependent. Going back to Fig. 18.2 (top row), one subtle effect that is not easily discernible is that as time scale is increased the conditional probability densities PrfL 2 jL 1  lg become more concentrated. Given that PrfL 2 jL 1  lg is a function of T 1 , T 2 , we would like to determine at what time scale predictability is maximized. One way to measure the `ìnformation content''Ðthat is, in the sense of randomness or unstructurednessÐin a probability distribution is to compute its Fig. 18.3 (a) Shifting effect of conditional probability densities PL 2 jL 1  1 and PL 2 jL 1  8 for a  1:05. (b) For a  1:95, the corresponding probabilities remain invariant. (a) (b) 18.3 PREDICTABILITY OF SELF-SIMILAR TRAFFIC 455 entropy. For a discrete probability density p i , its entropy Sp i  is de®ned as Sp i  P i p i log1=p i . In the case of our conditional density PrfL 2 jL 1  lg, S l À P 8 l H 1 PrfL 2  l H jL 1  lg log PrfL 2  l H jL 1  lg: Thus, entropy is maximal when the distribution is uniform and it is minimal if the distribution is concentrated at a single point. Since we are given a set of eight conditional probability densities, one for each L 1  1; 2; ; 8, we de®ne the average entropy  S as  S  P 8 l1 S l =8: The average entropy remains a function of T 1 ; T 2 : that is,  S   ST 1 ; T 2 . Figure 18.4 plots  ST 1 ; T 2   ST 1  (recall that T 1  T 2 ) for the a  1:05 throughput series as a function of time scale or aggregation level T 1 . Entropy is highest for small time scales in the range $250 ms, and it drops monotonically as T 1 is increased. Eventually,  ST 1  begins to ¯atten out near the 3±5 second mark, reaching saturation, and stays so as time scale is further increased. From our analysis of various long-range dependent traf®c traces, we ®nd that the ``knee'' of the entropy curve is in the range of 1±5 seconds. Note that increasing T 1 further and further to gain small decreases in entropy brings forth with it an important problem, namely, if prediction is done over a ``too long'' time interval, then the information may not be effectively exploitable by various congestion control strategies. In the next section, Aggregation Level (seconds) Fig. 18.4 Average entropy  ST 1  plot for a  1:05 traf®c as a function of time scale T 1 . 456 CONGESTION CONTROL FOR SELF-SIMILAR NETWORK TRAFFIC