UNDERSTANDING INTERNET ROUTING ANOMALIES AND BUILDING ROBUST TRANSPORT LAYER PROTOCOLS docx

132 256 0
UNDERSTANDING INTERNET ROUTING ANOMALIES AND BUILDING ROBUST TRANSPORT LAYER PROTOCOLS docx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

UNDERSTANDING INTERNET ROUTING ANOMALIES AND BUILDING ROBUST TRANSPORT LAYER PROTOCOLS MING ZHANG A DISSERTATIO N PRESENTED TO THE FACULTY OF PRINCETO N UNIVERSI TY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY RECOMMENDED FOR ACCEPTANCE BY THE DEPARTMENT OF COMPUTER SCIENCE SEPTEMB ER 2005 c  Copyright by Ming Zhang, 2005. All rights reserved. Abstract As the Internet grows and routing complexity increases, network-level instabilities are be- coming more and more common. End-to-end communications are especially susceptible to service disruptions, while diagnosing and mitigating these disruptions are extremely challenging. In this dissertation, we design and build systems for diagnosing routing anomalies and improving robustness of end-to-end communications. The first piece of this work describes PlanetSeer, a novel distributed system for di- agnosing routing anomalies. PlanetSeer passively monitors traffic in wide-area services, such as Content Distribution Networks (CDNs) or Peer-to-Peer (P2P) systems, to detect anomalous behavior. It then coordinates active probes from multiple vantage points to confirm the anomaly, characterize it, and determine its scope. There are several advan- tages of this approach: first, we obtain more complete and finer-grained views of routing anomalies since the wide-area nodes provide geographically-diverse vantage points. Sec- ond, we incur limited additional measurement cost since most active probes are initiated when passive monitoring detects oddities. Third, we detect anomalies at a much higher rate than other researchers have reported since the wide-area services provide large vol- umes of traffic to sample. Through extensive experimental study in the wide-area net- work, we demonstrate that PlanetSeer is an effective system for both gaining a better understanding about routing anomalies and for providing optimization opportunities for the host service. To improve the robustness of end-to-end communications during performance anoma- lies, we design mTCP, a novel transport layer protocol that can minimize the impact of anomalies using redundant paths. mTCP separates the congestion control for each path so that it can not only obtain higher throughput but also be more robust to path failures. mTCP can quickly react to failures, and the recovery process normally takes only several iii seconds. We integrate a shared congestion detection mechanism into mTCP that allows us to suppress paths with shared congestion. This helps alleviate the aggressiveness of mTCP. We also propose a heuristic to find disjoint paths between pairs of nodes. This can minimize the chance of concurrent failures and shared congestion. We implement mTCP on top of an overlay network and evaluate it using both emulations and experiments in the wide-area network. iv Acknowledgments I have been incredibly fortunate to have had three mentors during the course of my PhD study. The first one is Professor Randy Wang. I would like to thank him for his guid- ance, support, and help throughout the years. I consider myself very lucky to have the chance to work and learn from him. He provided the enthusiasm and encouragement that I needed to complete this work. The second one is Professor Larry Peterson. He made himself available for numerous discussions, often started by my dropping by his office unexpectedly. I always left with a deeper and clearer understanding about those research problems than I’d had when I arrived. I learned from him that research requires combina- tion of dedication, confidence, and truly long-term thinking. I am sincerely grateful for his high standard for research, kindness, and patience. The third one is Professor Vivek Pai. He provided me invaluable guidance and frequent advice on the PlanetSeer project. His vigorous approach both to research and to life has greatly shaped and enriched my view of networking and systems research. I have to thank him for letting me steal an enormous amount of time and wisdom during the last two years of my PhD study. I am fortunate to collaborate with Chi Zhang on lots of the work presented in this thesis. Chi is my friend, lab-mate, as well as apartment-mate. I drew immense inspiration from him both inside and outside work. He is the best collaborator one could ask for. I am also grateful to Junwen Lai. The mTCP project would not have been possible without his help on the user-level TCP implementation. The second part of my thesis was inspired by my work at ICIR, starting in the summer of 2001. I thank Dr. Brad Karp for making my visit possible. Later, Brad gave me the chance to continue collaborating with him at Intel Research Pittsburgh in the summer of 2003. I benefited enormously from the two summers I spent working with him. While at ICIR, I thank Dr. Sally Floyd for teaching me a lot on TCP related problems. It was a v great honor to work with Professor Arvind Krishnamurthy, who provided many vigilant comments on various algorithms in my work. I am especially grateful to Professor Jen- nifer Rexford. She always patiently listened to my incoherent thoughts and provided me amazingly insightful and detailed feedback. I learned a tremendous amount from her on doing research as well as on writing and presentation. I am grateful to the PlanetLab staffs for their help with deploying the PlanetSeer system. Andy Bavier answered me lots of questions on safe raw socket. Marc Fiuczynski shared with me his extensive experience in vserver. I would like to thank Scott Karlin, Mark Huang, Aaron Klingaman, Martin Makowiecki, and Steve Muir for their support and patience. I also thank KyoungSoo Park for his effort in keeping CoDeeN operational during my experiment. I would like to thank Professor David Walker and Moses Charikar for serving as non-readers on my dissertation committee. They gave many valuable comments and suggestions on my work. My work was supported in part by NSF grants CNS-0335214 and CNS-0435087, and DARPA contract F30602-00-2-0561. I greatly enjoyed my life at Princeton because of the many close friends I had there. I thank Ding Liu, Chi Zhang, Yaoping Ruan, Fengzhou Zheng, Ting Liu, Wen Xu, Gang Tan, and Fengyun Cao for their support and encouragement throughput the years. I also thank my non-Princeton friends, especially Xuehua Shen and Ningning Hu. They made my life lots of fun. This thesis is dedicated to my parents. They always gave me love, trust, and pride. They played the most important role in directing me into pursuing a research career. vi Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 1 Introduction 1 1.1 Why Do Performance Anomalies Occur on the Internet? . . . . . . . . . 3 1.2 Difficulties in Anomaly Diagnosis . . . . . . . . . . . . . . . . . . . . . 5 1.3 Difficulties in Anomaly Mitigation . . . . . . . . . . . . . . . . . . . . . 8 1.4 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Background and Related Work 12 2.1 Network Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Intradomain Routing Anomalies . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Interdomain Routing Anomalies . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Traffic Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 End-to-End Failure Measurement . . . . . . . . . . . . . . . . . . . . . . 16 2.6 Link-Layer and Application-Layer Striping . . . . . . . . . . . . . . . . 18 2.7 Transport-Layer Striping . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 PlanetSeer: Internet Path Failure Monitoring and Characterization 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 vii 3.2 PlanetSeer Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.2 MonD Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.3 MonD Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.4 MonD Flow/Path Statistics . . . . . . . . . . . . . . . . . . . . . 28 3.2.5 ProbeD Operation . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.6 ProbeD Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.7 Path Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Confirming Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.1 Massaging Traceroute Data . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Final Confirmation . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Loop-Based Anomalies . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.3 End-to-End Effects . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Building a Reference Path . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6 Classifying Non-loop Anomalies . . . . . . . . . . . . . . . . . . . . . . 48 3.6.1 Path Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6.2 Path Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.7.1 Bypassing Anomalies . . . . . . . . . . . . . . . . . . . . . . . 58 3.7.2 Reducing Measurement Overhead . . . . . . . . . . . . . . . . . 60 3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4 mTCP: Robust Transport Layer Protocol Using Redundant Paths 63 viii 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.1 Transport Layer Protocol . . . . . . . . . . . . . . . . . . . . . . 67 4.2.2 Shared Congestion Detection . . . . . . . . . . . . . . . . . . . . 72 4.2.3 Path Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2.4 Path Management . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.5 Path Failure Detection and Recovery . . . . . . . . . . . . . . . . 81 4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.4.2 Utilizing Multiple Independent Paths . . . . . . . . . . . . . . . 85 4.4.3 Recovering from Partial Path Failures . . . . . . . . . . . . . . . 90 4.4.4 Detecting Shared Congestion . . . . . . . . . . . . . . . . . . . . 92 4.4.5 Alleviating Aggressiveness with Path Suppression . . . . . . . . 97 4.4.6 Suppressing Bad Paths . . . . . . . . . . . . . . . . . . . . . . . 98 4.4.7 Comparing with Single-Path Flows . . . . . . . . . . . . . . . . 99 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5 Conclusion and Future Work 104 5.1 Summary of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 104 5.1.1 Internet Path Failure Monitoring and Characterization . . . . . . 105 5.1.2 Robust Transport Layer Protocol Using Redundant Paths . . . . . 106 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.2.1 Debugging Routing Anomalies . . . . . . . . . . . . . . . . . . . 107 5.2.2 Debugging Non-Routing Anomalies . . . . . . . . . . . . . . . . 109 ix 5.2.3 Internet Weather Service . . . . . . . . . . . . . . . . . . . . . . 110 x [...]... studying anomalies in the Internet and designing robust network protocols We will focus on those that are most relevant and discuss their difference from our approaches We first briefly introduce the network testbeds used for our experiments and evaluations We then turn to the recent studies on network anomalies, which include interdomain and intradomain routing anomalies, traffic anomalies, and end-to-end... causes of performance degradation on the Internet We first look at intradomain routing anomalies and defer the discussion about interdomain routing anomalies to the next section Nowadays, the most commonly used intradomain routing protocols are OSPF and IS-IS Researchers have been using routing updates collected in individual ISPs to study routing anomalies Labovitz and Ahuja used the OSPF messages gathered... the Internet Based on their methodologies, we classify them into intra- and inter-domain routing anomalies, traffic anomalies, and end9 to-end measurements At the end of Chapter 2, we will discuss the research efforts that improve the end-to-end performance using striping at the link -layer, application -layer and transport- layer Chapter 3 focuses on PlanetSeer, a large-scale distributed system for routing. .. recovery, and path selection 20 Chapter 3 PlanetSeer: Internet Path Failure Monitoring and Characterization As we have explained in Section 1.1, performance degradations are often caused by routing anomalies on today’s Internet Understanding routing anomalies is crucial for improving the overall stability of the Internet In this chapter, we introduce PlanetSeer, a large-scale distributed system for routing. .. overhead and can easily scale to a large number of nodes • It provides a finer-grained and more complete view on routing anomaly by correlating the probing from multiple vantage points In the past, a series of proposals have been made to enhance network performance using striping techniques at the link -layer, transport- layer, and application -layer We are the first to implement and evaluate a transport- layer. .. Routing instability is one of the major sources of performance anomalies Routing protocols are responsible for discovering the paths to reach any destination on the Internet Routing protocols can be classified into interdomain and intradomain protocols Intradomain protocols (IGP), such as OSPF[44] or IS-IS[20], are responsible for dis- 3 seminating reachability information within an AS Interdomain protocols. .. use striping techniques to improve performance and robustness Based on the network layer where the striping techniques are applied, we classify them into link -layer, transport- layer, and application -layer striping 2.1 Network Testbeds We evaluate our systems with both emulations and real-world deployment The emulations are conducted on Emulab [24], a time- and space-shared network emulator It 12 consists... 3, our work complements these two approaches by studying routing anomalies from an end-to-end perspective We will also quantify the impact of anomalies on end-to-end performance, such as loss rate and RTT 2.4 Traffic Anomalies Parallel to routing anomalies, several research efforts have focused on traffic anomalies which are defined as unusual and significant changes in network traffic These efforts examined... for confirming the routing anomalies, classifying them, and characterizing their scopes, locations, and end-to-end effects In the end, we quantify the effectiveness of overlay routing in bypassing path failures Chapter 4 presents mTCP, a novel transport layer protocol that is robust to performance anomaly mTCP differs from traditional transport layer protocols in that it can use more than one paths in... violating these agreements On the other hand, knowing why the anomalies occur will help the network operators to fix the problems quickly and to prevent the similar problems from occurring in the future Although understanding the characteristics and origins of performance anomalies can help us improve the long-term stability of the Internet, we are still going to encounter anomalies frequently in the foreseeable . UNDERSTANDING INTERNET ROUTING ANOMALIES AND BUILDING ROBUST TRANSPORT LAYER PROTOCOLS MING ZHANG A DISSERTATIO N PRESENTED. gaining a better understanding about routing anomalies and for providing optimization opportunities for the host service. To improve the robustness of end-to-end

Ngày đăng: 15/03/2014, 21:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan