Understanding Linux Network Internals 2005 phần 7 pot

128 273 0
Understanding Linux Network Internals 2005 phần 7 pot

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

NUD_INCOMPLETE A solicitation has been sent, but no reply has been received yet. In this state, there is no hardware address to use (not even an old one, as there is with NUD_STALE). NUD_REACHABLE The address of the neighbor is cached and the latter is known to be reachable (there has been a proof of reachability). NUD_FAILED Marks a neighbor as unreachable because of a failed solicitation request, either the one generated when the entry was created or the one triggered by the NUD_PROBE state. NUD_STALE NUD_DELAY NUD_PROBE Transitional states; they will be resolved when the local host determines whether the neighbor is reachable. See the section "Reachability Confirmation." The next set of values represents a group of special states that usually never change once assigned: NUD_NOARP This state is used to mark neighbors that do not need any protocol to resolve the L3-to-L2 mapping (see the section "Special Cases"). The section "Start of the arp_constructor Function" in Chapter 28 shows how and why this state is set in IPv4/ARP. But even though the name of this state suggests that it applies only to ARP, it can actually be used by any neighboring protocol. NUD_PERMANENT The L2 address of the neighbor has been statically configured (i.e., with user-space commands) and therefore there is no need to use any neighboring protocol to take care of it. See the section "System Administration of Neighbors" in Chapter 29. 26.6.2.2. Derived states In addition to the basic states listed in the previous section, the following derived values are defined just to make the code clearer when there is a need to refer to multiple states with something in common: NUD_VALID An entry is considered to be in the NUD_VALID state if its state is any one of the following, which represent neighbors believed to have an available address: This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com NUD_PERMANENT NUD_NOARP NUD_REACHABLE NUD_PROBE NUD_STALE NUD_DELAY NUD_CONNECTED This is used for the subset of NUD_VALID states that do not have a confirmation process pending: NUD_PERMANENT NUD_NOARP NUD_REACHABLE NUD_IN_TIMER The neighboring subsystem is running a timer for this entry, which happens when the status is unclear. The basic states that correspond to this are: NUD_INCOMPLETE NUD_DELAY NUD_PROBE Let's look at an example of why a derived state is useful in kernel code. When a neighbor instance is removed, the host needs to stop all the pending timers associated with that data structure. Instead of comparing the neighbor's state to the three states known to have a pending timer associated with them, it is just cleaner to define NUD_IN_TIMER and compare the neighbor's state against it using the bitwise operator &. 26.6.2.3. Initial state When a neighbor instance is created, the NUD_NONE state is assigned to it by default, but the state can also be explicitly set to something different when the creation is caused by an explicit user command (see Chapter 29). As explained in the section "Neighbor Initialization" in Chapter 27, the protocol's constructor method may also change the state depending on the characteristics of the associated device (e.g., point-to-point) and L3 address (e.g., broadcast). 26.6.3. Reachability Confirmation We saw in the section "Why Static Assignment of Addresses Is Not Sufficient" that it is possible for an L3-to-L2 mapping to change. Because of this, it makes sense to confirm the information stored in the cache regularly, if the information has not been used for some time. This is called reachability confirmation. Note that a change in reachability status is not necessarily due to the reasons listed in the section "Reasons That Neighboring Protocols Are Needed"; a router, bridge, or other network device may just be experiencing some problems. While the reachability confirmation is in progress, the cached information is temporarily used under the assumption that it is most likely still valid. The three NUD states NUD_STALE, NUD_DELAY, and NUD_PROBE support the task of reachability confirmation. The key reason for the use of these states is that there is no need to start a reachability confirmation process until a packet needs to be sent to the associated neighbor. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Let's define once again the exact meaning of these three NUD states, and then look at the two ways a mapping can be confirmed: NUD_STALE The cache contains the address of the neighbor, but the latter has not been confirmed for a certain amount of time (see the discussion of reachable_time in the section "neigh_parms Structure" in Chapter 29). The next time a packet is sent to the neighbor, the reachability verification process will be started. NUD_DELAY This state, closely tied to NUD_STALE, represents an optimization that can reduce the number of transmissions of solicitation requests. This state is entered when a packet is sent to a neighbor whose associated entry is in the NUD_STALE state. The NUD_DELAY state represents a window of time where external sources could confirm the reachability of the neighbor. The simplest sort of external confirmation is when the neighbor in question sends a packet, thus indicating that it is running and accessible. This state gives some time to the upper network layers to provide a reachability confirmation, which may relieve the kernel from sending a solicitation request and thus save both bandwidth and CPU usage. This state may look like a small optimization, but if you think in terms of big networks, you can imagine the gain it can provide. If no confirmation is received, the entry is put into the next state, NUD_PROBE, which resolves the status of the neighbor through explicit solicitation requests or whatever other mechanism a protocol might use. NUD_PROBE When the neighbor has been in the NUD_DELAY state for the allotted amount of time and no proof of reachability has been received, its state is changed to NUD_PROBE and the solicitation process starts. The reachability status of a neighbor can be confirmed in two main ways. As we will see, these two methods do not have the same level of authority. They are: Confirmation from a unicast solicitation's reply When your host receives a solicitation reply in answer to a solicitation request it previously sent out, it means that the neighbor received the request and was able to send back a reply; this in turn means that either it already had your L2 address or it learned your address from your request (see the section "Creating a neighbour Entry" in Chapter 27. It also means that there is a working path in both directions. Note, however, that this is true only when the solicitation's reply is sent as a unicast packet. The reception of a broadcast reply would move the state to NUD_STALE rather than NUD_REACHABLE. (You can find more discussion of this from the standpoint of ARP in the section "Processing Ingress ARP Packets" in Chapter 28.) External confirmation If your host is sure it received a packet from the neighbor in response to something previously sent, it can assume the neighbor is still reachable. Figure 26-14 shows an example, where the TCP layer of Host A confirms the reachability of Host B when it receives a SYN/ACK in reply to its SYN. Note that if Host B was not a neighbor of Host A, the reception of the SYN/ACK from Host B would confirm the reachability of the next hop gateway used by Host A to reach Host B. Figure 26-14. Example of external neighbor reachability confirmation This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Confirmation is done via dst_confirm, which confirms the validity of the routing table cache entry used to route the SYN packet toward Host B. dst_confirm is a simple wrapper around neigh_confirm, which accomplishes the task we described earlier: it confirms the reachability of the neighbor and therefore the L3-to-L2 mapping. Note that neigh_confirm only updates the neigh->confirmed timestamp; it will be the neigh_periodic_timer function (which is executed by the expiration of the timer started when the neighbor entered the NUD_DELAY state) that actually upgrades the neighbor entry's state to NUD_REACHABLE. [*] [*] The delay between the reception of the confirmation from the L4 layer and the setting of the state to NUD_REACHABLE does not affect traffic in any way. Note that the correlation between the two packets in Figure 26-14 could not be performed at the IP layer because the latter doesn't have any knowledge of data streams. This is why the L4 layer takes care of the confirmation. TCP SYN/ACK exchanges are only one example of an L4 protocol providing external confirmation. Given a socket, and therefore the associated routing cache entry and its next-hop gateway, a user-space application can confirm the reachability of the gateway by using the MSG_CONFIRM option with transmission calls such as send and sendmsg. While the reception of a solicitation's reply can move the state to NUD_REACHABLE regardless of the current state, external confirmations can be used only when the current state is NUD_STALE. This means that if the entry had just been created and it was in the NUD_INCOMPLETE state, external confirmations would not be allowed to confirm the reachability of the neighbor (see Figure 26-13). Note that NUD_DELAY/NUD_PROBE and NUD_NONE can lead to NUD_REACHABLE, as shown in Figure 26-13; however, from NUN_NONE to get to NUD_REACHABLE, you need full proof of reachability, while from NUD_DELAY/NUD_PROBE, any kind of confirmation is sufficient. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Chapter 27. Neighboring Subsystem: Infrastructure In Chapter 26, we saw the main problems that the neighboring protocols are asked to solve. You also learned that the Linux kernel abstracted out parts of the solution into a common infrastructure shared by various neighboring protocols. In this chapter, we will see how the infrastructure is designed. In particular, we will see how protocols interface to the common infrastructure, how caching and proxying are implemented, and how external subsystems such as higher-layer protocols notify the neighboring protocols about interesting events. We will conclude the chapter with a description of how L3 protocols such as IPv4 actually interface with their neighboring protocols, and how queuing is implemented for buffers awaiting address resolution. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 27.1. Main Data Structures To understand the code for the neighboring infrastructure, we first need to describe a few data structures used heavily in the neighboring subsystem, and see how they interact with each other. Most of the definitions for these structures can be found in the file include/net/neighbour.h. Note that the Linux kernel code uses the British spelling neighbour for data structures and functions related to this subsystem. When speaking generically of neighbors, this book sticks to the American spelling, which is the spelling found in RFCs and other official documents. struct neighbour Stores information about a neighbor, such as the L2 and L3 addresses, the NUD state, the device through which the neighbor can be reached, etc. Note that a neighbour enTRy is associated not with a host, but with an L3 address. There can be more than one L3 address for a host. For example, routers, among other systems, have multiple interfaces and therefore multiple L3 addresses. struct neigh_table Describes a neighboring protocol's parameters and functions. There is one instance of this structure for each neighboring protocol. All of the structures are inserted into a global list pointed to by the static variable neigh_tables and protected by the lock neigh_tbl_lock. This lock protects the integrity of the list, but not the content of each entry. struct neigh_parms A set of parameters that can be used to tune the behavior of a neighboring protocol on a per-device basis. Since more than one protocol can be enabled on most interfaces (for instance, IPv4 and IPv6), more than one neigh_parms structure can be associated with a net_device structure. struct neigh_ops A set of functions that represents the interface between the L3 protocols such as IP and dev_queue_xmit, the API introduced in Chapter 11 and described briefly in the upcoming section "Common Interface Between L3 Protocols and Neighboring Protocols." The virtual functions can change based on the context in which they are used (that is, on the status of the neighbor, as described in Chapter 26). struct hh_cache Caches link layer headers to speed up transmission. It is faster to copy a cached header into a buffer in one shot than to fill in its fields one by one. Not all device drivers implement header caching. See the section "L2 Header Caching." struct rtable struct dst_entry This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com When a host needs to route a packet, it first consults its cache and then, in the case of a cache miss, it queries the routing table. Every time the host queries the routing table, the result is saved into the cache. The IPv4 routing cache is composed of rtable structures. Each instance is associated with a different destination IP address. Among the fields of the rtable structure are the destination address, the next hop (router), and a structure of type dst_entry that is used to store the protocol-independent information. dst_entry includes a pointer to the neighbour structure associated with the next hop. I cover the dst_entry data structure in detail in Chapter 36. In the rest of this chapter, I will often refer to dst_entry structures as elements of the routing table cache, even though dst_entry is actually only a field of the rtable structure. Figure 27-1 shows how dst_entry structures are linked to hh_cache and neighbour structures. The neighboring code also uses some other small data structures. For instance, struct pneigh_entry is used by destination-based proxying, and struct neigh_statistics is used to collect statistics about neighboring protocols. The first structure is described in the section "Acting As a Proxy," and the second one is described in the section "Statistics" in Chapter 29. Figure 27-2 also includes the following data structure types, described in greater detail in Chapters 22 and 23: Figure 27-1. Relationship among dst_entry, neighbour, and hh_cache structures in_device, inet6_dev Used to store the IPv4 and IPv6 configurations of a device, respectively. net_device There is one net_device structure for each network device recognized by the kernel. See Chapter 8. Figure 27-2 shows the relationships between the most important data structures. Right now it might seem a big mess, but it will make much more sense by the end of this chapter. Here are the main points shown in Figure 27-2: In the central part of the figure, you can see that each network device has a pointer to a data structure that holds the configuration for each L3 protocol configured on the device. In the example shown in the figure, IPv6 is configured on one device and IPv4 is configured on both. Both the in_device structure (IPv4 configuration) and inet6_dev structure (IPv6 configuration) include a pointer to the configuration used by their neighboring protocols, respectively ARP and ND. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com All of the neigh_parms structures used by any given protocol are linked together in a unidirectional list whose root is stored in the protocol's neigh_table structure. The top and bottom of the figure show that each protocol keeps two hash tables. The first one, hash_buckets, caches the L3-to-L2 mappings resolved by the protocol or statically configured. The second one, phash_bucket, stores those IP addresses that are proxied, as described in the section "Per-Device Proxying and Per-Destination Proxying." Note that phash_bucket is not a cache, so its elements do not expire and don't need confirmation. Each pneigh_entry structure Figure 27-2. Data structures' relationships This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com includes a pointer (not depicted in Figure 27-2) to its associated net_device structure. Figure 27-6 gives more detail on the structure of the cache hash_buckets. Each neighbour instance is associated with one or more hh_cache structures, if the device supports header caching. The section "L2 Header Caching," and Figures 27-1 and 27-10, give more details about the relationship between neighbour and hh_cache structures. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... structure," and Table 29-3 in Chapter 29 for information on these variables Figures 27- 7 and 27- 8 show the behavior of neigh_periodic_timer and neigh_forced_gc, the two routines described in the next two sections 27. 6.1.1 Synchronous cleanup: the neigh_forced_gc function Figure 27- 7 shows the internals of neigh_forced_gc Figure 27- 7 neigh_forced_gc function moc.fdpopmis.www//:ptth - noisreV deretsigernU tilpS... Figure 27- 3 moc.fdpopmis.www//:ptth - noisreV deretsigernU tilpS dna egreM FDP opmiS This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it Thanks Figure 27- 3 (a) Initialization of neigh->ops; (b) initialization of neigh->output 27. 2.1 Initialization of neigh->ops On certain types of devices, the initialization of the functions listed in Figure 27- 3(b)... statically configured and therefore do not expire Elements are added by neigh_create at the head of the bucket's lists in the hash table 27. 6.1.2 Asynchronous cleanup: the neigh_periodic_timer function Figure 27- 8 shows the internals of neigh_periodic_timer Figure 27- 8 neigh_periodic_timer function moc.fdpopmis.www//:ptth - noisreV deretsigernU tilpS dna egreM FDP opmiS ... as shown inFigure 29-1 in Chapter 29 The function can also be used by the neighboring protocols themselves, but it is not the only function that changes state Figures 27- 5(a) and 27- 5(b) show a high-level description of neigh_update's internals The flowchart is divided into different areas, each area taking care of a different task: Sanity checks Changes applied to a neighbor whose current state is... distributes elements into buckets are the L3 address, the associated device, and a random value that is recomputed regularly to reduce the effectiveness of a hypothetical Denial of Service (DoS) attack Figure 27- 6 shows the structure of the cache In Figure 27- 2, you can see its relationship to other key data structures, such as the per-protocol neigh_table structure Hash tables are allocated and freed with... the number of neighbour entries used by the host Frees the neighbour data structure (i.e., gives it back to its memory pool) 27. 6.1 Garbage Collection Garbage collection refers to the process of eliminating resources that are not in use anymore Like many Linux kernel subsystems (networking and others), the neighboring subsystem maintains a timer that runs periodically and executes a function whenever... ChmMagic, please go to http://www.bisenter.com to register it Thanks 27. 2 Common Interface Between L3 Protocols and Neighboring Protocols The Linux kernel has a generic neighboring layer that connects L3 protocols to the main L2 transmit function ( dev_queue_xmit) via a virtual function table (VFT) A VFT is the mechanism frequently used in the Linux kernel for allowing subsystems to use different functions... the earlier section "Creating a neighbour Entry and the description of arp_process in Chapter 28 In a medium-size network, the ," thresholds are pretty safe and the cache is not likely to overflow The routine invoked to do synchronous cleanup is neigh_forced_gc, which is depicted inFigure 27- 7 neigh_forced_gc removes all of the eligible elements from the hash table Eligible elements are the ones that... protocols (e.g., neigh_event_send [*] The routines used to compare timestamps, such astime_after_eq and time_before_eq, are defined ininclude /linux/ jiffies.h [ ] Part of neigh_event_send is also depicted in Figure 27- 13 as part of the expanded neigh_resolve_output flowchart 27. 2.2.1 Common state changes: neigh_connect and neigh_suspect The main ways a neighbor can enter the NUD_REACHABLE state (all described... moc.fdpopmis.www//:ptth - noisreV deretsigernU tilpS dna egreM FDP opmiS This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it Thanks Figure 27- 5a neigh_update function 27. 2.3.2 Initial neigh_update operations In this section, we trace the decisions made by neigh_update as it handles various values for the current state neighbour->nud_state) and the ( moc.fdpopmis.www//:ptth . time_before_eq, are defined in include /linux/ jiffies.h. [] Part of neigh_event_send is also depicted in Figure 27- 13 as part of the expanded neigh_resolve_output flowchart. 27. 2.2.1. Common state changes:. but it is not the only function that changes state. Figures 27- 5(a) and 27- 5(b) show a high-level description of neigh_update's internals. The flowchart is divided into different areas, each. respectively. net_device There is one net_device structure for each network device recognized by the kernel. See Chapter 8. Figure 27- 2 shows the relationships between the most important data structures.

Ngày đăng: 13/08/2014, 04:21

Từ khóa liên quan

Mục lục

  • Understanding Linux Network Internals

  • Table of Contents

  • Copyright

  • Preface

    • The Audience for This Book

    • Background Information

    • Organization of the Material

    • Conventions Used in This Book

    • Using Code Examples

    • We'd Like to Hear from You

    • Safari Enabled

    • Acknowledgments

    • Part I:  General Background

      • Chapter 1.  Introduction

        • Section 1.1.  Basic Terminology

        • Section 1.2.  Common Coding Patterns

        • Section 1.3.  User-Space Tools

        • Section 1.4.  Browsing the Source Code

        • Section 1.5.  When a Feature Is Offered as a Patch

        • Chapter 2.  Critical Data Structures

          • Section 2.1.  The Socket Buffer: sk_buff Structure

          • Section 2.2.  net_device Structure

          • Section 2.3.  Files Mentioned in This Chapter

          • Chapter 3.  User-Space-to-Kernel Interface

            • Section 3.1.  Overview

Tài liệu cùng người dùng

Tài liệu liên quan