Understanding Linux Network Internals 2005 phần 8 ppsx

128 430 0
Understanding Linux Network Internals 2005 phần 8 ppsx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

29.2. Tuning via /proc Filesystem As we saw in an earlier chapter, the neighboring protocols follow the common kernel practice of offering a convenient interface in the /proc directory to let administrators tune the subsystem's parameters. The neighboring subsystem 's parameters reside in four directories, two for IPv4 and two for IPv6: /proc/sys/net/ipv4/neigh /proc/sys/net/ipv6/neigh Generic parameters of the neighboring subsystem, such as the timers used to control when cache operations take place /proc/sys/net/ipv4/conf /proc/sys/net/ipv6/conf Particular behaviors within the protocol, such as the ones described in the section "Tunable ARP Options" in Chapter 28 Each directory contains a subdirectory for each NIC device on the system, a default subdirectory, and (in the case of the conf directory) an all subdirectory that can be used to apply a change to all the devices at once. Under conf, the default subdirectory shows the global status of each feature, while under neigh, the default subdirectory shows the default setting (i.e., configuration parameters) of each feature. The values of the default subdirectories are used to initialize the per-device subdirectories when the latter are created. The directories for individual devices take precedence over the more general directories. But not all devices pay attention to all the parameters; if a parameter is not relevant to a device, the associated directory contains a file for the parameter but the kernel ignores it. For instance, the gc_thresh1 value is not used by any protocol, and only IPv4 uses locktime. Figure 29-3 shows the layout of the files and the routines that register them. The three files arp, arp_cache, and ndisc_cache at the top-right corner of Figure 29-3 are not used to configure anything, but just to export read-only data. Note that they are in the /proc/net directory, not in /proc/sys. /proc/net/arp is used by the arp command to dump the contents of the ARP cache (there is no counterpart for ND), as discussed in the section "Old-Generation Tool: net-tools's arp Command." The /proc/net/stat/xxx_cache files export statistics about the protocol caches. Most of their files represent fields of neigh_statistics structures, described in the section "neigh_statistics Structure." 29.2.1. The /proc/sys/net/ipv4/neigh Directory This directory contains parameters from neigh_parms structures, which were introduced in Chapter 27. As that chapter explained, each device has one neigh_parms structure for each neighboring protocol that it interacts with (see Figure 27-2 in Chapter 27). We have also seen that another neigh_parms instance is included in the neigh_table structure to store default values. However, not all fields of the neigh_parms structure are exported to /proc. For instance, reachable_time is a derived field whose value is indirectly calculated from base_reachable_time and therefore cannot be changed by the user. In addition, tbl and neigh_setup are used by the kernel to organize its data structures and do not have anything to do with the protocol itself, so they are not exported. In addition to exporting most of the parameters in the neigh_parms structure to /proc, the neighboring subsystem exports a few from the neigh_table structure, too. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 29.2.1.1. Initialization of global and per-device directories Because the default values are provided by the protocol itself, the default subdirectory is installed when the protocol is initialized (see the arp_init and ndisc_init functions) and populated with files whose names are based on those of the associated fields in the neigh_parms structure. You can find the default values of the fields in Table 29-3 directly in the initializations of the xxx_tbl tables; Chapter 28 shows an example for ARP. Figure 29-3. Example of /proc/sys file registration for the neighboring subsystem This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The relationships between the kernel variables and the names of the files in /proc/sys/net/ipv4/neigh/xxx/ are shown in Table 29-3. See the initialization of neigh_sysctl_template in net/core/neighbour.c; a guide to reading the template is in Chapter 3. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Table 29-3. Kernel variables and associated files in /proc/sys/net/ipv4/neigh subdirectories Kernel variable nameFilenameDefault value for IPv4/IPv6 mcast_probes mcast_solicit 3 ucast_probes ucast_solicit 3 app_probes app_solicit 0 retrans_time retrans_time 100 * HZ base_reachable_time base_reachable_time 30 * HZ delay_probe_time delay_first_probe_time 5 * HZ gc_staletime gc_stale_time 60 * HZ queue_len unres_qlen 3 proxy_qlen proxy_qlen 64 anycast_delay anycast_delay 1 * HZ proxy_delay proxy_delay (8*HZ)/10 locktime locktime 1 * HZ gc_interval gc_interval 30 * HZ gc_thresh1 gc_thresh1 128 gc_thresh2 gc_thresh2 512 gc_thresh3 gc_thresh3 1,024 Each device's directories are created when the device is first configured. The first time an address is configured on device D, a directory with the name D is created under /proc/sys/net/ipv4/neigh. All of the parameters apply to the device rather than to a specific address, so there is only a single directory for each device, even if it is configured with multiple addresses. Figure 29-3 shows the directory tree you would see if a host had three devices named eth0, eth1, and eth2; if eth0 and eth1 had been given IPv4 addresses; if eth0 had also been given an IPv6 address; and if eth2 has not been configured yet. The two functions in charge of configuring IPv4 and IPv6 devices are inetdev_init and ip6_add_dev, respectively. Each calls neigh_sysctl_register to create the device's subdirectory under /proc, as described in the following section. 29.2.1.2. Directory creation Both the default and the per-device directories in /proc/sys/net/ipv4/neigh are created with the neigh_sysctl_register function. The latter differentiates between the two cases by using the value of the input parameter dev. If we take IPv4 as an example, you can compare the way arp_init (a protocol initialization function) and inetdev_init (a device's configuration block initializer) call neigh_sysctl_register. neigh_sysctl_register needs to differentiate between the two cases to: Pick the name of the directory to create. It will be default when dev is NULL, and extracted from the device itself (dev->name) otherwise. Decide what parameters to add as files to that directory; the default directory will include a few more parameters than the others This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com (four to be exact). While the parameters extracted from neigh_parms are meaningful when configured on a per-device basis, the ones in neigh_table are not. Thus, the four parameters taken from neigh_table go only in the default directory (see the end of Table 29-3). Those four parameters are related to the garbage collection process: gc_interval gc_thresh1, gc_thresh2, gc_thresh3 Here is the meaning of the input parameters to neigh_sysctl_register: struct net_device *dev Device associated with the directory being created. When dev is NULL, it means the function has been invoked to create the default directory. struct neigh_parms *p Structure whose parameters will be exported. A device using ARP, for instance, passes in_dev->arp_parms. When dev is NULL, this is the neigh_parms instance embedded in the protocol's neigh_table structure (neigh_table->neigh_parms), which stores the protocol's defaults. int p_id Protocol identifier. See the NET_XXX values in include/linux/sysctl.h. ARP, for instance, uses NET_IPV4. int pdev_id Class identifier of parameters being exported. See the NET_IPV4_XXX values in include/linux/sysctl.h. ARP, for example, uses NET_IPV4_NEIGH. char *p_name String indicating the L3 protocol that refers to the neighboring protocol fields. ARP, for example, uses "ipv4". proc_handler *handler Function that the kernel invokes when the value of one of the exported fields is modified by the user. Only IPv6 passes a non-NULL value, and the function it provides is simply a wrapper to the default handler that the kernel would install otherwise. See ndisc_ifinfo_sysctl_change in net/ipv6/ndisc.c for an example. The only tricky part in the function is how the four gc_xxx parameters are extracted from the neigh_table structure. It relies on a trick of memory layout: the four parameters related to garbage collection are stored in the neigh_table structure right after the neigh_parms structure, as shown here: struct neigh_table struct neigh_parms parms; int gc_interval; int gc_thresh1; int gc_thresh2; int gc_thresh3; Thus, all the function needs to do to retrieve the neigh_table values is to go past neigh_parms, cast the pointer to an integer, and extract four This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com integers in a row: if (dev) { dev_name_source = dev->name; t->neigh_dev[0].ctl_name = dev->ifindex; memset(&t->neigh_vars[12], 0, sizeof(ctl_table)); } else { t->neigh_vars[12].data = (int *)(p + 1); t->neigh_vars[13].data = (int *)(p + 1) + 1; t->neigh_vars[14].data = (int *)(p + 1) + 2; t->neigh_vars[15].data = (int *)(p + 1) + 3; } 29.2.2. The /proc/sys/net/ipv4/conf Directory The files in the /proc/sys/net/ipv4/conf subdirectories are associated with the fields of the ipv4_devconf structure, which is defined in include/linux/inetdevice.h. Not all of its fields are used by the neighboring protocols (see Chapters 23 and 36 for the other fields). Table 29-4 lists the parameters relevant to the neighboring protocols; their meanings were described in the section "Tunable ARP Options" in Chapter 28. Table 29-4. Kernel variables and associated files in /proc/sys/net/ipv4/conf subdirectories Kernel variable nameFilenameDefault value for IPv4/IPv6 ipv4_devconf.arp_announce arp_announce 0 ipv4_devconf.arp_filter arp_filter 0 ipv4_devconf.arp_ignore arp_ignore 0 ipv4_devconf.medium_id medium_id 0 ipv4_devconf.proxy_arp proxy_arp 0 As shown in Figure 29-3, in addition to the per-device subdirectories, there are also two special ones named default and all. See Chapter 36 for more details. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 29.3. Data Structures Featured in This Part of the Book In the section "Main Data Structures" in Chapter 27, we had a brief overview of the main data structures used by the neighboring subsystem. This section presents a detailed description of each data structure's field. Figure 29-4 shows the files that define each data structure. The ones with a lighter color are not part of the neighboring subsystem, but I referred to them in this part of the book. Figure 29-4. Distribution of data structures in kernel files 29.3.1. neighbour Structure Neighbors are represented by struct neighbour structures. The structure is complex and includes status fields, virtual functions to This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com interface with L3 protocols, timers, and cached L2 headers. Here is a field-by-field description: struct neighbour *next Each neighbour enTRy is inserted in a hash table. next links the structure to the other ones that collide and share the same bucket. Elements are always inserted at the head of the list (see the section "Creating a neighbour Entry," and Figure 27-2 in Chapter 27). struct neigh_table *tbl Pointer to the neigh_table structure that defines the protocol associated with this entry. If the neighbor is an IPv4 address, for instance, tbl points to arp_tbl. struct neigh_parms *parms Parameters used to tune the neighboring protocol behavior. When a neighbour structure is created, parms is initialized with the values of the default neigh_parms structure embedded in the protocol's associated neigh_table structure. When the protocol's constructor method is called by neigh_create (e.g., arp_constructor for ARP), that block is replaced with the configuration block of the associated device, if any. While most devices use the system defaults, a device can start up with different parameters or be configured by the administrator later to use different parameters, as discussed earlier in this chapter. struct net_device *dev The device through which the neighbor is reachable. Only one device can be used to reach each neighbor. Thus, the value NULL never appears here as it does in other kernel subsystems that use it as a wildcard to refer to all devices. unsigned long confirmed Timestamp (in jiffies) when the reachability of the entry was most recently confirmed. L4 protocols can update it with neigh_confirm (see Figure 26-14 in Chapter 26). The neighboring infrastructure updates it in neigh_update, described in . unsigned long updated Timestamp of the most recent time the entry was updated by neigh_update (the only exception is the first initialization by neigh_alloc). Do not confuse updated and confirmed, which keep track of very different things. The updated field is set when the state of a neighbor changes, whereas the confirmed field merely records one particular change of state: the one that occurs when the entry was most recently confirmed to be valid. unsigned long used Most recent time the entry was used. Its value is not always updated synchronously with the data transmissions. When the entry is not in the NUD_CONNECTED state, this field is updated by neigh_event_send, which is called by neigh_resolve_output. In contrast, when the entry is in the NUD_CONNECTED state, its value is sometimes updated by neigh_periodic_timer to the time the entry's reachability was most recently confirmed. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com _ _u8 flags Possible values for this field are listed in include/linux/rtnetlink.h and include/net/neighbour.h: #define NTF_PROXY 0x08 When the ip neigh user-space command is used to add entries to the proxy tables (for instance, ip neigh add proxy 10.0.0.2 dev eth0), this flag is set in the data structure sent to the kernel, to let the kernel handler neigh_add know that the new entry has to be added to the proxy table (see the section "System Administration of Neighbors"). #define NTF_ROUTER 0x80 This flag is used only by IPv6. When set, it means the neighbor is a router. Unlike NTF_PROXY, this flag is not set by user-space tools. The IPv6 neighbor discovery code updates its value when receiving information from the neighbor. _ _u8 nud_state Indicates the entry's state. The possible values are defined in include/net/neighbour.h and include/linux/rtnetlink.h with names of form NUD_XXX. The role of states is described in the section "Transitions Between NUD States" in Chapter 26. Figure 26-13 in Chapter 26 shows how the state changes depending on various events. _ _u8 type This parameter is set when the entry is created with neigh_create by calling the protocol constructor method (e.g., arp_constructor for ARP). Its value is used in various circumstances, such as to decide what value to give nud_state. type can assume the values in Table 36-12 in Chapter 36, listed in include/linux/rtnetlink.h. In the context of this chapter, not all of the values of that table are actually used: we are mostly interested in RTN_UNICAST, RTN_LOCAL, RTN_BROADCAST, RTN_ANYCAST, and RTN_MULTICAST. Given an IPv4 address (such as the L3 address associated with a neighbour entry), the inet_addr_type function finds the associated RTN_XXX value (see Chapter 28). For IPv6, there is a similar function called ipv6_addr_type. _ _u8 dead When dead is set to 1 it means the structure is being removed and cannot be used anymore. See neigh_ifdown in the section "External Events" in Chapter 32, and neigh_forced_gc and neigh_periodic_timer for examples of usage. atomic_t probes Number of failed solicitation attempts. Its value is checked by the neigh_timer_handler timer, which puts the neighbour entry into the NUD_FAILED state when the number of attempts reaches the maximum allowed value. rwlock_t lock Used to protect the neighbour structure from race conditions. unsigned char ha[] This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The L2 address (e.g., Ethernet MAC address for Ethernet NICs) associated with the L3 address represented by primary_key (discussed shortly). The address is in binary format. The size of the vector ha is MAX_ADDR_LEN (defined as 32 in include/linux/netdevice.h), rounded up to the first multiple of a C long. An Ethernet address requires only six octets (i.e., 48 bits), but other link layer protocols may require more. For each hardware address type, the kernel defines a symbol that is assigned the size of the address. Most symbols use names like XXX_ALEN or XXX_ADDR_LEN. Ethernet, for example, defines the ETH_ALEN symbol in include/linux/if_ether.h. struct hh_cache *hh List of cached L2 headers. See the section "L2 Header Caching" in Chapter 27. atomic_t refcnt Reference count. See the sections "Caching" and "Reference Counts on neighbour Structures" in Chapter 27. int (*output)(struct sk_buff *skb) Function used to transmit frames to the neighbor. The actual routine this function pointer points to can change several times during the structure's lifetime, depending on several factors. It is first initialized by the neigh_table's constructor method (see the section "Initialization of a neighbour Structure" in Chapter 28). It can be updated by calling neigh_connect or neigh_suspect when the neighbor state goes to NUD_REACHABLE or NUD_STALE state, respectively. struct sk_buff_head arp_queue Packets whose destination L3 address has not been resolved yet are temporarily placed into this queue. Despite the name of this field, it can be used by all neighboring protocols, not just ARP. See the section "Egress Queuing" in Chapter 27. struct timer_list timer Timer used to handle several tasks. See the section "Timers" in Chapter 15. struct neigh_ops *ops VFT containing the methods used to manipulate the neighbour entry. Among the methods, for instance, are several used to transmit packets, each optimized for a different state or associated device type. Each protocol provides three or four different VFTs; which is used for a specific neighbour entry depends on the type of L3 address, the type of associated device, and the type of link (e.g., point-to- point). See the upcoming section "neigh_ops Structure," and the section "Initialization of neigh->ops" in Chapter 27. u8 primary_key[0]; L3 address of the neighbor. It is used as the key by the cache lookup functions. It is an IPv4 address for ARP entries and an IPv6 address for neighbor discovery entries. 29.3.2. neigh_table Structure This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... 30-2, traffic for any host outside the 10.0.0.0/24 network (designated by 0.0.0.0/0) is sent to the gateway on 10.0.0.1 For hosts on the 10.0.0.0/24 network, the neighboring subsystem described in Part VI is used Regardless of the role played by a host in the network, each host maintains a routing table that it consults whenever it needs to handle network traffic, both when sending and receiving Routers... next door or halfway around the world When more than one network is involved, the L3 layer is responsible for figuring out the most efficient route (so far as that is feasible) and for directing the message toward the next system along that route, also called the next hop This process is called routing, and it plays a central role in the Linux networking code Here is what is covered in each chapter:... a cached L2 header is struct hh_cache, defined in include /linux/ netdevice.h (The name comes from "hardware header.") The following is a description of its fields; the section "L2 Header Caching" in Chapter 27 describes how it is used unsigned short hh_type Protocol associated with the L3 address (see the ETH_P_XXX values in the file include /linux/ if_ether.h) struct hh_cache *hh_next More than one cached... 27) Look at shaper_neigh_setup in drivers/net/shaper.c for an example To see when this initialization function is called during the initialization phase of a new neighbour instance, see Figure 28- 11 in Chapter 28 Do not confuse this virtual function with net_device->neigh_setup The latter is called when the first L3 address is configured on a device, and normally initializes neigh_parms->neigh_setup,... functionality that a basic router, and therefore the Linux kernel, must provide Chapter 31 Routing: Advanced Introduces optional features the user can enable to configure routing in more complex scenarios Among them we will see policy routing and multipath routing We will also look at the other subsystems routing interacts with Chapter 32 Routing: Linux Implementation Gives you an overview of the main... Concepts Figure 30-1 shows where the routing subsystem (the gray box) fits into the network stack The figure does not include all the details (Netfilter, bridging, etc.) but shows the other major kernel subsystems that are traversed before and after routing Figure 30-1 Relationship between the routing subsystem and the other main network subsystems To explain some of the features or the details of their implementation,... routing will focus on IPv4 networks However, I will point out the aspects of IPv6 that differ significantly moc.fdpopmis.www//:ptth - noisreV deretsigernU tilpS dna egreM FDP opmiS This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it Thanks 30.1 Routers, Routes, and Routing Tables In its simplest form, a router can be defined as a network device that... to register it Thanks 30.1 Routers, Routes, and Routing Tables In its simplest form, a router can be defined as a network device that is equipped with more than one network interface card (NIC), and [*] that uses its knowledge of the network to forward ingress traffic appropriately [*] Unlike IPv4, IPv6 explicitly defines the router role by using a special flag in the IP header The information required... ping-pong effects that can take place, for instance, when more than one proxy ARP server is present on the same network segment and all of them reply to the same query solicitations with conflicting addresses Details of this behavior are discussed in the section "Final Common Processing in Chapter 28 " int dead Boolean flag that is set to mark the neighbor instance as "Being removed." See neigh_parms_release... all, they need more knowledge about how to reach remote networks, and the nonrouter hosts depend on them for that The routing protocols are beyond the scope of this book The routing capabilities required by hosts may be reduced even further under specific scenarios, such as the one described in the section "Proxy ARP Server as Router in Chapter 28 In this chapter, however, we will stick to the common . Version - http://www.simpopdf.com _ _u8 flags Possible values for this field are listed in include /linux/ rtnetlink.h and include/net/neighbour.h: #define NTF_PROXY 0x 08 When the ip neigh user-space. NET_XXX values in include /linux/ sysctl.h. ARP, for instance, uses NET_IPV4. int pdev_id Class identifier of parameters being exported. See the NET_IPV4_XXX values in include /linux/ sysctl.h. ARP, for. HZ queue_len unres_qlen 3 proxy_qlen proxy_qlen 64 anycast_delay anycast_delay 1 * HZ proxy_delay proxy_delay (8* HZ)/10 locktime locktime 1 * HZ gc_interval gc_interval 30 * HZ gc_thresh1 gc_thresh1 1 28 gc_thresh2 gc_thresh2 512 gc_thresh3 gc_thresh3 1,024 Each

Ngày đăng: 13/08/2014, 04:21

Từ khóa liên quan

Mục lục

  • Understanding Linux Network Internals

  • Table of Contents

  • Copyright

  • Preface

    • The Audience for This Book

    • Background Information

    • Organization of the Material

    • Conventions Used in This Book

    • Using Code Examples

    • We'd Like to Hear from You

    • Safari Enabled

    • Acknowledgments

    • Part I:  General Background

      • Chapter 1.  Introduction

        • Section 1.1.  Basic Terminology

        • Section 1.2.  Common Coding Patterns

        • Section 1.3.  User-Space Tools

        • Section 1.4.  Browsing the Source Code

        • Section 1.5.  When a Feature Is Offered as a Patch

        • Chapter 2.  Critical Data Structures

          • Section 2.1.  The Socket Buffer: sk_buff Structure

          • Section 2.2.  net_device Structure

          • Section 2.3.  Files Mentioned in This Chapter

          • Chapter 3.  User-Space-to-Kernel Interface

            • Section 3.1.  Overview

Tài liệu cùng người dùng

Tài liệu liên quan