Understanding Linux Network Internals 2005 phần 6 ppt

128 455 0
Understanding Linux Network Internals 2005 phần 6 ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

IPSTATS_MIB_OUTREQUESTS Number of packets that the system tried to transmit (successfully or not), not including forwarded packets. This field is updated in ip_ouput (and in ip_mc_output for multicast). IPSTATS_MIB_OUTDISCARDS Number of packets whose transmission failed. This field is updated in several places, including ip_append_data, ip_push_pending_frames, and raw_send_hdrinc. IPSTATS_MIB_OUTNOROUTES Number of locally generated packets discarded because there was no route to transmit them. Normally this field is updated after a failure of ip_route_output_flow. ip_queue_xmit is one of the functions that can update it. IPSTATS_MIB_OUTMCASTPKTS Number of transmitted multicast packets. Not used by IPv4 at the moment. Fields related to defragmentation IPSTATS_MIB_REASMTIMEOUT Number of packets that failed defragmentation because some of the fragments were not received in time. The value reflects the number of complete packets, not the number of fragments. This field is updated in ip_expire, which is the timer function executed when an IP fragment list is dropped due to a timeout. Note that this counter is not used as defined in the two RFCs mentioned at the beginning of this section. IPSTATS_MIB_REASMREQDS Number of fragments received (and therefore the number of attempted reassemblies). This field is updated in ip_defrag. IPSTATS_MIB_REASMFAILS Number of packets that failed the defragmentation. This field is updated in several places (_ _ip_evictor, ip_expire, ip_frag_reasm, and ip_defrag) for different reasons. IPSTATS_MIB_REASMOKS Number of packets successfully defragmented. This field is updated in ip_frag_reasm. Fields related to fragmentation IPSTATS_MIB_FRAGFAILS This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Number of failed fragmentation efforts. This field is updated in ip_fragment (and in ipmr_queue_xmit for multicast). IPSTATS_MIB_FRAGOKS Number of fragments transmitted. This field is updated in ip_fragment. IPSTATS_MIB_FRAGCREATES Number of fragments created. This field is updated in ip_fragment. The values of these counters are exported in the /proc/net/snmp file. Each CPU keeps its own accounting information about the packets it processes. Furthermore, it keeps two counters: one for events in interrupt context and the other for events outside interrupt context. Therefore, the ip_statistics array includes two elements per CPU, one for interrupt context and one for noninterrupt context. Not all of the events can happen in both contexts, but to make things easier and clearer, the vector has simply been defined of double in size; those elements that do not make sense in one of the two contexts are simply not to be used. Because some pieces of code can be executed both in interrupt context and outside interrupt context, the kernel provides three different macros to add an event to the IP statistics vector: #define IP_INC_STATS (field) SNMP_INC_STATS (ip_statistics, field) #define IP_INC_STATS_BH (field) SNMP_INC_STATS_BH (ip_statistics, field) #define IP_INC_STATS_USER(field) SNMP_INC_STATS_USER(ip_statistics, field) The first can be used in either context, because it checks internally whether it was called in interrupt context and updates the right element accordingly. The second and the third macros are to be used for events that happened in and outside interrupt context, respectively. The macros IP_INC_STATS, IP_INC_STATS_BH, and IP_INC_STATS_USER are defined in include/net/ip.h, and the three associated SNMP_INC_XXX macros are defined in include/net/snmp.h. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 23.4. IP Configuration The Linux IP protocol can be tuned and configured manually by a system administrator in different ways. This tuning includes both changes to the protocol itself and to device configuration . The four main interfaces are: ioctl calls made via ifconfig ifconfig is the older Unix-legacy tool for configuring IP on network devices. RTNetlink via ip ip, which is part of the IPROUTE2 package, is the newer tool that Linux offers for configuring IP on network devices. /proc filesystem Protocol behavior can be tuned via a collection of files in the directory /proc/sys/net/ipv4. RARP/BOOTP/DHCP These three protocols can be used to dynamically assign an IP configuration to a host and its interfaces. The last set of protocols in the preceding list have an interesting twist. They are normally implemented in user space, but Linux also has a simple kernel-space implementation that is useful when used together with the nfsroot boot option. The latter allows the kernel to mount the root directory (/) via NFS. To do that, it needs an IP configuration at boot time before the system is able to initialize the IP configuration from user space (which, by the way, could be stored in a remote partition and not even be available to the system when it mounts the root directory). Via kernel boot options, it is possible to give nfsroot a static configuration, or specify what protocols (yes, more than one can be used concurrently) to use to obtain the configuration. The IP configuration code is in net/ipv4/ipconfig.c, and the one used by nfsroot is in fs/nfs/nfsroot.c. The two files cross-reference variables and functions, but they are actually simple to read. We will not cover them, because network filesystems and user-space clients are outside the scope of this book. Once you know how to read _ _setup macros (described in Chapter 7), reading the code should become a piece of cake. It is clear and well commented. The third item in the list, /proc, is covered later in the section "Tuning via /proc Filesystem." In this section, I will say a bit about the kernel interfaces that support the behavior of the first two items, ifconfig and ip. The purpose here is not to cover the internals of the user-space commands or the associated kernel counterparts that handle configuration requests. It is to show how user space and kernel space communicate, and the kernel functions that are invoked in response to a user-space command. 23.4.1. Main Functions That Manipulate IP Addresses and Configuration This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com In net/ipv4/devinet.c, you can find several functions that can be used to add an IP address to a network interface, delete an address from an interface, modify an address, retrieve the IP configuration of a device given its device index or net_device data structure, etc. Here I introduce only a few of the functions that will be useful, to help you to understand the functions described later when we talk about the ip and ifconfig user-space tools. Before reading these descriptions of functions, it would be worthwhile reviewing the key data structures used by the IP layer, introduced in Chapter 19 and described in detail later in this chapter. For instance, a single IP address is represented by an in_ifaddr structure and the complete IPv4 configuration of a device by an in_device structure. inetdev_init and inetdev_destroy inetdev_init is invoked when the first IP configuration is applied to a device. It allocates the in_device structure and links it to the associated net_device instance. It also creates a directory in /proc/sys/net/ipv4/conf/ (see the section "Tuning via /proc Filesystem"). The IP configuration can be removed with inetdev_destroy, which simply undoes whatever was done in inetdev_init, plus removes all of the linked in_ifaddr structures. The latter are removed with inet_free_ifa, which also decrements the reference count on the in_device structure with in_dev_put. When the last reference is released, probably with the last call to inet_free_ifa, the in_device instance is freed with in_dev_finish_destroy. inet_alloc_ifa and inet_free_ifa Those two functions allocate and free, respectively, an in_ifaddr data structure. A new one is allocated when a user adds a new address to an interface. A deletion can be triggered by the removal of a single address, or by the removal of all of the devices' IP configurations together. Both routines use the read-copy update (RCU) mechanism as a means to enforce mutual exclusion. inet_insert_ifa and inet_del_ifa inet_insert_ifa adds a new in_ifaddr structure to the list within in_device. It detects duplicates and marks the address as secondary if it finds out that it falls within another address's subnet. Suppose, for instance that eth0 already had the address 10.0.0.1/24. When a new 10.0.0.2/24 address is added, it will be recognized as secondary with respect to the first. Primary addresses are also used to feed the entropy of the kernel random number generator with net_srandom. More information on primary and secondary addresses can be found in Chapter 30. inet_del_ifa simply removes an in_ifaddr structure from the associated in_device instance, making sure that, if the address is primary, all of the associated secondary addresses are removed too, unless the administrator has explicitly configured the device via its /proc/sys/net/ipv4/conf/dev_name/promote_secondaries file not to remove secondary addresses. Instead, a secondary address can be promoted to a primary one when the associated primary address is removed. Given the in_device instance, this configuration can be accessed with the IN_DEV_PROMOTE_SECONDARIES macro. The inet_del_ifa function accepts an extra input parameter that can be used to tell whether the in_device structure should be freed when the last in_ifaddr instance has been removed. While it is normal to remove the empty in_device structure, sometimes a caller might not do it, such as when it knows it is going to add a new in_ifaddr soon. In both cases, addition and deletion, successful completion leads to a Netlink broadcast notification with rtmsg_ifa (see the section "Change Notification: rtmsg_ifa") and a notification to the other kernel subsystems via the inetaddr_chainnotification chain (see Chapter 4). inet_set_ifa This is a wrapper for inet_insert_ifa that creates an in_device structure if none exists for the associated device, and sets the scope of the address to local (RT_SCOPE_HOST) for addresses like 127.x.x.x. Refer to the section "Scope" in Chapter 30 for more details on scopes. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Many other, smaller functions can be used to make the code more readable. Here are a few of them: inet_select_addr This function is used to select an IP address among the ones configured on a given device. The function accepts an optional scope as a parameter, which can be used to narrow down the lookup domain. We will see where this function is useful in Chapter 35. inet_make_mask and inet_mask_len Given the number of 1s the netmask is composed of, inet_make mask creates the associated netmask. For example, an input of 24 would generate the netmask with the decimal representation 255.255.255.0. inet_mask_len is the converse, returning the number of 1s in a decimal netmask. For instance, 255.255.0.0 would return 16. inet_ifa_match Given an IP address and a netmask, inet_ifa_match checks whether a given second IP address falls within the same subnet. This function is often used to classify secondary addresses and to check whether a given IP address belongs to one of the locally configured subnets. See, for instance, inet_del_ifa. for_primary_ifa and for_ifa These two functions are macros that can be used to browse all of the in_ifaddr instances associated with a given in_device structure. for_primary_ifa considers only primary addresses, and for_ifa goes through all of them. 23.4.2. Change Notification: rtmsg_ifa Netlink provides the RTMGRP_IPV4_IFADDR multicast group to user-space applications interested in changes to the locally configured IP addresses. The kernel uses the rtmsg_ifa function to notify those applications that registered to the group when any change takes place on the local IP addresses. The function can be called when two types of events occur: RTM_NEWADDR A new address has been configured on a device. RTM_DELADDR An address has been removed from a device. The generated message is initialized with inet_fill_ifaddr, the same function used to handle dump requests from user space (with commands such as ip addr list). The message includes the address being added or removed, and the device associated with it. So, who is interested in this kind of notification? Routing protocols are a major example. If you are using Zebra, the routing protocols you This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com have configured would like to remove all of the routes that are directly or indirectly dependent on an address that has gone away. In Chapter 31, you will learn more about the way routing protocols interact with the kernel routing subsystem. 23.4.3. inetaddr_chain Notification Chain The IP subsystem uses the inetaddr_chain notification chain to notify other kernel subsystems about changes to the IP configuration of the local devices. A kernel subsystem can register and unregister itself with inetaddr_chain by means of the register_inetaddr_notifier and unregister_inetaddr_notifier functions. Here are two examples of users for this notification chain: Routing See the section "External Events" in Chapter 32. Netfilter masquerading When a local IP address is used by the Netfilter's masquerading feature, and that address disappears, all of the connections that are using that address must be dropped (see net/ipv4/netfilter/ipt_MASQUERADE.c). The two NETDEV_DOWN and NEtdEV_UP events, respectively, are notified when an IP address is removed and when it is added to a local device. Such notifications are generated by the inet_del_ifa and inet_insert_ifa routines introduced in the section "Main Functions That Manipulate IP Addresses and Configuration." 23.4.4. IP Configuration via ip Traditionally, Unix system administrators configured interfaces and routes manually using ifconfig, route, and other commands. Currently Linux provides an umbrella ip command to handle IP configuration, with a number of subcommands. In this section we will see how IPROUTE2 handles the main addressing operations, such as adding and removing an address. Once you are familiar with these operations, you can easily understand and read through the code for the others. Figure 23-2 shows the files and the main functions of the IPROUTE2 package that are involved with IP address configuration activities. The labels on the lines are ip keywords, and the nodes show the function invoked and the file the latter belongs to. For instance, the command ip address addwould be handled by ipaddr_modify. Figure 23-2. IPROUTE2 files and functions for address configuration This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Table 23-1 shows the association between the operation specified with a command-line keyword (e.g., add) and the kernel handler run by the kernel. For instance, when the kernel receives a request for an RTM_NEWADDR operation, it knows it is associated with an add command and therefore invokes inet_rtm_newaddr. Some kernel operations are overloaded, and for these, the kernel needs extra flags to figure out exactly what the user-space command is asking for. See Chapter 36 for an example. This association is defined in net/ipv4/devinet.c in the inet_rtnetlink_table structure. For an introduction to RTNetlink, refer to Chapter 3. Table 23-1. ip route commands and associated kernel operations CLI keywordOperationKernel handler addRTM_NEWADDRinet_rtm_newaddr deleteRTM_DELADDRinet_rtm_deladdr list, lst, showRTM_GETADDRinet_dumpifaddr flushRTM_GETADDRinet_dumpifaddr The list and flush commands need some explanation. list is simply a request to the kernel to dump information, for instance, about a given device, and flush is a request to clear the entire IP configuration on the device. The two functions inet_rtm_newaddr and inet_rtm_deladdr are wrappers for the generic functions inet_insert_ifa and inet_del_ifa that we introduced in the section "Main Functions That Manipulate IP Addresses and Configuration." All the wrappers do is translate the request that comes from user space into an input understandable by the two more-general functions. They also filter bad requests that are associated with nonexistent devices. 23.4.5. IP Configuration via ifconfig ifconfig is implemented in the ifconfig.c user-space file (part of the net-tools package). Unlike ip, ifconfig uses ioctl calls to interface to the kernel. However, a set of functions are used by both the ip and ifconfig handlers. In Chapter 3, we had an overview of how ioctl calls are handled by the kernel. Here all we need to know is that the requests related to IPv4 configuration are handled by the inet_ioctl function in net/ipv4/af_inet.c. Based on the ioctl code you can see what helper functions inet_ioctl uses to process the user-space commands (e.g., devinet_ioctl). This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com As for IPROUTE2, user-space requests from ifconfig are handled on the kernel side by wrappers that end up calling the functions in the section "Main Functions That Manipulate IP Addresses and Configuration." This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com 23.5. IP-over-IP IP-over-IP, also called IP tunneling (or IPIP), consists of transmitting IP packets inside other IP packets. This protocol is useful in some very interesting cases, including in a Virtual Private Network (VPN). Of course, nothing comes for free; you can well imagine the extra weight of the doubling of the protocol: because each IP packet has two IP headers, the overhead becomes huge for small packets. There are subtle complexities in implementation, too. For instance, what is the relationship between the IP options of the two headers? If you consider just the IPv4 and IPv6 protocols, you already have four possible combinations of tunneling. But not all of these combinations are likely to be used. To make things more complex (I should actually say "flexible"), keep in mind that there is no limit to the number of recursions in tunneling. [*] [*] IPv6 defines the "tunnel encapsulation limit" as the maximum number of nested encapsulations. See section 6.6 of RFC 2473. The different tunnel interfaces that can be created in Linux are not covered in this book. However, given the background on the IP implementation in this part of the book, you can study the code in net/ipv4/ipip.c and include/net/ipip.h to derive the implementation details. This document was created by an unregistered ChmMagic, please go to http://www.bisenter.com to register it. Thanks. Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... the experience gained with IPv4, the new IPv6 version of the protocol has been designed to address the known shortcomings of IPv4, taking into consideration such aspects as: Functionality Ease of configuration Performance Transition from IPv4 networks to IPv6 networks Security Naturally, the committees designing the new protocol have tried to keep IPv4 and IPv6 as compatible as possible, and the transition... (Year) UDP 768 (1980) ICMP 792(1981) TCP 793(1981) Table 24-2 lists some of the protocols in the second category They can be added to the kernel from the section "Networking Support Networking Options" in the kernel configuration Table 24-2 Protocols implemented as modules Protocol RFC# (Year) Internet Group Management Protocol (IGMP) Version 1: 1112(1989) Version 2: 22 36( 1997) Version 3: 33 76( 2002) Stream... (SCTP) 2 960 (2000) Protocol Independent Multicast, version 1 (PIMv1) and version 2 (PIMv2) 2 362 (1998) AH: 2402(1998) IPsec suite: IP Authentication Header Protocol (AH) , IP Encapsulating Security Payload Protocol (ESP) , IP Payload Compression Protocol (IPcomp) ESP: 24 06( 1998) IPcomp: 3173(2001) Generic Routing Encapsulation (GRE) 2784(2000) IPv4-over-IPv4 tunnels (IPIP) 1853(1995) IPv6 over IPv6 2473(1998)... protocols rest on L3 protocols The three main protocols (ICMP, UDP, and TCP), as well as the IPsec suite, have IPv6 counterparts There is no IGMPv6 in Figure 24-1 because its functionality is implemented as part of ICMPv6 Figure 24-1 L4 protocols on top of IPv4 and IPv6 that are implemented in the Linux kernel Note that the last four items in Table 24-2 are tunneling protocols Their IDs identify an L3 protocol... defined in include /linux/ ip.h inside the definition of inet_sock, is used to handle the socket cork option ( UDP_CORK for UDP, TCP_CORK for TCP) We saw inChapter 21 how its fields are used to maintain some context information across consecutive invocations of ip_append_data and ip_append_page to handle data fragmentation [*] IPv6 defines its own version of cork in include /linux/ ipv6.h Here is a brief... the L3 layer and therefore is transparent to applications IPsec can be used by both IPv4 and IPv6, but it fits better with IPv6 With IPv6, the neighboring system has changed as well It is called neighbor discovery, and represents the counterpart to ARP for IPv4 The QoS component is also expanded With IPv4 networks, it is already possible to carry out automatic host configuration, thanks to protocols... /proc/sys/net/ipv4/conf/ This directory contains a subdirectory for each network device recognized by the kernel, plus other special directories (see Figure 36- 4 in Chapter 36) Those subdirectories include configuration parameters that are device specific; among them are accept_redirects, send_redirects, accept_source_route, and forwarding These will be covered inChapter 36, with the exception of promote_secondaries, which... sysctl_ip_default_ttl IPDEFTTL (64 ) ip_nonlocal_bind sysctl_ip_nonlocal_bind 0 ip_local_port_range sysctl_ip_local_port_range[0] 1 sysctl_ip_local_port_range[1] a 65 535 ipfrag_high_tresh sysctl_ipfrag_high_thresh 256K ipfrag_low_tresh sysctl_ipfrag_low_thresh 192K ipfrag_time sysctl_ipfrag_time IP_FRAG_TIME (30 * HZ) ipfrag_secret_interval sysctl_ipfrag_secret_interval 10 * 60 * HZ ip_dynaddr sysctl_ip_dynaddr... register it Thanks 23 .6 IPv4: What's Wrong with It? We saw in the section "IP Protocol: The Big Picture in Chapter 18 what the main tasks are of the IP protocol IPv4 was designed almost " 25 years ago (in 1981), and given the speed with which the Internet and network services have evolved since then, the protocol is showing its age Because IPv4 was not originally designed with today's big network topologies... 36 23.8.12 skb_frag_t Structure We saw in Chapter 21 what a paged buffer looks like (see, for example,Figure 21-5 in that chapter).skb_frag_t includes the fields necessary to identify a data block on a memory page: struct page *page Pointer to the memory page On i3 86, the page size is 4 KB To find the size of a page on any given architecture xxx, look for PAGE_SIZE in include/asm-xxx/page.h _ _u16 . IPv6 defines the "tunnel encapsulation limit" as the maximum number of nested encapsulations. See section 6. 6 of RFC 2473. The different tunnel interfaces that can be created in Linux. configuration Performance Transition from IPv4 networks to IPv6 networks Security Naturally, the committees designing the new protocol have tried to keep IPv4 and IPv6 as compatible as possible, and the. Unix-legacy tool for configuring IP on network devices. RTNetlink via ip ip, which is part of the IPROUTE2 package, is the newer tool that Linux offers for configuring IP on network devices. /proc filesystem Protocol

Ngày đăng: 13/08/2014, 04:21

Từ khóa liên quan

Mục lục

  • Understanding Linux Network Internals

  • Table of Contents

  • Copyright

  • Preface

    • The Audience for This Book

    • Background Information

    • Organization of the Material

    • Conventions Used in This Book

    • Using Code Examples

    • We'd Like to Hear from You

    • Safari Enabled

    • Acknowledgments

    • Part I:  General Background

      • Chapter 1.  Introduction

        • Section 1.1.  Basic Terminology

        • Section 1.2.  Common Coding Patterns

        • Section 1.3.  User-Space Tools

        • Section 1.4.  Browsing the Source Code

        • Section 1.5.  When a Feature Is Offered as a Patch

        • Chapter 2.  Critical Data Structures

          • Section 2.1.  The Socket Buffer: sk_buff Structure

          • Section 2.2.  net_device Structure

          • Section 2.3.  Files Mentioned in This Chapter

          • Chapter 3.  User-Space-to-Kernel Interface

            • Section 3.1.  Overview

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan