The present invention provides a method and apparatus for discovering a path maximum transmission unit. The method comprises receiving a data packet from a first remote device for transmission to a second remote device and determining if a size of the received data packet is greater than a maximum transmission unit (mtu) value. In response to determining that the size of the received data packet is greater than the mtu value, the method further comprises transmitting a portion of the received data packet to the second remote device and transmitting a message to the first remote device indicating that the size of the received data packet exceeds the mtu value.
|
1. A method, comprising:
receiving a first ip data packet encapsulating a second data packet from a first remote device for transmission to a second remote device, wherein the encapsulated second data packet comprises at least a header portion containing header information and a data portion containing data, and wherein the received first ip data packet includes a “don't fragment” flag;
determining if a size of the received first ip data packet is greater than a maximum transmission unit (mtu) value; and
in response to determining that the size of the received first ip data packet is greater than the mtu value, transmitting a portion of the data within the data portion of the received encapsulated second data packet to the second remote device and transmitting a message to the first remote device indicating that the size of the received first ip data packet exceeds the mtu value, wherein transmitting a portion of the data within the data portion of the received encapsulated second data packet and transmitting the message to the first remote device are done in response to determining that the “don't fragment” flag is set.
2. The method of
3. The method of
further comprising, in response to determining the size of the received first ip data packet is greater than the mtu value, forming a second TCP data packet with a portion of the data within the data portion of the received encapsulated second data packet, wherein forming the second TCP data packet comprises determining a checksum value of the second TCP data packet;
further comprising, forming a second ip packet that encapsulates the second TCP data packet, wherein the size of the second ip packet is less than the mtu value; and
wherein transmitting the portion of the data within the data portion of the received encapsulated second data packet further comprises:
transmitting, in the second ip packet, the portion of the data that contains the higher order bytes of the data portion of the received encapsulated second data packet to the second remote device; and
transmitting the checksum value of the second TCP data packet to the second remote device;
wherein transmitting the message to the first remote device further comprises:
generating the message that is used by the first remote device as a starting reference for determining a selected time interval after which a portion of data within the encapsulated second data packet is re-transmitted by the first remote device; and
providing, in the message, information about at least one intermediate mtu value that is less than the size of the received first ip packet, wherein the message is included in an Internet Control message Protocol (ICMP) packet.
5. The method of
6. The method of
7. The method of
8. The method of
9. The method of
|
1. Field of the Invention
The invention generally relates to network communications, and, in particular, to discovering the path maximum transmission unit (PMTU) of a network connection.
2. Description of the Related Art
It is generally accepted that for efficient data transfer using an Internet Protocol (IP) connection, the data packet size should be as large as possible. The larger the packets, the lesser the overhead associated with transferring the entire data. However, if a packet is larger than what any given intermediate router along the communication path can process, the packet will be fragmented at that link. The maximum size packet that a router can process without fragmenting the packet is called a maximum transmission unit (MTU). The maximum size packet that can be transferred from a transmitting host to a receiving host without fragmentation is called path maximum transmission unit (PMTU). Consequently, the PMTU is a function of the maximum size packets that all intermediate routes in an IP connection can process without fragmenting the packets.
For efficient transmission, it is desirable to determine the PMTU for a given IP connection. One method of determining the PMTU is described in Request for Comments (RFC) 1191. RFC 1191 describes a procedure in which the transmitting host sends a discovery packet with an instruction that the packet not be fragmented (e.g., the “don't fragment” bit is set). If there is some MTU between the two communicating devices that is too small to pass the packet successfully, then the discovery packet is dropped and a “can't fragment” message is sent back to the transmitting source. For example, if a router along the transmission path has a lower MTU than the size of the discovery packet, the router drops the received packet and thereafter transmits a “can't fragment” message to the transmitting source. In some instances, the router may provide its MTU size to the transmitting source. The “can't fragment” message is sent using Internet Control Message Protocol (ICMP), which is an extension to IP and is used to support packets containing errors, control, and informational messages.
Upon receiving the “can't fragment” message, the transmitting source then knows to decrease the size of the packets. As such, the transmitting source retransmits the discovery packet using a new, lower MTU value. If the network path between the transmitting source and the receiving device includes several routers (or other network devices) with lower MTU values, then the discovery mechanism of RFC 1191 will require many iterations to discover an acceptable MTU. For example, assume that two routers exist between the transmitting host and the receiving host, and that the first router has an MTU value of 4392 bytes and the second router has an MTU value of 1500 bytes. Further, assume that the transmitting source sends a discovery packet of 9000 bytes (with the “can't fragment” indication set) intended for the receiving host. The discovery packet first arrives at the first router, which, upon determining that the packet size exceeds its MTU value, will discard the packet and send a “can't fragment” ICMP message to the transmitting source. In some instances, the first router may also transmit its MTU to the transmitting source.
The transmitting source, upon receiving the “can't fragment” message, retransmits a lower-size discovery packet (such as the size of the MTU value of the first router, if available). This time the first router will allow the second discovery because its size is less than the MTU of the first router. However, this discovery packet will be rejected by the second router, which, in this example, has a MTU value of 1500 bytes. The second router will thus discard the received discovery packet and transmit a “can't fragment” message to the transmitting source. In some instances, the second router may also transmit its MTU to the transmitting host. Upon receiving the ICMP message, the transmitting host will retransmit another discovery packet of a size that is acceptable to the second router. Thus, in the above described example, the transmission source takes at least two (2) iterations to determine the PMTU. As the number of routers (or hops) increase in a given path, then so can the number of iterations needed to determine the PMTU. As a result, the efficiency of the network transmission can suffer, thereby adversely affecting network performance.
The present invention is directed to addressing, or at least reducing, the effects of, one or more of the problems set forth above.
In one aspect of the instant invention, a method is provided for discovering a path maximum transmission unit. The method comprises receiving a data packet from a first remote device for transmission to a second remote device and determining if a size of the received data packet is greater than a maximum transmission unit (MTU) value. In response to determining that the size of the received data packet is greater than the MTU value, the method further comprises transmitting a portion of the received data packet to the second remote device and transmitting a message to the first remote device indicating that the size of the received data packet exceeds the MTU value.
In another aspect of the instant invention, an apparatus is provided for discovering a path maximum transmission unit. An apparatus comprises an interface and a control unit communicatively coupled to the interface. The control unit is adapted to receive a data packet through the interface from a first remote device for transmission to a second remote device and determine if a size of the received data packet is greater than a maximum transmission unit (MTU) value. In response to determining that the size of the received data packet is greater than the MTU value, the control unit is further adapted to transmit a portion of the received data packet to the second remote device and transmit a message to the first remote device indicating that the size of the received data packet exceeds the MTU value.
In yet another aspect of the instant invention, an article comprising one or more machine-readable storage media containing instructions is provided for discovering a path maximum transmission unit. The instructions, when executed, enable a processor to receive a data packet from a first remote device for transmission to a second remote device and determine if a size of the received data packet is greater than a maximum transmission unit (MTU) value. In response to determining that the size of the received data packet is greater than the MTU value, the processor is further enabled to transmit a portion of the received data packet to the second remote device and transmit a message to the first remote device indicating that the size of the received data packet exceeds the MTU value.
The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify like elements.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
Illustrative embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
The words and phrases used herein should be understood and interpreted to have a meaning consistent with the understanding of those words and phrases by those skilled in the relevant art. No special definition of a term or phrase, i.e., a definition that is different from the ordinary and customary meaning as understood by those skilled in the art, is intended to be implied by consistent usage of the term or phrase herein. To the extent that a term or phrase is intended to have a special meaning, i.e., a meaning other than that understood by skilled artisans, such a special definition will be expressly set forth in the specification in a definitional manner that directly and unequivocally provides the special definition for the term or phrase.
Referring to
The network 125 may include one or more network devices 140(1-3) (labeled “routers” in the illustrated embodiment) through which the host devices 105, 110 may communicate. The network devices 140(1-3) may, in one embodiment, may be network gateways, network switches, or other devices capable of forwarding received data packets to the intended destination. The number of routers 140(1-3) employed in a given network 125 may vary from one implementation to another. For illustrative purposes, it is herein assumed that the packets sent by the transmitting module 130 traverse through the first router 140(1), the second router 140(2), the third router 140(3) (in that order) before arriving at the receiving module 135. Generally, each router 140 has its own associated MTU threshold (the maximum size packet that the router can process without fragmenting the packet).
The routers 140(1-3), in the illustrated embodiment, include a routing module 145 that processes discovery packets sent by the transmitting module 130 in accordance with one embodiment of the present invention. A discovery packet is a data packet that includes a flag indicating that the data packet should not be fragmented. Generally, and as described in greater detail below, the routing module 145 of a given router 140, upon receiving a discovery packet, forwards at least a portion of the received discovery packet to the next hop (e.g., the next router) even though the router 140 determines that the size of the received discovery packet is greater than the MTU supported by that router 140. As explained below, by transmitting at least portion of the received packet, the routing module 145 allows the transmitting module 130 of the host 105 to determine the path MTU in an efficient manner.
It should be appreciated that the arrangement of the communications system 100 of
The various modules 130, 135, and 145, illustrated in
The network 125 of
The communications system 100 employs the TCP/IP protocol, although other protocols may also be employed in alternative embodiments. For a proper perspective, a representative TCP/IP data packet 200 is shown in
The IP header 205 includes a fragment offset field 335 that indicates the position of the fragment's data relative to the beginning of the data in the original data packet, which allows the destination IP process to properly reconstruct the original data packet. A time-to-live field 340 maintains a counter that gradually decrements down to zero, at which point the data packet is discarded. This keeps packets from looping endlessly. A protocol field 345 indicates which upper-layer protocol receives incoming packets after IP processing is complete. A header checksum field 350 aids in ensuring the integrity of the IP header 205. The IP header 205 includes a source IP address field 355 that specifies a sending node (e.g., host 105), and a destination IP address field 360 that specifies a receiving node (e.g., host 110). An options field 370 allows IP to support various options, such as security.
The TCP header 210 includes a sequence number field 415 that is employed to inform the receiving host 110 of a particular packet in the stream of packets. This sequence number field 415 is also employed by the receiving host 110 to notify the sending host 105 that all data packets up to a certain number have been received. The TCP header 210 includes an acknowledgement number field 420 that contains the sequence number of the next byte of data the sender of the packet expects to receive. A header length field 425 indicates the size of the TCP header 210.
The TCP header 210 includes a plurality of flag fields 430 for carrying a variety of control information, including the SYN and ACK flags used for connection establishment, the FIN flag used for connection termination, URG field to indicate that the urgent pointer field 445 (discussed below) has valid information, the PSH flag to instruct the receiving host 110 to pass the data received thus far immediately to a higher-level application, and the RST flag to inform the receiving host 110 to re-establish connection.
The TCP header 210 includes a window-size field 435 that specifies the size of the sender's receive window (that is, the buffer space available for incoming data). A TCP checksum field 440 ensures that the TCP header 210 (and the associated data) have not been modified (or corrupted) in transmit. If the checksum is invalid, the receiving host 110 will not acknowledge the message. The urgent pointer field 445 points to the end of the data field that is considered urgent and requires immediate attention. This field is not valid if the URG flag is not set.
The transmitting module 130 of the host 105 of
The “can't fragment” message, in the illustrated embodiment, is sent using the Internet control message protocol (ICMP), which is an extension to IP and is utilized to support packet containing errors, control, and informational messages.
A flow diagram of one embodiment of the routing module 145 of
The routing module 145 of the first router 140(1) receives (at 705) the discovery packet transmitted by the host 105. In one embodiment, the discovery packet may take the form of the TCP/IP data packet 200 shown in
If the routing module 145 determines (at 710) that the size of the received discovery packet is greater than the MTU of the first router 140(1), the routing module 145 transmits (at 720) a message to the transmitting source (transmitting host 105, in this example) indicating that fragment is needed and that the “don't fragment” bit is set. In the illustrated embodiment, the message transmitted (at 720) is a “can't fragment” message that is sent in accordance with the ICMP protocol discussed above. Thus, the ICMP data packet 500 of
The routing module 145 of the first router 140(1) determines (at 730) at least a portion of the received discovery packet to be transmitted to the next hop, which in the illustrated example is the second router 140(2). The determined portion of the received discovery packet is transmitted (at 735) by the routing module 145 of the first router 140(1) to the next hop. In one embodiment, the portion of the data transmitted (at 735) by the routing module 145 is transmitted with the “don't fragment” flag set. That is, the routing module 145 of the first router 140(1) transmits (at 735) at least a portion of the received data packet (with the DF flag still set in the flag field 330 of
As noted above, in this example, the packet size transmitted by the host 105 is 9000 bytes, and the MTU of the first router 140(1) is 4392 bytes. Because the size of the packet is greater than the MTU of the first router 140(1) in this example, the routing module 145 transmits a portion of the 9000 bytes of the received packet to the next hop. Assuming that routing module 145 transmits the maximum data size that is supported by the first router 140(1) in this embodiment, the routing module 145 transmits 4392 bytes of the received data packet and discards the remaining bytes. The higher order bytes of the data packet may be transmitted, for example. In other embodiments, the “portion” of the data to be transmitted may be selected in other suitable ways without deviating from the spirit and scope of the present invention.
Each subsequent router 140 that includes the routing module 145 (e.g., the second and third routers 140(2-3) in this example) can thereafter perform the process illustrated in
The third router 140(3) similarly processes the packet that is transmitted (at 735) by the second router 140(2) according to the process of
The truncated discovery packets that are forwarded or transmitted (at 735) by the routers 140(1-3) may result in checksum errors at the receiving end (the receiving host 110). For example, data received by the receiving host 110 in the truncated discovery packet will fail to correspond to the checksum value originally calculated by the transmitting host 105. As such, the receiving module 135 of the host 110 may discard the received truncated discovery packet. To reduce the likelihood of this, in one embodiment, the routing module 145 of the routers 140(1-3) may adjust the checksum (at 740—see
In the context of the above-presented example, where a packet of 1006 bytes reaches the receiving host 110 because of the MTU size of the third router 140(3), each router 140 along the transmission path adjusts the checksum of the truncated packet accordingly such that the 1006 bytes arriving at the receiving host 110 has the correct checksum value. Upon receiving the discovery packet (of 1006 bytes), the receiving host 110 transmits an acknowledgement (in the field 430 of
By transmitting at least a portion of the received packet to the next hop (as opposed to discarding it altogether), the routing module 145 allows the transmit host 105 to more efficiently determine the path MTU. This is, because at least a portion of the discovery packet that is initially transmitted by the transmit host 105 may be forwarded through the network path even though one or more of the intermediate MTUs is less than the size of the discovery packet. In one embodiment, a “can't fragment” ICMP message is sent each time an intermediate MTU is less than the incoming discovery packet. Furthermore, the ICMP message may include information about the values of each intermediate MTUs that is less than the incoming discovery packet. In this manner, the transmitting host 105 is able to ascertain information about the intermediate MTUs by transmitting fewer discovery packets in comparison to the conventional discovery mechanism. That is, fewer iterations are needed to discover the PMTU using the present invention.
In one embodiment, the present invention allows the transmitting host 105 to efficiently determine the path MTU without requiring modification to either the transmitting host 105 or the receiving host 110. That is, the path discovery process may be improved by implementing the process described in
The transmitting module 130 creates (at 820) a discovery packet that is to be transmitted next of a selected size based on any received ICMP error message(s). For example, if a first router 140 provides its MTU value in the ICMP error message (or in some other message), the transmitting module 130 may create (at 820) the next discovery packet such that its size substantially corresponds to the size represented by the MTU value. As another example, if more than one router 140 responds with its MTU value, the transmitting module 130 may create (at 820) the next discovery packet such that its size substantially corresponds to the size of the smallest MTU value received. This would allow the discovery packet to at least successfully traverse through the router 140 that has the smallest then-known MTU. The transmitting module 130 transmits (at 825) the discovery packet of the selected size to the receiving host 110.
In one embodiment, the amount of time the transmitting module 130 waits (at 815) may be zero (i.e., no delay). That is, in this embodiment, the transmitting module 130 may create (at 820) and transmit (825) a discovery packet each time an ICMP error message is detected (at 805). One advantage of this embodiment is that conventional transmitting hosts may be employed with the routers 140 of communications system 100 of
Referring now to
Referring now to
A storage unit 1050 is coupled to the south bridge 1035. Although not shown, it should be appreciated that in one embodiment an operating system, such as AIX, Windows®, Disk Operating System®, Unix®, OS/2®, Linux®, MAC OS®, or the like, may be stored on the storage unit 1050 and executable by the control unit 1015. The storage unit 1050 may also include device drivers (not shown) for the various hardware components of the system 1000.
In the illustrated embodiment, the processor-based device 1000 includes a display interface 1047 that is coupled to the south bridge 1035. The processor-based device 1000 may display information on a display device 1048 via the display interface 1047. The south bridge 1035 of the processor-based device 1000 may include a controller (not shown) to allow a user to input information using an input device, such as a keyboard 1048 and/or a mouse 1049, through an input interface 1046.
The south bridge 1035 of the system 1000, in the illustrated embodiment, is coupled to a network interface 1060, which may be adapted to receive, for example, a local area network card. In an alternative embodiment, the network interface 1060 may be a Universal Serial Bus interface or an interface for wireless communications. The processor-based device 1000 communicates with other devices coupled to the network through the network interface 1060.
It should be appreciated that the configuration of the processor-based device 1000 of
The various system layers, routines, or modules may be executable control units (such as control unit 905, 1015 (see
The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.
Banerjee, Dwip N., Brown, Deanna Lynn Quigg, Venkatsubra, Venkat, Pancholi, Ketan P.
Patent | Priority | Assignee | Title |
10098037, | Mar 15 2013 | Trane International Inc. | Method of fragmenting a message in a network |
10200300, | Aug 11 2014 | Cisco Technology, Inc. | Maintaining named data networking (NDN) flow balance with highly variable data object sizes |
10530678, | Jul 20 2017 | VMware LLC | Methods and apparatus to optimize packet flow among virtualized servers |
10756967, | Jul 20 2017 | VMware LLC | Methods and apparatus to configure switches of a virtual rack |
10841235, | Jul 20 2017 | VMware LLC | Methods and apparatus to optimize memory allocation in response to a storage rebalancing event |
11102063, | Jul 20 2017 | VMware LLC | Methods and apparatus to cross configure network resources of software defined data centers |
11929875, | Jul 20 2017 | VMware LLC | Methods and apparatus to cross configure network resources of software defined data centers |
8005968, | May 28 2009 | Microsoft Technology Licensing, LLC | Single-interface dynamic MTU control |
8121135, | Jun 23 2009 | Juniper Networks, Inc. | Discovering path maximum transmission unit size |
8693483, | Nov 27 2007 | International Business Machines Corporation | Adjusting MSS of packets sent to a bridge device positioned between virtual and physical LANS |
9577949, | Aug 11 2014 | Cisco Technology, Inc. | Maintaining named data networking (NDN) flow balance with highly variable data object sizes |
Patent | Priority | Assignee | Title |
5751970, | Aug 08 1996 | International Business Machines Corp.; IBM Corporation | Method for determining an optimal segmentation size for file transmission in a communications system |
5892753, | Dec 02 1996 | International Business Machines Corporation | System and method for dynamically refining PMTU estimates in a multimedia datastream internet system |
5959974, | Dec 02 1996 | International Business Machines Corporation | System and method for discovering path MTU of internet paths |
6212190, | Jun 23 1997 | Oracle America, Inc | Method and system for generating data packets on a heterogeneous network |
7236501, | Mar 22 2002 | Juniper Networks, Inc. | Systems and methods for handling packet fragmentation |
20020141448, | |||
20030188015, | |||
20040008664, | |||
20040071140, | |||
20040088383, | |||
CN1425996, | |||
CN1426204, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jun 11 2004 | VENKATSUBRA, VENKAT | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014846 | /0926 | |
Jun 14 2004 | BANERJEE, DWIP N | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014846 | /0926 | |
Jun 14 2004 | BROWN, DEANNA LYNN QUIGG | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014846 | /0926 | |
Jun 14 2004 | PANCHOLI, KETAN P | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014846 | /0926 | |
Jun 17 2004 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Feb 13 2009 | ASPN: Payor Number Assigned. |
Sep 10 2012 | REM: Maintenance Fee Reminder Mailed. |
Dec 19 2012 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Dec 19 2012 | M1554: Surcharge for Late Payment, Large Entity. |
Sep 09 2016 | REM: Maintenance Fee Reminder Mailed. |
Jan 27 2017 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jan 27 2012 | 4 years fee payment window open |
Jul 27 2012 | 6 months grace period start (w surcharge) |
Jan 27 2013 | patent expiry (for year 4) |
Jan 27 2015 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jan 27 2016 | 8 years fee payment window open |
Jul 27 2016 | 6 months grace period start (w surcharge) |
Jan 27 2017 | patent expiry (for year 8) |
Jan 27 2019 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jan 27 2020 | 12 years fee payment window open |
Jul 27 2020 | 6 months grace period start (w surcharge) |
Jan 27 2021 | patent expiry (for year 12) |
Jan 27 2023 | 2 years to revive unintentionally abandoned end. (for year 12) |