A system for bulk network transmissions using multiple connections primed to optimize transfer parameters includes one or more processors and memory coupled to the processors. The memory stores program instructions executable by the processors to establish a plurality of network connections designated to be used for a single bulk data transfer. The instructions are further executable to adjust the values of one or more configuration parameters of each of the connections. The adjusting includes sending one or more priming packets over each of the connections. The instructions are also executable to perform the bulk transfer as a sequence of sub-transfers. each sub-transfer includes a transmission of a predetermined amount of application data over each connection of a selected subset of the plurality of network connections.
|
11. A computer-implemented method, comprising:
establishing a plurality of network connections designated to be used for a bulk transfer;
adjusting one or more configuration parameters associated with each network connection of the plurality of network connections, wherein said adjusting comprises sending one or more priming packets associated with the bulk data transfer over each network connection to generate a plurality of primed network connections;
performing the bulk transfer as a sequence of sub-transfers, wherein each sub-transfer includes a transmission of a predetermined amount of the data over each primed network connection of a selected subset of the plurality of primed network connections, wherein a reserved subset of the plurality of primed network connections are not initially used in performing the bulk transfer as the sequence of sub-transfers;
determining whether data transmission over each respective network connection selected for a corresponding sub-transfer of the sequence of sub-transfers meets a threshold performance criterion; and
in response to determining that the data transmission over a respective one of the network connections does not meet the threshold performance criterion, automatically determining that the respective one of the network connections is an unsatisfactory connection and automatically retransmitting at least a portion of the corresponding sub-transfer over one of the primed network connections of the reserved subset of the plurality of primed network connections.
17. A storage medium storing program instructions, wherein the instructions are computer executable to:
establish a plurality of network connections designated to be used for a bulk transfer;
adjust one or more configuration parameters associated with each network connection of the plurality of network connections, wherein to adjust the one or more configuration parameters, the instructions are further executable to send one or more priming packets associated with the bulk data transfer over each network connection to generate a plurality of primed network connections;
perform the bulk transfer as a sequence of sub-transfers, wherein each sub-transfer includes a transmission of a predetermined amount of the data over each primed network connection of a selected subset of the plurality of primed network connections, wherein a reserved subset of the plurality of primed network connections are not initially used in performing the bulk transfer as the sequence of sub-transfers;
determine whether data transmission over each respective network connection selected for a corresponding sub-transfer of the sequence of sub-transfers meets a threshold performance criterion; and
in response to determining that the data transmission over a respective one of the network connections does not meet the threshold performance criterion, automatically determine that the respective one of the network connections is an unsatisfactory connection and automatically retransmitting at least a portion of the corresponding sub-transfer over one of the primed network connections of the reserved subset of the plurality of primed network connections.
1. A system, comprising:
one or more processors; and
memory coupled to the one or more processors, wherein the memory comprises program instructions executable by the one or more processors to:
establish a plurality of network connections designated to be used for a bulk transfer;
adjust one or more configuration parameters associated with each network connection of the plurality of network connections, wherein to adjust the one or more configuration parameters, the instructions are further executable to send one or more priming packets associated with the bulk data transfer over each network connection to generate a plurality of primed network connections;
perform the bulk transfer as a sequence of sub-transfers, wherein each sub-transfer includes a transmission of a predetermined amount of the data over each primed network connection of a selected subset of the plurality of primed network connections, wherein a reserved subset of the plurality of primed network connections are not initially used in performing the bulk transfer as the sequence of sub-transfers;
determine whether data transmission over each respective network connection selected for a corresponding sub-transfer of the sequence of sub-transfers meets a threshold performance criterion; and
in response to determining that the data transmission over a respective one of the network connections does not meet the threshold performance criterion, automatically determine that the respective one of the network connections is an unsatisfactory connection and automatically retransmit at least a portion of the corresponding sub-transfer over one of the primed network connections of the reserved subset of the plurality of primed network connections.
2. The system as recited in
3. The system as recited in
4. The system as recited in
determine, after sending a particular priming packet of the plurality of priming packets, whether a particular configuration parameter of the one or more configuration parameters has reached a predetermined setting;
if the particular configuration parameter has not reached the predetermined setting, send an additional priming packet over the network connection; and
if the particular configuration parameter has reached the predetermined setting, determine that no additional priming packets are to be sent over the network connection.
5. The system as recited in
6. The system as recited in
in response to determining that the data transmission over the respective one of the network connections does not meet the threshold performance criterion,
close the respective one of the network connections;
open an additional network connection;
adjust one or more configuration parameters associated with the additional network connection, wherein said adjusting comprises sending one or more priming packets over the additional network connection; and
designate the additional network connection to be included in a set of network connections to be used to implement a remainder of the bulk transfer.
7. The system as recited in
8. The system as recited in
insert instrumentation data into a packet transmitted over each respective network connection.
9. The system as recited in
10. The system as recited in
partition data corresponding to the bulk transfer into a plurality of stripes of a predetermined transmission stripe size; and
transmit consecutive stripes on each of two or more primed network connections selected for a given sub-transfer of the sequence of sub-transfers.
12. The method as recited in
determining, after sending a particular priming packet of the plurality of priming packets, whether a particular configuration parameter of the one or more configuration parameters has reached a predetermined setting;
if the particular configuration parameter has not reached the predetermined setting, sending an additional priming packet over the network connection; and
if the particular configuration parameter has reached the predetermined setting, terminating said adjusting.
13. The method as recited in
14. The method as recited in
in response to determining that the data transmission over the respective one of the network connections does not meet the threshold performance criterion,
closing the respective one of the network connections;
opening an additional network connection;
adjusting one or more configuration parameters associated with the additional network connection, wherein said adjusting comprises sending one or more priming packets over the additional network connection; and
designating the additional network connection to be included in a set of network connections to be used to implement a remainder of the bulk transfer.
15. The method as recited in
inserting instrumentation data into a packet transmitted over each respective network connection.
16. The method as recited in
partitioning data corresponding to the bulk transfer into a plurality of stripes of a predetermined transmission stripe size; and
transmitting consecutive stripes on each of two or more primed network connections selected for a given sub-transfer of the sequence of sub-transfers.
18. The storage medium as recited in
determine, after sending a particular priming packet of the plurality of priming packets, whether a particular configuration parameter of the one or more configuration parameters has reached a predetermined setting;
if the particular configuration parameter has not reached the predetermined setting, send an additional priming packet over the network connection; and
if the particular configuration parameter has reached the predetermined setting, determine that no additional priming packets are to be sent over the network connection.
19. The storage medium as recited in
20. The storage medium as recited in
in response to determining that the data transmission over the respective one of the network connections does not meet the threshold performance criterion,
close the respective one of the network connections;
open an additional network connection;
adjust one or more configuration parameters associated with the additional network connection, wherein said adjusting comprises sending one or more priming packets over the additional network connection; and
designate the additional network connection to be included in a set of network connections to be used to implement a remainder of the bulk transfer.
21. The storage medium as recited in
partition data corresponding to the bulk transfer into a plurality of stripes of a predetermined transmission stripe size; and
transmit consecutive stripes on each of two or more primed network connections selected for a given sub-transfer of the sequence of sub-transfers.
|
1. Field of the Invention
This invention relates to computer systems and, more particularly, to the management of network traffic comprising large data transfers between computer systems.
2. Description of the Related Art
In today's enterprise environments, more and more applications rely on bulk data transfers to accomplish their functionality. Applications that require large amounts of data to be transferred between one network endpoint and another for a single application-level job or task may include, for example, storage management applications of various types, such as backup and restore applications, disaster recovery applications and the like, media server applications that may be required to transmit movie and audio files, telephony (e.g., voice over IP) and other telecommunications applications, scientific analysis and simulation applications, geographically distributed software development projects, and so on. The amount of data that has to be transferred for a given job or task varies with the specific applications and use cases, but can easily reach several tens of megabytes or even gigabytes in some cases. Furthermore, the data often has to be transferred over large distances: e.g., a disaster recovery application may be configured to replicate data from a primary data center in the United States to another data center in Europe or Asia.
As the emphasis on interoperability, vendor-independence and the use of standards-based technologies for IT infrastructures has increased, most of these bulk data transfers are performed over networks that employ long-established network protocols such as TCP/IP (Transmission Control Protocol/Internet Protocol). The most commonly used networking protocols in both public networks and private networks today, including TCP/IP, were developed before large bulk data transfers became as common as they are today. As a result, these protocols, at least in default configurations, are typically not optimized for large bulk data transfers, and in fact exhibit some behaviors that may be strongly detrimental to bulk data transfer performance.
TCP/IP, for example, implements a variety of built-in responses to apparent or actual error conditions that can negatively impact bulk data transfers. When one or more packets of a given data transfer are lost or “dropped”, which may occur for a variety of reasons, TCP/IP automatically adjusts one or more parameters such as transmit window size, retransmission timeout values, etc. to reduce the likelihood of cascading errors. In addition, a packet loss or an out delivery of a packet may lead to an automatic retransmission of all the packets that were sent after the lost or out-of-order packet. A substantial amount of processing and/or bandwidth may have to be used just for retransmissions caused by an occasional packet loss or out-of-order delivery, even though some of the original packets sent after the lost packet may have already been received successfully at the destination. Although some techniques (such as Selective Acknowledgment (SACK) mechanisms) have been proposed to reduce the impact of unnecessary retransmissions in TCP/IP, in practice these techniques have had limited success, especially in environments where a sender may be capable of injecting packets into the network fairly rapidly, relative to the time taken by the packets to traverse the network to the intended destination (thus leading to a large number of in-flight packets on a given connection). Packet loss may occur for a number of reasons: e.g., due to temporary micro-congestion at a network device such as a switch caused by near-simultaneous reception of a large number of data packets, as a result of suboptimal routing decisions, as a result of misconfiguration of network equipment (such as Ethernet duplex mismatch), or as a result of faulty wiring or equipment on one or more network paths. In response to the packet loss, the transmit window size may be reduced automatically by the protocol, and retransmission timeouts may be increased. In some cases, especially in the event of a number of consecutive packet losses or multiple packet losses that are closely spaced in time, the parameters may be modified to such an extent (e.g., a transmit window size may be reduced to such a small value) that the data transfer may in effect be stalled, with very, little data actually being transmitted. Substantial reductions in throughput may occur even if the packet losses were transient, i.e., even if the network recovers fairly rapidly from the events or conditions that led to the packet loss. In many of these cases, even after the conditions that led to the packet loss no longer hold, it takes the networking protocol a substantial amount of time to recover and adjust parameters such as window sizes to values that are appropriate for bulk data transfers. During these recovery or “self-healing” periods, bulk data transfers are often effectively blocked, which can result in timeouts or other apparent errors in application-level protocols (such as backup or replication protocols), potentially requiring large application jobs to be abandoned and restarted.
A number of different approaches to tuning network traffic have been considered. Some such schemes either require changes to standard network software stacks or require custom hardware; however, such schemes are difficult to implement in environments that rely on standards-based and vendor-independent communication technologies. Techniques that require substantial changes to legacy applications or third-party applications are also unlikely to be deployed in most enterprise environments.
Various embodiments of systems and methods for bulk network transmissions using multiple connections primed to optimize transfer parameters are disclosed. According to one embodiment, a system comprises one or more processors and memory coupled to the processors, wherein the memory stores program instructions executable by the processors to establish a plurality of network connections designated to be used for a single bulk data transfer. The instructions, which may be incorporated into a traffic manager software module or a set of traffic manager modules in some embodiments, may be further executable to adjust the values of one or more configuration parameters of each of the connections, e.g., to a value appropriate for bulk data transfers, wherein said adjusting includes sending one or more priming packets over each of the connections. By sending a sequence of small priming packets over a connection, for example, the transmit window size of the window may be increased automatically by the network protocol in use to a desired value or to within a desired range of values in one implementation. The instructions may also be executable to perform the bulk transfer as a sequence of sub-transfers, wherein each sub-transfer includes a transmission of a predetermined amount or stripe of application data over each connection of a selected subset of the plurality of network connections. The remaining connections, i.e., those not included in the subset selected for data transmission in a particular sub-transfer, may serve as “spare” connections that have already been tuned for bulk data transfer and so may be used, for example, to quickly retransmit data if performance over one of the selected subset of transactions is found to be unsatisfactory. By priming or preparing a set of connections, parallelizing data transfer over multiple primed connections, and keeping a subset of the connections as spares during any given sub-transfer, a traffic manager may provide sustained high throughputs and may quickly adjust to changing conditions in the network, instead of, for example, relying on gradual healing of congested connections by the underlying network protocol. Furthermore, the high throughput and rapid adjustments to changing conditions may be achieved in some embodiments without requiring changes to application code, operating system code or networking software stack code.
In one embodiment, the subsets of connections used for consecutive sub-transfers may be varied, so that, for example, at least one connection used to transfer data during a given sub-transfer is kept as a “spare” connection during the next sub-transfer for the same bulk transfer. By systematically selecting different subsets of connections for data transfer in successive sub-transfers, e.g., in a round-robin manner, the instructions may be executable to ensure that each of the plurality of connections is used frequently enough that its parameters remain set to appropriate values, and that no single connection remain idle for too long.
In some embodiments, the instructions may be executable to determine whether achieved performance on each connection used during a sub-transfer is satisfactory, i.e., whether one or more performance metrics for each connection meet a threshold performance criterion. If the performance over any given connection fails to meet the threshold criterion, the instructions may be further executable to take one or more remedial actions. For example, in one embodiment, if performance over a particular connection is unsatisfactory, the data sent over than connection during the latest sub-transfer may be retransmitted over a selected “spare” connection. In another embodiment, the unsatisfactory connection may be closed and a new connection may be established as a replacement. One or more configuration parameters of the new connection may be adjusted in a manner similar to the initial adjustment of parameters for the plurality of connections, e.g., by sending priming packets over the new connection, and the new connection may then be included in the set of connections to be used for subsequent sub-transfers. In one embodiment, the instructions may be further executable to insert instrumentation into packets sent over the connections used for the bulk transfer, e.g., to obtain performance metrics used to determine whether performance over a given connection is satisfactory or not.
By dynamically adjusting parameters as needed based on actual measurements, and by spreading the data transfer across a dynamically changing set of in-use connections, some of the problems that may otherwise result from more global or static conventional tuning actions may be avoided in various embodiments. For example, in conventional systems, a global change to increase window size for all connections established from a particular host to a very large size may lead to excessive retransmission-related costs in the event of a packet loss on a given connection. Under some conditions, for example, for a connection with a window size of W packets, (W/2) packets may be retransmitted on average in the event of a single packet loss. In contrast, in one embodiment, for a specific data transfer that requires a large amount of data to be transferred, a traffic manager may establish an “effective” window size of (m*w) using the above techniques, where m is the number of in-use connections and w is the average window size for the in-use connections, without incurring the potential retransmission-related penalties associated with a single large window size of (m*w). Configuration parameter settings for other transmissions, e.g., small transfers unrelated to the bulk data transfer, may be unaffected by the traffic manager, and such other transmissions would have a higher probability of obtaining access to resources (such as processing cycles and bandwidth) that might have to be dedicated to large retransmissions in conventional systems.
While the invention is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
The traffic manager 170 may be configured to perform the bulk data transfer as a sequence of sub-transfers, where each sub-transfer includes a transmission of a predetermined amount of the data over each network connection of a selected subset of the set of network connections. During each sub-sequence, data may be transmitted concurrently (i.e., in parallel) over the selected subset of connections in one embodiment. Successive sub-transfers for a given bulk data transfer may use different subsets of the connection set in some embodiments, so that, for example, each of the connections in a connection set is used frequently enough that its configuration remains optimized for bulk transfers, and none of the connections is closed as an “idle” connection or has its parameters reset by the networking protocol in use. The connections that are unused during a given sub-transfer may be considered primed “spares” held in reserve for future sub-transfers, with their configuration parameters set to the desired values in advance. The roles of “spares” and “in-use” connections may be rotated during successive sub-transfers: e.g., a particular connection that is a “spare” during one sub-transfer may be an “in-use” connection for an other sub-transfer that occurs within a short time thereafter, so that one or more configuration parameters for that connection do not get automatically adjusted to values that differ from the desired values, as might occur if the connection remains idle for too long. For example, TCP/IP may automatically shrink the window size of a connection determined to be idle, and the rapid rotation of a connection between “spare” and “in-use” roles may help to avoid such shrinkage.
If a particular connection of the set fails to meet a threshold performance criterion in a given sub-transfer, in one embodiment corrective action may be taken to ensure that throughput remains at a desired level, and that a desired minimum number of primed connections pre-tuned for bulk data transfer remain available. For example, the data that was sent over the problematic connection during the latest sub-transfer may be retransmitted over another connection of the connection set, and/or the problematic connection may be replaced in the connection set by a fresh primed connection in some embodiments.
By maintaining and using a plurality of primed connections dedicated to each given bulk data transfer in this manner, traffic manager 170 may ensure that at least some of the problems (such as small window size settings that result from transient congestion in the network) often encountered during large bulk transfers are avoided in system 100, and as a result, higher throughputs may be achieved for the bulk data transfers than with conventional approaches. The improved performance may be achieved without incurring some of the retransmission-related penalties that might have to be paid in conventional systems, such as large amounts of retransmitted packets and “hiccups” or periods of slower throughput resulting from increased retransmission delays. In some conventional systems, for example, where all the packets sent after a particular dropped or lost packet are retransmitted, (W/2) packets may be retransmitted on average in the event of a single packet loss on a connection with a window size of W packets. (This is because, on average, for a large data transfer, W packets may be in flight if the window size is W packets, and any one of the W packets may be lost with an equal probability; thus, on average, the “middle” packet of the W packets may be the one that is lost, leading to a retransmission of the following (W/2) packets.) In contrast, in one embodiment, for a specific data transfer that requires a large amount of data to be transferred, using the above techniques a traffic manager 170 may establish an “effective” window size of (m*w), where m is the number of in-use connections and w is the average window size for the in-use connections, without incurring the potential retransmission-related penalties associated with a single large window size of (m*w). Furthermore, in some embodiments, the functionality of the traffic manager may be implemented without requiring any code changes to applications 120, to the operating system or to the networking software stacks used at the endpoints 110, and may thus be platform-independent—for example, a sending endpoint such as 110A may have a different operating system than a receiving endpoint such as 110P. Further details regarding various aspects of the operation of traffic managers 170 to support bulk data transfers are provided below.
In the example scenario depicted in
Bulk data transfers for any of numerous types of applications 120 may be supported at endpoints 110 in different embodiments, including storage management applications (e.g., distributed file systems, backup/restore, replication, and disaster recovery applications), telephony and other telecommunication applications (e.g., voice over IP), media servers, bioinformatics applications, distributed simulation applications, geographically distributed software development projects, etc. Network devices 130 may include switches, routers, gateways, and any other devices that participate in or help manage transmission of data within the network 115. Any of a variety of networking technologies (e.g., Ethernet, Asynchronous Transfer Mode (ATM), Fibre Channel, etc.) and corresponding software and/or hardware protocols may be used singly or in combination within network 115 in various embodiments. The term “bulk data transfer”, as used herein, refers to a transfer of data on behalf of an application task, such as a transfer of a requested video or audio file, or a backup of a specified source data set, where the amount of data to be transferred for the task is sufficiently large that the data is typically transmitted in the network using multiple packets or messages. The terms “application task” and “application job” may be used herein to refer to a unit of work performed using an application, for example work performed in response to a single request initiated by a user or by another application. The terms “packet”, “network message” and “message” may be used herein generally to refer to units in which data and/or control information is transmitted over the network 115; the specifics of how packets or messages are organized internally may vary from protocol to protocol and from network type to network type in various embodiments. As viewed from a network device 130, incoming and outgoing traffic streams may comprise packets from a variety of bulk data transfers (as well as packets corresponding to short, non-bulk communications) mingled together; typically, a network device 130 may not be able to distinguish packets of one data transfer from those of another. Often, a single bulk data transfer for a single application task may comprise sending hundreds or thousands of data packets from one endpoint 110 to another.
Subsets of the endpoints 110 and network devices 130 shown in
The protocols in use in network 115 may be configured to automatically modify various connection parameters in response to changing conditions in some embodiments. For example, in one embodiment where the Transmission Control Protocol/Internet Protocol (TCP/IP) family of protocols is used in network 115, a sliding window technique may be implemented for flow control, in which a “transmit window” (which may alternatively be referred to herein as a “send window” or simply a “window”) may be maintained at each endpoint for each connection. The number of packets that can be unacknowledged at any given time may be constrained by the transmit window: for example, if the transmit window for a given connection at a given endpoint 110 is currently set to 16 packets, this means that the sending endpoint may transmit up to 16 packets before it receives an acknowledgment. In addition to being dependent on the speed of the various network devices 130 and links 140 used for a given connection, and on the processing speeds at the endpoints 110 involved in the connection, the throughput achieved over the connection may also depend on the transmit window size. In particular, for bulk data transfers, a larger transmit window size may in general be more suitable than a smaller transmit window size. A large transmit window size tends to allow more data of the transfer to be “in flight” at any given point in time, thus potentially utilizing the network's resources more effectively than if acknowledgments had to be received for every packet or every few packets. TCP/IP implementations typically dynamically adjust the transmit window size for a given connection, based on a number of factors, to manage flow control and to respond to congestion. For example, in some implementations, the transmit window size is typically set to a minimum of a “receiver advertisement” and a “congestion window”. The “receiver advertisement” may be an indication from the receiving endpoint 110 of the amount of data the receiving endpoint is currently prepared to accept, e.g., based on buffer sizes available at the receiving endpoint and the speed at which the receiving endpoint can process the incoming data. For many endpoints 110 equipped with powerful processors and large amounts of memory, the receiver advertisement may typically be quite large, and the transmit window size may therefore typically be governed largely by the “congestion window”.
The “congestion window” may be modified by the protocol in response to dropped packets, which may occur for a variety of reasons as described below in further detail. When a connection is first established, a technique called “slow-start” may be used in some TCP/IP implementations, in which the initial congestion window size is set to a small value (such as one packet) and increased by a small amount (e.g., one packet) each time an acknowledgment is received. In one exemplary implementation of slow-start, the initial congestion window may be set to one, and increased to two packets when the acknowledgment for the first packet is received. The sender may then send up to two packets and wait for acknowledgments. When acknowledgements for the two packets are received, the congestion window may be increased to four; when acknowledgments for the next four packets are received, the congestion window may be set to eight, and so on In the absence of packet loss, the congestion window size may thus gradually increase, up to a maximum usually limited by an operating system and/or by configurable parameters. To reach a congestion window size of N packets, for example, log2N round trip transmissions may be required if no packet loss is encountered. If a packet is lost, however, the TCP/IP implementation may decrease the congestion window by a substantial amount, e.g., by half. This reduction may be repeated for every packet lost: e.g., if the window size is 128 packets, a first packet loss may reduce the congestion window to 64 packets, a second consecutive packet loss may reduce the congestion window to 32 packets, and so on, until a minimum congestion window size of one packet is reached. In addition to decreasing congestion window sizes, the protocol may also modify other parameters in response to packet loss, such as increasing retransmission timers for the packets that remain in flight. After the congestion window is decreased due to a packet loss, it may be increased again only gradually, e.g., in some TCP/IP implementations, the protocol may enter a “congestion-avoidance” mode of operation in which window size is incremented by just one packet per received acknowledgment. Thus, even a small number of lost packets may significantly affect congestion windows, and hence, transmit window sizes. In many cases, packet loss may be a transient phenomenon, but even after the causes that led to the packet loss are no longer in effect, it may take a while for the transmit window sizes to reach large values suitable for bulk data transfers.
Network packets may be lost or dropped for a variety of reasons in different embodiments, many of which can be corrected fairly quickly. A particular network device 130 or link 140 that forms a part of the network path for a bulk data transfer may fail, for example; however, an alternative path may rapidly be found for remaining packets of the data transfer. In some cases, packets may be dropped due to transient congestion phenomena at a given network device 130: e.g., if a large number of packets happen to arrive at about the same time at a given input port of a network switch, the buffer space available for the input port may be temporarily overrun. Similarly, if a large number of packets happen to be routed out from a given output port of a network device at about the same time, the buffers available for the outbound packets at the port may be exhausted, and one or more of the packets may have to be dropped. Such congestion events may be caused at least partly by the bursty nature of bulk data transfers, where, for example, an application 120 hands off a large amount of data for transmission to a networking software stack, and the software stack in turn transmits a large number of packets as quickly as possible into the network 115. The congestion effects of the bursty nature of traffic may quite often be very short-lived, however, since the bursts of outbound traffic from applications are often separated by long gaps of little or no outbound traffic. Dropped packets may also be the result of misconfiguration of network devices 130 or protocol parameters, which can also often be corrected shortly after they are detected. Even though many of the causes of dropped packets may quickly disappear, the dropped packets may lead to dramatic and sustained reductions in the throughput of bulk data transfers if, for example, the transmit window sizes are substantially reduced as a result. Furthermore, despite advances in networking technology, it may be hard or impossible to completely eliminate packet loss, especially over large decentralized networks such as the Internet that comprise a wide variety of network devices 130 and links 140. For example, small packet loss rates (e.g., less than one percent) are often observed even when the overall utilization of a network is relatively low.
In some embodiments, traffic managers 170 may be configured to implement a technique of using “primed” connection sets 150 for bulk data transfers, for example to overcome some of the potential problems associated with packet losses described above. In one such technique, priming packets may be transmitted over the connections of a connection set 150 to set one or more parameters such as transmit window size to a desired value or to within a desired range, and connections may be discarded if the parameters are modified by the protocol to an undesirable value.
Having established at least some of the connections of connection set 150, the traffic manager may adjust one or more configuration parameters, e.g., by sending one or more priming packets over each established connection, as indicated in the illustration labeled “2” in
In one embodiment, the sending traffic manager 170 may be configured to monitor, as the priming packets are sent, the values of the one or more configuration parameters being adjusted as a result of the priming packets, and to terminate the sending of priming packets when the desired adjustment has been achieved. For example, if a parameter has not reached the desired value or range for a given connection, another priming packet may be sent; and if the parameter has reached the desired value or range for the given connection, no further priming packets may be sent. In other embodiments, the traffic manager may not monitor the parameter values, relying instead on a fixed or default number of priming packets to have the desired effect on the parameter values. Values of the parameters being changed may be obtained, for example, using an API provided by a networking software stack in use at the endpoint, by reading packet header contents, or by other techniques in various embodiments. The target value for a given parameter may be a range (e.g., a transmit window size of at least K packets) in some embodiments, and a specific value (e.g., a transmit window size equal to the maximum transmit window size supported by the networking stack and/or operating system) in other embodiments. In one embodiment, the traffic manager 170 may be configured to make a “best-effort” adjustment to a configuration parameter, e.g., by sending a specified number of priming packets, and may terminate the stream of priming packets even if a targeted configuration parameter value has not been achieved. In the example illustrated in
By increasing the window sizes using priming packets as described above, in some embodiments an “effective” window that is larger than the actual data size for any given connection may be achieved: e.g., if each of the six window sizes W1b-W6b is “m” packets in
After the parameter or parameters for at least some of the connections of a connection set 150 have been adjusted appropriately, transmission of the data corresponding to an application task may begin. The data may be transferred in a sequence of sub-transfers in some embodiments, in which only a subset of the connections are used in a given sub-transfer, and the remaining connections are held in reserve as “spare” connections (with their parameters already tuned appropriately) to be used for subsequent sub-transfers or in case one of the currently-used connections fails to meet a threshold level of performance. Packets may be transferred in parallel over the selected connections for a given sub-sequence in some embodiments.
Remedial or corrective actions similar to those illustrated in
It is noted that in some implementations, if traffic manager 170 determines that all the packets sent over the unsatisfactory connection reached the destination, the retransmission indicated in
Traffic manager 170 may be configured to make other types of modifications to the connection set 150 for a given bulk data transfer than those illustrated in
Traffic manager 170 may be configured to obtain separate estimates or measurements of the performance achieved over each connection of connection set 150 in some embodiments. In one embodiment, traffic manager 170 may be configured to instrument one or more packets sent over each connection, for example to obtain real-time performance metrics such as estimates of end-to-end available bandwidth and/or round-trip transmission times for the connection. The term “end-to-end bandwidth”, as used herein, refers to the rate at which data associated with the bulk data transfer is transmitted over a given connection, taking the entire network path between the endpoints into account. It is noted that the end-to-end bandwidth, as measured at a given endpoint 110, may differ from the advertised or achieved bandwidth at individual components of the network path between endpoints. For example, an Ethernet network interface card used to transmit the data from endpoint 110A may have an advertised bandwidth of 100 Megabits/second or one Gigabit/second; various network devices, such as switches, may offer advertised aggregate bandwidths of several Gigabits/second; and links 140 may each have their own associated advertised bandwidth capacity. The actual bandwidth achieved from endpoint to endpoint may be substantially less than the advertised bandwidth of even the slowest device or link of the network path for a variety of reasons in different implementations. For example, several of the network devices 130 and/or links 140 may be shared among multiple tasks of a given application 120, between multiple applications 120, and/or between multiple endpoints 110. Concurrent use of the devices 130 may result in queuing and/or packet collisions at various points along the path, depending on the specific implementations of the devices 130 and the networking protocol or protocols in use. Temporary congestion at network devices 130 and/or over network links 140 resulting from bursty send traffic patterns may also reduce available end-to-end bandwidth as described earlier. In some embodiments, the specific performance metrics to be measured or estimated, as well as threshold values used to determine whether a particular connection is unsatisfactory or not, may be specified via parameters supplied to traffic manager 170 by applications 120 and/or users. By monitoring the performance over each connection in real time and rapidly taking responsive actions (such as actions illustrated in
In various embodiments, one or more components of a traffic manager 170 may be implemented at various levels of a software stack at an endpoint 170.
In the embodiment shown in
Kernel space 520 may include a system interface 522, services 524, and hardware interface 526. System interface 522 may provide an interface between the operating system services 524 and application code within the user space 510. Services 524 may include, for example, protocol stacks, drivers, and other facilities commonly found in an operating, system. Hardware interface 526 may include code that provides an interface between the operating system and hardware components of the endpoint 110, such as network interface card (NIC) 530. Outbound traffic corresponding to a given task of application 120 originates at the application 120, and proceeds down the layers of components illustrated in
In the embodiment illustrated in
In one embodiment, a traffic manager 170 may be configured to insert timestamps and/or data transfer identifiers into the packets of a data transfer, e.g., in order to obtain per-connection performance estimates.
A number of variations of the basic timestamp-based technique described above may be used in various embodiments. For example, in one embodiment, only a subset of the packets sent on a given connection may be instrumented: e.g., every fifth packet or every tenth packet may be instrumented by the sending traffic manager 170, and corresponding acknowledgements from the receiving traffic manager may be used to estimate connection performance. In another embodiment, instead of modifying packets that contain application data, the sending side traffic manager 170 may send special diagnostic packets from time to time to the receiving endpoint to help estimate the available bandwidth. The priming packets described above may also be used to measure or estimate connection performance in some embodiments: e.g., if the priming packets sent on a connection themselves indicate unsatisfactory performance, the connection may be closed before it is used for the bulk data transfer. The special diagnostic packets may not contain application data in some embodiments: instead, the data bodies of the diagnostic packets may contain, e.g., control information about the data transfer and or padding inserted to make the packet reach a desired size. In one implementation, instead of sending a receive timestamp in an acknowledgment, the receiving traffic manager may send its estimates of the transmission time for the packet and or its estimates of the available bandwidth back to the sending traffic manager. The instrumentation metadata 707 may in some embodiments include different fields than those shown in
Traffic manager 170 may then initiate a sequence of sub-transfers using the primed connections (i.e., connections whose parameters have been appropriately adjusted) to send the application data to the receiving endpoint 110. For each sub-transfer, M of the N connections in the connection set 150 may be used in some embodiments, leaving (N-M) “spare” connections with their parameters set appropriately for the bulk data transfer. As described above in conjunction with the description of
In the embodiment depicted in
If additional data remains to be transferred for a given bulk data transfer (as determined in block 835), the sending traffic manager 170 may perform additional sub-transfers, e.g., repeating operations corresponding to blocks 815-830 until the bulk data transfer is complete. Continuing the example with N=6 and M=4, for which mappings of the first four data stripes to connections were shown above, the following mappings may be used during the second sub-transfer: S5 may be sent over C2, S6 over C3, S7 over C4, and S8 over C5, with C6 and C1 kept as spare connections As noted earlier, the parameters M and/or N may be modified from one sub-transfer to another even if unsatisfactory performance is not detected: e.g., if the traffic manager detects that the network is relatively underutilized, it may increase the number of active connections used for one or more sub-transfers and monitor the resulting performance. As the traffic manager 170 gradually modified the parameters of the bulk data transfer in response to monitored performance, steady state values of M and N may be eventually be reached in such embodiments, at which point the network is optimally utilized to provide the maximum sustainable throughput without causing congestion.
Any of a number of variations of the techniques illustrated in
In some embodiments, one or more input parameters may be used to control aspects of the operation of a traffic manager 170. Stripe sizes (e.g., the maximum or minimum amount of application data to be transmitted over a connection during a given sub-transfer) may be specified via input parameters in one embodiment. Threshold performance metric values (e.g., a maximum round-trip delay or a minimum throughput) to be used in determining whether performance over a given connection is satisfactory or not may be indicated by input parameters in another embodiment. In one implementation, an input parameter may be used to specify how closely traffic manager 170 should attempt to match available bandwidth for a given application's data transfers or for a particular data transfer, e.g., by modifying the number of connections in use during sub-transfers. If the input parameter indicates that the traffic manager 170 should be aggressive and attempt to maximize throughput as much as possible, the parameters M and/or N of
In addition to traffic manager 170, memory 1010 and/or storage devices 1040 may also store operating systems software and/or software for various applications 120 in various embodiments. In some embodiments, part or all of traffic manager 170 may be included an operating system, a storage management software product or another software package, while in other embodiments, traffic manager 170 may be packaged as a standalone product. In some embodiments, the component modules of a traffic manager 170 may be distributed across multiple hosts 1001, or may be replicated at a plurality of hosts 1001. In one embodiment, part or all of the functionality of traffic manager 170 may be implemented via one or more hardware devices (e.g., via one or more Field Programmable Gate Array (FPGA) devices) or in firmware. It is noted that in addition to or instead of computer hosts, in some embodiments endpoints 110 linked to network 115 may include a variety of other devices configured to implement applications 120 and traffic managers 170, such as television set-top boxes, mobile phones, intelligent stereo devices, etc.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
van Rietschote, Hans F., Kritov, Slava
Patent | Priority | Assignee | Title |
10142405, | Nov 21 2012 | NETFLIX, INC. | Multi-CDN digital content streaming |
10341245, | Mar 24 2014 | VMware LLC | Bursty data transmission in a congestion controlled network |
10523732, | Nov 21 2012 | NETFLIX, INC. | Multi-CDN digital content streaming |
10536508, | Jun 30 2011 | TELEFONAKTIEBOLAGET L M ERICSSON PUBL | Flexible data communication |
10554761, | Dec 12 2015 | AT&T Intellectual Property I, L P | Methods and apparatus to improve transmission of a field data set to a network access point via parallel communication sessions |
10693760, | Jun 25 2013 | GOOGLE LLC | Fabric network |
10732588, | Jun 30 2015 | LYNKROS TECHNOLOGY BEIJING CO , LTD | Decentralized computing network system and computing processing node used for the same |
11055029, | Oct 19 2018 | EMC IP HOLDING COMPANY LLC | Edge case handling in system with dynamic flow control |
11650848, | Jan 21 2016 | Suse LLC | Allocating resources for network function virtualization |
11915051, | Jan 21 2016 | Suse LLC | Allocating resources for network function virtualization |
8300620, | Dec 29 2008 | T-MOBILE INNOVATIONS LLC | Dynamically tuning a timer mechanism according to radio frequency conditions |
8341291, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Network communications of application running on device utilizing virtual network connection and routing protocol based on application connection criteria |
8341292, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Network communications of applications running on device utilizing different virtual network connections with different routing protocols |
8352636, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Transmitting packets from device in network communications with other device utilizing multiple virtual network connections |
8423664, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Network communications of application running on device utilizing multiple virtual network connections |
8429226, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Facilitating network communications with control server, hosting server, and devices utilizing virtual network connections |
8429286, | Jun 28 2007 | Apple Inc. | Methods and systems for rapid data acquisition over the internet |
8429293, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | IP server facilitating network communications between devices utilizing virtual network connections |
8433818, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Network communications of application running on device utilizing virtual network connections with redundancy |
8433819, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Facilitating download of requested data from server utilizing virtual network connections between client devices |
8447882, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Software router facilitating network communications between devices utilizing virtual network connections |
8539098, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Multiplexed client server (MCS) communications and systems |
8560634, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Apparatus, systems and methods utilizing dispersive networking |
8583977, | Jun 26 2007 | International Business Machines Corporation | Method and system for reliable data transfer |
8612613, | Sep 11 2009 | CDNETWORKS HOLDINGS SINGAPORE PTE LTD | Method for setting plurality of sessions and node using same |
8789138, | Dec 27 2010 | Microsoft Technology Licensing, LLC | Application execution in a restricted application execution environment |
8848704, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Facilitating network routing using virtualization |
8880647, | Jun 28 2007 | Apple Inc. | Methods and systems for rapid data acquisition over the internet |
8955110, | Jan 14 2011 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | IP jamming systems utilizing virtual dispersive networking |
8959627, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Quarantining packets received at device in network communications utilizing virtual network connection |
8996945, | Dec 24 2004 | International Business Machines Corporation | Bulk data transfer |
9002968, | Jun 25 2013 | GOOGLE LLC | Fabric network |
9055042, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Providing network communications satisfying application requirements using virtualization |
9059975, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Providing network communications using virtualization based on protocol information in packet |
9071607, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Virtual dispersive networking systems and methods |
9100405, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Apparatus, systems and methods utilizing dispersive networking |
9167025, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Network communications of application running on device utilizing routing of data packets using virtual network connection |
9191465, | Nov 21 2012 | Netflix, Inc | Multi-CDN digital content streaming |
9241025, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Network communications of applications running on devices utilizing virtual network connections with asymmetrical network paths |
9241026, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Facilitating network communications with control server and devices utilizing virtual network connections |
9246980, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Validating packets in network communications |
9300734, | Nov 21 2012 | Netflix Inc. | Multi-CDN digital content streaming |
9319345, | Jun 28 2007 | Apple Inc. | Methods and systems for rapid data acquisition over the internet |
9350794, | Oct 17 2007 | DISPERSIVE HOLDINGS, INC ; ASSET RECOVERY ASSOCIATES, LLC | Transmitting packet from device after timeout in network communications utilizing virtual network connection |
9443079, | Dec 27 2010 | Microsoft Technology Licensing, LLC | Application execution in a restricted application execution environment |
9443080, | Dec 27 2010 | Microsoft Technology Licensing, LLC | Application execution in a restricted application execution environment |
9736266, | Jun 28 2007 | Apple Inc. | Rapid data acquisition over the internet |
9811541, | Sep 15 2010 | Oracle International Corporation | System and method for supporting lazy deserialization of session information in a server cluster |
9864759, | Sep 15 2010 | Oracle International Corporation | System and method for providing scatter/gather data processing in a middleware environment |
9923801, | Jun 25 2013 | GOOGLE LLC | Fabric network |
Patent | Priority | Assignee | Title |
5748919, | Apr 01 1994 | International Business Machines Corporation | Shared bus non-sequential data ordering method and apparatus |
7353286, | Aug 09 2000 | Microsoft Technology Licensing, LLC | Fast dynamic measurement of bandwidth in a TCP network environment |
7436772, | Mar 23 2005 | Microsoft Technology Licensing, LLC | Available bandwidth estimation |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Feb 24 2006 | KRITOV, SLAVA | VERITAS Operating Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017642 | /0511 | |
Feb 27 2006 | VAN RIETSCHOTE, HANS F | VERITAS Operating Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017642 | /0511 | |
Feb 28 2006 | Symantec Operating Corporation | (assignment on the face of the patent) | / | |||
Oct 30 2006 | VERITAS Operating Corporation | Symantec Operating Corporation | CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE PREVIOUSLY RECORDED ON REEL 019872 FRAME 979 ASSIGNOR S HEREBY CONFIRMS THE ASSIGNEE IS SYMANTEC OPERATING CORPORATION | 027819 | /0462 | |
Oct 30 2006 | VERITAS Operating Corporation | Symantec Corporation | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 019872 | /0979 | |
Jan 29 2016 | Veritas US IP Holdings LLC | WILMINGTON TRUST, NATIONAL ASSOCIATION, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 037891 | /0726 | |
Jan 29 2016 | Veritas US IP Holdings LLC | BANK OF AMERICA, N A , AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 037891 | /0001 | |
Jan 29 2016 | Symantec Corporation | Veritas US IP Holdings LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 037697 | /0412 | |
Mar 29 2016 | Veritas Technologies LLC | Veritas Technologies LLC | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 038455 | /0752 | |
Mar 29 2016 | Veritas US IP Holdings LLC | Veritas Technologies LLC | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 038455 | /0752 | |
Aug 20 2020 | Veritas Technologies LLC | WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 054370 | /0134 | |
Nov 27 2020 | WILMINGTON TRUST, NATIONAL ASSOCIATION, AS COLLATERAL AGENT | VERITAS US IP HOLDINGS, LLC | TERMINATION AND RELEASE OF SECURITY IN PATENTS AT R F 037891 0726 | 054535 | /0814 | |
Nov 22 2024 | BANK OF AMERICA, N A , AS ASSIGNOR | ACQUIOM AGENCY SERVICES LLC, AS ASSIGNEE | ASSIGNMENT OF SECURITY INTEREST IN PATENT COLLATERAL | 069440 | /0084 | |
Dec 09 2024 | WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT | Veritas Technologies LLC | RELEASE BY SECURED PARTY SEE DOCUMENT FOR DETAILS | 069634 | /0584 | |
Dec 09 2024 | ACQUIOM AGENCY SERVICES LLC, AS COLLATERAL AGENT | VERITAS TECHNOLOGIES LLC F K A VERITAS US IP HOLDINGS LLC | RELEASE BY SECURED PARTY SEE DOCUMENT FOR DETAILS | 069712 | /0090 |
Date | Maintenance Fee Events |
Dec 23 2013 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Dec 26 2017 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Mar 14 2022 | REM: Maintenance Fee Reminder Mailed. |
Aug 29 2022 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jul 27 2013 | 4 years fee payment window open |
Jan 27 2014 | 6 months grace period start (w surcharge) |
Jul 27 2014 | patent expiry (for year 4) |
Jul 27 2016 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 27 2017 | 8 years fee payment window open |
Jan 27 2018 | 6 months grace period start (w surcharge) |
Jul 27 2018 | patent expiry (for year 8) |
Jul 27 2020 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 27 2021 | 12 years fee payment window open |
Jan 27 2022 | 6 months grace period start (w surcharge) |
Jul 27 2022 | patent expiry (for year 12) |
Jul 27 2024 | 2 years to revive unintentionally abandoned end. (for year 12) |