Methods, systems and computer program products are provided for transferring a transmission control protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems. An operating system kernel of a first data processing system obtains application level information from a transaction received from the client over the connection. In addition, application state information associated with the connection may be obtained from the first application. A second application at a second data processing system is selected for transfer of the connection based on the obtained information and the connection is transferred to the second data processing system. The transfer includes providing to the second data processing system the associated state information of the first communication protocol stack and, optionally, the obtained application state information associated with the connection from the first application. selection operations for transfers may be initiated responsive to a notification from the first application of completion of a transaction.
|
17. A method of transferring a transmission control protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems, the method comprising:
establishing a connection between the client device and a first application at a first data processing system of the cluster of data processing systems utilizing a first communication protocol stack associated with the first application, the first communication protocol stack having associated state information; wherein an operating system kernel of the first data processing system carries out the following: obtaining application level information from a transaction received from the client over the connection, the transaction including at least one request; obtaining application state information associated with the connection from the first application; selecting a second application at a second data processing system of the cluster of data processing systems for transfer of the connection based on the obtained application level information; and transferring the connection to a second communication protocol stack on the second data processing system associated with the selected second application including providing to the second data processing system the associated state information of the first communication protocol stack and the obtained application state information associated with the connection from the first application. 34. A system for transferring a transmission control protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems, the system comprising:
means for establishing a connection between the client device and a first application at a first data processing system of the cluster of data processing systems utilizing a first communication protocol stack associated with the first application, the first communication protocol stack having associated state information; an operating system kernel of the first data processing system, the operating system kernel of the first data processing system further comprising: means for obtaining application level information from a transaction received from the client over the connection, the transaction including at least one request; means for obtaining application state information associated with the connection from the first application; means for selecting a second application at a second data processing system of the cluster of data processing systems for transfer of the connection based on the obtained application level information; and means for transferring the connection to a second communication protocol stack on the second data processing system associated with the selected second application including providing to the second data processing system the associated state information of the first communication protocol stack and the obtained application state information associated with the connection from the first application. 1. A method of transferring a transmission control protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems, the method comprising:
establishing a connection between the client device and a routing node coupled to the cluster of data processing systems utilizing a communication protocol stack at the routing node, the protocol stack having associated state information; wherein an operating system kernel of the routing node carries out the following: obtaining application level information from an initial transaction received from the client over the connection, the transaction including at least one request; selecting a target application at a first data processing system of the cluster of data processing systems for transfer of the connection based on the obtained information; and transferring the connection to a target communication protocol stack on the first data processing system associated with the selected target application including providing the associated state information of the communication protocol stack of the routing node; and wherein the target communication protocol stack carries out the following: accepting the connection from the routing node based on the provided associated state information of the communication protocol stack of the routing node so as to, transparently to the client, establish communications between the client and the target application; receiving a notification of completion of the transaction from the target application; and making the connection available to a routing device for selection of a next target application to receive the connection responsive to receipt of the notification of completion of the transaction. 26. A system for transferring a transmission control protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems, the system comprising:
means for establishing a connection between the client device and a routing node coupled to the cluster of data processing systems utilizing a communication protocol stack at the routing node, the protocol stack having associated state information; an operating system kernel of the routing node, the operating system kernel of the routing node comprising: means for obtaining application level information from an initial transaction received from the client over the connection, the transaction including at least one request; means for selecting a target application at a first data processing system of the cluster of data processing systems for transfer of the connection based on the obtained information; and means for transferring the connection to a target communication protocol stack on the first data processing system associated with the selected target application including providing the associated state information of the communication protocol stack of the routing node; wherein the target communication protocol stack comprises: means for accepting the connection from the routing node based on the provided associated state information of the communication protocol stack of the routing node so as to, transparently to the client, establish communications between the client and the target application; means for receiving a notification of completion of the transaction from the target application; and means for making the connection available to a routing device for selection of a next target application to receive the connection responsive to receipt of the notification of completion of the transaction. 49. A computer program product for transferring a transmission control protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems, comprising:
a computer-readable storage medium having computer-readable program code embodied in said medium, said computer-readable program code comprising: computer-readable program code which establishes a connection between the client device and a first application at a first data processing system of the cluster of data processing systems utilizing a first communication protocol stack associated with the first application, the first communication protocol stack having associated state information; computer-readable program code for execution in an operating system kernel of the routing node which obtains application level information from a transaction received from the client over the connection, the transaction including at least one request; computer-readable program code for execution in an operating system kernel of the routing node which obtains application state information associated with the connection from the first application; computer-readable program code for execution in an operating system kernel of the routing node which selects a second application at a second data processing system of the cluster of data processing systems for transfer of the connection based on the obtained application level information; and computer-readable program code for execution in an operating system kernel of the routing node which transfers the connection to a second communication protocol stack on the second data processing system associated with the selected second application including providing to the second data processing system the associated state information of the first communication protocol stack and the obtained application state information associated with the connection from the first application. 43. A computer program product for transferring a transmission control protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems, comprising:
a computer-readable storage medium having computer-readable program code embodied in said medium, said computer-readable program code comprising: computer-readable program code which establishes a connection between the client device and a routing node coupled to the cluster of data processing systems utilizing a communication protocol stack at the routing node, the protocol stack having associated state information; computer-readable program code for execution in an operating system kernel of the routing node which obtains application level information from an initial transaction received from the client over the connection, the transaction including at least one request; computer-readable program code for execution in an operating system kernel of the routing node which selects a target application at a first data processing system of the cluster of data processing systems for transfer of the connection based on the obtained information; computer-readable program code for execution in an operating system kernel of the routing node which transfers the connection to a target communication protocol stack on the first data processing system associated with the selected target application including providing the associated state information of the communication protocol stack of the routing node; computer-readable program code which accepts the connection from the routing node based on the provided associated state information of the communication protocol stack of the routing node so as to, transparently to the client, establish communications between the client and the target application; computer-readable program code which receives a notification of completion of the transaction from the target application; and computer-readable program code which makes the connection available to a routing device for selection of a next target application to receive the connection responsive to receipt of the notification of completion of the transaction.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
8. The method of
obtaining application level information from a next transaction received from the client over the connection, the next transaction including at least one request; selecting the next target application at a data processing system of the cluster of data processing systems for transfer of the connection based on the obtained application level information from a next transaction; and transferring the connection to the communication protocol stack associated with the next target application including providing the associated state information of the target communication protocol stack and the application state information associated with the connection to the communication protocol stack associated with the next target application.
9. The method of
10. The method of
obtaining application level information from a next transaction received from the client over the connection, the next transaction including at least one request; selecting the next target application at a data processing system of the cluster of data processing systems for transfer of the connection based on the obtained application level information from the next transaction; and transferring the connection to a communication protocol stack associated with the next target application including providing associated communication protocol stack state information for the routing device to the communication protocol stack associated with the next target application.
11. The method of
12. The method of
13. The method of
14. The method of
15. The method of
16. The method of
18. The method of
accepting the connection from the first data processing system based on the provided associated state information of the first communication protocol stack and the obtained application state information so as to, transparently to the client, establish communications between the client and the second application.
19. The method of
20. The method of
21. The method of
22. The method of
23. The method of
receiving a notification of completion of the transaction from the first application; and wherein the step of selecting a second application further comprises making the connection available to a routing device for selection of the second application responsive to receipt of the notification of completion of the transaction.
24. The method of
25. The method of
wherein the second application carries out communicating with the client utilizing the transferred connection.
27. The system of
28. The system of
29. The system of
30. The system of
a communication protocol stack associated with the next target application that receives the application state information and the notification as ancillary data of a recvmsg socket call.
31. The system of
32. The system of
33. The system of
35. The system of
means for accepting the connection from the first data processing system based on the provided associated state information of the first communication protocol stack and the obtained application state information so as to, transparently to the client, establish communications between the client and the second application.
36. The system of
37. The system of
38. The system of
39. The system of
40. The system of
means for receiving a notification of completion of the transaction from the first application; and wherein means for selecting a second application further comprises means for making the connection available to a routing device for selection of the second application responsive to receipt of the notification of completion of the transaction.
41. The system of
42. The system of
wherein the second application comprise means for communicating with the client utilizing the transferred connection.
44. The computer program product of
45. The computer program product of
46. The computer program product of
computer-readable program code which provides the application state information to the routing device for use in transferring the connection to the next target application; and computer-readable program code which receives the application state information and the notification as ancillary data of a recvmsg socket call.
47. The computer program product of
48. The computer program product of
50. The computer program product of
computer-readable program code which accepts the connection from the first data processing system based on the provided associated state information of the first communication protocol stack and the obtained application state information so as to, transparently to the client, establish communications between the client and the second application.
51. The computer program product of
52. The computer program product of
53. The computer program product of
54. The computer program product of
|
The present invention relates to network communications and more particularly to network communications to a cluster of data processing systems.
As the use of the Internet has increased, in general, so has the demand placed on servers on the Internet. One technique which has been used to address this increase in demand has been through the use of multiple servers which perform substantially the same function. Applications, such as Telnet or Internet Mail Access Protocol (IMAP)/Post Office Protocol 3 (POP3) mail serving, may need to connect a Transmission Control Protocol (TCP) client to a particular TCP server of a set of similar, but not identical, servers. However, the particular server instance typically cannot be selected until after information from the client has been received. For example, an Internet Service Provider (ISP) may have multiple e-mail servers to which users may connect to obtain their mail. However, when multiple servers which perform substantially the same function are present, selecting which server a user should be connected to may present difficulties.
In the e-mail example discussed above, one conventional approach has been to have users individually configure each client application to request a connection to a dedicated server. Such may be accomplished by providing each server with a unique name or Internet Protocol address and configuring the client to specify the name or address when making a connection. Such approaches may, however, present difficulties in maintaining balanced workload between the servers as users come and go. Furthermore, reconfiguring a large population of client applications may present administrative difficulties.
Another approach to routing clients to specific servers has been through an application unique protocol between the client and the server which performs application redirection. In application redirection, a client typically establishes a first connection to a first server which sends a redirect instruction to the client. Upon receiving the redirection instruction, the client disconnects from the initial server and establishes a second connection to the specified server. One difficulty with such an approach, however, is that the client and the server typically must implement the application-unique protocol to provide the redirection and, thus, the redirection is not transparent to the client.
Another approach is known as proxying, where the client establishes an initial connection to a proxy application and the proxy application forms a second connection with the proper server after obtaining enough information from the client to select a server. Such an approach may have the advantage that the selection and communication with the selected server by the proxy may be transparent to the client. However, both inbound and outbound communications must, typically, traverse a protocol stack twice to be routed by the proxy application. First, the communications traverse the protocol stack to the proxy application and again traverse the protocol stack when routed by the proxy application. Such traversals of the protocol stack may consume significant processing resources at the server executing the proxy application.
In addition, if the data between the client and the server is encrypted, it may not be possible for the proxy to decrypt the data and select the proper server. For example, for Secure Socket Layer/Transport Layer Security (SSL/TLS), the proxy typically must share the SSL/TLS keys with each server for which it proxies. For Internet Protocol Security (IPSec), the proxy typically either acts as the IPSec endpoint for both the client and server or must share the Security Association with the server. In all such cases, in order for the proxy to examine the protocol content, end-to-end security must generally be broken.
In additional approaches, the client establishes a connection to a proxy, which in turn establishes a connection to the ultimate server. Either at a low level in the stack or by instructing an external router, a TCP connection translation function is set up which causes the router or stack to perform modifications on all incoming and outgoing TCP segments. The modifications may include the server-side address (destination for incoming requests, source address for outgoing replies) in the IP header, sequence numbers in the TCP header, window sizes, and the like. Such an approach may not require traversal of the entire TCP stack, but may result in every TCP segment requiring modification, and if IP addresses flow in the application data, the connection translation function may not translate such addresses unless specifically programmed to scan all the application data. This approach also generally requires all flows for the connection, both inbound and outbound, to traverse a single intermediate node, making it a single point of failure (like the proxy).
Furthermore, the Locality-Aware Request Distribution system developed at Rice University is described as providing content-based request distribution which may provide the ability to employ back-end nodes that are specialized for certain types of requests. A "TCP handoff protocol" is described in which incoming requests are "handed off to a back-end in a manner transparent to a client, after the front-end has inspected the content of the request." See Pai et al., "Locality-Aware Request Distribution in Cluster-based Network Servers", Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, Calif., October, 1998. See also, Aron et al., "Efficient Support for P-HTTP in Cluster-Based Web Servers", Proceedings of the USENIX 1999 Annual Technical Conference, Monterey, Calif., June, 1999. However, this approach is generally directed to a stateless environment with well defined requests, each of which may be distributed to different nodes.
However, often a communication connection such as an Internet connection may be used for transaction processing where the transaction may involve more than one request/response pair. As described for the Locality-Aware Request Distribution system above, all of the transaction requests would be individually routed on a request by request basis even where they were all routed to the same server. In addition, a server may have a state which needs to be transferred along with a connection to support transparent (to the user) handoff which state based transfers are not provided for by the described Locality-Aware Request Distribution system described above.
Other approaches to moving a client connection or session from one server to another include Virtual Telecommunications Access Method (VTAM) multi-node persistent session support (MNPS). VTAM multi-node persistent session support allows for recovering a System Network Architecture (SNA) session state on another VTAM when an application fails and is restarted. However, typically, a client must re-authenticate to the applications or other system using multi-node persistent sessions. Furthermore, such a movement from a first VTAM to a second VTAM typically only occurs after a failure.
VTAM also supports CLSDEST PASS, which causes one SNA session with the client to be terminated and another initiated without disrupting the application using the sessions. Such a movement from one session to another, however, typically requires client involvement.
Embodiments of the present invention include methods, systems and computer program products for transferring a Transmission Control Protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems. A connection is established between the client device and a routing node coupled to the cluster of data processing systems utilizing a communication protocol stack at the routing node, the protocol stack having an associated state. An operating system kernel of the routing node obtains application level information from an initial transaction received from the client over the connection. The transaction includes at least one request. The operating system kernel also selects a target application at a first data processing system of the cluster of data processing systems for transfer of the connection based on the obtained information and transfers the connection to a target communication protocol stack on the first data processing system associated with the selected target application including providing the associated state information of the communication protocol stack of the routing node.
The target communication protocol stack accepts the connection from the routing node based on the provided associated state information of the communication protocol stack of the routing node so as to, transparently to the client, establish communications between the client and the target application. A notification of completion of the transaction is received from the target application and the connection is made available to a routing device for selection of a next target application to receive the connection responsive to receipt of the notification of completion of the transaction. The routing device may be the routing node or the first data processing system.
In other embodiments of the present invention, receiving a notification of completion of the transaction from the target application includes receiving the notification from the target application at the target communication protocol stack over the connection as data which is detectable by the target communication protocol stack as being directed to the target communication protocol stack rather than the client. The data may be received as ancillary data of a sendmsg socket call. The data also may include application state information associated with the connection from the target application. In such embodiments, the application state information may be provided to the routing device for use in transferring the connection to the next target application. A communication protocol stack associated with the next target application may receive the application state information and the notification as ancillary data of a recvmsg socket call. The associated state information of the target communication protocol stack may be provided to the routing device for use in transferring the connection to the next target application. In various embodiments, the application state information may be a null set.
In further embodiments of the present invention, an operating system kernel of the routing device obtains application level information from a next transaction received from the client over the connection, the next transaction including at least one request. A next target application at a data processing system of the cluster of data processing systems is selected for transfer of the connection based on the obtained application level information from a next transaction. The connection is transferred to the communication protocol stack associated with the next target application including the associated state information of the target communication protocol stack and the application state information associated with the connection to the communication protocol stack associated with the next target application. The selection of a target application may be carried out by a policy-based engine of the operating system kernel of the routing node.
Sufficient application level information may be obtained to identify the initial transaction. The application level information is obtained in various embodiments by executing application-specific exits within the operating kernel of the routing node to examine data associated with the transaction to identify the transaction to the operating kernel of the routing node. The communication protocol stack of the routing node may be made available after the connection is transferred. In further embodiments of the present invention, the connection is an encrypted connection and the provided associated state information includes encryption information.
In other embodiments of the present invention, methods, systems and computer program products are provided for transferring a Transmission Control Protocol (TCP) connection with a client device between data processing systems in a cluster of data processing systems. A connection is established between the client device and a first application at a first data processing system of the cluster of data processing systems utilizing a first communication protocol stack associated with the first application, the first communication protocol stack having an associated state. An operating system kernel of the first data processing system obtains application level information from a transaction received from the client over the connection, the transaction including at least one request. In addition, application state information associated with the connection is obtained from the first application.
A second application at a second data processing system of the cluster of data processing systems is selected for transfer of the connection based on the obtained information and the connection is transferred to a second communication protocol stack on the second data processing system associated with the selected second application. The transfer includes providing to the second data processing system the associated state information of the first communication protocol stack and the obtained application state information associated with the connection from the first application.
While the invention has been described above primarily with respect to the method aspects of the invention, both systems and/or computer program products are also provided.
The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which preferred embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like numbers refer to like elements throughout.
As will be appreciated by those of skill in the art, the present invention can take the form of an entirely hardware embodiment, an entirely software (including firmware, resident software, micro-code, etc.) embodiment, or an embodiment containing both software and hardware aspects. Furthermore, the present invention can take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code means embodied in the medium for use by or in connection with an instruction execution system. In the context of this document, a computer-usable or computer-readable medium can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
The computer-usable or computer-readable medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection having one or more wires, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, and a portable compact disc read-only memory (CD-ROM). Note that the computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
Embodiments of the present invention provide for content-based transfers of an active TCP connection from one data processing system to another. Such a transfer of an active connection may be beneficial as it may allow for selection of a particular data processing system for handling a client's requested interaction without the client knowing that the transfer has taken place. Such a transfer may include transferring both application state information and connection state information. By transferring the existing active connection and application state, rather than establishing a new connection, the client need not repeat providing the initial information to the data processing system to which the connection is transferred. Furthermore, by transferring the connection to a new data processing system, rather than merely routing packets to and from the new data processing system over a separate connection, administrative overhead may be reduced and scalability may be improved.
A system suitable for use in embodiments of the present invention is illustrated in FIG. 1. As seen in
The first data processing system 20 communicates with the client 10 over the initial connection 30, for example, through request/response exchanges with the client, until such time as sufficient information is received that the first data processing system 20 can determine which of the additional data processing systems 22, 22' and 22" is to receive the connection with the client 10. When such a determination is made, the first data processing system 20 provides connection and/or application state information to the selected data processing system.
In the example illustrated in
Operations for transferring an active connection are illustrated generally in FIG. 2. As seen in
More particularly, in accordance with various embodiments of the present invention, the obtained information is obtained by an operating system kernel of the first data processing system 20 which may be a routing node and/or may be a data processing system supporting applications for which connections are intended. Alternative approaches to making a decision on routing the connection are described in co-pending and commonly assigned U.S. patent application Ser. No. 09/825,122 titled "Methods, Systems and Computer Program Products for Content-Based Routing Via Active TCP Connection Transfer" filed Apr. 3, 2001 ("the '122 application) which is incorporated herein by reference as if set forth in its entirety.
In any event, once the information is received from the client 10, the operating system kernel of the first data processing system 20 selects a target application at one of the other servers 22, 22' and 22" for transfer of the initial connection 30 based on the obtained information (block 215). The connection 30 is then transferred to a target communication protocol stack on the selected data processing system associated with the selected target application. Connection state information, such as the associated state information of the communication protocol stack of the first data processing system 20 which supports the connection 30 and, optionally, application state information is collected by the first data processing system 20 and provided to the selected data processing system as part of the transfer of the connection (block 220).
An approach to implementing kernel based policy rules suitable for modification in light of the teachings of the present invention to support selection of a target application is described in co-pending and commonly assigned U.S. patent application Ser. No. 09/693,268 titled "Methods, Systems and Computer Program Products for Server Based Type of Service Classification of a Communication Request" filed Oct. 20, 2000 ("the '268 application) which is incorporated herein by reference as if set forth in its entirety. The '268 application describes the use of an application plug-in process in a TCP/IP kernel of an operating system kernel to implement service classification for requests from a client based on application level information obtained in the kernel from the request(s).
As noted above, various embodiments of the present invention obtain application state information and connection state information and provide such information to a receiving target application as part of the transfer of a connection. The particular application state information may vary from application to application. The connection state information should be sufficient to allow the initial connection 30 to be transferred to the selected data processing system 22 to provide the transferred connection 30' without disruption of the initial connection 30. For example, for a TCP connection, the connection and/or application state information may include the source and destination IP addresses and port numbers, the received and sent packet sequence numbers, the client and routing application advertised windows, the negotiated maximum segment size and/or scaling parameters, if any etc.
The state of the initial connection 30 may also be "frozen" until the initial connection 30 is transferred such that any subsequent packets received by the first data processing system 20 may be discarded by the first data processing system 20 so as to invoke retransmission by the client 10. Alternatively, such packets could be buffered and forwarded to the selected data processing system 22 for acknowledgment by the selected data processing system 22 over the transferred connection 30'.
After the application state and/or connection state information is provided to the selected data processing system 22, the selected data processing system 22 accepts the initial connection 30 to provide the transferred connection 30' based on the provided associated state information (block 225). The selected data processing system 22 may communicate directly with the client 10 over the transferred connection 30' which corresponds to the initial connection 30 taken over by the selected data processing system.
In various embodiments of the present invention, for example, those where only connection state information and not application state information are provided with the transfer of the connection, the operations illustrated at blocks 230 and 235 are also provided. A notification of completion of the transaction is received from the second (target) application by the target communication protocol stack at the selected data processing system 22 (block 230). The target communication protocol stack makes the connection 30' available to a routing device for selection of a next target application to receive the connection 30' responsive to receipt of the notification of completion of the transaction (block 235). The routing device may be the selected data processing system 22 or a routing node, such as the data processing system 20.
Accordingly, with various embodiments of the present invention, selection operations for re-routing of a connection may be implemented on a transaction basis, rather than applied to each request, by using the target application to identify when a transaction has completed. It is to be understood that operations as described above to transfer the initial connection 30 to provide the transferred connection 30' may repeat with the connection 30' viewed as the initial connection for subsequent transfers based on different transactions.
As will be appreciated by those of skill in the art in light of the above discussion, from the perspective of the client 10, the initial connection 30 and the transferred connection 30' (as well as later iterations of transfers) are the same connection. Thus, content-based routing may be achieved and the active connection transferred to a selected data processing system in a manner which is transparent to the client 10.
In particular embodiments of the present invention systems, methods, and/or computer program products are provided which allow for a single IP address being associated with a plurality of communication protocol stacks in a cluster of data processing systems. Thus, embodiments of the present invention may be utilized in a Virtual IP Address (VIPA) environment, cluster address systems using Network Dispatcher, the Sysplex Distributor from International Business Machines Corporation and other load balancer systems.
The TCP/IP communication protocol stack 355, as shown in
As shown in
As will be appreciated by those of skill in the art, the operating system in which the present invention is incorporated may be any operating system suitable for use with a data processing system, such as OS/2, AIX or OS/390 from International Business Machines Corporation, Armonk, N.Y., Solaris from Sun Microsystems, WindowsCE, WindowsNT, Windows95, Windows98, Windows ME or Windows2000 from Microsoft Corporation, Redmond, Wash., PalmOS from Palm, Inc., MacOS from Apple Computer, UNIX or Linux, proprietary operating systems or dedicated operating systems, for example, for embedded data processing systems.
The present invention is described generally with reference to
For performance reasons, such a server may process multiple transactions over a single TCP/IP connection rather than open new connections for each transaction. Again, using HTTP as an example, browser clients generally send multiple requests over a single TCP/IP connection rather than close the active connection and open a new one.
Furthermore, for performance reasons, it is known to use a pool or cluster of servers to process client requests. When a client connects to a server in such an environment, one of the servers from this pool or cluster is selected. A load balancer, such as a routing node, may be used in selecting the "optimal" server from the cluster where "optimal" typically means the server which is best capable of handling the new client connection.
However, it is generally not possible to know the type of transaction the client will request before a connection is established and, as a client may initiate multiple transactions over the connection once established, each server in such cases must be able to support the complete set of transactions which a client may request. If the server in such an environment is not capable of not supporting any single transaction, it is generally removed from the pool of servers to which a client may connect. From a functional perspective, only the subset of servers in the pool supporting the range of transactions would be viewed as homogenous servers in such an environment.
Furthermore, such environments typically provide load balancing which is performed when a new connection request is received. However, this approach may be limited in that it assumes each server is capable of processing each transaction with equal efficiency which is often not the case. For example, one server may be best for browsing inventory while a different server may be better for managing a shopping cart and so on. From a customer perspective, this may result in a set of servers which must support all the same set of back-end applications and load balancing on a node-wide basis.
By implementing the present invention as described above in such a cluster of data processing systems environment, the deployment of a heterogeneous set of servers in the cluster may be facilitated. Each server in the cluster may support only a subset of the actual applications. When a client initiates a transaction, the client may be transparently (from the client's perspective) routed to a server capable of processing the transaction. The server which is best able to process the transaction may be selected from those servers within the cluster which support the transaction. The process may be repeated for each transaction initiated by the client. As each new transaction is received on the active connection, the server best able to process a new transaction may be selected and the active connection may then be again transferred to that server for processing. This transfer and subsequent processing of the transaction may be done in a manner transparent to the client.
This initial set up and subsequent further transfer of a connection is further illustrated in FIG. 4. As shown in item 400 of
As shown in
Kernel-Based Policy Driven Load Balancing may allow a load balancing node to consider application-specific data in making its load balancing decision as shown by items 400-430 of FIG. 4. In order to accomplish this, the server preferably a) understands the transaction being processed, and (b) waits until enough of a transaction is received to identify the transaction and to make the load balancing decision. Once the proper target host is selected, the connection may be moved from the current server to the target host. This may include moving the TCP/IP state, IP state, and any security state (such as IPSec or TLS state).
When a connection request is received by the load balancing node, the load balancing node may accept the connection and begin to exchange data with the client. As data is received, the load balancing node, in various embodiments of the present invention, drives application-specific exits within the operating system kernel, such as those provided by Fast Response Cache Accelerator (FRCA), to examine the connection data. Once sufficient data has been received to identify the transaction, the exit notifies the kernel of the type of transaction. For example, for web traffic, the Universal Resource Identifier (URI) can be used to classify the type of transaction. By using kernel-based exits, the classification mechanism may be extended to new application workload, generally without requiring updates to the applications themselves.
Once the transaction has been classified, a policy-based engine within the kernel may be used to select the appropriate server to process the transaction. The decision on which server to process the transaction can be based on many criteria. At the simplest level, it may simply select any server capable of processing the particular transaction type. Other metrics (average response time, active workload on the server, network bandwidth considerations, etc.) may also be applied in the selection of the appropriate server.
Once the appropriate target application is selected, the connection is moved to the target host for processing. After the connection is moved to the target host, the receiving socket may be made available to the target application. Subsequent transactions may continue to be processed on the stack which supports the application which initially opens the connection.
This kernel-based load balancing approach may require no change to the application itself. It may be suited to cases where, for example, there are single transactions per connection or all transactions received over a connection are of same type.
Multi-Node Active Connection Transfers (ACT), in accordance with various embodiments of the present invention, may further allow an active TCP/IP connection to be moved from one server to another server. Once moved, the connection may be closed on the current server and made available to the application on the new server.
The connection is preferably only moved if the application on the initial server indicates it is acceptable to do so. This may be signaled, for example, by the application setting fields in the ancillary data of the sendmsg socket call as a notification that a transaction is completed. In the ancillary data, the application, may optionally, also pass the application context or state information, if any, which should be made available to the new target host. However, such application state information may be a null set (i.e., not required). If the connection is transferred to a new host, then the socket on the server may be closed.
In order to move the active connection, some, and in some cases, all, of the following state information may be transferred.
1. The TCP/IP stack's state should be moved to the new server. Information such as the TCP state (sent/received sequence numbers, MSS, window scaling parameters), IP state (source/destination IP address, port numbers), IPSec state, and TLS state should be moved and reinitialized at the new server. This is the same state information which may be moved for the Kernel-Based Policy Driven Load Balancing item mentioned above.
2. The application's state may also be moved to the new server. In some cases, the application state may not need to transfer any state information; in other cases, application state data may be needed.
3. Any queued data at the "old" server is preferably transferred to the "new" server. While TCP/IP can retransmit any data lost, there may be performance penalties if the data is dropped and retransmitted.
4. The application on the "new" server should be able to recognize the new connection as an existing connection and find the associated application state. This may be accomplished, for example, by examining the ancillary data on the recvmsg socket call. For a connection which is transferred, the application context passed by the "old" server on the sendmsg call may be made available to the "new" server on the recvmsg call. Likewise, the application on the "old" server should be able to relinquish ownership of the existing connection. This may be accomplished by closing the socket on the "old" server.
The transfer of the active connection as described herein may be transparent to the client. However, it may require changes to the applications, both on the initial host and, if context is to be transferred, on the new host. On the initial host, the application may also be modified to signal when a particular transaction has been completed and it is possible to move the connection to a new host. The initial host may also provide any application context which should be transferred with the active connection. In many cases, there will be no application context which needs to be moved.
On the new host, no changes to the application may be required, particularly, if no application context (state) needs to be transferred or if the application has other methods for sharing the context. For instance, the web server on z/OS shares context between all instances of the web server in a sysplex via DB2 data sharing and, as such, typically has no context which needs to be transferred. If there is application context which does need to be transferred, the application on the host should be prepared to receive and initialize the context when it accepts the new connection. This context may be provided to the application as ancillary data on the first recvmsg socket call made after the connection is accepted.
In a further aspect of the present invention, Transaction Load Balancing (TLB) may piece together Kernel-Based Policy Driven Load Balancing and Active Connection Transfer. Each application may process a single transaction at the server. Once the transaction is completed, the application may indicate that the connection may be routed to a new server for processing subsequent transactions. This may be signaled, for example, by setting the appropriate fields in the ancillary data of the last sendmsg socket call invoked by the application.
The TCP/IP stack may examine the next transaction sent by the client and determine the optimal server for the transaction, using the metrics described in the Kernel-Based Policy Driven Load Balancing description above. If it is the current server that is optimal, the connection may be made available to the application which is waiting on the recvmsg socket call. If a different server is optimal, the connection may be transferred to the proper host as described in the Active Connection Transfer description above. The connection is transferred, the socket on the current server is closed, and the connection is made available to the application on the new server.
After the application transfers socket ownership to the stack, the next transaction received may be examined and classified. The "best" server to process the transaction may be identified, the connection transferred, and this "best" server may then process the transaction. Once the transaction is completed, the application may transfer ownership of the socket to the TCP/IP stack and the process repeats. This may continue until the socket is closed.
To avoid thrashing, the metrics for transferring the active connection may include a "threshold" which specifies how much "better" the new server must be than the current server before the connection is transferred. In its degenerative form, the metric could have a threshold value of "0," implying the connection should be moved if the new server is considered preferable than the current server by any amount.
Referring now to the flowchart diagram of
An operating system kernel of the first data processing system carries out the steps illustrated at blocks 505-520. More particularly, application level information, such as level 5 or above information for a TCP/IP connection, is obtained from a first transaction received from the client over the connection (block 505). The transaction may include one or more requests. In addition, application state information associated with the connection is obtained from the first application (block 510). A second application is selected at a second data processing system of the cluster of data processing systems for transfer of the connection based on the obtained information (block 515).
Selection of the target application may be carried out by a policy-based engine of the operating system kernel of the first data processing system. Alternatively, operations at block 515 related to selecting the second application may be provided by making the connection available to a routing device other than the first data processing system, such as a routing node of the cluster of data processing systems, for selection of the second application. Making the connection available to such a routing device may, in various embodiments, be initiated responsive to receipt of a notification of completion of the transaction from the first application.
In any event, the connection is transferred to a second communication protocol stack on the second data processing system associated with the selected second application, either directly from the first data processing system or through a separate routing device such as a routing node (block 520). The transfer includes providing to the second data processing system both the associated state information for the first communication protocol stack and the obtained application state information associated with the connection from the first application.
The connection in various embodiments of the present invention is then accepted by the second communication protocol stack from the first data processing system based on the provided associated state information of the first communication protocol stack and the obtained application state information (block 525). The transfer may then be provided transparently to the client so as to establish communications between the client and the second application. More particularly, in various embodiment, the second communication protocol stack may provide the obtained application state information to the second application so as to re-establish the connection transparently to the client as well as utilizing the connection state information from the first communication protocol stack. Thus, a state of the connection at the second data processing system may be set to a state specified by the provided associated state information of the first communication protocol stack and the obtained application state information to provide a transferred connection to the second application. The second application may then communicate with the client utilizing the transferred connection (block 530).
Embodiments of the present invention have been described with reference to
Accordingly, blocks of the flowchart illustrations and/or block diagrams support combinations of means for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
The present invention has been described with reference to particular sequences of operations. However, as will be appreciated by those of skill in the art, other sequences may be utilized while still benefitting from the teachings of the present invention. Thus, while the present invention is described with respect to a particular division of functions in the kernel or sequences of events, such divisions or sequences are merely illustrative of particular embodiments of the present invention and the present invention should not be construed as limited to such embodiments.
Furthermore, while the present invention has been described with reference to particular embodiments of the present invention, as will be appreciated by those of skill in the art, the present invention may be embodied in other environments and should not be construed as limited to such environments but may be incorporated into other systems where applications or groups of applications are associated with an address rather than a communications adapter. Thus, the present invention may be suitable for use in any collection of data processing systems which allow sufficient communication to all of the systems for the use of dynamic virtual addressing or the like.
In the drawings and specification, there have been disclosed typical preferred embodiments of the invention and, although specific terms are employed, they are used in a generic and descriptive sense only and not for purposes of limitation, the scope of the invention being set forth in the following claims.
Brabson, Roy Frank, Huynh, Lap Thiet
Patent | Priority | Assignee | Title |
10044738, | Feb 01 2002 | Intel Corporation | Integrated network intrusion detection |
10440156, | Mar 14 2011 | EDGIO, INC | Network connection hand-off and hand-back |
11170101, | Sep 29 2017 | Avast Software s.r.o. | Observation and classification of device events |
7209962, | Jul 30 2001 | TREND MICRO INCORPORATED | System and method for IP packet filtering based on non-IP packet traffic attributes |
7340772, | Jun 13 2001 | Citrix Systems, Inc. | Systems and methods for continuing an operation interrupted from a reconnection between a client and server |
7502726, | Jun 13 2001 | Citrix Systems, Inc. | Systems and methods for maintaining a session between a client and host service |
7519719, | Apr 15 2004 | Agilent Technologies, Inc.; Agilent Technologies, Inc | Automatic creation of protocol dependent control path for instrument application |
7562146, | Oct 10 2003 | Citrix Systems, Inc | Encapsulating protocol for session persistence and reliability |
7661129, | Feb 26 2002 | Citrix Systems, Inc. | Secure traversal of network components |
7908162, | Oct 11 2005 | International Business Machines Corporation | Method of delegating activity in service oriented architectures using continuations |
7984157, | Feb 26 2002 | Citrix Systems, Inc | Persistent and reliable session securely traversing network components using an encapsulating protocol |
8090874, | Jun 13 2001 | Citrix Systems, Inc. | Systems and methods for maintaining a client's network connection thru a change in network identifier |
8275871, | Aug 22 2006 | Citrix Systems, Inc. | Systems and methods for providing dynamic spillover of virtual servers based on bandwidth |
8291108, | Mar 12 2007 | Citrix Systems, Inc | Systems and methods for load balancing based on user selected metrics |
8312120, | Aug 22 2006 | Citrix Systems, Inc | Systems and methods for providing dynamic spillover of virtual servers based on bandwidth |
8493858, | Aug 22 2006 | Citrix Systems, Inc | Systems and methods for providing dynamic connection spillover among virtual servers |
8688817, | Mar 14 2011 | EDGIO, INC | Network connection hand-off using state transformations |
8752173, | Feb 01 2002 | Intel Corporation | Integrated network intrusion detection |
8874791, | Jun 13 2001 | Citrix Systems, Inc. | Automatically reconnecting a client across reliable and persistent communication sessions |
9185019, | Aug 22 2006 | Citrix Systems, Inc. | Systems and methods for providing dynamic connection spillover among virtual servers |
9253289, | Mar 14 2011 | EDGIO, INC | Network connection hand-off using state transformations |
9654601, | Mar 14 2011 | EDGIO, INC | Network connection hand-off and hand-back |
Patent | Priority | Assignee | Title |
4403286, | Mar 06 1981 | International Business Machines Corporation | Balancing data-processing work loads |
4495570, | Jan 14 1981 | Hitachi, Ltd. | Processing request allocator for assignment of loads in a distributed processing system |
4577272, | Jun 27 1983 | E-Systems, Inc. | Fault tolerant and load sharing processing system |
5031089, | Dec 30 1988 | United States of America as represented by the Administrator, National | Dynamic resource allocation scheme for distributed heterogeneous computer systems |
5515508, | Dec 17 1993 | Apple Inc | Client server system and method of operation including a dynamically configurable protocol stack |
5548723, | Dec 17 1993 | Apple Inc | Object-oriented network protocol configuration system utilizing a dynamically configurable protocol stack |
5563878, | Jan 05 1995 | International Business Machines Corporation | Transaction message routing in digital communication networks |
5675739, | Feb 03 1995 | International Business Machines Corporation | Apparatus and method for managing a distributed data processing system workload according to a plurality of distinct processing goal types |
5917997, | Dec 06 1996 | International Business Machines Corporation; IBM Corporation | Host identity takeover using virtual internet protocol (IP) addressing |
5923854, | Nov 22 1996 | International Business Machines Corporation; IBM Corporation | Virtual internet protocol (IP) addressing |
5935215, | Mar 21 1997 | International Business Machines Corporation; IBM Corporation | Methods and systems for actively updating routing in TCP/IP connections using TCP/IP messages |
5951650, | Jan 31 1997 | International Business Machines Corporation; IBM Corporation | Session traffic splitting using virtual internet protocol addresses associated with distinct categories of application programs irrespective of destination IP address |
6389479, | Oct 14 1997 | ALACRITECH, INC | Intelligent network interface device and system for accelerated communication |
6658480, | Oct 14 1997 | ALACRITECH, INC | Intelligent network interface system and method for accelerated protocol processing |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jul 24 2001 | BRABSON, ROY FRANK | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 012045 | /0569 | |
Jul 24 2001 | HUYNH, LAP THIET | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 012045 | /0569 | |
Jul 26 2001 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Oct 06 2004 | ASPN: Payor Number Assigned. |
Jan 11 2008 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jul 30 2012 | REM: Maintenance Fee Reminder Mailed. |
Dec 14 2012 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Dec 14 2007 | 4 years fee payment window open |
Jun 14 2008 | 6 months grace period start (w surcharge) |
Dec 14 2008 | patent expiry (for year 4) |
Dec 14 2010 | 2 years to revive unintentionally abandoned end. (for year 4) |
Dec 14 2011 | 8 years fee payment window open |
Jun 14 2012 | 6 months grace period start (w surcharge) |
Dec 14 2012 | patent expiry (for year 8) |
Dec 14 2014 | 2 years to revive unintentionally abandoned end. (for year 8) |
Dec 14 2015 | 12 years fee payment window open |
Jun 14 2016 | 6 months grace period start (w surcharge) |
Dec 14 2016 | patent expiry (for year 12) |
Dec 14 2018 | 2 years to revive unintentionally abandoned end. (for year 12) |