Described herein is a hybrid cooling device and a cooling method that use a combination of phase change cooling and air cooling. The hybrid cooling device includes a closed loop two phase system, one or more fans, and an assembly clamp. The two phase system further includes a cold plate, an integrated channel, and a radiator, and a pressure sensor. The cold plate can include phase change fluid for extracting heat from electronics on a printed circuit board sandwiched between the cold plate and the assembly clamp. The one or more fans can be used to create airflows for cooling both the cold plate and the radiator. The pressure sensor can be used to control the operation of the hybrid cooling device, which can be deployed in different system environments and server configurations.
|
1. A hybrid cooling device, comprising:
a phase change system that includes a cold plate, a radiator, and an integrated channel connecting the cold plate and the radiator;
an assembly clamp to position electronic hardware to be cooled between the assembly clamp and the cold plate, wherein the radiator is positioned above the cold plate, and wherein the cold plate is positioned vertically to be attached to the electronic hardware when the assembly clamp clamps onto the cold plate; and
one or more fans to provide air cooling the radiator and the electronic hardware, wherein the electronic hardware includes a printed circuit board (PCB) and an electronic device packaged thereon, the electronic device including one or more of a chip or a power electronic, and wherein the phase change system, the assembly clamp and the one or more fans together with the electronic device packaged on the PCB can be inserted into a peripheral bus as an integrated peripheral device.
10. A server chassis, comprising:
a hybrid cooling device including:
a phase change system that includes a cold plate, a radiator, and an integrated channel connecting the cold plate and the radiator,
an assembly clamp to position electronic hardware to be cooled between the assembly clamp and the cold plate, wherein the radiator is positioned above the cold plate, and wherein the cold plate is positioned vertically to be attached to the electronic hardware when the assembly clamp clamps onto the cold plate, and
one or more fans to provide air cooling the radiator and the electronic hardware, wherein the electronic hardware includes a printed circuit board (PCB) and an electronic device packaged thereon, and wherein the phase change system, the assembly clamp and the one or more fans together with the electronic device packaged on the PCB can be inserted in to a peripheral bus as an integrated peripheral device; and
a chassis fan to provide an airflow to cool the server chassis and the hybrid cooling device.
20. An electronic rack, comprising:
a plurality of server chassis, each server chassis including:
a hybrid cooling device comprising:
a phase change system that includes a cold plate, a radiator, and an integrated channel connecting the cold plate and the radiator,
an assembly clamp to position electronic hardware to be cooled between the assembly clamp and the cold plate, wherein the radiator is positioned above the cold plate, and wherein the cold plate is positioned vertically to be attached to the electronic hardware when the assembly clamp clamps onto the cold plate, and
one or more fans to provide air cooling the radiator and the electronic hardware, wherein the electronic hardware includes a printed circuit board (PCB) and an electronic device packaged thereon, the electronic device including one or more of a chip or a power electronic, and
wherein the phase change system, the assembly clamp and the one or more fans together with the electronic device packaged on the PCB can be inserted into a peripheral bus as an integrated peripheral device; and
a chassis fan to provide an airflow to cool the server chassis and the hybrid cooling device.
2. The hybrid cooling device of
a device frame, wherein the radiator, the integrated channel, and the cold plate are attached to the device frame.
3. The hybrid cooling device of
an adapting stiffener positioned between the cold plate and the electronic hardware;
an elastic channel;
wherein the adapting stiffener and the elastic channel operate in conjunction to maintain proper pressure on the electronic hardware.
4. The hybrid cooling device of
a moving axis in the elastic channel;
wherein one end of the assembly clamp is inserted into the elastic channel through the moving axis such that the end of the assembly clamp is moveable on the elastic channel.
5. The hybrid cooling device of
6. The hybrid cooling device of
7. The hybrid cooling device of
8. The hybrid cooling device of
9. The hybrid cooling device of
a temperature sensor;
a pressure sensor;
wherein the temperature sensor and the pressure sensor are used to control an operation of the hybrid cooling device.
11. The server chassis of
12. The server chassis of
a device frame, wherein the radiator, the integrated channel, and the cold plate are attached to the device frame.
13. The server chassis of
an adapting stiffener positioned between the cold plate and the electronic hardware;
an elastic channel;
wherein the adapting stiffener and the elastic channel operate in conjunction to maintain proper pressure on the electronic hardware.
14. The server chassis of
a moving axis in the elastic channel;
wherein one end of the assembly clamp is inserted into the elastic channel through the moving axis such that the end of the assembly clamp is moveable on the elastic channel.
15. The server chassis of
16. The server chassis of
17. The server chassis of
18. The server chassis of
19. The server chassis of
a temperature sensor;
a pressure sensor;
wherein the temperature sensor and the pressure sensor are used to control an operation of the hybrid cooling device.
|
Embodiments of the present disclosure relate generally to cooling systems. More particularly, embodiments of the disclosure relate to a hybrid cooling device and a hybrid cooling method that use both phase change cooling and air cooling.
A high power density device is a computing device that is packaged with high performance processors (e.g., such as GPU, ASIC, heterogeneous computing based IC chip or chiplet). Such high power density devices are increasingly popular due to the continuous high computing need. A high power density device tends to generate a large amount of heat and is often integrated into a server chassis. Therefore, for a high power density device to function properly, a proper thermal environment for servers, racks, and data center facility is needed.
Although liquid cooling can be a promising cooling solution for high power density devices, particular when the power budget for a single chip exceeds a threshold (e.g., 400 W), the required accompanying facility can be a bottleneck, because such a liquid cooling solution has certain requirements for supply inlet temperatures, flow rates and pressures that exceed the capability of a typical data center. Even if a data center facility can be developed to meet the requirements, the cost would be too high.
Further complicating the problem is that many high performance hardware components are connected through a peripheral component interconnect express (PCIe) expansion bus. A liquid cooling solution for such hardware components and packages requires completely different architecture compared to Mezzanine connector based cards.
Previous cooling solutions for the PCIE based electronics focus on desktop products, rather than on hyper scale cloud data centers. Such cooling solutions may not be feasible for integration into servers in a cloud data center. Further, these solutions may be unscalable, inversatile, not reliable enough, or too costly. In addition, most of the solutions are air cooling based, which may not satisfy the constantly increasing power density.
Embodiments of the invention are illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements.
Various embodiments and aspects of the inventions will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative of the invention and are not to be construed as limiting the invention. Numerous specific details are described to provide a thorough understanding of various embodiments of the present invention. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments of the present inventions.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in conjunction with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification do not necessarily all refer to the same embodiment.
According to various embodiments, described herein is a hybrid cooling device and a cooling method that use a combination of phase change cooling and air cooling. The hybrid cooling device includes a closed loop two phase system, one or more fans, and an assembly clamp. The two phase system further includes a cold plate, an integrated channel, and a radiator as a condenser. The cold plate can include phase change fluid for extracting heat from electronics on a printed circuit board (PCB) sandwiched between the cold plate and the assembly clamp. The one or more fans can be used to create airflows for cooling both the electronics on the PCB and the radiator. A pressure sensor and a temperature sensor can be used to control the operation of the hybrid cooling device, which can be integrated into different system environments and server configurations.
In one embodiment, the hybrid cooling device further includes a device frame, to which the radiator, the integrated channel, and the cold plate are attached. Further, the hybrid cooling device can include an adapting stiffener positioned between the cold plate and the electronics on the PCB, and one or more elastic channels. The adapting stiffener and the one or more elastic channels operate in conjunction to maintain proper pressure on the electronics on the PCB.
In one embodiment, the hybrid cooling device further includes a moving axis in the one of the elastic channels, and one end of the assembly clamp is inserted into the elastic channels through the moving axis such that the end of the assembly clamp is moveable on the elastic channel. This elastic channel can provide forces on the moving axis on both sides horizontally to properly fix the PCB at a particular position within the hybrid cooling device.
In one embodiment, the electronics on the PCB can include one or more of a chip or a power electronics, and wherein the PCB where the electronics installed on are connected by a peripheral component interconnect express (PCIe) bus to a server main PCB.
In one embodiment, the integrated channel includes a vapor line and a liquid line, the liquid line for passing liquid from the radiator to the cold plate, and the vapor line for passing vapor from the cold plate to the radiator. In one embodiment, the vapor line and liquid line may be designed in different physical dimensions for better performance.
In one embodiment, each of the one or more fans can be a fan integrated into the hybrid cooling device or a separate fan. The airflows created by the one or more fans pass through the PCB through a first dedicated channel, and pass through the radiator through a second dedicated channel.
In one embodiment, the hybrid cooling device can include a temperature sensor and a pressure sensor to control the operation of the hybrid cooling device. In one embodiment, the hybrid cooling device can include only a pressure sensor, and the pressure sensor is pre-integrated on the vapor line in the hybrid cooling device.
In one embodiment, the hybrid cooling method can be deployed to different chassis, e.g., blade servers. Further, multiple electronics on a PCB or multiple PCBs can be packaged within the hybrid cooling device. A variety of clamping methods can be used for sandwiching the PCBs.
The hybrid cooling device can be deployed in any server or chassis environment, and is compatible with different heterogeneous hardware configurations for complex and multiple heterogeneous computing workloads. As such, the hybrid cooling device is scalable and interoperable for different server system designs and configurations, including different heterogeneous hardware expansions. In addition, the solution is highly efficient since fluid is self-driven with phase change technologies.
The various embodiments provide a solution for hyperscale data centers applications and corresponding servers in a cloud environment, as well as for edge computing system, either in edge cluster or edge devices. The cooling solution described in the various embodiments can be used for cooling high power density electronics. With a complete packing method for designing hybrid cooling devices, the cooling solution can be configured for different hybrid designs such as phase change with air in parallel, phase change liquid cooling only and so on.
As shown, the hybrid cooling device include a radiator 101, an integrated channel 103, a cold plate 105, an assembly clamp 107, and a device frame 109. The radiator 101, the integrated channel 103, and the cold plate 105 can be combined into a single unit. The single unit constitutes the main component of the hybrid cooling device.
However, despite being a single unit, the integral designs for the three components 101, 103 and 105 can be different depending on actual implementations and specific requirements of different users.
The device frame 109 can be a hardware frame, to which the radiator 101, the integrated channel 103, and the cold plate 105 are attached. The integrated channel 103 can include a liquid line and a vapor line for connecting the radiator and the cold plate. The assembly clamp 107, which is described in detail below, can be used to hold electronics on a printed circuit board (PCB) with proper pressure.
As shown, the hybrid cooling device can include a fan 201. The fan 201 and the single unit described above, provides a hybrid cooling environment for a printed circuit board (PCB) 203 with high power density electronics installed thereon.
In one embodiment, the PCB 203 be an acceleration PCB that includes multiple hardware components to speed up data communication, storage and retrieval, encryption and decryption, mathematical operations, graphics, and web page viewing, etc. The PCB 203 can be attached to the cold plate 105. Both the radiator 101 and the PCB 203 can be air cooled by the fan 201. The solution shown in
In one embodiment, the structural layout of the hybrid cooling device enables the fan 201 to blow direct or indirect airflows towards both the radiator 101 and the electronics on the PCB 203. As such, the fan 201 can provide direct air cooling and indirect air cooling. The fan 201 can be an integrated unit of the hybrid cooling device, or a separate module attached to the hybrid cooling device.
In
The hybrid cooling device further includes a connection bus 301 used to connect the different electronics on the PCT 203. The connection bus 301 can be a peripheral component interconnect express (PCIe) bus, which is an interface standard for connecting high-speed components.
In
The assembly clamp 107 can be locked and unlocked by turning around the moving axis 309. When the assembly clamp 107 is locked, the PCB 203, the chips 303 (also referred to as electronics) on the PCB 203, and the adapting stiffener 305 can be sandwiched between the cold plate 105 and the assembly shaft 107. Further, when the assembly clamp 107 is locked, the two elastic channels 307 and 315 can ensure that proper pressure be exerted on the chips 303 and the PCB 203 to avoid damages, and to prevent them from malfunctions. The elastic channels 307 and 315 can also ensure proper thermal contacting between the cold plate 105 and the chips 303.
As shown in
In
The radiator 101 can function as a condensing unit to condense the vapor elevated from the cold plate 105 back to liquid by extracting its latent heat from the vapor. The liquid can return to the cold plate driven by the gravity force.
In one embodiment, the airflows 407 can pass through the radiator 101 to assist the radiator 101 in condensing vapor to liquid, and the airflows 409 can pass through the chips or electronics on the PCB 203 to provide air cooling to the chips or electronics on the PCB 203. In
Alternatively,
In
As further shown, the server chassis 507 can include a server PCB 505 and a chassis fan 509 mounted on the right side of the hybrid cooling device 501. The chassis fan 509, as part of the existing server chassis structure, can function as the primary air mover. Thus, the hybrid cooling device 501 can take advantage of the existing server chassis structure.
In
In this embodiment, unlike the embodiments illustrated in
In the various embodiments described above, the hybrid cooling device in
As shown, the hybrid cooling device can include two sensors. A pressure sensor 701 can be attached to the vapor line 403 to measure the pressure of the vapor passing through the vapor line 404. A temperature sensor 703 can be provided in the cold plate to measure the temperature of the cold plate. These two sensors 701 and 703 are decoupled from any of the electronics on the PCB 503. The decoupling can significantly increase the adaptability and reliability of the cooling solution. In one embodiment, the temperature sensor can be a sensor in the chip package, such as a sensor for measuring the case temperatures. In this case, only the pressure is needed on the hybrid cooling device for the purpose of controlling the operation of the hybrid cooling device.
In one embodiment, the two sensors 701 and 703 are used for controlling the fan or fans of the hybrid cooling device only, and the device control applies to only the hardware of the device, and does not apply to the PCBs 503 and 505 and the electronics on the two PCBs. Such a design can increase the hybrid cooling device's deployability, tunability, and interoperability. The design aims to simplify the system integration and tuning procedures, which means plug and play.
As shown in
In operation 801, the processing logic initiates the temperature sensor to measure the temperature inside the cold plate in the hybrid cooling device, and initiates the pressure sensor to measure the pressure of the vapor passing through the vapor line.
In operation 803, the processing logic determines whether the measured temperature is under a predetermined threshold (i.e., Tcase-design).
In operation 805, if the measured temperature is not under the predetermined threshold, the processing logic can send commands to run the main fan in the hybrid cooling device to its maximum speed.
In operation 806, the processing logic determines whether the measured temperature has decreased under the predetermined threshold due to the blowing of the main fan at its maximum speed.
In operation 807, the measured temperature has decreased under the threshold hold. The processing logic continues monitoring the temperature, and also uses the measured pressure to control the operation of the hybrid cooling device.
In operation 808, the measured temperature has not decreased under the threshold hold, and the processing logic runs the secondary fan to its maximum speed.
In operation 809, the processing logic determines whether the measures pressure has increased.
In operation 811, the processing logic determines that the measured pressure has not increased and accordingly decreases the speed of the main fan.
In operation 813, the processing logic determines that the measured pressure has increased, and accordingly increases the speed of the main fan if the main fan is not running at its maximum speed.
In operation 815, the processing logic determines whether the measured temperature exceeds the predetermined threshold. If so, the processing logic will monitor the measured temperature to determine if it decreases under the predetermined threshold; otherwise, the processing logic will check if the measured pressure has increased.
As shown in
In addition, for at least some of the server chassis 1003, an optional fan module (not shown) is associated with the server chassis. Each of the fan modules includes one or more cooling fans. The fan modules may be mounted on the backends of server chassis 1003 or on the electronic rack to generate airflows flowing from frontend 1004, traveling through the air space of the sever chassis 1003, and existing at backend 1005 of electronic rack 1000.
In one embodiment, CDU 1001 mainly includes heat exchanger 1011, liquid pump 1012, and a pump controller (not shown), and some other components such as a liquid reservoir, a power supply, monitoring sensors and so on. Heat exchanger 1011 may be a liquid-to-liquid heat exchanger. Heat exchanger 1011 includes a first loop with inlet and outlet ports having a first pair of liquid connectors coupled to external liquid supply/return lines 131-132 to form a primary loop. The connectors coupled to the external liquid supply/return lines 131-132 may be disposed or mounted on backend 1005 of electronic rack 1000. The liquid supply/return lines 131-132, also referred to as room liquid supply/return lines, may be coupled to an external cooling system (e.g., a data center room cooling system).
In addition, heat exchanger 1011 further includes a second loop with two ports having a second pair of liquid connectors coupled to liquid manifold 1025 (also referred to as a rack manifold) to form a secondary loop, which may include a supply manifold (also referred to as a rack liquid supply line or rack supply manifold) to supply cooling liquid to server chassis 1003 and a return manifold (also referred to as a rack liquid return line or rack return manifold) to return warmer liquid back to CDU 1001. Note that CDUs 1001 can be any kind of CDUs commercially available or customized ones. Thus, the details of CDUs 1001 will not be described herein.
Each of server chassis 1003 may include one or more IT components (e.g., central processing units or CPUs, general/graphic processing units (GPUs), memory, and/or storage devices). Each IT component may perform data processing tasks, where the IT component may include software installed in a storage device, loaded into the memory, and executed by one or more processors to perform the data processing tasks. Server chassis 1003 may include a host server (referred to as a host node) coupled to one or more compute servers (also referred to as computing nodes, such as CPU server and GPU server). The host server (having one or more CPUs) typically interfaces with clients over a network (e.g., Internet) to receive a request for a particular service such as storage services (e.g., cloud-based storage services such as backup and/or restoration), executing an application to perform certain operations (e.g., image processing, deep data learning algorithms or modeling, etc., as a part of a software-as-a-service or SaaS platform). In response to the request, the host server distributes the tasks to one or more of the computing nodes or compute servers (having one or more GPUs) managed by the host server. The compute servers perform the actual tasks, which may generate heat during the operations.
Electronic rack 1000 further includes optional RMU 1002 configured to provide and manage power supplied to servers 1003, and CDU 1001. RMU 1002 may be coupled to a power supply unit (not shown) to manage the power consumption of the power supply unit. The power supply unit may include the necessary circuitry (e.g., an alternating current (AC) to direct current (DC) or DC to DC power converter, battery, transformer, or regulator, etc.,) to provide power to the rest of the components of electronic rack 1000.
In one embodiment, RMU 1002 includes optimization module 1021 and rack management controller (RMC) 1022. RMC 1022 may include a monitor to monitor operating status of various components within electronic rack 1000, such as, for example, computing nodes 1003, CDU 1001, and the fan modules. Specifically, the monitor receives operating data from various sensors representing the operating environments of electronic rack 1000. For example, the monitor may receive operating data representing temperatures of the processors, cooling liquid, and airflows, which may be captured and collected via various temperature sensors. The monitor may also receive data representing the fan power and pump power generated by the fan modules and liquid pump 1012, which may be proportional to their respective speeds. These operating data are referred to as real-time operating data. Note that the monitor may be implemented as a separate module within RMU 1002.
Based on the operating data, optimization module 1021 performs an optimization using a predetermined optimization function or optimization model to derive a set of optimal fan speeds for the fan modules and an optimal pump speed for liquid pump 1012, such that the total power consumption of liquid pump 1012 and the fan modules reaches minimum, while the operating data associated with liquid pump 1012 and cooling fans of the fan modules are within their respective designed specifications. Once the optimal pump speed and optimal fan speeds have been determined, RMC 1022 configures liquid pump 1012 and cooling fans of the fan modules based on the optimal pump speeds and fan speeds.
As an example, based on the optimal pump speed, RMC 1022 communicates with a pump controller of CDU 1001 to control the speed of liquid pump 1012, which in turn controls a liquid flow rate of cooling liquid supplied to the liquid manifold 1025 to be distributed to at least some of server chassis 1003. Similarly, based on the optimal fan speeds, RMC 1022 communicates with each of the fan modules to control the speed of each cooling fan of the fan modules, which in turn control the airflow rates of the fan modules. Note that each of fan modules may be individually controlled with its specific optimal fan speed, and different fan modules and/or different cooling fans within the same fan module may have different optimal fan speeds.
Note that the rack configuration as shown in
In one embodiment, the cooling devices disposed in each of the server chassis as shown may represent any cooling device described throughout this application.
In the foregoing specification, embodiments of the disclosure have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.
As previously explained, an embodiment of the disclosure may be (or include) a non-transitory machine-readable medium (such as microelectronic memory) having stored thereon instructions, which program one or more data processing components (generically referred to here as a “processor”) to perform airflow management operations, such as controlling fan speed of one or more fans of the battery module (and/or BBU shelf). In other embodiments, some of these operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmed data processing components and fixed hardwired circuit components of any of the battery modules described herein.
While certain aspects have been described and shown in the accompanying drawings, it is to be understood that such aspects are merely illustrative of and not restrictive on the broad disclosure, and that the disclosure is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. The description is thus to be regarded as illustrative instead of limiting.
In some aspects, this disclosure may include the language, for example, “at least one of [element A] and [element B].” This language may refer to one or more of the elements. For example, “at least one of A and B” may refer to “A,” “B,” or “A and B.” Specifically, “at least one of A and B” may refer to “at least one of A and at least one of B,” or “at least of either A or B.” In some aspects, this disclosure may include the language, for example, “[element A], [element B], and/or [element C].” This language may refer to either of the elements or any combination thereof. For instance, “A, B, and/or C” may refer to “A,” “B,” “C,” “A and B,” “A and C,” “B and C,” or “A, B, and C.”
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5720338, | Sep 10 1993 | ANTARES CAPITAL LP, AS SUCCESSOR AGENT | Two-phase thermal bag component cooler |
6055157, | Apr 06 1998 | Hewlett Packard Enterprise Development LP | Large area, multi-device heat pipe for stacked MCM-based systems |
9408329, | Mar 31 2014 | Meta Platforms, Inc | Server device cooling system |
20050248922, | |||
20070042514, | |||
20180235108, | |||
20180307283, | |||
20200403283, | |||
20200404805, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 25 2021 | Baidu USA LLC | (assignment on the face of the patent) | / | |||
Mar 25 2021 | GAO, TIANYI | Baidu USA LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 055725 | /0843 |
Date | Maintenance Fee Events |
Mar 25 2021 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Feb 14 2026 | 4 years fee payment window open |
Aug 14 2026 | 6 months grace period start (w surcharge) |
Feb 14 2027 | patent expiry (for year 4) |
Feb 14 2029 | 2 years to revive unintentionally abandoned end. (for year 4) |
Feb 14 2030 | 8 years fee payment window open |
Aug 14 2030 | 6 months grace period start (w surcharge) |
Feb 14 2031 | patent expiry (for year 8) |
Feb 14 2033 | 2 years to revive unintentionally abandoned end. (for year 8) |
Feb 14 2034 | 12 years fee payment window open |
Aug 14 2034 | 6 months grace period start (w surcharge) |
Feb 14 2035 | patent expiry (for year 12) |
Feb 14 2037 | 2 years to revive unintentionally abandoned end. (for year 12) |