systems and methods for virtual machine based huge page balloon support are provided. A guest operating system (OS) receives a request from a hypervisor for guest memory to be made available to a host operating system (OS). The guest OS further receives a huge page size of a host page and a quantity of requested guest memory. The guest OS then allocates unused guest memory and transmits at least one address of the allocated guest memory to the hypervisor, where the allocated guest memory is a contiguous block of memory that is at least the size of the huge page size and aligned to the size of the huge page size.
|
8. A method, comprising:
receiving, by a guest operating system (OS) executing on a virtual machine, from a hypervisor, a request, wherein the request requests guest memory to be made available to a host operating system (OS);
receiving, by the guest OS, a huge page size of a host page and a quantity of requested guest memory;
responsive to receiving the request, allocating, by the guest OS, unused guest memory and transmitting, by the guest OS, at least one address of the allocated guest memory to the hypervisor, wherein the allocated guest memory is a contiguous block of memory that is at least the size of the huge page size of the host page; and
determining, by the hypervisor, that the allocated guest memory is not reclaimable responsive to determining that the allocated guest memory is (a) not a multiple of the huge page size of the host page or (b) not aligned to the multiple of the huge page size of the host page.
13. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a computer system, cause the computer system to:
receive, by a guest operating system (OS) executing on a virtual machine, from a hypervisor, a request, wherein the request requests guest memory to be made available to a host operating system (OS);
receive, by the guest OS, a huge page size of a host page and a quantity of requested guest memory;
responsive to receiving the request, allocate, by the guest OS, unused guest memory and transmit, by the guest OS, at least one address of the allocated guest memory to the hypervisor, wherein the allocated guest memory is a contiguous block of memory that is at least the size of the huge page size of the host page; and
determine, by the hypervisor, that the allocated guest memory is not reclaimable responsive to determining that the allocated guest memory is (a) not a multiple of the huge page size of the host page or (b) not aligned to the multiple of the huge page size of the host page.
1. A system comprising:
one or more physical processors;
a hypervisor executing on the one or more physical processors;
a host operating system (OS) executing on the one or more physical processors; and
a virtual machine, including a guest operating system (OS), executing on the one or more physical processors to:
receive, by the guest OS executing on the virtual machine, from the hypervisor, a request, wherein the request requests guest memory to be made available to the host OS;
receive, by the guest OS, a huge page size of a host page and a quantity of requested guest memory;
responsive to receiving the request, allocate, by the guest OS, unused guest memory and transmit, by the guest OS, at least one address of the allocated guest memory to the hypervisor, wherein the allocated guest memory is a contiguous block of memory that is at least the size of the huge page size of the host page; and
determine, by the hypervisor, that the allocated guest memory is not reclaimable responsive to determining that the allocated guest memory is (a) not a multiple of the huge page size of the host page or (b) not aligned to the multiple of the huge page size of the host page.
2. The system of
3. The system of
4. The system of
5. The system of
6. The system of
7. The system of
9. The method of
10. The method of
11. The method of
12. The method of
14. The computer-readable non-transitory storage medium of
15. The computer-readable non-transitory storage medium of
16. The computer-readable non-transitory storage medium of
|
The present disclosure relates generally to memory management of virtual machines, and more particularly to ballooning with assigned devices. Virtualization may be used to provide some physical components as logical objects in order to allow running various software modules, for example, multiple operating systems, concurrently and in isolation from other software modules, on one or more interconnected physical computer systems. Virtualization allows, for example, consolidating multiple physical servers into one physical server running multiple virtual machines in order to improve the hardware utilization rate.
Virtualization may be achieved by running a software layer, often referred to as a hypervisor, above the hardware and below the virtual machines. A hypervisor may run directly on the server hardware without an operating system beneath it or as an application running on a traditional operating system. A hypervisor may virtualize the physical layer and provide interfaces between the underlying hardware and virtual machines. Processor virtualization may be implemented by the hypervisor scheduling time slots on one or more physical processors for a virtual machine, rather than a virtual machine actually having a dedicated physical processor. The present disclosure provides improved systems and methods for managing memory in a virtual environment.
The present disclosure provides a new and innovative system, methods and apparatus for virtual machine based huge page balloon support.
An example system comprises one or more physical processors, a hypervisor executing on the one or more physical processors, a host operating system (OS) executing on the one or more physical processors, and a virtual machine, including a guest operating system (OS), executing on the one or more physical processors.
A guest operating system (OS) receives a request from a hypervisor for guest memory to be made available to a host operating system (OS). The guest OS further receives a huge page size of a host page and a quantity of requested guest memory. The guest OS then allocates unused guest memory and transmits at least one address of the allocated guest memory to the hypervisor, where the allocated guest memory is a contiguous block of memory that is at least the size of the huge page size and aligned to the size of the huge page size.
Additional features and advantages of the disclosed method and apparatus are described in, and will be apparent from, the following Detailed Description and the Figures.
As used herein, physical processor or processor 120A-C refers to a device capable of executing instructions encoding arithmetic, logical, and/or I/O operations. In one illustrative example, a processor may follow Von Neumann architectural model and may include an arithmetic logic unit (ALU), a control unit, and a plurality of registers. In an example embodiment, a processor may be a single core processor which is typically capable of executing one instruction at a time (or process a single pipeline of instructions), or a multi-core processor which may simultaneously execute multiple instructions. In another example embodiment, a processor may be implemented as a single integrated circuit, two or more integrated circuits, or may be a component of a multi-chip module (e.g., in which individual microprocessor dies are included in a single integrated circuit package and hence share a single socket). A processor may also be referred to as a central processing unit (CPU).
As discussed herein, a memory device 130A-C refers to a volatile or non-volatile memory device, such as RAM, ROM, EEPROM, or any other device capable of storing data. As discussed herein, I/O device 140A-B refers to a device capable of providing an interface between one or more processor pins and an external device capable of inputting and/or outputting binary data.
Processors 120A-C may be interconnected using a variety of techniques, ranging from a point-to-point processor interconnect, to a system area network, such as an Ethernet-based network. Local connections within each node 110A-D, including the connections between a processor 120A and a memory device 130A-B and between a processor 120A and an I/O device 140A may be provided by one or more local buses of suitable architecture, for example, peripheral component interconnect (PCI). As used herein, a device of the host OS 186 (or “host device”) may refer to CPU 120A-C, MD 130A-C, I/O 140A-B, a software device, and/or hardware device 150A-B.
As noted above, computer system 100 may run multiple virtual machines (e.g., VM 170A-B), by executing a software layer (e.g., hypervisor 180) above the hardware and below the virtual machines 170A-B, as schematically shown in
In an example embodiment, a virtual machine 170A-B may execute a guest operating system 196A-B which may utilize the underlying VCPU 190A-D, VMD 192A-B, and VI/O devices 194A-D. One or more applications 198A-D may be running on a virtual machine 170A-B under the guest operating system 196A-B. In an example embodiment, a virtual machine 170A-B may include multiple virtual processors (VCPU) 190A-B. Processor virtualization may be implemented by the hypervisor 180 scheduling time slots on one or more physical processors 120A-C such that from the guest operating system's perspective those time slots are scheduled on a virtual processor 190A-B.
In an example embodiment, the guest operating system 196A-B may use a memory balloon 197A-C to temporarily make guest memory 195A-B available to a host operating system 186 by allocating a portion of the guest memory 195A-B to the memory balloon 197A-C. In an example embodiment, each guest operating system 196B may include multiple balloons 197B-C, where each balloon 197B-C manages memory pages or memory segments of a different size. For example, memory balloon 197B may gather segments of guest memory 195B to provision requests for 512 kB sized memory and memory balloon 197C may gather segments of guest memory 195B to provision requests for 1 MB sized memory. The memory balloons 197A-C may be managed by a balloon driver 199A-B.
The hypervisor 180 manages host memory 184 for the host operating system 186 as well as memory allocated to the virtual machines 170A-B and guest operating systems 196A-B such as guest memory 195A-B provided to guest OS 196A-B. Host memory 184 and guest memory 195A-B may be divided into a plurality of memory pages that are managed by the hypervisor 180. As discussed below, guest memory 195A-B allocated to the guest OS 196A-B are mapped from host memory 184 such that when a guest application 198A-D uses or accesses a memory page of guest memory 195A-B it is actually using or accessing host memory 184.
The hypervisor may keep track of how each memory page is mapped, allocated, and/or used through the use of one or more extended page tables 182. In this manner, the hypervisor 180 can prevent memory allocated to one guest OS 196A from being inappropriately accessed and/or modified by another guest OS 196B or the host OS 186. Similarly, the hypervisor 180 can prevent memory assigned to or being used by one application 198A from being used by another application 198B.
To accommodate a changing demand for memory by the virtual machines 170A-B and host operating system 186, the hypervisor 180 uses memory balloons 197A-B and balloon drivers 199A-B to change the amount of memory allocated between a guest OS 196A-B and a host OS 186. The process of memory ballooning is described in greater detail with reference to
A page table 182 is a data structure used by the hypervisor 180 to store a mapping of memory addresses of the guest OS 196A-B to memory addresses of the host OS 186. Accordingly, address translation is handled using page tables 182.
The extended page table 182 comprises page entries 200A-D that map PFN 210A-D (e.g., an address of the guest OS 196A-B) with an address 240A-D (e.g., an address of the host OS 186). Page tables 182 may be used together with any paging data structure used by the VMs 170A-B to support translation from guest OS 196A-B to host OS 186 addresses (e.g., 32-bit linear address space using a two-level hierarchical paging structure, Physical Address Extension mode, INTEL Extended Memory 64 Technology mode, etc.). In an example embodiment, page tables 182 may include presence identifiers 220A-D and protection identifiers 230A-D that indicate an access status for each of the pages 310A-D.
In an example embodiment, page tables 182 may include a presence identifier 220A-D. The presence identifier 220A-D indicates an access status of a page 310A-D corresponding to the page entry 200A-D of the page table 182. For example, a presence identifier 220A-D may used to define that a given page 310A-D is present (or accessible) or non-present (or inaccessible). For example, as illustrated in the example embodiment in
In an example embodiment, page tables 182 may include a protection identifier 230A-D. The protection identifier 230A-D indicates the access status of a page 310A-D corresponding to the page entry 200A-D of the page table 182. For example, a protection identifier 230A-D may used to define that a given page 310A-D is writable (or read-write), write-protected (or read-only), executable (or executable and readable), executable only, etc. For example, as illustrated in the example embodiment in
In an example embodiment, one or more page tables 182 may be maintained by the hypervisor 180 which map guest OS 196A-B addresses to host OS 186 addresses that are accessible by the hypervisor 180, VMs 170, guest OS 196A-B, Host OS 186, Host OS 186 resources, and/or VM Functions 183. The sizes of different page tables may vary and may include more or fewer entries than are illustrated in
In the illustrated example embodiment, the group of pages 415 constitute eight pages of contiguous memory. The illustrated cross-hatched sections of memory (e.g. segments 420, 425, and 430) denote memory that is presently being used (e.g. by an application 198A-D to store and/or access data) and the illustrated white sections of memory (e.g. segments 410, 415, 435, 440, and 445) denote memory that is available to be used and/or allocated. For example, segment 435 (consisting of 24 guest pages) is available to be allocated from the guest OS 196A to the host OS 186 upon request.
The example method 500 starts and a guest operating system 196A receives a request from a hypervisor 180 for guest memory 195A to be made available to a host operating system 186 (block 510). In an example embodiment, the hypervisor 180 specifically makes a request to a balloon driver 199A of the guest OS 196A to determine if the host OS 186 can borrow some portion of guest memory 195A.
In an example embodiment, the guest OS 196A then receives a huge page size of a host page and a quantity of requested guest memory 195A (block 520). In an example embodiment, the quantity of requested guest memory 195A may take the form of a number of bytes requested, a number of guest pages (e.g., 405) requested, or a number of host pages (e.g., 410) sought. In an example embodiment, the huge page size and the quantity of requested guest memory 195A are included in the initial request for guest memory 195A. In another example embodiment, the huge page size and the quantity of requested guest memory 195A are provided to the guest OS 196A responsive to a request by the guest OS 196A for this information.
In an example embodiment, responsive to receiving the request, the guest OS 196A allocates unused blocks (e.g., 415 and/or 435) of guest memory 195A and transmits at least one address of the newly allocated blocks (e.g., 415 and/or 435) of guest memory 195A to the hypervisor 180, where each of the newly allocated blocks (e.g., 415 and/or 430) of guest memory 195A is a contiguous block of memory that is (a) at least the size of the huge page size and (b) aligned to the size of the huge page size (block 530).
In an example embodiment, the guest OS 196A allocates unused guest memory pages (e.g., 415 and/or 435) by first identifying contiguous unused blocks (e.g., 415 and/or 435) of guest memory 195A that are at least the size of a single huge page size of a host page (e.g., 410) identified to the guest OS 196A in block 520 and aligned to the size of the huge page size. As used herein, unused memory refers to memory that is not presently assigned to, used, and/or accessed by an application 198A-D, device (e.g., VCPU 190A-B, VMD 192A-B, VI/O 194A-B), or other process of a guest OS 196A-B or host OS 186. Once such contiguous unused blocks (e.g., 415 and/or 435) of guest memory 195A are identified, the guest OS 196A then places the identified blocks (e.g., 415 and/or 435) of unused guest memory 195A into the balloon 197A. As more guest memory pages (e.g., 415 and/or 435) are placed in the balloon 197A, the balloon 197A inflates. In placing the guest memory pages (e.g., 415 and/or 435) into the balloon 197A, the guest OS 196A releases these memory pages (e.g., 415 and/or 435) for use by the host OS 186 (i.e. the guest OS 196A effectively allocates these memory pages to the host OS 186) and further, the guest OS 196A refrains from using these allocated guest memory pages (e.g., 415 and/or 435) while these pages are in the balloon 197A.
In an example embodiment, the guest OS 196A attempts to allocate all of the requested quantity of guest memory 195A. In an example embodiment, the guest OS 196A allocates as many contiguous blocks (e.g., 415 and/or 435) of unused guest memory 195A as are available that are at least the size of the huge page size and that are aligned to a huge page size. In an example embodiment, the transmitted address is the beginning address of at least one contiguous unused block (e.g., 415 or 435) of allocated guest memory 195A. In an example embodiment, the guest OS 196A transmits an address for each contiguous unused block (e.g., 415 or 435) of guest memory 195A that has been allocated by the guest OS 196A. In an example embodiment, the guest OS 196A also transmits at least one indicator of the size of the guest memory 195A that has been allocated. For example, the guest OS 196A may transmit, in addition a beginning address, the size (e.g., a number of host pages, a number, a size of memory in bytes, or an offset) of each block (e.g., 415 and/or 435) of guest memory 195A that has been allocated by the guest OS 196A.
For example, responsive to receiving a request for 5 host page sized portions of guest memory 195A from the guest OS 196A, the guest OS 196A may identify blocks 415 and 435 as being the only two sets of contiguous blocks that are at least the size of the huge page size of the host page and aligned to the huge page size. As illustrated in the example embodiment, contiguous block 415 is the size of a single huge page sized host page and contiguous block 435 is the size of three huge page sized host pages. Although this amounts to only 4 host pages rather than the requested 5 host pages, the guest OS 196A nevertheless allocates the available guest memory 195A to the host OS 186 using the balloon 197A. The guest OS 196A may then transmit the beginning address of contiguous block 415 and the size of contiguous block 415 (e.g., in this example 1 huge page or 8 guest pages) as well as the beginning address of contiguous block 435 and the size of contiguous block 435 (e.g., in this example 3 huge pages or 24 guest pages) to the hypervisor 180.
The guest OS 196A further designates the contiguous set of memory pages (e.g., 415 and/or 435) that are placed in the balloon 197A as unavailable and will not allow any applications 198A-B, devices (e.g., VCPU 190A, VMD 192A, VI/O 194A), or other processes of the guest OS 196A to use the allocated memory pages (e.g., 415 and 435) until they are removed from the balloon 197A. In an example embodiment, the host OS 180 may use the allocated contiguous blocks (e.g., 415 and/or 435) of guest memory 195A for its own processes. In an example embodiment, the host OS 180 may make the contiguous blocks (e.g., 415 and/or 435) of allocated guest memory 195A available for use by other guest operating systems (e.g., guest OS 196B).
In the illustrated example embodiment, a hypervisor 180 generates and sends to a guest OS 196A a request for guest memory 195A to be made available to a host OS 186 (blocks 605 and 610). The guest OS 196A receives the request (block 615). Responsive to receiving the request, the guest OS 196A generates and sends a request for the hypervisor 180 to specify a host page size and a quantity of requested guest memory 195A (blocks 620 and 625). The hypervisor 180 receives the request (block 630). The hypervisor 180 then transmits to the guest OS 196A a huge page size of a host page of host OS 186 and a quantity of requested guest memory (blocks 635 and 640). The guest OS 196A receives the huge page size and the indicator of the quantity of requested guest memory 195A (block 645).
The guest OS 196A then allocates contiguous unused blocks (e.g., 415 and/or 435) of the guest memory 195A to the host OS 186 (e.g., using a balloon 197A) and transmits at least one address of the allocated guest memory 195A and at least one indicator of the size of the allocated guest memory 195A to the hypervisor 180 (blocks 650 and 655). The hypervisor 180 receives the transmitted address and size information (block 660). The hypervisor 180 then determines whether the each of the allocated contiguous unused blocks (e.g., 415 and/or 435) of guest memory 195A is reclaimable (block 665). In an example embodiment, the hypervisor 180 determines that each of the allocated unused blocks (e.g., 415 and/or 435) of guest memory 195A is reclaimable if each allocated unused blocks (e.g., 415 and/or 435) of guest memory 195A is (a) at least a multiple of the huge page size of a host page, and/or (b) aligned to the multiple of the huge page size of the host page. For example, the hypervisor 180 may determine that contiguous memory blocks 415 and 435 are reclaimable based on the fact that each of them is a multiple of the huge page size of a host page (block 415 is 1 times the size of the huge page size and block 435 is 3 times the size of the huge page size) and the fact that each of them is aligned to a multiple of the huge page size of a host page. On the other hand, if the guest OS 196A allocated contiguous unused block 445 (which only contains 5 guest pages), then the hypervisor 180 may determine that the contiguous memory block 445 is not reclaimable based either on the fact that it does not constitute a multiple of the huge page size of a host page or the fact that it is not aligned to a multiple of the huge page size of a host page.
Responsive to determining that that the allocated guest memory 195A is reclaimable, the hypervisor 180 reclaims the allocated guest memory 195A (block 670). Responsive to determining that the allocated guest memory is not reclaimable, the hypervisor 180 generates and returns an allocation error to the guest OS 196A (blocks 675 and 680). The guest OS 196A receives the allocation error (block 685).
Responsive to receiving the allocation error, the guest OS 196A allocates smaller unused blocks (e.g., 405 and/or 445) of guest memory 195A and transmits at least one address of the allocated smaller unused blocks (e.g., 405 and/or 445) of guest memory 195A and at least one indicator of the size of the allocated smaller unused blocks (e.g., 405 and/or 445) guest memory 195A to the hypervisor 180 (blocks 690 and 695). In an example embodiment, responsive to receiving the allocation error, the guest OS 196A may further deallocate the previously allocated guest memory 195A described in block 650. In an example embodiment, this may involve removing the previously allocated contiguous blocks (e.g., 415 and/or 435) of guest memory 195A from the balloon 197A. The hypervisor 180 receives the transmitted address and size information (block 700).
Although not illustrated in
It will be appreciated that all of the disclosed methods and procedures described herein can be implemented using one or more computer programs or components. These components may be provided as a series of computer instructions on any conventional computer readable medium or machine readable medium, including volatile or non-volatile memory, such as RAM, ROM, flash memory, magnetic or optical disks, optical memory, or other storage media. The instructions may be provided as software or firmware, and/or may be implemented in whole or in part in hardware components such as ASICs, FPGAs, DSPs or any other similar devices. The instructions may be configured to be executed by one or more processors, which when executing the series of computer instructions, performs or facilitates the performance of all or part of the disclosed methods and procedures.
It should be understood that various changes and modifications to the example embodiments described herein will be apparent to those skilled in the art. Such changes and modifications can be made without departing from the spirit and scope of the present subject matter and without diminishing its intended advantages. It is therefore intended that such changes and modifications be covered by the appended claims.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
7788461, | Apr 15 2004 | International Business Machines Corporation | System and method for reclaiming allocated memory to reduce power in a data processing system |
8484405, | Jul 13 2010 | VMware LLC | Memory compression policies |
8769184, | Oct 29 2010 | VMware LLC | System and method to prioritize large memory page allocation in virtualized systems |
8949295, | Jun 29 2010 | VMware LLC | Cooperative memory resource management via application-level balloon |
9176766, | Jul 06 2011 | Microsoft Technology Licensing, LLC | Configurable planned virtual machines |
9280458, | May 12 2011 | Citrix Systems, Inc | Reclaiming memory pages in a computing system hosting a set of virtual machines |
WO2013082598, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 12 2015 | Red Hat Israel, Ltd. | (assignment on the face of the patent) | / | |||
Aug 12 2015 | TSIRKIN, MICHAEL | Red Hat Israel, Ltd | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 036526 | /0644 |
Date | Maintenance Fee Events |
Sep 16 2020 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
Jul 18 2020 | 4 years fee payment window open |
Jan 18 2021 | 6 months grace period start (w surcharge) |
Jul 18 2021 | patent expiry (for year 4) |
Jul 18 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 18 2024 | 8 years fee payment window open |
Jan 18 2025 | 6 months grace period start (w surcharge) |
Jul 18 2025 | patent expiry (for year 8) |
Jul 18 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 18 2028 | 12 years fee payment window open |
Jan 18 2029 | 6 months grace period start (w surcharge) |
Jul 18 2029 | patent expiry (for year 12) |
Jul 18 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |