Intelligent query planning for metric gateway

Intelligent query planning for metric gateway
US11900164

PTO Wrapper PDF
Dossier Espace Google

Patent 11900164
Priority Nov 24 2020
Filed Feb 10 2021
Issued Feb 13 2024
Expiry Sep 02 2042 Extension 569 days
Inventors Liu, Cong
Assg.orig Nutanix, I…
Assg.curr Nutanix, I…
Entity Large
Referenced by 0
References 176
Maint.: currently ok

CROSS-REFERENCE TO R…
BACKGROUND
SUMMARY
BRIEF DESCRIPTION OF…
DETAILED DESCRIPTION

17. A computer-implemented method comprising:

receiving a request to join a first entity data structure, a second entity data structure, and a third entity data structure using a first join order;

determining a first performance cost of the first join order;

determining a second performance cost of a second join order;

determining whether the second performance cost is lower than the first performance cost;

in response to determining that the second performance cost is lower than or exceeds the first performance cost, selecting the second join order or the first join order, respectively;

joining the first entity data structure, the second entity data structure, and the third entity data structure using the selected join order to generate a joined entity data structure; and

sending the joined entity data structure.

9. A non-transitory computer readable storage medium comprising instructions stored thereon that, when executed by a processor, cause the processor to:

receive a request to join a first entity data structure, a second entity data structure, and a third entity data structure using a first join order;

determine a first performance cost of the first join order;

determine a second performance cost of a second join order;

determine whether the second performance cost is lower than the first performance cost;

in response to determining that the second performance cost is lower than or exceeds the first performance cost, select the second join order or the first join order, respectively;

join the first entity data structure, the second entity data structure, and the third entity data structure using the selected join order to generate a joined entity data structure; and

send the joined entity data structure.

1. An apparatus comprising a processor and a memory, wherein the memory includes programmed instructions that when executed by the processor, cause the apparatus to:

receive a request to join a first entity data structure, a second entity data structure, and a third entity data structure using a first join order;

determine a first performance cost of the first join order;

determine a second performance cost of a second join order;

determine whether the second performance cost is lower than the first performance cost;

in response to determining that the second performance cost is lower than or exceeds the first performance cost, select the second join order or the first join order, respectively;

join the first entity data structure, the second entity data structure, and the third entity data structure using the selected join order to generate a joined entity data structure; and

send the joined entity data structure.

2. The apparatus of claim 1, wherein the first join order indicates that the first entity data structure is to be joined to the second entity data structure to generate a fourth entity data structure, and wherein the third entity data structure is to be joined to the fourth entity data structure and wherein the second join order indicates that the first entity data structure is to be joined to the third entity data structure to generate a fifth entity data structure, and wherein the second entity data structure is to be joined to the fifth entity data structure.

3. The apparatus of claim 1, wherein the first performance cost includes a first count of a first plurality of entities of the second entity data structure and the second performance cost includes a second count of a second plurality of entities of the third entity data structure.

4. The apparatus of claim 1, wherein the first performance cost includes one or more of a first central processing unit (CPU) usage, a first input/output (I/O) usage, or a first network usage of a first node including a first entity of the second entity data structure and the second performance cost includes one or more of central processing unit (CPU) usage, input/output (I/O) usage, or network usage of a second node including a second entity in the third entity data structure.

5. The apparatus of claim 1, wherein the first entity data structure includes a first entity and a second entity and wherein the first entity and the second entity are in different geographic locations of a distributed, hyper-converged data center.

6. The apparatus of claim 1, wherein the first entity data structure is in a first data location with a higher consistency guarantee and a second data location with a lower consistency guarantee, wherein the memory includes programmed instructions that when executed by the processor, cause the apparatus to:

determine whether accessing the first entity data structure is in the first data location is greater than a first predetermined threshold,

in response to determining that accessing the first entity data structure is in the first data location is greater than or less than the first predetermined threshold, access the data in the second location or the first location, respectively.

7. The apparatus of claim 1, wherein the first entity data structure is in a first data location with a higher consistency guarantee and a second data location with a lower consistency guarantee, wherein the memory includes programmed instructions that when executed by the processor, cause the apparatus to:

determine whether first instructions indicate to access the data in the second location,

in response to determining that the first instructions indicate or do not indicate to access the data in the second location, access the data in the second location or the first location, respectively.

8. The apparatus of claim 1, wherein the memory includes programmed instructions that when executed by the processor, cause the apparatus to:

copy the first entity data structure from a first data location to a second data location based on an indication that an alert is to occur; and

in response to the alert occurring, access the entity data structure from the second data

location .

10. The medium of claim 9, wherein the first join order indicates that the first entity data structure is to be joined to the second entity data structure to generate a fourth entity data structure, and wherein the third entity data structure is to be joined to the fourth entity data structure and wherein the second join order indicates that the first entity data structure is to be joined to the third entity data structure to generate a fifth entity data structure, and wherein the second entity data structure is to be joined to the fifth entity data structure.

11. The medium of claim 9, wherein the first performance cost includes a first count of a first plurality of entities of the second entity data structure and the second performance cost includes a second count of a second plurality of entities of the third entity data structure.

12. The medium of claim 9, wherein the first performance cost includes one or more of a first central processing unit (CPU) usage, a first input/output (I/O) usage, or a first network usage of a first node including a first entity of the second entity data structure and the second performance cost includes one or more of central processing unit (CPU) usage, input/output (I/O) usage, or network usage of a second node including a second entity in the third entity data structure.

13. The medium of claim 9, wherein the first entity data structure includes a first entity and a second entity and wherein the first entity and the second entity are in different geographic locations of a distributed, hyper-converged data center.

14. The medium of claim 9, wherein the first entity data structure is in a first data location with a higher consistency guarantee and a second data location with a lower consistency guarantee, wherein the medium further comprises instructions, when executed by the processor, cause the processor to:

determine whether accessing the first entity data structure is in the first data location is greater than a first predetermined threshold,

15. The medium of claim 9, wherein the first entity data structure is in a first data location with a higher consistency guarantee and a second data location with a lower consistency guarantee, wherein the medium further comprises instructions, when executed by the processor, cause the processor to:

determine whether first instructions indicate to access the data in the second location,

16. The medium of claim 9, further comprising instructions stored thereon that, when executed by a processor, cause the processor to:

copy the first entity data structure from a first data location to a second data location based on an indication that an alert is to occur; and

in response to the alert occurring, access the entity data structure from the second data location.

18. The method of claim 17, wherein the first join order indicates that the first entity data structure is to be joined to the second entity data structure to generate a fourth entity data structure, and wherein the third entity data structure is to be joined to the fourth entity data structure and wherein the second join order indicates that the first entity data structure is to be joined to the third entity data structure to generate a fifth entity data structure, and wherein the second entity data structure is to be joined to the fifth entity data structure.

19. The method of claim 17, wherein the first performance cost includes a first count of a first plurality of entities of the second entity data structure and the second performance cost includes a second count of a second plurality of entities of the third entity data structure.

20. The method of claim 17, wherein the first performance cost includes one or more of a first central processing unit (CPU) usage, a first input/output (I/O) usage, or a first network usage of a first node including a first entity of the second entity data structure and the second performance cost includes one or more of central processing unit (CPU) usage, input/output (I/O) usage, or network usage of a second node including a second entity in the third entity data structure.

CROSS-REFERENCE TO RELATED APPLICATION

This application is related to and claims priority under 35 U.S. § 119(e) the U.S. Provisional Patent Application No. 63/117,968, filed Nov. 24, 2020, titled “INTELLIGENT QUERY PLANNING FOR METRIC GATEWAY,” the entire contents of which are incorporated herein by reference for all purposes.

BACKGROUND

The following description is provided to assist the understanding of the reader.

Virtual computing systems are widely used in a variety of applications. Virtual computing systems include one or more host machines running one or more entities (e.g., workloads, virtual machines, containers, and other entities) concurrently. Modern virtual computing systems allow several operating systems and several software applications to be safely run at the same time, thereby increasing resource utilization and performance efficiency. However, the present-day virtual computing systems have limitations due to their configuration and the way they operate.

SUMMARY

In accordance with some aspects of the present disclosure, an apparatus is disclosed. The apparatus includes a processor and a memory, wherein the memory includes programmed instructions that when executed by the processor, cause the apparatus to receive a request to join a first entity data structure, a second entity data structure, and a third entity data structure using a first join order, determine a first performance cost of the first join order, determine a second performance cost of a second join order, and determine whether the second performance cost is lower than the first performance cost. The apparatus includes a processor and a memory, wherein the memory includes programmed instructions that when executed by the processor, cause the apparatus to, in response to determining that the second performance cost is lower than or exceeds the first performance cost, select the second join order or the first join order, respectively. The apparatus includes a processor and a memory, wherein the memory includes programmed instructions that when executed by the processor, cause the apparatus to join the first entity data structure, the second entity data structure, and the third entity data structure using the selected join order to generate a joined entity data structure and send the joined entity data structure.

In accordance with some aspects of the present disclosure, a non-transitory computer readable storage medium is disclosed. The non-transitory computer readable storage medium includes instructions stored thereon that, when executed by a processor, cause the processor to receive a request to join a first entity data structure, a second entity data structure, and a third entity data structure using a first join order, determine a first performance cost of the first join order, determine a second performance cost of a second join order, and determine whether the second performance cost is lower than the first performance cost. The non-transitory computer readable storage medium includes instructions stored thereon, when executed by a processor, cause the processor to, in response to determining that the second performance cost is lower than or exceeds the first performance cost, select the second join order or the first join order, respectively. The non-transitory computer readable storage medium includes instructions stored thereon, when executed by a processor, cause the processor to, join the first entity data structure, the second entity data structure, and the third entity data structure using the selected join order to generate a joined entity data structure and send the joined entity data structure.

In accordance with some aspects of the present disclosure, a computer-implemented method is disclosed. The computer-implemented method includes receiving a request to join a first entity data structure, a second entity data structure, and a third entity data structure using a first join order, determining a first performance cost of the first join order, determining a second performance cost of a second join order, and determining whether the second performance cost is lower than the first performance cost. The computer-implemented method includes, in response to determining that the second performance cost is lower than or exceeds the first performance cost, selecting the second join order or the first join order, respectively. The computer-implemented method includes joining the first entity data structure, the second entity data structure, and the third entity data structure using the selected join order to generate a joined entity data structure and sending the joined entity data structure.

The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the following drawings and the detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an example block diagram of a virtual computing system for signal planning and executing, in accordance with some embodiments of the present disclosure.

FIG. 2 is an example entity graph is shown, in accordance with some embodiments of the present disclosure.

FIG. 3 is an example flowchart of a method, in accordance with some embodiments of the present disclosure.

FIG. 4 is an example flowchart of a method, in accordance with some embodiments of the present disclosure.

FIG. 5 is an example flowchart of a method, in accordance with some embodiments of the present disclosure.

The foregoing and other features of the present disclosure will become apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several embodiments in accordance with the disclosure and are therefore, not to be considered limiting of its scope, the disclosure will be described with additional specificity and detail through use of the accompanying drawings.

DETAILED DESCRIPTION

In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, and designed in a wide variety of different configurations, all of which are explicitly contemplated and made part of this disclosure.

Some implementations of cluster management services enable a user to query, view, filter, sort, group, and extract information from data center entities. One problem is that certain entities, or hosts/disks/etc. of entities, suffer intermittent contention or other performance-related issues. Additionally, one or more types of entity tables may vary in size from one or more other types of entity tables. When a user sends a query for entity information, such as a join query, the manner and/or order in which the entity information is collected can cause degraded performance such as increased latency, reduced throughput, and/or an increased in network hops, thereby worsening user experience.

Described herein are embodiments of a query plan service that detects a contention, a join order that sub-optimally filters entities, or other issue and reorders operations of the query in order to minimize or eliminate the effect of the contention, suboptimal filtering, or other issue. For example, the query plan service can reorder the joins so that entity tables of the least contentious entities are joined first. Such a reorder increases the likelihood that the more contentious entities will resolve their own contentions before their data is joined to the other entities' data. Additionally or alternatively, the query plan service can reorder the joins so that smallest entity tables are joined first. Such a reorder increases a number of entities that are filtered out before they are joined. This is advantageous for resource-expensive joins such as category joins for category and/or cluster-based access control. Accordingly, reordering operations of the query is beneficial to reduce latency, reduce resource usage, and improve user experience.

Another problem is that certain queries are expensive in terms of CPU, I/O, or network usage. Especially with distributed data centers holding Terabytes of data, poor execution of the queries can result in unnecessary latency and usage.

Described herein are embodiments of a query executing service that detects performance bottlenecks such as queries that causes excessive CPU, I/O, or network usage. In some embodiments, the query executing service requests or accepts user hints on data preference or tolerance. For example, the user may only accept strongly consistent data, or the user may tolerate eventually consistent data. In some embodiments, the query executing service caches data up front when there is a bottleneck and the user tolerates slightly stale data. Advantageously, embodiments of the systems and methods described herein provides flexibility to the user and tailors performance tradeoffs based on the use case. Moreover, in cases where the user tolerates less consistent data, embodiments of the systems and methods described herein reduce latency.

Another problem is that certain alerts are expensive in terms of CPU, I/O, or network usage. In particular, users cannot afford delays on addressing issues flagged by alerts because if the alert is not addressed in a timely manner, a disk, node, or cluster may fail causing a disruption to an entire team or department of an organization.

Described herein are embodiments of an alert executing service that determines the entities related to an alert. In some embodiments, the alert executing service proactively caches the data for the related entities so that when the user requests the data in order to troubleshoot the issue raised by the alert, the data will be delivered promptly to the user, thereby improving user experience and reducing the likelihood of a massive failure event.

FIG. 1 is an example block diagram of a virtual computing system 100 for signal planning and executing, in accordance with some embodiments of the present disclosure. The virtual computing system 100 includes one or more clusters 102 (e.g., hyper-converged infrastructure (HCl) clusters). The one or more clusters 102 are connected by a network 116. The one or more clusters can be located in one data center (on-premises) or one cloud, or distributed across multiple data centers, multiple clouds, or a data center-cloud hybrid. The one or more clusters 102 includes one or more nodes 104 (e.g., host machines, computers, computing devices, physical devices). The one or more nodes 104 include a controller virtual machine (CVM) 106, one or more virtual machines (VMs) 108, a hypervisor 110, one or more disks 112, and one or more central processing units (CPUs) 114. The one or more VMs 108 may run their own operating systems and applications on underlying resources (e.g., the one or more disks 112 and the one or more central processing units (CPUs) 114) virtualized through the hypervisor 110. The CVM 106 controls the I/O operation between the one or more disks 112 on its node 104 and VMs 108 on any node 104 in the one or more clusters 102. The one or more VMs 108 and the CVM 106 share resources of the one or more nodes 104. Thus, operating system (OS) data, application data, VM configuration data, and other data used by the one or more VMs 108 can be distributed across disks 112 on multiple nodes 104.

The virtual computing system 100 includes a cluster management service 122 which lets the user view as well as filter, sort, and group data center entities such as VMs, clusters, nodes, networks, etc. The cluster management service 122 plans and executes signals, which include queries and alerts. A query can include a point query. For example, a user can query a VM state in order to update a VM configuration parameter such as a VM name. A query can include a query for a report or other data for troubleshooting or debugging an issue affecting a cluster. A query can include a join query. A query can include one or more of a point query, a query for a report, or a join query.

The cluster management service 122 includes a user interface (UI) 124. In response to inputs from users (e.g., customers, datacenter administrators, cloud administrators, database administrators, system reliability engineers), the UI 124 receives/generates/displays queries or other requests from users. In some embodiments, each cluster 102 is dedicated to one department (e.g., human resources, finance, database, infrastructure technology) of an organizational client and each cluster 102 has a corresponding UI 124 that receives queries or other requests from a corresponding user. In response to inputs from a processor (e.g., query plan service 126, query execution service), the UI 124 provides/displays/renders responses to queries to the user.

The cluster management service 122 includes a query plan service 126 coupled to the UI 124. The UI 124 may forward the queries to the query plan service 126. The query plan service 126 optimizes for modeling relational queries over entity graphs. The query plan service 126 translates the query into a graph query (e.g., an entity graph). An entity graph is a graph that models relationships between entities. For example, the query plan service 126 translates a join query into an entity graph that models an order in which entity data structures that are to be joined.

Referring now to FIG. 2, an example entity graph 200 is shown, in accordance with some embodiments of the present disclosure. The entity graph 200 includes a plurality of vertices 210 and one or more edges 220. Each of the vertices 210 indicates/models a data structure (e.g., an entity table) of one or more entities, in which each entity data structure can have a number of entries equal to the number of entities for that entity data structure. For example, vertex 210A indicates a first entity data structure of one or more virtual machines, vertex 210B indicates one or more keys, and vertex 210C indicates one or more values. Keys and values (and key-value pairs) are categories. In some embodiments, a category is a user-defined entity. For example, the keys in vertex 210B can be controller virtual machines and the values in vertex 210C can be clusters. A category can include one or more of key-value pairs, keys, values, virtual machines, hosts, disks, clusters, etc.

Each of the edges 220 indicates a link between the entities. The edge 220 may indicate a join order. For example, an edge 220A pointing from 210A towards 210B indicates that the entity data structure of 210B is to be joined to the entity data structure of 210A. Likewise, an edge 220B pointing from 210B towards 210C indicates that the entity data structure of 210C is to be joined to the entity data structure of 210B (after 210B is joined to the entity data structure of 210A). In the example, the two joins generate a new (e.g., combined) entity data structure.

As part of a join, an entry from a first data structure is joined/merged/combined with an entry in a second data structure if the two entries have one matching value for a specified column/attribute. For example, a certain entry in a VM table is joined with a certain entry in a cluster table if a column in each table for disk_ID has a same value. In some types of joins (e.g., inner join, left join, right join), an entry from a first data structure may not be included in the joined data structure if the second data structure does not have an entry with a matching value for the specified column or attribute. This aspect is referred to as filtering.

The query plan service 126 supports category and cluster-based access control over the entity graph in a distributed hyper-converged data center (such as the one or more clusters 102). Customers can put entities in categories and define access control policy based on categories. Thus, a category join (e.g., due to security role-based access control (RBAC)) is leveraged. Accordingly, entities can be represented as a power law distribution in terms of a number of accesses to categories as well as the entities within them (e.g., a number of edge out degrees of an entity). As used herein, a power law distribution is a distribution in which most (e.g., 95%) of the items are clustered in one part of the distribution (e.g., the high end or low end) and the remaining items form a long tail of the distribution. For example, certain entities that have a high (e.g., higher than a predetermined threshold) edge out degree (for example, certain kinds of categories that are used in RBAC) may be infrequent but may incur a greater performance degradation than the performance degradation of entities that do not have a high edge out degree.

The query plan service 126 determines (e.g., arranges, re-arranges, generates, modifies) a join order (e.g., an order in which the entity data structures are joined), e.g., as part of a query plan. In some embodiments, the query plan service 126 looks at is the particular graph vertex (e.g., the size of the table indicated by the vertex) involved in the join and, as such, reorders the query plan to filter as many entities as possible early on. In some embodiments, the query plan service 126 traverses the category with the least number of entities to filter early on to avoid larger computation later and to reduce (e.g., total) resource usage/consumption. This is particularly advantageous, for example, when there are many combinations of AND or OR operators between categories. Further, in case of OR clauses in queries (e.g., in the conditions of the queries), in which it is possible to revisit the same edges again, the query plan service 126 reduces the number of backend store calls.

In one example, the vertex 210A indicates an entity data structure having entries for 40 virtual machines, vertex 210B an entity data structure having entries for 20 keys, and vertex 210C an entity data structure having entries for 10 values, the query plan service 126 reorders the query plan to one wherein the entity data structure of vertex 210B is to be joined to the entity data structure of vertex 210C first, then the entity data structure of vertex 210A is to be joined to the joined entity data structure of vertices 210B-C. Based on this plan by the query plan service 126, the query execution service 128, in executing the query plan, filters some of the entities/entries (e.g., 30 of the 40 virtual machines) before joining the entity data structure of vertex 210A. This is more optimal than joining vertex 210A to vertex 210B first, in which only 20 of the 40 virtual machines are filtered before the next join of vertex 210C. Advantageously, filtering more entities earlier results in lower latency and lower resource usage.

In some embodiments, when one of the nodes (e.g., in a distributed network/cloud/data center of nodes) which owns the category entity is running hot (e.g., using a predetermined amount/percentage/portion of available resources, achieving a predetermined amount/percentage/portion of resource capacity, or causing or likely to cause a resource contention) on disk, CPU, memory, or network resources, the query plan service 126 re-orders the traversal such that the virtual computing system 100 queries other nodes until the contention goes away.

The cluster management service 122 includes a query execution service 128 coupled to the query plan service 126. The query execution service 128 receives a query plan from the query plan service 126 and executes the query plan. The query execution service 128 learns the cost/profile of executing queries in terms of CPU, input/output (I/O), and network metrics. The query execution service 128 can prevent head of line blocking for lighter queries depending on current system load.

The query execution service 128 detects bottlenecks/degradations/costs in CPU, I/O, and network performance greater than a predetermined threshold. For example, the query execution service 128 determines a first CPU cost when a query requires greater than a predetermined CPU cycles for data processing/computing. Also, the query execution service 128 determines a first I/O cost when a query requires greater than a predetermined number of reads from a disk (or writes to a disk). Moreover, the query execution service 128 determines a first network cost when more than a predetermined number of network hops are required to fetch data.

The query execution service 128 accepts user hints for certain queries that can work with relaxed guarantees (e.g., in terms of consistency). In some embodiments, the user may indicate that the user only accepts highly consistent data (e.g., data from a location on one or more of the disks 112). For example, when fetching a VM state/configuration, the user may request highly consistent (e.g., fresh/new) data in order to modify the system state. In some embodiments, the user may indicate that the user permits eventually consistent data (e.g., data from a location of the cache 130 coupled to the query execution service 128). For example, when generating a report in order to debug an issue (e.g., which users powered on in the last 7 days), the user may permit slightly stale data by reading from the cache 130. In some embodiments, at a first time, the query execution service 128 or one of the VMs 108 or the CVM 106 copies data from one or more of the disks 112 to the cache 130. In case the current system load crosses a threshold, the query execution service 128 may choose to return a cached result set.

The cluster management service 122 includes an alert service 132 that receives or generates an alert. The alert can predict/report a disk failure or a bully VM (e.g., a VM that consumes resources and causes contention with other VMs for resources).

The cluster management service 122 includes an alert execution service 134 coupled to the UI 124 and coupled to the alert service 132. The alert execution service 134 can receive alerts from the alert service 132 or user-generated alerts from the UI 124. The alert execution service 134 supports prefetching (e.g., proactive/up-front caching). The alert execution service 134 looks at a current system state such as critical alerts and prefetches metric/state information for entities (e.g., VM, host, disk, network card, etc.) affected by the alert. The alert execution service 134 takes the entity relationships into consideration when determining entities affected by the alert. The alert execution service 134 caches data (e.g., configuration settings, web pages) that the alert execution service 134 predicts that the user will use. For example, in response to the alert service 132 and/or machine learning (ML) algorithms forecast a disk is going to fail and/or proactively raise an alert, the alert execution service 134 fetches information for the node containing the disk, VM/container metrics which could get affected by the failure, and the page where the user can order a new disk. At a first time, the predicted data is copied to the cache 136 to avoid a cold start. Then, at a second time, the alert service 132 reads the predicted data from the cache to improve the user experience. Another example is that the alert identifies an I/O bully VM, which can affect any VM on the cluster. The alert service 132 can copy data about all of the VMs of the cluster to avoid a cold start.

Each of the components (e.g., elements, entities) of the virtual computing system 100 (e.g., the cluster management service 122, the UI 124, the query plan service 126, the query execution service 128, the alert service 132, and the alert execution service, 134), is implemented using hardware, software, or a combination of hardware or software, in one or more embodiments. Each of the components of the virtual computing system 100 may be a processor with instructions or an apparatus/device including a processor with instructions, in some embodiments. In some embodiments, multiple components (e.g., the query plan service 126, the query execution service 128, and/or the alert service 132) may be part of a same apparatus and/or processor. Each of the components of the virtual computing system 100 can include any application, program, library, script, task, service, process or any type and form of executable instructions executed by one or more processors, in one or more embodiments. Each of the one or more processors is hardware, in some embodiments. The apparatus may include one or more computer readable and/or executable storage media including non-transitory storage media, or memory. The instructions may be stored on one or more of the storage or memory, and the instructions, when executed by the processor, can cause the apparatus to perform in accordance with the instructions.

Referring now to FIG. 3, a flowchart of an example method 300 is illustrated, in accordance with some embodiments of the present disclosure. The method 300 may be implemented using, or performed by, the virtual computing system 100, one or more components of the virtual computing system 100, or a processor associated with the virtual computing system 100 or the one or more components of the virtual computing system 100. Additional, fewer, or different operations may be performed in the method 300 depending on the embodiment.

The processor (e.g., the query plan service 126, the query execution service 128, or a combination thereof) receives a request to join a plurality of entity data structures using (e.g., in accordance with) a first join order having a first performance cost (at operation 310). For example, the processor receives a query that requests to (a) join entity data tables A and B, and then (b) join the combined entity data table A-B with an entity data table C. In some embodiments, the processor receives the request from a user interface (e.g., the UI 124) in response to inputs from a user.

The processor determines that the first join order has a first performance cost (at operation 320). In some embodiments, the first performance cost is a performance metric (e.g., one or more of a queue length, a first central processing unit (CPU) usage, a first input/output (I/O) usage, a first network usage, an I/O throughput, an I/O per second (IOPS), or a latency, e.g., from a sending a request to a node to fetch an entity data structure or join two entity data structures to receiving a response to the request with the requested entity data structure). In some embodiments, the performance metric is of a node that the entity (corresponding to one of the entity data structures that is to be joined first) is operating within or coupled to. For example, the processor determines the CPU usage of a node that a VM listed in entity table A is on. In some embodiments, the first performance cost is compared to a predetermined threshold that indicates contention, e.g., in processing queries. In some embodiments, the performance cost is a statistical performance cost (e.g., a latency value/range and a likelihood of the latency value).

In some embodiments, the first performance cost is a count (e.g., size, number) of entities (e.g., entries) of one of the entity data structures that is to be joined first. In some embodiments, the performance cost is a number of entities (e.g., number of entity data structures) that are to be joined in total. For example, if entity table A includes (e.g., lists, is associated with) X entities, and entity table B and C each include Y entities, in which X>Y, joining A first results in joining X+Y+Y entities and filtering out X-Y entities after. However, joining either B or C first results in joining Y+Y+Y entities because X-Y entities of A can be filtered out before joining A. In some embodiments, the first performance cost is a combination of a performance metric and a count of entities. For example, 50% of the cost is determined by a performance metric, 50% by a count of entities, and one of the costs is scaled to the other.

The processor determines whether a second join order having a second performance cost lower than the first performance cost (at operation 330). For example, the second join order may be joining entity data tables B and C first, and then joining the entity data table B-C and the entity table A. The second performance cost may be based on a performance metric of one or more entities or count of entities in one or more entity data structures, e.g., to be joined first using the second join order.

If the processor determines that the performance cost of the second join order is not lower than (e.g., exceeds) the performance cost of the first join order, the processor selects the first join order (340). If the processor determines that the performance cost of the second join order is lower than the performance cost of the first join order, the processor selects the second join order (350). The processor may determine that the performance cost is lower for the second join order because (a) the contention is likely to be reduced or go away before sending a request to the contentious node, during the time the other entity data structures are being joined, and/or (b) less resources are used because some entities are filtered out before the joins. In the example, the processor determines that the latency of requesting a join from node D using the second join order is less than 1 second, which is less than the latency of the first join order.

The processor joins the plurality of entity data structures using the second join order (at operation 360) using the selected join order. The processor sends the joined plurality of entity data structures (at operation 370). In some embodiments, the processor sends the joined plurality of entity data structures (e.g., joined tables) to the UI. In some embodiments, the joined tables are rendered on the UI. In some embodiments, the joined tables are used to troubleshoot an issue and perform a remediation action. For example, a user or a component of the virtual computing system 100 determines, based on the joined tables, that a disk is failing. In response to inputs from the user, the UI sends a request for a new disk to replace the failing disk. Advantageously, by using the second join order, the performance cost (e.g., a number of entities joined, the latency, and/or the resource usage) is decreased, thereby improving the user experience, e.g., in obtaining the joined tables and in performing the remediation action.

In some embodiments, the first entity data structure includes a first entity (e.g., VM1) and a second entity (e.g., VM2) and wherein the first entity and the second entity are in different geographic locations of a distributed, hyper-converged data center. In some embodiments, the first entity data structure includes a first entity of a first entity type (e.g., VMs), the second entity data structure is a second entity of a second entity type (e.g., clusters), and the third entity data structure is a third entity of a third entity type (e.g., categories).

Referring now to FIG. 4, a flowchart of an example method 400 is illustrated, in accordance with some embodiments of the present disclosure. The method 400 may be implemented using, or performed by, the virtual computing system 100, one or more components of the virtual computing system 100, or a processor associated with the virtual computing system 100 or the one or more components of the virtual computing system 100. Additional, fewer, or different operations may be performed in the method 400 depending on the embodiment. One or more operations of the method 400 may be combined with one or more operations of the method 300.

At a first time, a processor (e.g., the query execution service 128) copies first data having a first level of data consistency (e.g., strongly consistent, higher consistency guarantee) from a first data location (e.g., one of the one or more disks 112) to a second data location (e.g., the cache 130) (at operation 410). In some embodiments, the first level of data consistency is greater than a first consistency threshold. The level of consistency may be described as a probability that the data is new/fresh/the latest/not stale. For example, the probability that the first data is new data is greater than a first threshold. The second data location may have a second level of data consistency (e.g., eventually consistent, lower consistency guarantee). For example, the second level of data consistency is less than a second consistency threshold

At a second time, the processor receives a request for the data (at operation 420). The request may be from a UI (the UI 124). The second time is after the first time. The processor determines whether accessing the first data incurs a performance cost greater than a predetermined threshold (at operation 430). For example, the performance cost is a latency or a CPU usage greater than the predetermined threshold. The performance cost being greater than the predetermined threshold may indicate that there is contention at the first data location.

If the processor determines that the performance cost is greater than the predetermined threshold, the processor determines whether a user permits, e.g., instructions, policies, or patterns (e.g., detected by heuristics or machine learning) indicate that a user permits, accessing the copied data (at operation 440). The user is using, or associated with, the UI. The copied data has a second level of data consistency lower than the first level of data consistency. For example, the probability that the copied data is new is lower than the probability that the first data is new. If the processor determines that the user does not permit accessing the copied data, then the processor accesses the first data (at operation 450). If the processor determines that (a) the performance cost is not greater than (e.g., less than) the predetermined threshold or (b) the user permits accessing the copied data, then the processor accesses the copied data (at operation 460). The processor sends the accessed data (at operation 470). The processor may send the data to the UI. Advantageously, by accepting hints from the user, the method 400 adapts to the particular user preference and/or use case. Additionally, by accessing the copied data, the performance cost (e.g., the latency) is decreased, thereby improving the user experience.

Referring now to FIG. 5, a flowchart of an example method 500 is illustrated, in accordance with some embodiments of the present disclosure. The method 500 may be implemented using, or performed by, the virtual computing system 100, one or more components of the virtual computing system 100, or a processor associated with the virtual computing system 100 or the one or more components of the virtual computing system 100. Additional, fewer, or different operations may be performed in the method 500 depending on the embodiment. One or more operations of the method 500 may be combined with one or more operations of one or more of the methods 300 and 400.

The processor (e.g., the alert execution service) receives an indication (e.g., prediction) of an alert (at operation 510), e.g., that a component of a hyper-converged system is failing or is performing sub-optimally. In some embodiments, the processor determines/identifies the indication of the alert based on heuristics or machine learning of data (e.g., historical data). The processor determines data associated with/relevant to an entity to which the alert would be directed (at operation 520). In some embodiments, because the data is associated with/relevant to the entity, there is some probability that a user or component will request to access the data if the alert is raised. At a first time, the processor copies the data from a first data location (e.g., one of the one or more disks 112) to a second data location (e.g., the cache 130) (at operation 530). At a second time, the processor receives the alert or a request for data of the entity (at operation 540). The alert may be from another component of the virtual computing system 100 such as the alert service 132. The request may be from the UI (e.g., the UI 124). In response to the alert or request, the processor may access the data from the second data location (at operation 550). The processor sends the accessed data (at operation 560). The processor may send the data to the UI. Advantageously, by accessing the data from the second data location, the latency is decreased, thereby improving the user experience.

The herein described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable,” to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.

With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.

It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to disclosures containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should typically be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should typically be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, typically means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.” Further, unless otherwise noted, the use of the words “approximate,” “about,” “around,” “substantially,” etc., mean plus or minus ten percent.

The foregoing description of illustrative embodiments has been presented for purposes of illustration and of description. It is not intended to be exhaustive or limiting with respect to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosed embodiments. It is intended that the scope of the disclosure be defined by the claims appended hereto and their equivalents.

INVENTORS:

Liu, Cong, Shukla, Himanshu, Nagpal, Abhinay, Kumar, Sourav

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent

Priority

Assignee

Title

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
10003650,	Dec 07 2011	JPMORGAN CHASE BANK, N A	System and method of implementing an object storage infrastructure for cloud-based services
10095549,	Sep 29 2015	Amazon Technologies, Inc	Ownership transfer account service in a virtual computing environment
10120902,	Feb 20 2014	Microsoft Technology Licensing, LLC	Apparatus and method for processing distributed relational algebra operators in a distributed database
10152428,	Jul 13 2017	EMC IP HOLDING COMPANY LLC	Virtual memory service levels
10176225,	Nov 14 2011	GOOGLE LLC	Data processing service
10296255,	Dec 16 2015	EMC IP HOLDING COMPANY LLC	Data migration techniques
10380078,	Dec 15 2011	ARCTERA US LLC	Dynamic storage tiering in a virtual environment
10409837,	Dec 22 2015	Uber Technologies, Inc	Asynchronous notifications for a datastore of a distributed system
10528262,	Jul 26 2012	EMC IP HOLDING COMPANY LLC	Replication-based federation of scalable data across multiple sites
10565230,	Jul 31 2015	NetApp, Inc.; NetApp, Inc	Technique for preserving efficiency for replication between clusters of a network
10592495,	Sep 11 2015	Amazon Technologies, Inc	Function-based object queries
10659520,	Jun 30 2015	Amazon Technologies, Inc	Virtual disk importation
10691464,	Jan 18 2019	QUADRIC IO, INC	Systems and methods for virtually partitioning a machine perception and dense algorithm integrated circuit
10725826,	Jun 21 2017	Amazon Technologies, Inc	Serializing duration-limited task executions in an on demand code execution system
10740302,	Mar 02 2012	NetApp, Inc.	Dynamic update to views of a file system backed by object storage
10747752,	Oct 23 2015	Oracle International Corporation	Space management for transactional consistency of in-memory objects on a standby database
10802975,	Jul 20 2018	EMC IP HOLDING COMPANY LLC	In-memory dataflow execution with dynamic placement of cache operations
10915497,	Jul 31 2017	EMC IP HOLDING COMPANY LLC	Multi-tier storage system with controllable relocation of files from file system tier to cloud-based object storage tier
11086545,	Mar 23 2018	TINTRI BY DDN, INC	Optimizing a storage system snapshot restore by efficiently finding duplicate data
11099938,	Jul 31 2018	VMware LLC	System and method for creating linked clones of storage objects with surface snapshots
5802309,	Jun 12 1996	Hewlett Packard Enterprise Development LP	Method for encoding SNMP summary objects
6209128,	Jun 05 1998	International Business Machines Corporation	Apparatus and method for providing access to multiple object versions
6775673,	Dec 19 2001	Hewlett Packard Enterprise Development LP	Logical volume-level migration in a partition-based distributed file system
7260563,	Oct 08 2003	TERADATA US, INC	Efficient costing for inclusion merge join
7395279,	Nov 17 2003	DROPBOX, INC	System and method for achieving different levels of data consistency
7461912,	Oct 24 2002	Seiko Epson Corporation	Device manufacturing apparatus, device manufacturing method, and electronic equipment
7653668,	Nov 23 2005	ACQUIOM AGENCY SERVICES LLC, AS ASSIGNEE	Fault tolerant multi-stage data replication with relaxed coherency guarantees
7685109,	Dec 29 2005	Amazon Technologies, Inc	Method and apparatus for data partitioning and replication in a searchable data service
7721044,	Oct 20 2005	AMZETTA TECHNOLOGIES, LLC,	Expanding the storage capacity of a virtualized data storage system
8019732,	Aug 08 2008	Amazon Technologies, Inc	Managing access of multiple executing programs to non-local block data storage
8166128,	Feb 28 2003	Oracle America, Inc	Systems and methods for dynamically updating a virtual volume in a storage virtualization environment
8250033,	Sep 29 2008	EMC IP HOLDING COMPANY LLC	Replication of a data set using differential snapshots
8311859,	May 18 2009	AMADEUS S A S	Method and system for determining an optimal low fare for a trip
8312027,	Nov 26 2008	MICRO FOCUS LLC	Modular query optimizer
8352424,	Feb 09 2010	GOOGLE LLC	System and method for managing replicas of objects in a distributed storage system
8549518,	Aug 10 2011	Nutanix, Inc	Method and system for implementing a maintenanece service for managing I/O and storage for virtualization environment
8554724,	Feb 09 2010	GOOGLE LLC	Method and system for efficiently replicating data in non-relational databases
8601473,	Aug 10 2011	Nutanix, Inc	Architecture for managing I/O and storage for a virtualization environment
8683112,	Apr 21 2009	GOOGLE LLC	Asynchronous distributed object uploading for replicated content addressable storage clusters
8799222,	Dec 17 2010	COHESITY, INC	Host based software block level replication using metadata indicating changed data objects at source and secondary nodes
8849759,	Jan 13 2012	NEXENTA BY DDN, INC	Unified local storage supporting file and cloud object access
8850130,	Aug 10 2011	Nutanix, Inc	Metadata for managing I/O and storage for a virtualization
8863124,	Aug 10 2011	Nutanix, Inc	Architecture for managing I/O and storage for a virtualization environment
8930693,	Mar 08 2011	CITIBANK, N A , AS COLLATERAL AGENT	Cluster federation and trust
8997088,	Nov 02 2012	PALO ALTO NETWORKS, INC	Methods and systems for automated deployment of software applications on heterogeneous cloud environments
9003335,	Sep 06 2011	LG Electronics Inc.	Mobile terminal and method for providing user interface thereof
9009106,	Aug 10 2011	Nutanix, Inc	Method and system for implementing writable snapshots in a virtualized storage environment
9043372,	Dec 08 2009	NetApp, Inc	Metadata subsystem for a distributed object store in a network storage system
9043567,	Mar 28 2012	NetApp, Inc.	Methods and systems for replicating an expandable storage volume
9052942,	Dec 14 2012	Amazon Technologies, Inc	Storage object deletion job management
9069708,	Dec 27 2012	Nutanix, Inc	Method and system for implementing consistency groups with virtual machines
9069983,	Apr 29 2009	CA, INC	Method and apparatus for protecting sensitive information from disclosure through virtual machines files
9110882,	May 14 2010	Amazon Technologies, Inc	Extracting structured knowledge from unstructured text
9256498,	Jun 02 2015	Hewlett Packard Enterprise Development LP	System and method for generating backups of a protected system from a recovery system
9336132,	Feb 06 2012	Nutanix, Inc	Method and system for implementing a distributed operations log
9342253,	Aug 23 2013	Nutanix, Inc	Method and system for implementing performance tier de-duplication in a virtualization environment
9350623,	Sep 07 2010	MICRO FOCUS LLC	System and method for automated deployment of multi-component computer environment
9405806,	Dec 02 2013	RAKUTEN GROUP, INC	Systems and methods of modeling object networks
9652265,	Aug 10 2011	Nutanix, Inc	Architecture for managing I/O and storage for a virtualization environment with multiple hypervisor types
9705970,	Jul 31 2014	International Business Machines Corporation	System of geographic migration of workloads between private and public clouds
9747287,	Aug 10 2011	Nutanix, Inc	Method and system for managing metadata for a virtualization environment
9772866,	Jul 17 2012	Nutanix, Inc	Architecture for implementing a virtualization environment and appliance
9805054,	Nov 14 2011	Canadian Imperial Bank of Commerce	Managing a global namespace for a distributed filesystem
20010034733,
20020065776,
20020078065,
20030023587,
20030145310,
20030172094,
20030191745,
20040186826,
20050273571,
20060041661,
20060047636,
20060080443,
20060080646,
20060161704,
20070088744,
20080034307,
20090171697,
20090327621,
20100042673,
20100050173,
20100293174,
20110082962,
20110137966,
20110185355,
20110213884,
20110258297,
20120096052,
20120096205,
20120210095,
20120293886,
20120331065,
20120331243,
20130054523,
20130103884,
20130198472,
20130246431,
20130332608,
20140095459,
20140279838,
20140282626,
20140339117,
20140379840,
20150012571,
20150046586,
20150046600,
20150079966,
20150208985,
20150254325,
20150378767,
20160048408,
20160092326,
20160117226,
20160162547,
20160188407,
20160207673,
20160275125,
20160306643,
20170075909,
20170091235,
20170109421,
20170235764,
20170235818,
20170242746,
20170344575,
20170351450,
20180165161,
20180205791,
20180292999,
20180349463,
20190004863,
20190050296,
20190102256,
20190196885,
20190207929,
20190213175,
20190213179,
20190227713,
20190243547,
20190286465,
20190324874,
20190354544,
20190370043,
20190370362,
20190384678,
20190391843,
20200004570,
20200036787,
20200042364,
20200104222,
20200117637,
20200195743,
20200201724,
20200250044,
20200310859,
20200310980,
20200311116,
20200314174,
20200319909,
20200387510,
20200394078,
20210026661,
20210034350,
20210064585,
20210072917,
20210124651,
20210181962,
20210294499,
20210406224,
CN113406169,
EP4006737,
WO2020146043,
WO2021119546,
WO2021232109,

ASSIGNMENT RECORDS Assignment records on the USPTO

//////

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Jan 14 2021	NAGPAL, ABHINAY	Nutanix, Inc	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	055213	0089	pdf
Jan 14 2021	KUMAR, SOURAV	Nutanix, Inc	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	055213	0089	pdf
Jan 15 2021	SHUKLA, HIMANSHU	Nutanix, Inc	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	055213	0089	pdf
Feb 09 2021	LIU, CONG	Nutanix, Inc	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	055213	0089	pdf
Feb 10 2021		Nutanix, Inc.	(assignment on the face of the patent)
Feb 12 2025	Nutanix, Inc	BANK OF AMERICA, N A , AS COLLATERAL AGENT	SECURITY INTEREST SEE DOCUMENT FOR DETAILS	070206	0463	pdf

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events
Feb 10 2021	BIG: Entity status set to Undiscounted (note the period is included in the code).

Date	Maintenance Schedule
Feb 13 2027	4 years fee payment window open
Aug 13 2027	6 months grace period start (w surcharge)
Feb 13 2028	patent expiry (for year 4)
Feb 13 2030	2 years to revive unintentionally abandoned end. (for year 4)
Feb 13 2031	8 years fee payment window open
Aug 13 2031	6 months grace period start (w surcharge)
Feb 13 2032	patent expiry (for year 8)
Feb 13 2034	2 years to revive unintentionally abandoned end. (for year 8)
Feb 13 2035	12 years fee payment window open
Aug 13 2035	6 months grace period start (w surcharge)
Feb 13 2036	patent expiry (for year 12)
Feb 13 2038	2 years to revive unintentionally abandoned end. (for year 12)