A method, computer program product and apparatus provide an improved data structure for storing key-value pairs. The data structure comprises six arrays. The method, computer program product and apparatus provide for efficient searching, adding, removal, and iteration of elements. The data structure utilizes a scaled hash code and may store multiple values associated with a same scaled hash code. The required memory is allocated at the time of instantiation, resulting in improved performance. An insertion time of a new key-value pair is a linear function of the total number of key-value pairs.
|
15. A method comprising:
receiving an indication of a key for which to identify a corresponding value, wherein the key is an element associated with a key index of a first array;
with a processor, calculating a scaled hash code of the key, wherein the scaled hash code is calculated based on at least a maximum number of key value pairs to be stored;
accessing a second array to identify a third array start index based on the calculated scaled hash code;
traversing a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index;
accessing a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array;
traversing a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array; and
causing the corresponding value to be returned in response to a request for the value,
wherein the arrays are of primitive types and are allocated upon instantiation.
20. An apparatus comprising:
means for receiving an indication of a key for which to identify a corresponding value, wherein the key is an element associated with a key index of a first array;
means for calculating a scaled hash code of the key, wherein the scaled hash code is calculated based on at least a maximum number of key value pairs to be stored;
means for accessing a second array to identify a third array start index based on the calculated scaled hash code;
means for traversing a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index;
means for accessing a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array;
means for traversing a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array; and
means for causing the corresponding value to be returned in response to a request for the value,
wherein the arrays are of primitive types and are allocated upon instantiation.
8. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the processor, cause the apparatus to at least:
receive an indication of a key for which to identify a corresponding value, wherein the key is an element associated with a key index of a first array;
calculate a scaled hash code of the key, wherein the scaled hash code is calculated based on at least a maximum number of key value pairs to be stored;
access a second array to identify a third array start index based on the calculated scaled hash code;
traverse a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index;
access a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array;
traverse a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array; and
cause the corresponding value to be returned in response to a request for the value,
wherein the arrays are of primitive types and are allocated upon instantiation.
1. A computer program product comprising at least one non-transitory computer-readable storage medium having computer-executable program code instructions stored therein, the computer-executable program code instructions comprising program code instructions to:
receive an indication of a key for which to identify a corresponding value, wherein the key is an element associated with a key index of a first array;
calculate a scaled hash code of the key, wherein the scaled hash code is calculated based on at least a maximum number of key value pairs to be stored;
access a second array to identify a third array start index based on the calculated scaled hash code;
traverse a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index;
access a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array;
traverse a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array; and
cause the corresponding value to be returned in response to a request for the value,
wherein the arrays are of primitive types and are allocated upon instantiation.
2. The computer program product according to
3. The computer program product according to
4. The computer program product according to
5. The computer program product according to
receive an indication of a new key-value pair to be added;
add a new value to the fifth array in a position based on a freeValuePosition pointer; and
update any of the first array, second array, third array, fourth array, or sixth array accordingly.
6. The computer program product according to
receive an indication of a specified key for which the specified key and associated values of the specified key are to be removed;
and
update any of the first array, second array, third array, fourth array, fifth array or sixth array such that a search for the specified key returns no results.
7. The computer program product according to
9. The apparatus according to
10. The apparatus according to
11. The apparatus according to
12. The apparatus according to
receive an indication of a new key-value pair to be added;
add a new value to the fifth array in a position based on a freeValuePosition pointer; and
update any of the first array, second array, third array, fourth array, or sixth array accordingly.
13. The apparatus according to
receive an indication of a specified key for which the specified key and associated values of the specified key are to be removed; and
update any of the first array, second array, third array, fourth array, fifth array or sixth array such that a search for the specified key returns no results.
14. The apparatus according to
16. The method according to
17. The method according to
18. The method according to
19. The method according to
receiving an indication of a new key-value pair to be added;
adding a new value to the fifth array in a position based on a freeValuePosition pointer; and
updating any of the first array, second array, third array, fourth array, or sixth array accordingly.
|
An example embodiment of the present invention relates generally to data structures, and more particularly, to a method, apparatus and computer program product for providing an improved data structure for storing key-value pairs.
The development of modern computing technology has led to vast amounts of stored data. In some examples, modern systems require maintaining hundreds of millions, or even more, data values. Retrieval of the desired data may require searching for a desired key among the data and may require a large volume of operations to determine the associated value. Typical operations include searching among the data values and determining associations between keys and related values. In this regard, a key-value pair may include a key by which a requesting service or method may request an associated value. Depending on the data structure, the data is searched and processed such that an associated value is returned to the requesting service or method.
Java Multimap is an example data structure used to maintain large amounts of key-value pairs. The Multimap maps each key ki from the set (k1, . . . , kn) to a sequence of values [v1
Furthermore, many implementations of data structures are based on mappings of keys to dynamic arrays of values, requiring dynamic allocations of objects during the addition and removal of elements. The final sizes of sets of values associated with keys in such data structures may be unknown at the time of creation or instantiation, which may cause further performance degradation with respect to reallocations, particularly when more values are added.
In Java, allocations in MultiMap are managed by a Garbage Collector (GC) and when the amount of allocations becomes large, the GC must track many objects. For example, each time a new object is added or removed, the GC may instantiate and/or track numerous objects, resulting in the performance degradation. Furthermore, memory fragmentation may also cause reallocations of objects within the heap, which may add even more time and usage of computing resources for the completion of operations.
A method, apparatus, and computer program product are therefore provided for providing an improved data structure for storing key-value pairs. The data structure described herein comprises at least six arrays, and will be referred to herein as a six-array multimap.
A computer program product is provided. The computer program product includes at least one non-transitory computer-readable storage medium having computer-executable program code instructions stored therein, with the computer-executable program code instructions comprising program code instructions to receive an indication of a key for which to identify a corresponding value. The key is an element associated with a key index of a first array. The computer program product calculates a scaled hash code of the key, and accesses a second array to identify a third array start index based on the calculated scaled hash code. The computer program code traverses a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index. The computer program code accesses a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array, and traverse a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array.
In some examples, each key index of the plurality of traversed key indices in the third array has an associated key having a same scaled hash code as the calculated scaled hash code, and the traversal continues until an instance that an arbitrary key accessed in the first array based on the traversed key index equals the key.
In some examples, each value index of the plurality of traversed value indices of the sixth array have associated values in the fifth array that are associated with the key.
In some embodiments, a number of elements in the first array is one greater than a number of unique keys, and a number of elements in the fifth array is one greater than the number of unique values.
In some embodiments, the computer-executable program code instructions further comprise program code instructions to receive an indication of a new key-value pair to be added and add a new value to the fifth array in a position based on a freeValuePosition pointer. The computer program code updates any of the first array, second array, third array, fourth array, or sixth array accordingly.
In some embodiments, the computer-executable program code instructions further comprise program code instructions to receive an indication of a specified key for which the specified key and associated values of the specified key are to be removed, and update any of the first array, second array, third array, fourth array, fifth array or sixth array such that a search for the specified key returns no results.
In some examples, an insertion time of a new key-value pair is a linear function of a total number of key-value pairs.
An apparatus is also provided. The apparatus includes at least one processor and at least one memory including computer program code, with the at least one memory and the computer program code configured to, with the processor, cause the apparatus to at least receive an indication of a key for which to identify a corresponding value, wherein the key is an element associated with a key index of a first array. The apparatus is further caused to calculate a scaled hash code of the key and access a second array to identify a third array start index based on the calculated scaled hash code. The apparatus is caused to traverse a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index, access a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array, and traverse a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array.
In some examples, the at least one memory and the computer program code are further configured to, with the processor, further cause the apparatus to at least receive an indication of a new key-value pair to be added, and add a new value to the fifth array in a position based on a freeValuePosition pointer. The apparatus is further caused to update any of the first array, second array, third array, fourth array, or sixth array accordingly.
In some examples, the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to at least receive an indication of a key-value pair to be stored, remove at least one value from the fifth array, and update any of the first array, second array, third array, fourth array, or sixth array accordingly.
A method is also provided, including receiving an indication of a key for which to identify a corresponding value. The key is an element associated with a key index of a first array. The method further includes, with a processor, calculating a scaled hash code of the key, and accessing a second array to identify a third array start index based on the calculated scaled hash code. The method further includes traversing a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index and accessing a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array. The method additionally includes traversing a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array.
In some examples, the method includes receiving an indication of a new key-value pair to be added, adding a new value to the fifth array in a position based on a freeValuePosition pointer, and updating any of the first array, second array, third array, fourth array, or sixth array accordingly.
In some embodiments, the method further includes receiving an indication of a key-value pair to be stored, removing at least one value from the fifth array, and updating any of the first array, second array, third array, fourth array, or sixth array accordingly.
An apparatus is provided, including means for receiving an indication of a key for which to identify a corresponding value. The key is an element associated with a key index of a first array. The apparatus further includes means for calculating a scaled hash code of the key, and means for accessing a second array to identify a third array start index based on the calculated scaled hash code. The apparatus further includes means for traversing a plurality of key indices in the third array to determine the key index with which the key is associated, wherein the traversal begins from the identified third array start index and accessing a fourth array based on the determined key index to identify a last value index of a list of values in a fifth array. The apparatus additionally includes means for traversing a plurality of value indices in a sixth array, beginning at the identified last value index, to determine an actual value index by which to access the corresponding value in the fifth array.
Having thus described certain example embodiments of the present invention in general terms, reference will hereinafter be made to the accompanying drawings which are not necessarily drawn to scale, and wherein:
Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms “data,” “content,” “information,” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
Additionally, as used herein, the term ‘circuitry’ refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term ‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term ‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
As defined herein, a “computer-readable storage medium,” which refers to a physical storage medium (e.g., volatile or non-volatile memory device), may be differentiated from a “computer-readable transmission medium,” which refers to an electromagnetic signal.
A method, apparatus and computer program product are provided in accordance with an example embodiment of the present invention for providing an improved data structure, namely a six-array multimap. In this regard, a service, method, or computer program product requesting a specified value in the six-array multimap provides a key (for example, as a parameter in a retrieval method) to access the desired associated value. The six-array multimap may provide performance improvements in comparison to other data structures and methods for storing key-value pairs.
Many observations are considered in the implementation of the six-array multimap. First, in many examples, the set of all keys is unknown until all pairs are added to the data structure. Associations (keyi→[v1, v2, . . . , vn
In general, the use of the six-array multimap avoids allocations of new objects during operations such as add, remove, search, and iterate. The six-array multimap may be implemented in Java, for example. The six-arrays may be arrays of primitive types, such as Long, and may be allocated when the six-array multimap is instantiated. Further operations may be performed using the six arrays and may not require any new allocations and/or memory fragmentation. The use of primitive data types provides improved performance because population of non-primitive array elements requires dynamic allocation and results in detrimental impacts to performance. While referred to throughout as a six-array multimap, it will be appreciated that in some examples, embodiments may include additional arrays or data structures.
The apparatus may include, be associated with or otherwise in communication with a processor 24 and a memory 26. In some examples, a user interface 28 and/or communications interface 30 may be optionally included. In some embodiments, the processor (and/or co-processors or any other processing circuitry assisting or otherwise associated with the processor) may be in communication with the memory device via a bus for passing information among components of the apparatus. The memory device may be non-transitory and may include, for example, one or more volatile and/or non-volatile memories. In other words, for example, the memory device may be an electronic storage device (for example, a computer readable storage medium) comprising gates configured to store data (for example, bits) that may be retrievable by a machine (for example, a computing device like the processor). The memory device may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present invention. For example, the memory device could be configured to buffer input data for processing by the processor. Additionally or alternatively, the memory device could be configured to store the six arrays, as well as instructions for execution by the processor. In this regard, processor 24 may perform operations provided by memory 26 to retrieve, add, and delete values from the six-array multimap.
As noted above, the apparatus 20 may be embodied by a computing device. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (for example, chips) including materials, components and/or wires on a structural assembly (for example, a circuit board). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single “system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
The processor 24 may be embodied in a number of different ways. For example, the processor may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. As such, in some embodiments, the processor may include one or more processing cores configured to perform independently. A multi-core processor may enable multiprocessing within a single physical package. Additionally or alternatively, the processor may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
In an example embodiment, the processor 24 may be configured to execute instructions stored in the memory 26 or otherwise accessible to the processor. Alternatively or additionally, the processor may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor may represent an entity (for example, physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor is embodied as an ASIC, FPGA or the like, the processor may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor is embodied as an executor of software instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor may be a processor of a specific device (for example, the computing device) configured to employ an embodiment of the present invention by further configuration of the processor by instructions for performing the algorithms and/or operations described herein. The processor may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor.
The apparatus 20 of an example embodiment may also include or otherwise be in communication with a user interface 28. The user interface may include a touch screen display, a keyboard, a mouse, a joystick or other input/output mechanisms. In some embodiments, the user interface, such as a display, speakers, or the like, may also be configured to provide output to the user. In this example embodiment, the processor 24 may comprise user interface circuitry configured to control at least some functions of one or more input/output mechanisms. The processor and/or user interface circuitry comprising the processor may be configured to control one or more functions of one or more input/output mechanisms through computer program instructions (for example, software and/or firmware) stored on a memory accessible to the processor (for example, memory 26, and/or the like).
The apparatus 20 of an example embodiment may also optionally include a communications interface 30 that may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from/to other electronic devices in communication with the apparatus. In this regard, the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s). In some environments, the communication interface may alternatively or also support wired communication. For example, the communications interface 30 may be configured to provide a value based on a lookup-up index to a requesting service or system.
Referring now to
In general, the first array, keys, is an array of keys. The second array, startKeyIndex, is an array of start indices of lists of key indices of keys having the same hash code. Such lists are provided in the third array, nextKeyIndex. In this regard, the third array may be considered a linked list of elements that are the key indices and/or that may reference other elements in the third array. The elements of the third array may be linked in a manner that enables traversal of the lists of key indices having an associated key with the same hash code.
The fourth array, lastValueIndex, provides elements that are indices by which to reference the last value index in a list of value indices. The fifth array, values, comprises elements that are the values requested by calling services, methods, and/or the like based on an associated key. The sixth array, prevValueIndex, may be considered a linked list of elements that are value indices and/or that may reference other elements in the sixth array. The sixth array may therefore be traversed in a manner so as to access a list of value indices of values having the same associated key. Use of the six-array multimap for storage and retrieval of key-value pairs is described in detail hereinafter.
Having now briefly described the six arrays,
As shown by operation 400, apparatus 20 may include means, such as processor 24, communications interface 30, and/or user interface 28, for receiving an indication of a key for which to identify a corresponding value. In this regard, a requesting service or method, may request a corresponding value based on the key, which may be provided as a parameter in a request, for example. As another example, when apparatus 20 is implemented as a user device, a user may enter a key via a user interface 28 to access the corresponding value. The key is an element associated with a key index of the first array. However, at the time of the request, the key index is unknown to the processor 24 and must be determined to retrieve the associated value. As described below, operations 410, 420, and 430 are performed while utilizing the first, second, and third arrays to determine the key index of the provided key.
As shown by operation 410, the apparatus 20 may include means, such as processor 24 and/or memory 26, for calculating a scaled hash code associated with the key.
In the example of
As shown in operation 420 of
As shown in operation 430 of
The following example algorithm or pseudo code provided in Table 1 illustrates the traversal operation 430 with respect to
TABLE 1
Traversal of key indices
Operation
Notes
1.1
Repeat 1.2-1.4 until true or false is
returned
1.2
If keyIndex == 0 then return false
no value will be found - end
1.3
if keys[keyIndex] equals K then
the keyIndex is identified - end
return true
1.4
else set keyIndex =
continue traversal
nextKeyIndex[keyIndex]
In
Continuing to operation 440 of
Continuing to operation 450 of
In general, the traversal of value indices is performed using three functions provided in Table 2:
TABLE 2
Subroutines of traversal of value indices
Operation
Description
2.1
getLastValueIndex(K)
returns last value index or 0 if K is not
among keys
2.2
getValueByIndex(index)
returns the value by specified index
2.3
getPrevValueIndex(index)
returns the index of previous value or 0
if it is first value for key K
In general, Table 3 provides an example of how the value indices are traversed:
TABLE 3
Traversal of value indices
3.1
while (index != 0) {
3.2
value = getValueByIndex(index);
3.3
index = getPrevValueIndex(index); }
Note that the values are stored such that the values are stored in a last in, first out order, and therefore traversed in reverse order, which provides improved efficiency for insertions. When a new value is inserted into the six-array multimap, the new value may be inserted without needing to iterate the whole list of other values for the key.
As operation 450 and/or the above pseudo code applies to the example in
The above description with respect to
Assuming that storage of N keys is required, being mapped to sets of values having a total size of M values, the arrays should have the following minimum sizes as provided in Table 4:
TABLE 4
Array sizes
Array
Description
Size
4.1
First array
array of keys
N + 1 elements
4.2
Second array
array of start indices of list of
2k | k = min (i, 2i > N)
keys
elements.
4.3
Third array
array of indices of next key
N + 1 elements
4.4
Fourth array
array of last indices of list of
N + 1 elements
values
4.5
Fifth array
array of values
M + 1 elements
4.6
Sixth array
array of indices of previous
M + 1 elements
value
Note that in defining the size of the second array, 2k≦2N, the total amount of required memory is linear to N despite a requirement for the length of the second array to be an integer power of 2 (e.g., an exponential expression). The size of the second array may, in some embodiments, be larger than 2N to reduce hash code collisions (e.g., a scenario in which no two distinct keys have the same scaled hash code such as those stored in the third array, the nextKeyIndex array). However, ensuring the size of the second array is less than or equal to 2N may ensure that minimal or minimized memory is used in the allocation of the second array. Given the above minimum sizes, the minimal amount of all elements in the six-array multimap is not greater than 5N+2M+5, resulting in an order (in Big O Notation) of O(M+N).
Furthermore, the total amount Q of memory necessary to store all the elements assuming the size of key type is p, size of value type is q and the index has integer type (size=4) is not greater than:
Q=(N+1)p+(4N+2)*4+(M+1)q+(M+1)*4=(p+16)N+(q+4)M+p+q+12 bytes.
In some examples, such as Java 1.7, array allocation may require more free memory than necessary to fit the array. For example, if 50% more free memory is required, the memory allocation needed for a six-array multimap is:
Q+max((N+1)p/2, (M+1)q/2, 2(M+1), 4N) bytes.
Even further, according to example embodiments, determination of the scaled hash code is also relevant in optimizing performance and guaranteeing the required number of key-value pairs can be stored in the six-array multimap. A scaled hash code algorithm may be stored as computer program code on memory 26, for example, and performed by the processor 24.
The calculated value of scaled hash code forms the index in the second array, the array of start indexes of lists of keys having the same hash code. If a maximum expected number of keys is N, then the scaled index I could be calculated as:
I=hashcode(Key) & mask, where mask=2[log 2N]+1−1 and can be pre-calculated initially (set of 1 . . . 1b for select values from 0 to N). Hashcode( ) is an integer hash function, such as hashcode(key)=(int)(key xor (key/4294967296)). The hash code algorithm may therefore be dependent on a mask value. The mask value is used for scaling and may be calculated once during instantiation of the six-array multimap, and used in calculations to determine a scaled hash code.
For example, presume the maximum number of keys=N, and a length of the second array, startKeyIndex=L=min(E|E=2m, E>N), a hash code function for the six-array multimap may be:
Hashcode(key)=(Key xor (Key shift bits right 32)) and (2m−1). The result will be an integer in [0, L−1].
The apparatus 20 may also comprise means for adding a new key-value to the six-array multimap, according to example embodiments. For example, processor 24 and memory 26, may utilize two or more pointers to indices of the six-array multimap to enable efficient addition of new key-value pairs.
A first pointer, “freeKeyPosition” references an index of the first free cell in the first array, keys, where s next new key will be inserted. A second pointer, “freeValuePosition” references an index of the first free cell in the values where next new value could be inserted.
A general approach to adding a key-value pair (K,V) is provided in Table 5:
TABLE 5
Adding (K, V)
Operation
Notes
5.1
Calculate scaledIndex = H(K)
5.2
Iterate list of keys having same scaled
hash code = H(K)
5.3
If K equals to some existing key from
this list at index keyIndex
5.4
then addValueForExistingKey(keyIndex, V)
more detail provided
in Table 7
5.5
else addNewPair(scaledIndex,
more detail provided
startKeyIndex[scaledIndex], K, V)
in Table 6
The last referenced method above, addNewPair, may be implemented according to the following algorithm or pseudo code in Table 6:
TABLE 6
addNewPair(scaledIndex, oldStartKeyIndex, K, V)
Operation
Notes
6.1
Set startKeyIndex[scaledIndex] =
points to new
freeKeyPosition
added key
6.2
Set keys[freeKeyPosition] = K
add new key
6.3
Set nextKeyIndex[freeKeyPosition] =
link new key to old
oldStartKeyIndex
last key
6.4
Set lastValueIndex[freeKeyPosition] =
points to new added
freeValuePosition
value
6.5
Set values[freeValuePosition] = V
add new value
6.6
Set prevValueIndex[freeKeyPosition] = 0
this is the last value
in list
6.7
Shift freeKeyPosition and freeValuePosition
to next element
TABLE 7
addValueForExistingKey(keyIndex, V)
Operation
Notes
7.1
Set oldValueIndex =
remember old last
lastValueIndex[keyIndex]
value index
7.2
Set prevValueIndex[freeValuePosition] =
link new value to old
oldValueIndex
last value
7.3
Set lastValueIndex[keyIndex] =
points to new added
freeValuePosition
value
7.4
Set values[freeValuePosition] = V
add new value to array of
values
7.5
Shift freeValuePosition to next element
Having now described the addition of a key-value pair, apparatus 20 may also comprise means, such as processor 24 and memory 26, for removing a key and the key's associated values. The apparatus 20 may receive an indication to remove some values based on a specified key. The apparatus 20 may therefore update any of the arrays of the six-array multimap such that a subsequent search for the specified key, or retrieval of values associated with the specified key, returns no results. It will therefore be appreciated that while the terminology ‘remove’ is used to describe updates to the six-array multimap such that the key and its associated values cannot be located based on the provided search or retrieved implementations, the key and/or some of the values may physically remain in the respective first and fifth arrays, but are no longer related to the indices to allow retrieval of the associated values. Therefore, after a successful removal, a retrieve or search for the specified key will return no results, even if the key and/or associated values are still stored in the respective first and fifth arrays. The following Table 8 provides an algorithm or pseudo code for removing the key K and associated values (V1, . . . , Vm) from the six-array multimap
TABLE 8
Remove key K with associated values (V1, . . . , Vm)
Operation
Notes
8.1
Calculate scaledIndex = H(K)
scaled hash
code
8.2
Iterate list of keys having same scaled
hash code = H(K)
8.3
If K equals to some existing key from this list
at index keyIndex then
8.3.1
Remove K from list of keys by
adjusting nextKeyIndex elements
8.3.2
If this list becomes empty then set
startKeyIndex[scaledIndex] = 0
8.3.3
Set nextKeyIndex[keyIndex] = −1
mark given key
as deleted
8.3.4
Set prevValueIndex[. . .] = −1
optional
for all associated values
Following the removal according to the algorithm of Table 8, as illustrated in
The apparatus 20 may additionally comprise means, such as processor 24 and memory 26 to perform additional functions, such as, but not limited to, iterating all keys of the six-array multimap, iterating all values of the six-array multimap, and clearing the six-array multimap.
Table 9 provides a general approach for iterating all keys:
TABLE 9
Iterate all keys
Operation
9.1
for index from 1 to freeKeyPosition − 1:
9.1.1
if nextKeyIndex[index] ≧ 0 then process(keys[index])
In considering the above algorithm to iterate all keys, if keys will never be removed from the six-array multimap, then the check for nextKeyIndex could be skipped.
Table 10 provides a general approach for iterating all values:
TABLE 10
Iterate all values
Operation
10.1
for index from 1 to freeValuePosition − 1:
10.1.1
if prevValueIndex[index] ≧ 0 then
process(values[index])
In considering the above algorithm to iterate all values, if keys will never be removed from the six-array multimap, then the check for prevValueIndex could be skipped.
Table 11 provides a general approach for clearing the six-array multimap:
TABLE 11
Clear six-array multimap
Operation
11.1
for each index from 1 . . . freeKeyPosition −1 set
nextKeyIndex[index] = 0
11.1.1
Set freeKeyPosition =
freeValuePosition = 1
In considering the above algorithm to iterate all keys, it is not necessary to clear the values array. The values will be overwritten by new ones when new keys will be inserted.
The method, apparatus and computer program product provide numerous technical advantages including the conservation of processing resources and the associated power consumption otherwise expended to support larger sized memory allocations and larger order algorithms according to alternative implementations.
The six-array multimap enables add, remove, search, and iterate functionality without dynamically allocating additional memory. The insertion of a new key-value pair has algorithmic complexity O(1) in a case of no hash collision, (e.g., a scenario in which no two distinct keys have the same scaled hash code such as those stored in the third array, the nextKeyIndex array). The search for a key-value pair has an algorithmic complexity O(1) in the case of no hash collision. Iteration of a next value for a given key has algorithmic complexity O(1) in case of no hash collision.
Removal of a key and all associated values has algorithmic complexity O(1) in case of no hash collision and no iteration of all values, or O(M/N) when all values iteration is supported (where M is the number of unique keys, and N is the number of all unique values). The worst case algorithmic complexity is O(M) for operations (M−number of added keys) but is a rare scenario if a scaled hashing function is provided as described herein. The result is improved memory consumption and processing time when compared to alternative implementations, resulting in faster performance because only primitive types are used.
In some example embodiments, the six-array multimap can be expanded on structural types and fixed-length array types for keys and values.
As described above,
Accordingly, blocks of the flowchart support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
In some embodiments, certain ones of the operations above may be modified or further amplified. Furthermore, in some embodiments, additional optional operations may be included, some of which have been described above. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Liu, Hongming, Olshanetckii, Oleg
Patent | Priority | Assignee | Title |
11080251, | Oct 23 2017 | Optimization of memory usage while creating hash table |
Patent | Priority | Assignee | Title |
7424477, | Sep 03 2003 | Oracle America, Inc | Shared synchronized skip-list data structure and technique employing linearizable operations |
7903819, | Sep 07 2007 | R2 SOLUTIONS LLC | Memory efficient storage of large numbers of key value pairs |
8433695, | Jul 02 2010 | Futurewei Technologies, Inc. | System architecture for integrated hierarchical query processing for key/value stores |
20030195873, | |||
20040225865, | |||
20100217953, | |||
20130226931, | |||
20140032527, | |||
20140201247, | |||
20140237159, | |||
20150370794, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 16 2014 | OLSHANETCKII, OLEG | HERE GLOBAL B V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 034545 | /0360 | |
Dec 16 2014 | LIU, HONGMING | HERE GLOBAL B V | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 034545 | /0360 | |
Dec 18 2014 | HERE Global B.V. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Sep 24 2020 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Oct 16 2024 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Date | Maintenance Schedule |
May 02 2020 | 4 years fee payment window open |
Nov 02 2020 | 6 months grace period start (w surcharge) |
May 02 2021 | patent expiry (for year 4) |
May 02 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
May 02 2024 | 8 years fee payment window open |
Nov 02 2024 | 6 months grace period start (w surcharge) |
May 02 2025 | patent expiry (for year 8) |
May 02 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
May 02 2028 | 12 years fee payment window open |
Nov 02 2028 | 6 months grace period start (w surcharge) |
May 02 2029 | patent expiry (for year 12) |
May 02 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |