A configuration management system that uses a data compression method to compress entries in a data set. An entry is selected as a prefix value and prefix compression of the data set is performed. The entry to serve as the prefix value is quickly selected using an iterative approach. In each iteration, subgroups of entries are formed from groups formed in prior iterations based on the values of characters at successive positions in the entries. The approach is readily implemented using data structures represented as lists.
|
9. A configuration management system comprising:
a) a database storing a plurality of artifacts;
b) an index comprising a plurality of entries, each of the plurality of entries having a string of characters associated therewith, with each character having a value and a position in the string, and each entry identifying an artifact in the database; and
c) a computer-readable storage medium storing computer executable instructions for performing a method comprising:
i) forming subgroups containing entries from the plurality of entries in the index based on the value of a character at a first position of the string associated with each entry;
ii) forming further subgroups containing entries from the entries in the subgroups based on the value of a character at a second position in the string associated with each entry;
iii) for each of a plurality of entries:
calculating an indicator based on a number of entries in each subgroup containing the entry; and
storing the indicator in computer memory;
iv) selecting an entry from the plurality of entries as a prefix based on the calculated indicators; and
v) compressing the index using the selected entry.
14. A method of compressing a data set comprising a plurality of elements, each element having a string of characters associated therewith, each character in the string having a value and a position, the method comprising:
operating at least one processor to perform a method comprising:
a) forming a plurality of first groups from the plurality of elements, each first group in the plurality of first groups comprising elements from the plurality of elements having strings that share a common value of a first character in a first position in the string, the common value of the first character in the first position being unique, among the plurality of first groups, to the strings of the elements of said first group;
b) forming, for each first group of at least a portion of the plurality of first groups, a plurality of second groups from the plurality of elements in a respective first group, each second group in the plurality of second groups comprising elements from the plurality of elements in the respective first group having strings that share a common value of a second character in a second position in the string, the common value of the second character in the second position being unique, among the plurality of second groups associated with said respective first group, to the strings of the elements of said second group;
c) for each of the plurality of elements, calculating an indicator, the calculating comprising adding a number of elements in a first group of the plurality of first groups containing the element and a number of elements in a second group of the plurality of second groups containing the element;
d) selecting as a prefix the string associated with the element having the largest calculated indicator;
e) storing the selected prefix; and
f) compressing the data set using the selected prefix.
1. A method of compressing a data set comprising a plurality of entries, each of the plurality of entries having a string of characters associated therewith, with each character having a value and a position in the string, the method comprising:
operating at least one processor to perform a method comprising:
a) forming a list of a plurality of list elements, with each list element having the string of one of the plurality of entries associated therewith;
b) forming a first plurality of sublists from the plurality of list elements with all the list elements in each sublist of the first plurality of sublists having a string associated therewith having a character with the same value in a first position in the string and each sublist of the first plurality of sublists having a number of said list elements associated therewith;
c) forming, for each sublist in at least a portion of the first plurality of sublists, a second plurality of sublists from the list elements in the sublist in the first plurality of sublists, with all the list elements in each sublist of the second plurality of sublists having a string associated therewith having a character with the same value in a second position in the string and each sublist of the second plurality of sublists having a number of said list elements associated therewith;
d) for each of the plurality of entries:
calculating an indicator based at least in part on the number of list elements in the sublist of the first plurality of sublists and the number of list elements in the sublist of the second plurality of sublists in which the entry is associated; and
storing the indicator in computer memory;
e) selecting an entry of the plurality of entries as a prefix based on the calculated indicator; and
f) storing the selected prefix for use in compressing the data set.
2. The method of
3. The method of
4. The method of
a) the values of the characters in the each of the strings associated with the plurality of entries take on one of a predetermined number of values; and
b) the method additionally comprises establishing a second data structure in the computer-readable and computer-writable medium, the second data structure having the predetermined number of second sub-structures, each second sub-structure comprising a field storing a pointer.
5. The method of
6. The method of
7. The method of
8. The method of
10. The configuration management system of
11. The configuration management system of
a) additionally comprising forming a list of a plurality of list elements, with each list element having one of the plurality of entries associated therewith; and
b) wherein, forming subgroups of entries comprises forming a plurality of sublists from the plurality of list elements with all the list elements in each sublist of the first plurality of sublists having an entry associated therewith with a string having a character with the same value in a first position in the string.
12. The configuration management system of
13. The configuration management system of
|
1. Field of Invention
This invention relates generally to information management systems and more particularly to data compression in information management systems.
2. Description of the Related Art
Systems that store large amounts of information are used in many applications. For easily finding and retrieving information stored in such a system, an index is often formed of data stored in the system.
One application of an information management system is in a configuration management system.
Configuration management system 100 includes an index 120. Index 120 is also implemented in the computer storage system. The index includes two portions, an identifier portion 130 and a location portion 150. For each of the entries 1221, 1222 . . . 1226, a value is provided to identify a particular artifact in database 110 and describe where it is stored. For example, entry 1225 contains an identifier value 124 and a location value 126. Controller 170 is a computer that controls storage and retrieval of information from configuration management system 170.
In order to reduce the total amount of storage space required by configuration management system 100, it is known to compress data stored by the system.
As can be seen in the examples of
It would be desirable to have an improved method of compressing data.
The invention relates to a method of selecting a prefix value for compressing records in a database with reduced computational requirements. The method involves processing the entries in the data set on a character-by-character basis. An aggregated savings value is updated for each entry in the data set as each character is processed. The aggregated savings value for each entry is updated by adding a value representative of the number of other entries in the data set that have the same prefix portion, up to and including the character being processed, as that entry. The aggregated data values are used to select the entry that will yield the best compression of the data set if used as a prefix.
This algorithm may be efficiently implemented in a computer program that establishes data structures representing lists and processes them iteratively. These savings values are aggregated to compute an indication of the total compression achievable for each possible prefix value. The prefix value providing the most compression is selected as the prefix value for encoding the database.
In one aspect, the invention relates to a method of compressing a data set comprising a plurality of entries, each of the plurality of entries having a string, the string formed from characters, each character having a value and an ordered position in the string, with one character being the beginning of the string, such that each string has one or more prefix portions, each prefix portion containing a portion of the string having one or more characters at consecutively ordered positions in the string including the character at the beginning of the string. The method comprises determining a plurality of indicators, for each of the plurality of entries, representing the number of entries in the data set with a string having the same prefix portion as the sting of the entry, with each of the plurality of indicators for each of the plurality of entries determined for a prefix portion of the entry having a different number of characters; producing a plurality of aggregated indicators, each aggregated indicator representing a combination of the plurality of indicators determined for one of the plurality of entries; selecting an entry of the plurality of entries based on the aggregated indicators; and compressing the data set using the selected entry.
In another aspect, the invention relates to a method of compressing a data set comprising a plurality of entries, each of the plurality of entries having a string of characters associated therewith, with each character having a value and a position in the string. The method comprises forming a list of a plurality of list elements, with each list element having one of the plurality of entries associated therewith; forming a first plurality of sublists from the plurality of list elements with all the elements in each sublist of the first plurality of sublists having an entry associated therewith with a string having a character with the same value in a first position in the string; forming, for each sublist in at least a portion of the first plurality of sublists, a second plurality of sublists from the list elements in the sublist in the first plurality of sublists, with all the elements in each sublist of the second plurality of sublists having an entry associated therewith with a string having a character with the same value in a second position in the string; and for each of the plurality of entries, computing an indicator based at least in part on the number of elements in the sublist of the first plurality of sublists and the number of elements in the sublist of the second plurality of sublists in which the entry is associated.
In a further aspect, the invention relates to a configuration management system that has a database storing a plurality of artifacts; an index comprising a plurality of entries, each of the plurality of entries having a string of values associated therewith, with each value having a position in the string, and each entry identifying an artifact in the database; and a computer-readable medium storing computer executable instructions. The computer-executable instructions perform a method comprising: forming subgroups of entries from the plurality of entries in the index on the value at a first position of the string associated with each entry; forming further subgroups of entries from the entries in the subgroups based on the value at a second position in the string associated with each entry; for each of a plurality of entries, computing an indicator based on the number of entries in each subgroup to which the entry is a member; selecting an entry from the plurality of entries based on the indicators; and compressing the index using the selected entry.
The invention may be used in connection with a configuration management system. The configuration management system includes an index of artifacts stored in a database. The index is compressed using prefix compression as known in the prior art. However, in contrast to prior art prefix compression approaches, the described embodiment selects a prefix with substantially less computation.
While described in connection with a configuration management system, the compression approach described herein is not so limited and may used in connection with any set of data entries that is to be compressed. Accordingly, compression of a set of entries in a data set is described. In the example embodiments used herein, each entry has a string of characters, with each character having a value. The values may be ASCII representations of letters, numbers and symbols such that the string of characters form file identifiers, such as is stored in the identifier portion 130 (
In the example of
In phase 210, the entries from which the prefix will be selected are represented as group 212. In phase 210, group 212 is divided into subgroups, with each entry in each subgroup having a first character with the same value. For example, all of the entries in group 212 that have the value “0” as the first character are assigned to subgroup 2220. Sub-group 2221 includes all of the entries in group 212 that have the value “1” for the first character. Subgroup 2222 contains all the entries in group 212 that have the value “2” for the first character.
Each of the subgroups is also assigned a value proportionate to the savings that can be achieved by using the character that was used as a criteria for forming the subgroup as a character in the prefix used in compressing the group of entries by prefix compression. For example, group 2220 has a savings value 2240 associated with it, representing the savings possible if the first character in the prefix has a value of “0.” Savings value 2240 is assigned based on the number of members in the subgroup. In this example, the assigned savings value is computed by counting the number of entries in the subgroup and subtracting one. Such a value represents the number of records for which the selected character would not need to be stored in the suffix portion of the compressed file. One is subtracted from this count to represent the fact that the character would be stored once as part of the prefix.
Similarly, subgroup 2221 includes a savings value 2241. Subgroup 2222 contains a savings value 2242.
In phase 220, each of the subgroups is formed into one or more further subgroups. The subgroups are formed based on the value of the next character in each of the entries in the subgroup. For example, the subgroup 2220 was formed based on the value in the first character in each entry. Subgroup 2220 is divided into subgroups based on the second character in each entry. In this example, none of the entries in subgroup 2220 have the same value for their second character. Accordingly, subgroups 2320 and 2321 are formed, each with one entry.
Savings values 2340 and 2341 are likewise computed. Because each of the subgroups 2320 and 2321 contains one entry, the savings values 2340 and 2341 associated with these subgroups have a value of zero.
Likewise, subgroup 2221 is divided into two subgroups 2322 and 2323. The savings values 2342 and 2343 are likewise set to zero because each subgroup contains a single entry.
Subgroup 2222 is also divided into further subgroups based on the second character of each entry. Because two entries in subgroup 2222 have the value “1” as the second character, a subgroup 2324 is formed with those two entries. Subgroup 2324 has a savings value 2344 of one.
Subgroup 2222 also contains an entry with a value of “2” for the second character. This entry is assigned to subgroup 2325. Because subgroup 2325 has a single entry, the savings value 2345 associated with subgroup 2325 has a value of zero.
In phase 230, the subgroups 2320, 2321, 2322, 2323 and 2325 all have a single entry. Accordingly, they cannot be divided into further subgroups. Subgroup 2324 has multiple entries and can be divided into further subgroups.
In this example, subgroup 2324 is divided into further subgroups based on the value of the third character of each entry in the subgroup. In this case, none of the entries have a common character value in the third character position. Accordingly, subgroup 2324 is divided into subgroups 2420 and 2421, each of which has a single entry. Savings values are also assigned to subgroups 2420 and 2421. As in the prior phases, these savings values are assigned based on the number of entries in the subgroup. In this case, both subgroups receive savings values of zero.
In phase 240, once it is determined that no further subgroups may be formed, the savings value associated with each of the entries in the original group 212 may be aggregated to compute a total savings value for each entry. The total savings values 2520 . . . 2526 associated with each entry may be computed by adding the savings values for every subgroup containing that entry. For example, savings value 2520 is the savings value associated with the entry “022” in the initial group 212. Savings value 252 is the sum of the savings value associated with subgroup 2320 and subgroup 2200. As a further example, total savings value 2524 is the savings value associated with the entry “212”.
Total savings value 2524 is computed as the sum of the savings values associated with subgroup 2420, 2324 and 2222.
In phase 250, once a total savings value has been computed for each entry in the original group 212, the entry with the largest savings value is selected as the prefix for compressing the entries in the original group 212 using prefix compression. In this example, total savings values 2524 and 2525 each have a value of “3”, which is the maximum value. Where multiple entries are mapped to the maximum total savings value, any suitable method for selecting between them may be used. For example, the shorter of the two may be selected as the prefix value.
The process shown in
Each subgroup formed in this way includes all the entries in the initial group 212 having the same prefix portion, up to and including the character at the position used in forming that subgroup. Each phase may be viewed as finding the number of entries that share a common prefix portion with successively longer prefix portions used at each phase.
Though the process illustrated in
In addition, each element in the list includes a next element pointer, such as next element pointer 3140, 3141 . . . 3146. In the example of
As elements in the list are processed, pointer 316 keeps track of the element in the list being processed. In the illustration of
For keeping track of which character position of the elements in the list is being used to form subgroups, an index value 320 is provided. Index value 320 may, for example, be implemented as a value stored in a memory location. In the illustration of
The process according to the embodiment of
The embodiment of
The processing according to the embodiment of
In the illustrated embodiment, each of the bin substructures 3600, 3601 . . . 360255 has the same structure. Taking bin substructure 3600 as illustrative, bin substructure 3600 is shown to have associated with it a character value 3520. Each of the bin substructures 3600, 3601 . . . 360255 has a unique character value associated with it. It is not, however, necessary that memory storage be allocated to store the character value. The character value may be inferred by the position of a particular bin substructure within the overall bin data structure 350.
Bin substructure 3600 also has associated with it a member count field 3540. Member count field 3540 may contain a count of the number of entries added to the subgroup associated with the character value 3520. The value in member count field 3540 may be updated as list 310 of entries is processed and the elements of list 310 are added to subgroups. Member count field 3540 could be, but need not be, a physical storage location in computer memory. The value in member count field could, for example, alternatively be determined by counting the number of entries in the subgroup associated with the character value 3520.
Members are assigned to the subgroup associated with character value 3520 by adjusting the pointers joining the elements in list 310. Pointer 3560, which is part of bin structure 3600, points to this sublist to associate it with character value 3520.
Pointers 3561 . . . 356255 may likewise be set to point to sublists formed from the elements in list 310. The sublist are built by processing the elements in list 310 one at a time.
Bin data structure 350 is initialized before any subgroups have been formed. In the first pass of sorting elements of list 310 into bins, each element in list 310 is processed starting with the first element pointed to by pointer 316. Based on the value of the character at the position in the string pointed to by the index value 320, the element is removed from list 310 and added to the list in the appropriate bin substructure 3600, 3601 . . . 360255. For example, in the initialized configuration shown in
As part of this processing, pointer 316 is adjusted to point to the next element of the list 310, which is determined from the value of next element pointer 3140 before it is changed. The value in member count field 3542 is also incremented by one to indicate that an element has been added to the list in bin subgroup 3602.
The next element pointed to by pointer 316 may be processed in a similar manner. In this example, element 3121 is the next item in list 310. The value of the first character in element 3121 is also “2”. Therefore, element 3121 is also added to the list in bin substructure 3602. To add element 3121 to the list, pointer 3562 is modified to point to element 3121. The next element pointer 3141 associated with element 3121 is adjusted to point to the list element previously pointed to by pointer 3562. Likewise, the member count field 3542 is again incremented. Pointer 316 is again adjusted to point to the next element in list 310 by taking on the value of next element pointer 3141 before it is changed. Processing continues in this fashion until all of the elements in list 310 are added to a list associated with the bin substructures 3600, 3601 . . . 360255.
Each of the member count fields 3540, 3541, 3542 . . . contains a value representing the number of entries in the list associated with the data substructure containing that member counter field.
The savings count accumulators 3220, 3221 . . . 3226 are each shown loaded with a value that is one less than the value in the member counter field associated with the bin to which the correlated list element has been assigned. For example, element 3120 has been assigned to the subgroup represented by bin substructure 3602. The member count field for bin substructure 3602 contains a value of three. Accordingly, the savings count accumulator 3220 associated with element 3120 contains a value of two (one less than the value contained in the member count field 3542). The values in the other savings count accumulators 3221, 3222 . . . 3226 are set in a similar fashion.
Before bin data structure 350 is reset, sublists identified by pointers 3560, 3561 . . . 356225 are saved for further processing.
Skip list entries 3240 . . . 3246 and pointer 316 are used to retain the lists associated with each of the bin substructures 3600, 3601 . . . 360255 having more than one member. To set the values of the skip list entries, each bin substructure is processed. The bin substructures may be processed in any order. In this example, they are processed in reverse order of character value so that they are processed from bin substructure 360255 to 3600. Pointer 316 is adjusted to point to the beginning of the sublist associated with the first bin data structure processed with a member count field greater than one. One of the skip list entries 3240, 3241 . . . 3246 is adjusted to point to the beginning of every other sublist having more than one entry.
In this example, the bin data substructure with the highest character value having a list with more than one entry is the list associated with bin substructure 3602. As shown in
The next sublist to be retained is pointed to by pointer 3560. The first element in that sublist is element 3126. Accordingly the skip list entry 3244 associated with element 3124 is made to point to element 3126.
No further bin substructures have lists requiring further processing. Accordingly, the skip list element 3246 associated with element 3126 is adjusted to point to the NULL value.
Once the sublists created in the first phase are retained in the skip list entries 3240 . . . 3246, the sublists may each be processed, one after another, in the same way that list 310 was processed. Index value 320 is shown to point to the second character in the strings that form each of the list elements. The bin data substructure 350 is reset to the state as shown in
As processing of the first sublist begins, the skip list entry 3242 associated with the list element that is pointed to by pointer 316 identifies the next sublist to be processed. Before this value is lost by changing the value in pointer 316 or modifying the value in skip list entry 3242 as a result of processing the first sublist, the value in skip list entry 3242 is saved in temporary pointer 316′.
Processing then begins first with the sublist pointed to by pointer 316. That list may then be processed in the same way that list 310 was processed. Bin data structure 350, having been restored to its initialization state, may be used for processing the sublist pointed to by pointer 316. At the end of processing that sublist, the member count fields 3540, 3541, . . . 354255 are added to the savings count accumulators 3220, 3221, . . . 3226, respectively. New skip list entry values may be stored so that further processing may be performed on each sublist. The new skip list entries do not alter skip list entries for any sublists not yet processed and may be stored in the same memory locations used for skip list entries 3240 . . . 3246.
If processing of the first sublist results in the generation of more sublists with more than one element, those sublists may then be processed in the same way that the first sublist was processed.
Bin data structure 350 may be reset and used to process each sublist in turn. The savings count accumulators 3220 . . . 3226 are not reset, before each sublist is processed so that they will contain accumulated savings values.
Such a processing order lends itself to recursive processing as described below in connection with
Once the first sublist identified by pointer 316 in
Each sublist, and any sublists generated by processing that sublist, are processed in this fashion. After processing of the sublist starting with entry 3126, an attempt to read the next sublist from skip list entry 3246 returns a pointer to the NULL value. Accordingly, when a skip list entry is found to contain a pointer to the NULL value, it may be determined that processing of all sublists has been completed in a particular pass. Once all of the sublists have been processed, the values in savings count accumulators 3220, 3221 . . . 3226 represent the total saving if each entry in the data set is used as a prefix for compression. The entry associated with the largest value may be selected.
At process block 520 an input list is selected for processing. In the first iteration through the process, the original input list is selected for processing. In the example of
The selected list is then processed according to the subprocess 550 shown in
Once the list has been processed according to subprocess 550 (
If there are further sublists to process, the first of the sublists is selected at process block 526. The other sublists are “remembered” at process block 525. In the embodiment of
Once a sublist is selected for further processing, the index value 320 is incremented so that the appropriate character in that sublist will be used to form any further sublists. Each sublist is divided into further sublists using the next character in the string of values associated with the list element. Incrementing index value 320 at process block 524 ensures that the appropriate character in the list elements is used to form further sublists.
Once the appropriate sublist is identified for further processing, that sublist is processed at block 522. As with processing on the initial list, processing at block 522 may divide the sublist into further sublists.
Following the creation of additional sublists, decision block 530 is again executed. If the sublists generated by processing at block 522 require further processing, the process blocks 526, 525 and 524 are again repeated to prepare for processing one of those sublists. The loop formed by process blocks 522, 526, 525 and 524 and decision block 530 is repeated until a sublist is processed and does not generate any sublists that require further processing. This condition is detected at decision block 530 and processing then passes to decision block 531.
At decision block 531, a check is made for other sublists that were generated from the processing step that generated the sublist just processed. In the processing shown in
Where further sublists at the same level as the sublist just processed remain for processing, the next sublist is selected at block 540. One simple way that the identification of sublists may be performed is through the use of a dynamically created data structure, such as a stack. When one sublists is selected from a group of sublists generated by processing a higher level list, a pointer to the next sublist in the group may be pushed on the stack. When processing of a sublist is completed, the pointer to the list element at the top of the stack may be popped from the stack and used to identify the next sublist to process.
Such a dynamically created data structure may be expressly structured as a stack. Alternatively, the processing shown in
The sublist selected at block 540 may then be processed in the same way as the first sublist selected. The processing will continue through the loop formed by process blocks 522, 526, 525 and 524 and decision block 530 until that sublist is fully processed. Once that sublist is processed, the loop formed by decision block 531 and process block 540 is repeated until all sublists formed at the same level are processed.
Once all of the sublists formed at the same level are processed, decision block 532 determines whether sublists were formed at a higher level. As above, levels may be implemented using a function that is recursively called to process every sublist. A function may be called recursively to process a sublist and generate further sublists. When the processing is completed for all the sublists created from one sublist, the instantiation of the function at that level will complete and processing will return to an instantiation of that same function instantiated to process the sublists formed at the next higher level. As described above, the function instantiated at each level may allocate memory to store the next sublist to be processed at that level. Thus, by returning from one instantiation of a function processing sublists to the instantiation that called it, the next sublist to be processed can be identified from the value stored in the memory allocated for the instantiation of the function to which execution returns.
Process block 542 selects a sublist at the next higher level. As indicated above, this sublist may be identified as a result of using a recursively called function. However, any suitable mechanism for identifying sublists for processing at different levels may be used.
At block 544, the index value 320 is decremented to indicate that processing is being performed at the next highest level. A character in the string at a position that depends on the level of the processing is used to sort the list elements into sublists. The process loops back to block 522 and the sublist selected at block 542 is processed in the same fashion as prior sublists.
Sublists are selected and processed in this fashion until all sublists at all levels have been fully processed. Once the processing of sublists is completed, the process continues at block 534. At block 534, the total savings counts for each entry in the list is determined. In the embodiment of
Processing begins at process block 574. At process block 574, bin data structure 350 is initialized. For a bin data structure as represented in
In process block 562, the next item in the list being processed is assigned to a bin. In the described embodiment, there is one bin for each possible character value. Bins are implemented by creating sublists from the elements in the data set being processed. In the example of
At decision block 564, a determination is made whether there are further items in the list being processed. If further list elements remain to be processed, process block 562 is executed for the next list element. Process block 562 is repeatedly executed until all elements in the list are processed. Once all list elements are processed, processing proceeds to process block 566.
Starting at process block 566, the bins are processed to retain the information relating to the groupings formed. At process block 566, one of the bins is selected. The bins may be selected in any order for processing.
At decision block 568, a check is made whether the bin count for the selected bin is greater than one. If the bin count is not greater than one, the group of elements in that bin does not need to be further processed. Accordingly, processing proceeds to decision block 572. Alternatively, if the bin count is greater than one, the group of elements in that bin is further processed at process block 570.
At process block 570, the groupings formed are saved for further processing. In addition, savings counts associated with each bin are updated. The elements forming a group may be recorded as a sublist and the skip list entries may be used to identify each sublist. However, any suitable method may be used to retain this information. In the embodiment of
If more bins remain to be processed, processing loops back from decision block 572 to process block 566. Processing continues in this fashion until all of the bins have been processed.
Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated various alterations, modifications, and improvements will readily occur to those skilled in the art. As one example, a prefix is selected to compress values in a field in an index for a configuration management system. If the index has multiple fields as shown in
As a further, example, the embodiments shown use a full entry from the original data set as a prefix for compression. The size of the prefix may be set to some predetermined number of character positions. In such an embodiment, processing could be stopped after subgroups have been formed based on values in that character position.
The above-described embodiments of the present invention can be implemented in any of numerous ways. For example, the embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions. The one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above. For example, the degenerative nature of the process in which one list is segregated into multiple sublists, each of which is separately processed makes the process amendable to implementation in a multiprocessing environment. Accordingly, any reference to the sequential nature of the process should be taken as a description of a logical flow of the algorithm and not a description of scheduling of tasks that may occur in a multiprocessor environment.
Also, the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or conventional programming or scripting tools, and also may be compiled as executable machine language code.
In this respect, one embodiment of the invention is directed to a computer readable medium (or multiple computer readable media) (e.g., a computer memory, one or more floppy discs, compact discs, optical discs, magnetic tapes, etc.) encoded with one or more programs that, when executed on one or more computers or other processors, perform methods that implement the various embodiments of the invention discussed above. The computer readable medium or media can be transportable, such that the program or programs stored thereon can be loaded onto one or more different computers or other processors to implement various aspects of the present invention as discussed above.
The term “program” is used herein in a generic sense to refer to any type of computer code or set of instructions that can be employed to program a computer or other processor to implement various aspects of the present invention as discussed above. Additionally, it should be appreciated that according to one aspect of this embodiment, one or more computer programs that when executed perform methods of the present invention need not reside on a single computer or processor, but may be distributed in a modular fashion amongst a number of different computers or processors to implement various aspects of the present invention.
Various aspects of the present invention may be used alone, in combination, or in a variety of arrangements not specifically discussed in the embodiments described in the foregoing and is therefore not limited in its application to the details and arrangement of components set forth in the foregoing description or illustrated in the drawings. For example, aspects described in one embodiment may be combined in any manner with aspects described in other embodiment.
Use of ordinal terms such as “first,” “second,” “third,” etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements.
Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having,” “containing,” “involving,” and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.
Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description and drawings are by way of example only.
Patent | Priority | Assignee | Title |
8442954, | Jul 21 2009 | Quantum Corporation | Creating and managing links to deduplication information |
8635195, | May 19 2011 | International Business Machines Corporation | Index compression in a database system |
8988258, | Oct 31 2011 | Hewlett Packard Enterprise Development LP | Hardware compression using common portions of data |
Patent | Priority | Assignee | Title |
4870662, | Dec 01 1987 | MEMOTEC DATACOM INC , A CORP OF NY | System and method for compressing transmitted or stored data |
5001478, | Dec 28 1989 | International Business Machines Corporation; INTERNATIONAL BUSINESS MACHINES CORPORATION, A CORP OF NY | Method of encoding compressed data |
5274805, | Jan 19 1990 | AMALGAMATED SOFTWARE OF NORTH AMERICA, INC , A CORP OF TEXAS | Method of sorting and compressing data |
5424732, | Dec 04 1992 | International Business Machines Corporation | Transmission compatibility using custom compression method and hardware |
5442350, | Oct 29 1992 | International Business Machines Corporation | Method and means providing static dictionary structures for compressing character data and expanding compressed data |
5585793, | Jun 10 1994 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Order preserving data translation |
5717860, | Sep 20 1995 | ULOGIN LLC | Method and apparatus for tracking the navigation path of a user on the world wide web |
5831558, | Jun 17 1996 | Hewlett Packard Enterprise Development LP | Method of compressing and decompressing data in a computer system by encoding data using a data dictionary |
5933104, | Nov 22 1995 | Microsoft Technology Licensing, LLC | Method and system for compression and decompression using variable-sized offset and length fields |
6073135, | Mar 10 1998 | R2 SOLUTIONS LLC | Connectivity server for locating linkage information between Web pages |
6327699, | Apr 30 1999 | Microsoft Technology Licensing, LLC | Whole program path profiling |
6611832, | Oct 30 2000 | GOOGLE LLC | Data structure for longest-matching prefix searching and updating method for search table data structures |
6687688, | Mar 13 1998 | Kabushiki Kaisha Toshiba | System and method for data management |
6718325, | Jun 14 2000 | Oracle America, Inc | Approximate string matcher for delimited strings |
7072880, | Aug 13 2002 | Xerox Corporation | Information retrieval and encoding via substring-number mapping |
7260583, | Dec 16 2003 | R2 SOLUTIONS LLC | Web page connectivity server construction |
20010002483, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 23 2005 | LIVSHITS, ARTEM Y | Microsoft Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 015901 | /0675 | |
Mar 24 2005 | Microsoft Corporation | (assignment on the face of the patent) | / | |||
Oct 14 2014 | Microsoft Corporation | Microsoft Technology Licensing, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 034543 | /0001 |
Date | Maintenance Fee Events |
Mar 23 2010 | ASPN: Payor Number Assigned. |
Mar 18 2013 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jul 13 2017 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Sep 13 2021 | REM: Maintenance Fee Reminder Mailed. |
Feb 28 2022 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jan 26 2013 | 4 years fee payment window open |
Jul 26 2013 | 6 months grace period start (w surcharge) |
Jan 26 2014 | patent expiry (for year 4) |
Jan 26 2016 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jan 26 2017 | 8 years fee payment window open |
Jul 26 2017 | 6 months grace period start (w surcharge) |
Jan 26 2018 | patent expiry (for year 8) |
Jan 26 2020 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jan 26 2021 | 12 years fee payment window open |
Jul 26 2021 | 6 months grace period start (w surcharge) |
Jan 26 2022 | patent expiry (for year 12) |
Jan 26 2024 | 2 years to revive unintentionally abandoned end. (for year 12) |