A system and method for clustering data from a server computer and sent to a client computer. The server computer obtains a requested cluster size for the client computer. The requested cluster size includes the optimal size cluster the client computer can handle and the largest manageable cluster size that can be handled by the client. Fuzzy logic computations are performed on the data to determine an optimal cluster size and an optimal point at which to split the data for the particular client. Part of the cluster computations are based upon the affinity of individual data items to adjacent data items in the clustered list. The server computer also checks the affinity between the item with the largest score and the first item in the next cluster. If this affinity is higher than other affinity scores within the cluster, the cluster split is moved accordingly. Once an optimal cluster is determined, the data is transmitted from the server computer to the client computer.
|
1. A method for clustering data wherein the data includes a plurality of items, said method comprising:
obtaining requested cluster size data; calculating a score for each item; and determining a cluster size based upon the calculated score.
17. A computer operable medium for clustering data wherein the data includes a plurality of items, said medium comprising:
means for obtaining requested cluster size data; means for calculating a score for each item; and means for determining a cluster size based upon the calculating.
9. An information handling system for clustering data wherein the data includes a plurality of items, said system comprising:
a computer, the computer including: one or more processing units; a memory operatively coupled to the one or more processing units; and a nonvolatile storage area where the data is stored; a program executable by the one or more processing units, the program including: software code programmed to obtain requested cluster size data; software code programmed to calculate a score for each item; and software code programmed to determine a cluster size based upon the score for each item. 2. The method of
3. The method of
calculating an affinity score for each item, the affinity score relating to the similarity of each item to an adjacent item in the data.
4. The method of
calculating a fuzzy optimal score for each item, the fuzzy optimal score relating to an optimal cluster size usable by a client computer; and calculating a fuzzy maximum score for each item, the fuzzy maximum score relating to a largest manageable cluster size usable by a client computer.
5. The method of
calculating a total score for each item, the total score determined as the product of the affinity score for the item, the fuzzy optimal score for the item, and the fuzzy maximum score for the item.
6. The method of
7. The method of
comparing the affinity score for the last cluster item with the affinity score for each item; and selecting a new last cluster item, the last cluster item having a greater affinity score than the affinity score for the last cluster item.
8. The method of
connecting a server computer to a computer network; and receiving at the server computer the requested cluster size data from a client computer.
10. The information handling system of
11. The information handling system of
software code to calculate an affinity score for each item, the affinity score relating to the similarity of each item to an adjacent item in the data.
12. The information handling system of
software code to calculate a fuzzy optimal score for each item, the fuzzy optimal score relating to an optimal cluster size usable by a client computer; and software code to calculate a fuzzy maximum score for each item, the fuzzy maximum score relating to a largest manageable cluster size usable by a client computer.
13. The information handling system of
software code to calculate a total score for each item, the total score determined as the product of the affinity score for the item, the fuzzy optimal score for the item, and the fuzzy maximum score for the item.
14. The information handling system of
15. The information handling system of
software code to compare the affinity score for the last cluster item with the affinity score for each item; and software code to select a new last cluster item, the last cluster item having a greater affinity score than the affinity score for the last cluster item.
16. The information handling system of
software code to connect the computer to a computer network; and software code to receive at the server computer the requested cluster size data from a client computer.
18. The computer operable medium of
19. The computer operable medium of
means for calculating an affinity score for each item, the affinity score relating to the similarity of each item to an adjacent item in the data.
20. The computer operable medium of
means for calculating a fuzzy optimal score for each item, the fuzzy optimal score relating to an optimal cluster size usable by a client computer; and means for calculating a fuzzy maximum score for each item, the fuzzy maximum score relating to a largest manageable cluster size usable by a client computer.
21. The computer operable medium of
means for calculating a total score for each item, the total score determined as the product of the affinity score for the item, the fuzzy optimal score for the item, and the fuzzy maximum score for the item.
22. The computer operable medium of
23. The computer operable medium of
means for comparing the affinity score for the last cluster item with the affinity score for each item; and means for selecting a new last cluster item, the last cluster item having a greater affinity score than the affinity score for the last cluster item.
24. The computer operable medium of
means for connecting a server computer to a computer network; and means for receiving at the server computer the requested cluster size data from a client computer.
|
1. Field of the Invention
The present invention relates to information processing technology. More particularly, the present invention relates to a system and method for clustering large lists of information into optimal segments for human-computer interfaces using fuzzy logic.
2. Description of the Related Art
In client-server applications, the end user is often presented with lists of items from which to make choices or selections. Presenting an end user with a list of items from which to select introduces fewer errors into the system than relying on the end user to enter the information manually using the keyboard. However, allowing the user to select from a list of items becomes untenable when the list of items from which the user is selecting becomes extremely large. The point at which the list is untenable depends on various factors including the size of the user's display screen and the speed at which the data can be transmitted from the server to the client's workstation. For example, a list of 30 items may be very manageable on a 19" display with 1024×768 pixel resolution, but that same list is untenable on a 14" display with only 320×200 pixel resolution.
In the field of systems and network management, it is common to have lists of items consisting of thousands or tens of thousands of items. Traditional techniques for clustering these long lists rely on some characteristic or attribute of the item being displayed that can also be used for building the clustered list. For example, a long list of names could be clustered by their DNS hierarchy. A challenge arises, however, when the list is not evenly distributed across the characteristic or attribute used for clustering. Another challenge of the traditional method is that it often requires the end user to understand more information about the item that he or she wishes to select than is desirable.
It has been discovered that large lists of items can be grouped into clusters so that the items in a given cluster are a manageable size and the affinity between the last item of one cluster and the first item of the second cluster is reduced and the affinity of items within a cluster is accordingly maximized. The client may differ in terms of its display size and its connectivity to the server. Based on these factors, an optimal list size is determined that can optimally be sent to the client computer and displayed on the client's display device. In addition, a maximum manageable list size is determined to provide a maximum list size that can reasonably be handled by the client given the client's connectivity to the server and the client's display size.
The clustering of list data is performed by determining several "fuzzy" scores. These fuzzy scores include the affinity of list items to one another, whether the list size is about equal to the optimal size that can be handled by the client, and whether the list size is less than or about equal to the maximum list size that can be handled by the client. A total score is determined using the aforementioned fuzzy scores. The optimal cluster size is determined by first selecting the cluster based upon the item that receives the highest total score followed by performing an affinity test to determine if the cluster size should be modified due to data list items having a greater affinity score than the affinity score for the item first selected based upon total score.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations, and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawing indicates similar or identical items.
The following is intended to provide a detailed description of an example of the invention and should not be taken to be limiting of the invention itself. Rather, any number of variations may fall within the scope of the invention which is defined in the claims following the description.
Once request 110 is received by server 100, the data is clustered by clustering routine 115 that clusters list 120 into manageable cluster sizes. Clustered list 125 is the same as list 120, however in clustered list 125 breakpoints have been determined separating clustered list 125 into one or more clusters depending upon the optimal list size and largest manageable list size that can be handled by client 105. While clustered list 125 is shown as a separate list for clarity, one embodiment forms clustered list 125 by storing pointers of the breakpoints forming the various clusters with the pointers pointing to list items contained in list 120. Clustered list 125 is shown as including n clusters. If the number of clusters exceeds the largest manageable list size that can be handled by client 105, then the cluster list is clustered into further high-level cluster groups. These cluster groups can be iteratively clustered into further higher level cluster lists until a cluster group is attained that does not exceed the largest manageable list size that can be handled by client 105. Node list 130 includes identifiable characteristics of the associated clusters. For example, if list 120 contained an alphabetical listing of names, the clusters may be divided based on the letter of the alphabet with which the items in the corresponding cluster begins, i.e., "Aaron to Azman", "Backer to Byron", "Cabot to Cymer", "Dabney to Dyson", etc. These cluster dividing points are returned to the user to provide a meaningful cluster selection. If the user is looking for someone with the last name of "Doyle" he or she would know to select the "Dabney to Dyson" cluster. In one embodiment, node list 130 also includes the logical instructions needed to retrieve the corresponding data. For the examples cited above, the logical instructions would be "item<B", "B<=item<C", "C<=item<D", "D<=item<E", etc. Because the logical instructions may be more difficult for human readers to decipher, the logical instructions can be returned to client 105 as hidden text. At response 135, server 100 returns the information from node list 130 to client 105 for use by the user.
Client 105 receives response 135 and presents the cluster list to the user in a textual selection list, such as a pull down menu, combo box, list box, or the like. In the example above, the user was interested in finding information about a user named "Doyle" so the user would select the "Dabney to Dyson" cluster. By opening up the "D" (the fourth) cluster, client 105 sends open instruction 140 to server 100 requesting that data for the fourth cluster. Server 100 in turn performs fetch cluster 145 to fetch the fourth cluster from clustered list 125. Fetch cluster 145 retrieves the fourth cluster from node list 130. Including logical instructions for the data as described above allows server 100 to quickly fetch the requested cluster. In the example, list data for cluster 4 is fetched and return cluster 150 returns cluster from server 100 to client 105. The list of child nodes is returned to client 105 from server 100 in return list 155.
An embodiment of calculating the affinity score, "close to" optimal size score, and less than or about equal to the largest manageable size score is described below along with sample clustering data. After the total score has been calculated (at step 380), the loop reiterates (step 385) and processing begins for the next item in the list at the top of the loop (step 320). After all list items have been processed by loop beginning with step 320, the total number of clusters for the list is compared with the largest manageable cluster size that can be handled by the client (step 390). If the number of clusters is less than or equal to the largest manageable cluster size, "yes" branch 392 is taken bypassing further clusterization. However, if the number of clusters resulting from the loop beginning with step 320, then "no" branch 394 is taken and the loop is recursively processed (step 396) for the list of clusters until a high-level cluster is achieved that is less than or equal to the largest manageable cluster size. For example, if a dictionary of several thousand words was being clustered and the largest manageable cluster size was 50, the low-level nodes would contain less than or equal to 50 words. Because the number of clusters would also be greater than the largest manageable cluster size, the cluster lists themselves would be further clustered until the highest level cluster is found. In the example outlined above, the highest level cluster may be "A", "B", "C", . . . "Z". When the user clicks on the letter, the next level cluster would appear (i.e., "AA", "AB", "AC", . . . "AZ") and this process would continue until the lowest level clusters are displayed. Once all items have been clustered and cluster lists have been also clustered, if needed, processing terminates at step 399.
Turning now to
TABLE 1 | ||||||
Is Less Than OR | ||||||
About EqualTo | ||||||
Cluster | Is just | IsAboutEqual | Largest | Total | ||
Item | size | before | Affinity | Optimal Size | Manageable Size | Score |
dw3.tivoli.com | 1 | 0.67 | 0.33 | 0.04 | 1.00 | 0.01 |
dwakashi.dev.tivoli.com | 2 | 0.75 | 0.25 | 0.08 | 1.00 | 0.02 |
dwaltman.dev.tivoli.com | 3 | 0.67 | 0.33 | 0.12 | 1.00 | 0.04 |
dwidow.dev.tivoli.com | 4 | 0.75 | 0.25 | 0.16 | 1.00 | 0.04 |
dwilkins.dev.tivoli.com | 5 | 0.83 | 0.17 | 0.20 | 1.00 | 0.03 |
dwills.dev.tivoli.com | 6 | 0.75 | 0.25 | 0.24 | 1.00 | 0.06 |
dwise.dev.tivoli.com | 7 | 0.50 | 0.50 | 0.28 | 1.00 | 0.14 |
dyang.dev.tivoli.com | 8 | 0.86 | 0.14 | 0.32 | 1.00 | 0.04 |
dyang2.dev.tivoli.com | 9 | 0.67 | 0.33 | 0.36 | 1.00 | 0.12 |
dyounis1.dev.tivoli.com | 10 | 0.00 | 1.00 | 0.40 | 1.00 | 0.40 |
eagle.dev.tivoli.com | 11 | 0.67 | 0.33 | 0.44 | 1.00 | 0.15 |
eamon.dev.tivoli.com | 12 | 0.67 | 0.33 | 0.48 | 1.00 | 0.16 |
earth.dev.tivoli.com | 13 | 0.50 | 0.50 | 0.52 | 1.00 | 0.26 |
ebagley.training.tivoli.com | 14 | 0.50 | 0.50 | 0.56 | 1.00 | 0.28 |
ecarrie.dev.tivoli.com | 15 | 0.50 | 0.50 | 0.60 | 1.00 | 0.30 |
eddie.dev.tivoli.com | 16 | 0.67 | 0.33 | 0.64 | 1.00 | 0.21 |
edenor-msio.dev.tivoli.com | 17 | 1.00 | 0.00 | 0.68 | 1.00 | 0.00 |
edenor-sftiii.dev.tivoli.com | 18 | 1.00 | 0.00 | 0.72 | 1.00 | 0.00 |
edenor1.dev.tivoli.com | 19 | 1.00 | 0.00 | 0.76 | 1.00 | 0.00 |
edenor2.dev.tivoli.com | 20 | 0.50 | 0.50 | 0.80 | 1.00 | 0.40 |
eel.dev.tivoli.com | 21 | 0.67 | 0.33 | 0.84 | 1.00 | 0.28 |
eemejul1.dev.tivoli.com | 22 | 0.50 | 0.50 | 0.88 | 1.00 | 0.44 |
efollis.dev.tivoli.com | 23 | 0.67 | 0.33 | 0.92 | 1.00 | 0.30 |
efran.dev.tivoli.com | 24 | 0.75 | 0.25 | 0.96 | 1.00 | 0.24 |
efron-nt.dev.tivoli.com | 25 | 0.86 | 0.14 | 1.00 | 1.00 | 0.14 |
efron.dev.tivoli.com | 26 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 |
eggflip.dev.tivoli.com | 27 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 |
ehalliday.dev.tivoli.com | 28 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 |
eilbott.dev.tivoli.com | 29 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
einstein.dev.tivoli.com | 30 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 |
ekartzma.dev.tivoli.com | 31 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
ekulchak.training.tivoli.com | 32 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 |
el-guardo.dev.tivoli.com | 33 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
eliu.dev.tivoli.com | 34 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
elm.dev.tivoli.com | 35 | 0.75 | 0.25 | 1.00 | 1.00 | 0.25 |
elmo.dev.tivoli.com | 36 | 0.67 | 0.33 | 0.96 | 1.00 | 0.32 |
elvis.dev.tivoli.com | 37 | 0.50 | 0.50 | 0.92 | 1.00 | 0.46 |
enceladus.dev.tivoli.com | 38 | 0.67 | 0.33 | 0.88 | 1.00 | 0.29 |
engulf.dev.tivoli.com | 39 | 0.67 | 0.33 | 0.84 | 1.00 | 0.28 |
enigma.dev.tivoli.com | 40 | 0.67 | 0.33 | 0.80 | 1.00 | 0.26 |
ennuncio.dev.tivoli.com | 41 | 0.67 | 0.33 | 0.76 | 1.00 | 0.25 |
enterprise.dev.tivoli.com | 42 | 0.50 | 0.50 | 0.72 | 1.00 | 0.36 |
eofenste.dev.tivoli.com | 43 | 0.67 | 0.33 | 0.68 | 1.00 | 0.22 |
eoliver-nt.dev.tivoli.com | 44 | 1.00 | 0.00 | 0.64 | 1.00 | 0.00 |
eoliver.dev.tivoli.com | 45 | 0.50 | 0.50 | 0.60 | 1.00 | 0.30 |
epurzer.training.tivoli.com | 46 | 0.50 | 0.50 | 0.56 | 0.93 | 0.26 |
equator.dev.tivoli.com | 47 | 0.50 | 0.50 | 0.52 | 0.87 | 0.23 |
eratlif1.dev.tivoli.com | 48 | 1.00 | 0.00 | 0.48 | 0.80 | 0.00 |
eratlif2.dev.tivoli.com | 49 | 0.67 | 0.33 | 0.44 | 0.73 | 0.11 |
eridani.dev.tivoli.com | 50 | 0.67 | 0.33 | 0.40 | 0.67 | 0.09 |
ernie.dev.tivoli.com | 51 | 0.67 | 0.33 | 0.36 | 0.60 | 0.07 |
eros.dev.tivoli.com | 52 | 0.67 | 0.33 | 0.32 | 0.53 | 0.06 |
ersmith.training.tivoli.com | 53 | 0.67 | 0.33 | 0.28 | 0.47 | 0.04 |
erupt.dev.tivoli.com | 54 | 0.50 | 0.50 | 0.24 | 0.40 | 0.05 |
esc-dev.dev.tivoli.com | 55 | 0.83 | 0.17 | 0.20 | 0.33 | 0.01 |
esc-staging.dev.tivoli.com | 56 | 0.75 | 0.25 | 0.16 | 0.27 | 0.01 |
escher.dev.tivoli.com | 57 | 0.75 | 0.25 | 0.12 | 0.20 | 0.01 |
escort.dev.tivoli.com | 58 | 0.67 | 0.33 | 0.08 | 0.13 | 0.00 |
esher.dev.tivoli.com | 59 | 0.50 | 0.50 | 0.04 | 0.07 | 0.00 |
etch.dev.tivoli.com | 60 | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 |
In Table 1, assume all words shown fall within the largest manageable cluster size for the current cluster. The item with the highest total score, "ekulchak.training.tivoli.com", falls right before the item "ekartzma.dev.tivoli.com", however, the item with the lowest affinity score, "dyounisl.dev.tivoli.com", falls right before the item "eagle.dev.tivoli.com." It makes more sense to split the clusters in this spot to create a small cluster that is bounded by dw3.tivoli.com and dyounisl.dev.tivoli.com rather than to expand the size of the cluster so as to achieve the optimal score. Expanding the cluster to the optimal size would create an unnatural grouping of items (dw-ek) whereas the smaller cluster while being sub-optimal based on size is better optimized for usability (dw-e). Referring back to
Table 2 (shown below) illustrates a sample cluster following processing by the flowcharts shown in
TABLE 2 | ||||||
Sample Cluster | ||||||
(5) | (6) | |||||
(2) | (3) | (4) | About Equal | Less Than Or | (7) | |
(1) | Cluster | Is just | Affinity | to Optimal | About Equal To | Total |
Item | size | before | (1-C) | Size | Max.Manageable Size | Score |
a | 1 | 0.50 | 0.50 | 0.05 | 1.00 | 0.03 |
ability | 2 | 0.67 | 0.33 | 0.10 | 1.00 | 0.03 |
able | 3 | 0.67 | 0.33 | 0.15 | 1.00 | 0.05 |
about | 4 | 0.50 | 0.50 | 0.20 | 1.00 | 0.10 |
across | 5 | 0.50 | 0.50 | 0.25 | 1.00 | 0.13 |
address | 6 | 0.88 | 0.12 | 0.30 | 1.00 | 0.04 |
addressability | 7 | 0.90 | 0.10 | 0.35 | 1.00 | 0.04 |
addressable | 8 | 0.88 | 0.12 | 0.40 | 1.00 | 0.05 |
addressed | 9 | 0.67 | 0.33 | 0.45 | 1.00 | 0.15 |
advanced | 10 | 0.50 | 0.50 | 0.50 | 1.00 | 0.25 |
agent | 11 | 0.83 | 0.17 | 0.55 | 1.00 | 0.09 |
agent. | 12 | 0.50 | 0.50 | 0.60 | 1.00 | 0.30 |
all | 13 | 0.75 | 0.25 | 0.65 | 1.00 | 0.16 |
allow | 14 | 0.83 | 0.17 | 0.70 | 1.00 | 0.12 |
allows | 15 | 0.67 | 0.33 | 0.75 | 1.00 | 0.25 |
already | 16 | 0.67 | 0.33 | 0.80 | 1.00 | 0.26 |
alternatively | 17 | 0.75 | 0.25 | 0.85 | 1.00 | 0.21 |
although | 18 | 0.50 | 0.50 | 0.90 | 1.00 | 0.45 |
an | 19 | 0.67 | 0.33 | 0.95 | 1.00 | 0.31 |
and | 20 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
announce | 21 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
another | 22 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
answer | 23 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
any | 24 | 0.75 | 0.25 | 1.00 | 1.00 | 0.25 |
anything | 25 | 0.75 | 0.25 | 1.00 | 1.00 | 0.25 |
anywhere | 26 | 0.50 | 0.50 | 1.00 | 1.00 | 0.50 |
API | 27 | 0.67 | 0.33 | 1.00 | 1.00 | 0.33 |
application | 28 | 0.92 | 0.08 | 1.00 | 1.00 | 0.08 |
applications | 29 | 0.80 | 0.20 | 1.00 | 1.00 | 0.20 |
apply | 30 | 0.75 | 0.25 | 1.00 | 1.00 | 0.25 |
approach | 31 | 0.50 | 0.50 | 0.95 | 1.00 | 0.48 |
architecture | 32 | 0.92 | 0.08 | 0.90 | 1.00 | 0.07 |
architecture. | 33 | 0.67 | 0.33 | 0.85 | 1.00 | 0.28 |
are | 34 | 0.75 | 0.25 | 0.80 | 1.00 | 0.20 |
areas | 35 | 0.67 | 0.33 | 0.75 | 1.00 | 0.25 |
arrangements | 36 | 0.50 | 0.50 | 0.70 | 1.00 | 0.35 |
as | 37 | 0.67 | 0.33 | 0.65 | 1.00 | 0.21 |
aspects | 38 | 0.67 | 0.33 | 0.60 | 1.00 | 0.20 |
associated | 39 | 0.75 | 0.25 | 0.55 | 1.00 | 0.14 |
assumed | 40 | 0.86 | 0.14 | 0.50 | 1.00 | 0.07 |
assumes | 41 | 0.67 | 0.33 | 0.45 | 1.00 | 0.15 |
asynchronicity | 42 | 0.50 | 0.50 | 0.40 | 1.00 | 0.20 |
at | 43 | 0.87 | 0.33 | 0.35 | 1.00 | 0.12 |
attached | 44 | 0.50 | 0.50 | 0.30 | 1.00 | 0.15 |
availability | 45 | 0.88 | 0.12 | 0.25 | 1.00 | 0.03 |
available | 46 | 0.00 | 1.00 | 0.20 | 1.00 | 0.20 |
back | 47 | 0.80 | 0.20 | 0.15 | 1.00 | 0.03 |
backup | 48 | 0.67 | 0.33 | 0.10 | 1.00 | 0.03 |
bandwidth | 49 | 0.67 | 0.33 | 0.05 | 1.00 | 0.02 |
based | 50 | 0.67 | 0.33 | 0.00 | 1.00 | 0.00 |
Column 1 contains the item that is being clustered. Column 2 contains the current cluster size. Column 3 contains the "is just before" value that relates to the items affinity with the next item in the list. Depending upon the kind of data being clustered, different affinity formulas can be used. In the example shown, a simple alphanumeric text list is being clustered. The "is just before" value is calculated by determining which character is different between the item and the next item. For example, the "is just before" value for "backup" is 0.67 because the next item is the word "bandwidth" and "backup" and "bandwidth" have the same first 2 characters and differ with respect to the third character. Using the formula 1-(1/character position) results in a 0.67 value for the item "backup" and a 0.8 value for the item "back." The fourth column is the affinity value which is calculated as the inverse (1-"is just before" value). When breaking cluster sizes, the determination is made by either finding the lower "is just before" value for the item or the higher "affinity" value for the item. During the discussion of affinity used for breaking clusters in
The fifth column is the "is just about equal to optimal size" value. To understand how this fuzzy value is determined, a fuzzy "shape" is applied to the data. In Table 2, the shape used is trapezoidal. Other shapes can be triangular and curvilinear (such as a beta curve or Gaussian curve). The sixth column is the "is less than or about equal to" the largest manageable cluster size value. This fuzzy value can be determined as a linear, or shoulder, value or as a sigmoid value.
The optimal value scores shown in Table 2 use the trapezoidal function. The optimal value used in Table 2 is 25. The optimal range is then computed as being any cluster size from 20 through 30. These cluster sizes are each considered "close to" optimal. As the numbers fall away from the optimal range, the "close to" optimal value is computed by the formula 1-(distance from optimal range/20). The number 20 is used in the formula because that is the first number that falls within the optimal range.
Referring back to
The seventh column in Table 2, Total Score is calculated by multiplying the affinity value (column 4), the "about equal to optimal size" value (column 5), and the "less than or about equal to largest manageable cluster size" value (column 6). The highlighted row (row 26) shows the item with the highest total score. This item ("anywhere") would be the last item in the cluster. The first item in the next cluster would be "API." Note that no other item in the cluster range has a higher affinity score than the breakpoint shown ("anywhere" has an affinity score of 0.5), therefore the processing of the flowchart in
While particular embodiments of the present invention have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, changes and modifications may be made without departing from this invention and its broader aspects and, therefore, the appended claims are to encompass within their scope all such changes and modifications as are within the true spirit and scope of this invention. Furthermore, it is to be understood that the invention is solely defined by the appended claims. It will be understood by those with skill in the art that is a specific number of an introduced claim element is intended, such intent will be explicitly recited in the claim, and in the absence of such recitation no such limitation is present. For non-limiting example, as an aid to understanding, the following appended claims contain usage of the introductory phrases "at least one" and "one or more" to introduce claim elements. However, the use of such phrases should not be construed to imply that the introduction of a claim element by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an"; the same holds true for the use in the claims of definite articles.
Patent | Priority | Assignee | Title |
11361008, | May 11 2010 | International Business Machines Corporation | Complex query handling |
7007069, | Dec 16 2002 | Palo Alto Research Center Inc. | Method and apparatus for clustering hierarchically related information |
7072902, | May 26 2000 | TZUNAMI, INC | Method and system for organizing objects according to information categories |
8108392, | Oct 05 2007 | Fujitsu Limited | Identifying clusters of words according to word affinities |
8914372, | Mar 31 2011 | Utopus Insights, Inc | Clustering customers |
8918397, | Mar 31 2011 | Utopus Insights, Inc | Clustering customers |
8972388, | Feb 29 2012 | GOOGLE LLC | Demotion of already observed search query completions |
9052844, | Jun 16 2011 | GN Netcom A/S; GN NETCOM A S | Computer-implemented method of arranging text items in a predefined order |
9323828, | May 11 2010 | International Business Machines Corporation | Complex query handling |
9465859, | Jun 16 2011 | GN Netcom A/S | Computer-implemented method of arranging text items in a predefined order |
9495775, | Jun 28 2002 | Microsoft Technology Licensing, LLC | System and method for visualization of categories |
9892191, | May 11 2010 | International Business Machines Corporation | Complex query handling |
Patent | Priority | Assignee | Title |
5267146, | Aug 01 1990 | NISSAN MOTOR CO , LTD | System for designing configuration with design elements characterized by fuzzy sets |
5706497, | Aug 15 1994 | NEC Corporation | Document retrieval using fuzzy-logic inference |
5832182, | Apr 24 1996 | Wisconsin Alumni Research Foundation | Method and system for data clustering for very large databases |
5832525, | Jun 24 1996 | Oracle America, Inc | Disk fragmentation reduction using file allocation tables |
5890168, | Jul 13 1995 | Gula Consulting Limited Liability Company | System with data repetition between logically sucessive clusters |
6026388, | Aug 16 1995 | Textwise, LLC | User interface and other enhancements for natural language information retrieval system and method |
6167397, | Sep 23 1997 | RPX Corporation | Method of clustering electronic documents in response to a search query |
6269368, | Oct 17 1997 | Textwise LLC | Information retrieval using dynamic evidence combination |
20010023419, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 06 2000 | HART, DAVID G | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 010657 | /0203 | |
Mar 09 2000 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Dec 04 2002 | ASPN: Payor Number Assigned. |
Jun 30 2006 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jul 16 2010 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Aug 29 2014 | REM: Maintenance Fee Reminder Mailed. |
Sep 26 2014 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Sep 26 2014 | M1556: 11.5 yr surcharge- late pmt w/in 6 mo, Large Entity. |
Date | Maintenance Schedule |
Jan 21 2006 | 4 years fee payment window open |
Jul 21 2006 | 6 months grace period start (w surcharge) |
Jan 21 2007 | patent expiry (for year 4) |
Jan 21 2009 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jan 21 2010 | 8 years fee payment window open |
Jul 21 2010 | 6 months grace period start (w surcharge) |
Jan 21 2011 | patent expiry (for year 8) |
Jan 21 2013 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jan 21 2014 | 12 years fee payment window open |
Jul 21 2014 | 6 months grace period start (w surcharge) |
Jan 21 2015 | patent expiry (for year 12) |
Jan 21 2017 | 2 years to revive unintentionally abandoned end. (for year 12) |