Clustering of nominal attributes using a nominal population metric enables comparisons of entities which are not easily comparable. In some embodiments, nominal population metrics are determined using a similarity matrix and a nominal population matrix using comparisons. In some embodiments, nominal population metrics are determined using a nominal population matrix using distributions. A computing device is able to determine the nominal population metrics with the appropriate hardware and applications configured for computing the nominal population metrics.
|
15. A system comprising:
a. a processing component;
b. a memory component coupled to the processing component; and
c. an application component stored in the memory component and processed by the processing component, the application component for:
i. comparing a plurality of entities thereby forming one or more entity comparisons, wherein each entity has a common non-numerical attribute and corresponding non-numerical value such that the one or more entity comparisons each associate two of the corresponding non-numerical values forming a non-numerical attribute value pair; and
ii. determining a coverage percentage for one or more of the non-numerical attribute value pairs based on the one or more entity comparisons;
wherein the two of the corresponding non-numerical values forming the non-numerical attribute value pair come from different entities of the plurality of entities.
11. A computing device comprising:
a. a processor;
b. a memory coupled to the processor; and
c. an application stored in the memory and processed by the processor, the application for:
i. comparing a plurality of entities thereby forming one or more entity comparisons, wherein each entity has a common non-numerical attribute and corresponding non-numerical value such that the one or more entity comparisons each associate two of the corresponding non-numerical values forming a non-numerical attribute value pair; and
ii. determining a coverage percentage for one or more of the non-numerical attribute value pairs based on the one or more entity comparisons;
wherein the processor and the memory are integrated into a single device, and further wherein the two of the corresponding non-numerical values forming the non-numerical attribute value pair come from different entities of the plurality of entities.
1. A method of determining non-numerical population metrics using a computing device comprising:
a. comparing a plurality of entities with a processor of the computing device thereby forming one or more entity comparisons, wherein each entity has a common non-numerical attribute and corresponding non-numerical value such that the one or more entity comparisons each associate two of the corresponding non-numerical values forming a non-numerical attribute value pair;
b. determining a coverage percentage for one or more of the non-numerical attribute value pairs based on the one or more entity comparisons with the processor;
c. storing the coverage percentage in a memory of the computing device; and
d. determining a non-numerical population metric based on the coverage percentage with the processor;
wherein the two of the corresponding non-numerical values forming the non-numerical attribute value pair come from different entities of the plurality of entities.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
9. The method of
10. The method of
12. The device of
13. The device of
14. The device of
16. The device of
17. The system of
18. The method of
19. The system of
|
This application claims priority under 35 U.S.C. § 119(e) of the co-pending, co-owned U.S. Provisional Patent Application Ser. No. 60/897,795, filed Jan. 26, 2007, and entitled “NOMINAL POPULATION METRIC: CLUSTERING OF NOMINAL APPLICATION ATTRIBUTES.” The Provisional Patent Application Ser. No. 60/897,795, filed Jan. 26, 2007, and entitled “NOMINAL POPULATION METRIC: CLUSTERING OF NOMINAL APPLICATION ATTRIBUTES” is also hereby incorporated by reference in its entirety.
The present invention relates to the field of clustering. More specifically, the present invention relates to the field of clustering nominal attributes.
When places such as colleges, universities or businesses review applications, certain elements of the applications are easily comparable. For example, if one student has a Grade Point Average (GPA) of 4.0 versus another student with a GPA of 2.5, clearly the GPA of 4.0 is higher. While the GPA is likely weighted against the quality of the school, there are still means of assigning values and computing a score that is easily compared. For example, if the schools are rated by tiers and the first school is rated a 0.9 while the second school is rated a 1.0, the first student receives a score of (4.0*0.9=3.6) versus (2.5*1.0=2.5) with the score of 3.6 being higher/better. However, there are elements in applications which are not easily quantifiable or comparable. Furthermore, there are laws that restrict using some characteristics in certain ways, such as quotas, as they are viewed as discriminatory. For example, it has been ruled unconstitutional to use a quota system to accept a certain number of people of a specific race.
Some attempts have been made for comparing these elements such as race, but they simply utilize a binary system of does the application have this element or not. Such attempts do not provide a sufficient comparison of the applications.
Clustering of nominal attributes using a nominal population metric enables comparisons of entities which are not easily comparable. In some embodiments, nominal population metrics are determined using a similarity matrix and a nominal population matrix using comparisons. In some embodiments, nominal population metrics are determined using a nominal population matrix using distributions. A computing device is able to determine the nominal population metrics with the appropriate hardware and applications configured for computing the nominal population metrics.
In one aspect, a method of determining nominal population metrics using a computing device comprises comparing one or more attributes from a plurality of entities to obtain one or more difference measures, determining a coverage percentage for attribute pairs based on the one or more attributes and storing the coverage percentage in a memory of the computing device. The coverage percentage is determined by the occurrence frequency of the attribute pairs. The method further comprises determining an inverse coverage percentage by subtracting the coverage percentage from 100%. The method further comprises selecting one or more entities from the plurality of entities based in part on the inverse coverage percentage. The one or more attributes include at least one of race, gender, major, extra curricular activity, wealth, nationality and country of origin. The plurality of entities are selected from the group consisting of admissions applications, job applications and scholarships. The difference measures are contained within a similarity matrix. The attribute pairs and the coverage percentage are contained in a nominal population matrix. The computing device is selected from the group consisting of a personal computer, a laptop computer, a computer workstation, a server, a mainframe computer, a handheld computer, a personal digital assistant, a cellular/mobile telephone, a smart appliance and a gaming console.
In another aspect, a method of determining nominal population metrics using a computing device comprises determining a plurality of distributions for one or more attributes from a plurality of entities, determining nominal differences based on the plurality of distributions and storing the nominal differences in a memory of the computing device. The nominal differences are determined by multiplying two distributions of the plurality of distributions and subtracting the product from 1. The method further comprises selecting one or more entities from the plurality of entities based in part on the nominal differences. The one or more attributes include at least one of race, gender, major, extra curricular activity, wealth, nationality and country of origin. The plurality of entities are selected from the group consisting of admissions applications, job applications and scholarships. The computing device is selected from the group consisting of a personal computer, a laptop computer, a computer workstation, a server, a mainframe computer, a handheld computer, a personal digital assistant, a cellular/mobile telephone, a smart appliance and a gaming console.
In yet another aspect, a method of determining nominal population metrics using a computing device comprises generating a first matrix comprising a first identification column identifying a first set of entities, a second identification column identifying a second set of entities and a difference measure column containing a difference amount between a first entity identified in the first identification column and a second entity identified in the second identification column, generating a second matrix comprising an attribute pair column containing a set of attribute pairs and a coverage column containing coverage data based on similarities of the attribute pairs in the attribute pair column, generating a third matrix comprising the attribute pair column containing the set of attribute pairs and an inverse coverage column containing inverse coverage data based on the coverage data and storing the third matrix in a memory of the computing device. The inverse coverage data is the result of the coverage data subtracted from 100%. The set of attribute pairs is related to at least one of race, gender, major, extra curricular activity, wealth, nationality and country of origin. The first set of entities and the second set of entities are selected from the group consisting of admissions applications, job applications and scholarships. The computing device is selected from the group consisting of a personal computer, a laptop computer, a computer workstation, a server, a mainframe computer, a handheld computer, a personal digital assistant, a cellular/mobile telephone, a smart appliance and a gaming console.
In another aspect, a method of determining nominal population metrics using a computing device comprises generating a matrix comprising an attribute column containing a plurality of attributes and a distribution column containing distribution percentages of the plurality of attributes, computing a nominal difference utilizing the distribution percentages of the plurality of attributes from the distribution column and storing the nominal difference in a memory in the computing device.
In yet another aspect, a computing device comprises a processor, a memory coupled to the processor and an application stored in the memory and processed by the processor, the application for comparing one or more attributes from a plurality of entities to obtain a difference measure and determining a coverage percentage for attribute pairs based on the one or more attributes. The coverage percentage is determined by the occurrence frequency of the attribute pairs. The application further determines an inverse coverage percentage from the coverage percentage by subtracting the coverage percentage from 100%.
In another aspect, an apparatus comprises a processing component, a memory component coupled to the processing component and an application component stored in the memory component and processed by the processing component, the application component for comparing one or more attributes from a plurality of entities to obtain a difference measure and determining a coverage percentage for attribute pairs based on the one or more attributes.
In most clustering or data mining applications, nominal attributes are assigned a value that allows them to be compared. This is common practice and requires some analysis of the nominal attributes to determine the value system. Nominal attributes are those that do not have numerical values; therefore, measuring the similarity or difference between nominal attribute values is very difficult. Race, gender, major, extra curricular activity, wealth, nationality and country of origin are just a few examples. In order to provide an accurate measure for comparing nominal attributes, the Nominal Population Metric (NPM) is able to be used. The NPM is able to be used for analyzing a variety of entities such as admissions applications, job applications, scholarships and more.
The NPM begins by identifying the nominal attributes. For example purposes, the attribute Race is discussed below. The example also assumes there are 20 applications in the applicant pool where Race is one of many attributes. In this example, the unique values for Race are W—White, B—Black, H—Hispanic, A—Asian and O—Other. Every application is compared to every other application. This results in 190 total combinations for 20 applications compared 2 at a time. The number of total combinations is able to be calculated using n C r=n!/[(n−r)! r!], where n=20 and r=2. If these combination pairs are placed into a matrix with application pairs side by side, the NPM is able be utilized to process the nominal attributes as follows:
TABLE 1
Example Similarity Matrix with Difference Measures
Application ID1
Application ID2
Difference Measure
1
2
20%
1
3
15%
1
4
75%
.
.
.
.
.
.
.
.
.
19
20
10%
TABLE 2
Example Nominal Population Matrix
Race
Coverage
W-B
30%
W-H
15%
B-A
5%
.
.
.
.
.
.
TABLE 3
Example Nominal Population Matrix
Race
Coverage
W-B
70%
W-H
85%
B-A
95%
.
.
.
.
.
.
An alternative approach to using the attribute pairs is to use the distribution values for each attribute. Using the previous example with Race, analysis is performed using the alternative approach. Table 4 shows the distribution of race within the base table, e.g. 70% of the rows have a ‘W’ for Race.
TABLE 4
Example Nominal Population Matrix
Race
Distribution
B
10%
W
70%
A
15%
H
5%
Other
0%
In general, the Nominal Population Metric covers any use of the distribution of the nominal attribute values to determine the difference or similarity between nominal attribute values.
Although matrices have been described above, any configuration for storing the computations/results is possible.
Examples of suitable computing devices include a personal computer, a laptop computer, a computer workstation, a server, a mainframe computer, a handheld computer, a personal digital assistant, a cellular/mobile telephone, a smart appliance, a gaming console or any other suitable computing device.
To utilize the Nominal Population Metric, values are determined which are able to be incorporated into a clustering application to make selections from a pool of applicants. For example, if a university wants to have a diverse student body, but wants to abide by U.S. laws as well as maintain the importance of academics in admissions, the Nominal Population Metric is able to be incorporated with other criteria such as GPA and SAT scores, to admit students based on a variety of criteria including race, gender and other non-numerical criteria.
In operation, a set of entities is obtained to be compared. For example, a university receives admissions applications from high school students wanting to attend the university. Each of the entities in the set of entities is compared to obtain a difference measure. Using data from the comparisons, a coverage amount is determined for each attribute pair within the entities. For example, a coverage amount is determined for race for White-Black pairs, White-Hispanic pairs and so on. The coverage amount is determined by how often those pairs are encountered in the entities. To find similarities in the entities, the coverage amount is used without further modification. To find differences in the entities, the coverage amount is then subtracted from 100% or 1.00, so that less common attributes carry more weight than more common attributes. The coverage amounts are then able to be used in clustering algorithms which incorporate the coverage amounts with other data. For example, in addition to other information provided on an admissions application such as GPA and SAT score, the nominal population metrics are able to be incorporated as well, to provide an even better selection of admitted applicants. In an alternative embodiment, instead of comparing pairs of the applications, a distribution is determined for each attribute. For example, the distribution for white students is determined to be 70% of the applications and for black students, 15% of the applications. Using the distribution, a difference of a nominal attribute (the difference between two distributions) is calculated by multiplying the two values and subtracting the product from 1. For example, to determine the nominal difference between White and Black, their distributions are multiplied and subtracted from 1. The nominal difference is then able to be used in clustering computations. The computations for determining nominal population metrics are able to be performed on any suitable computing device.
The present invention has been described in terms of specific embodiments incorporating details to facilitate the understanding of principles of construction and operation of the invention. Such reference herein to specific embodiments and details thereof is not intended to limit the scope of the claims appended hereto. It will be readily apparent to one skilled in the art that other various modifications may be made in the embodiment chosen for illustration without departing from the spirit and scope of the invention as defined by the appended claims.
Patent | Priority | Assignee | Title |
11861629, | Apr 20 2018 | Acxiom LLC | Global urbanicity measurement machine and method |
Patent | Priority | Assignee | Title |
6473898, | Jul 06 1999 | VERSATA SOFTWARE, INC | Method for compiling and selecting data attributes |
6487539, | Aug 06 1999 | eBay Inc | Semantic based collaborative filtering |
6567797, | Jan 26 1999 | GOOGLE LLC | System and method for providing recommendations based on multi-modal user clusters |
6922699, | Jan 26 1999 | GOOGLE LLC | System and method for quantitatively representing data objects in vector space |
7630986, | Oct 27 1999 | STRIPE, INC | Secure data interchange |
7720791, | May 25 2006 | MONSTER WORLDWIDE, INC | Intelligent job matching system and method including preference ranking |
20030237103, | |||
20040172393, | |||
20040181554, | |||
20050273350, | |||
20060224259, | |||
20060229896, | |||
20060282306, | |||
20070027562, | |||
20070162319, | |||
20070273909, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 22 2008 | GILBERT, JUAN E | Auburn University | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020477 | /0209 | |
Jan 23 2008 | Auburn University | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Jun 01 2017 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
Apr 28 2021 | M2552: Payment of Maintenance Fee, 8th Yr, Small Entity. |
Date | Maintenance Schedule |
Dec 17 2016 | 4 years fee payment window open |
Jun 17 2017 | 6 months grace period start (w surcharge) |
Dec 17 2017 | patent expiry (for year 4) |
Dec 17 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Dec 17 2020 | 8 years fee payment window open |
Jun 17 2021 | 6 months grace period start (w surcharge) |
Dec 17 2021 | patent expiry (for year 8) |
Dec 17 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Dec 17 2024 | 12 years fee payment window open |
Jun 17 2025 | 6 months grace period start (w surcharge) |
Dec 17 2025 | patent expiry (for year 12) |
Dec 17 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |