A non-transitory storage device containing software than, when executed by a processor, causes the processor to generate a projection set of polynomials based on a projection of a space linear combination of candidate polynomials of degree d on polynomials of degree less than d that do not evaluate to less than a threshold on a set of points. The software also causes the processor to compute the singular value decomposition of a matrix containing the difference between candidate polynomials evaluated on the points and the projection set of polynomials evaluated on the points, and to partition the polynomials resulting from the singular value decomposition based on a threshold.
|
7. A system for determining a unique set of polynomials for a non-numerical data to extract numerical data from the non-numerical data, comprising:
a processor;
a non-transitory, computer-readable storage device containing instructions that, when executed by the processor, cause the processor to:
measure the non-numerical data, wherein measuring includes measuring a widest portion of the non-numerical data and a highest portion of the non-numerical data;
generate a projection set of polynomials of a space linear combination of candidate polynomials of degree d on polynomials of degree less than d that do not evaluate to less than a threshold on a set of points based on the non-numerical data measurements, wherein the projection set of polynomials are projected to identify the non-numerical data;
generate a subtraction matrix based on the projection set of polynomials evaluated on the points and the candidate polynomials evaluated on the points;
compute a singular value decomposition of the subtraction matrix of evaluated polynomials;
partition the polynomials resulting from the singular value decomposition based on a threshold such that partitioning determines a set of polynomials of the unique set of polynomials that identify the non-numerical data; and
identify the non-numerical data based on the partitioned polynomials and the non-numerical data measurements.
13. A non-transitory, computer-readable storage device comprising instructions executable by a processor to determine a unique set of polynomials for a non-numerical data to extract numerical data from the non-numerical data, the non-transitory, computer-readable storage device comprising instructions causing the processor to:
measure the non-numerical data, wherein measuring includes measuring a widest portion of the non-numerical data and a highest portion of the non-numerical data;
generate a projection set of polynomials based on a projection of a space linear combination of candidate polynomials of degree d on polynomials of degree less than d that do not evaluate to less than a threshold on a set of points and based on the non-numerical data measurements, wherein the projection set of polynomials are projected to identify the non-numerical data;
compute the singular value decomposition of a matrix containing the difference between candidate polynomials evaluated on the points and the projection set of polynomials evaluated on the points;
partition the polynomials resulting from the singular value decomposition based on a threshold such that partitioning determines a set of polynomials of the unique set of polynomials that identify the non-numerical data, and
identify the non-numerical data based on the partitioned polynomials and the non-numerical data measurements.
1. A computer-implemented method of determining a unique set of polynomials for a non-numerical data to extract numerical data from the non-numerical data by a computing device, the method comprising:
measuring the non-numerical data, wherein measuring includes measuring a widest portion of the non-numerical data and a highest portion of the non-numerical data;
generating a projection set of polynomials that are projected to identify the non-numerical data based on the non-numerical data measurements by computing a projection of a space linear combination of candidate polynomials of degree d on polynomials of degree less than d that do not evaluate to less than a threshold on a set of points;
subtracting the projection set of polynomials evaluated on the points from the candidate polynomials evaluated on the points to generate a subtraction matrix of evaluated polynomials;
computing the singular value decomposition of the subtraction matrix of evaluated polynomials;
partitioning the polynomials resulting from the singular value decomposition based on a threshold such that partitioning determines a set of polynomials of the unique set of polynomials that identify the non-numerical data; and
identifying the non-numerical data based on the partitioned polynomials and the non-numerical data measurements,
wherein measuring, generating, subtracting, computing, and partitioning are performed by executing modules stored on a non-transitory computer-readable storage device of the computer.
2. The method of
3. The method of
4. The method of
5. The method of
incrementing d;
multiplying the set of candidate polynomials in degree d−1 that do not evaluate to less than the threshold on the points by the degree 1 candidate polynomials that do not evaluate to less than the threshold on the points, and
repeating the generating the projection set of polynomials, computing the singular value decomposition and partitioning the polynomials for the incremented value of d.
8. The system of
9. The system of
set d to 1 and initialize the candidate polynomials to all monomials of the points.
10. The system of
11. The system of
increment d;
multiply the set of candidate polynomials in degree d−1 that do not evaluate to less than the threshold on the points by the degree 1 candidate polynomials that do not evaluate to less than the threshold on the points, and
cause a subsequent iteration to be performed of the instructions to generate the projection set of polynomials, subtract the projection set of polynomials evaluated on the points from the candidate polynomials evaluated on the points, compute the singular value decomposition, and again partition the polynomials for the incremented value of d.
14. The non-transitory, computer-readable storage device of
15. The non-transitory, computer-readable storage device of
16. The non-transitory, computer-readable storage device of
17. The non-transitory, computer-readable storage device of
increment d;
multiply the set of candidate polynomials in degree d−1 that do not evaluate to less than the threshold on the points by the degree 1 candidate polynomials that do not evaluate to less than the threshold on the points, and
repeat generating the projection set of polynomials, computing the singular value decomposition, subtracting the projection set of polynomials evaluated on the points from the candidate polynomials evaluated on the points, and partitioning the polynomials for the incremented value of d.
18. The non-transitory, computer-readable storage device of
|
Data analysis is ubiquitous. Some data, however, is not numerical and, even if numerical, may be non-linear. Examples of non-numerical data include scanned documents and photographs. The types of analysis that might be useful on such non-numerical data may include compression, character recognition, etc. Computers, of course, only understand numbers so non-numerical data may be converted to numbers for the computer to understand and further process.
For a detailed description of exemplary embodiments of the invention, reference will now be made to the accompanying drawings in which:
In accordance with various implementations, numbers are extracted from non-numerical data so that a computing device can further analyze the extracted numerical data and/or perform a desirable type of operation on the data. The extracted numerical data may be referred to as “data points” or “coordinates.” A type of technique for analyzing the numerical data extracted from non-numerical data includes determining a unique set of polynomials for each class of interest and then evaluating the polynomials on a set of data points. For a given set of data points, the polynomials of one of the classes may evaluate to 0 or approximately 0. The data points are then said to belong to the class corresponding to those particular polynomials.
The principles discussed herein are directed to a technique by which a computing device processes data points in regards to a class. The technique involves the data points being described in terms of a corresponding class.
Measurements can be made on many types of non-numerical data. For example, in the context of alphanumeric character recognition, multiple different measurements can be made for each alphanumeric character encountered in a scanned document. Examples of such measurements include the average slope of the lines making up the character, a measure of the widest portion of the character, a measure of the highest portion of the character, etc. The goal is to determine a suitable set of polynomials for each possible alphanumeric character. Thus, capital A has a unique set of polynomials, B has its own unique set of polynomials, and so on. Each polynomial is of degree n (n could be 1, 2, 3, etc.) and may use some or all of the measurement values as inputs.
The classes depicted in
Part of the analysis, however, is determining which polynomials to use for each alphanumeric character. A class of techniques called Approximate Vanishing Ideal (AVI) may be used to determine polynomials to use for each class. The word “vanishing” refers to the fact that a polynomial evaluates to 0 for the right set of input coordinates. Approximate means that the polynomial only has to evaluate to approximately 0 for classification purposes. Many of these techniques, however, are not stable. Lack of stability means that the polynomials do not perform well in the face of noise. For example, if there is some distortion of the letter A or extraneous pixels around the letter, the polynomial for the letter A may not at all vanish to 0 even though the measurements were made for a letter A. Some AVI techniques are based on a pivoting technique which is fast but inherently unstable.
The implementations discussed below are directed to a Stable Approximate Vanishing Ideal (SAVI) technique which, as its name suggests, is stable in the face of noise in the input data.
The non-transitory storage device 130 is shown in
The distinction among the various engines 102-110 and among the software modules 132-140 is made herein for ease of explanation. In some implementations, however, the functionality of two or more of the engines/modules may be combined together into a single engine/module. Further, the functionality described herein as being attributed to each engine 102-110 is applicable to the software module corresponding to each such engine, and the functionality described herein as being performed by a given module is applicable as well as to the corresponding engine.
The functions performed by the various engines 102-110 of
The initialization engine 102 initializes a dimension (d) to 1 (action 202). The disclosed SAVI process thus begins with dimension 1 polynomials. The initialization engine 102 also initializes a set of candidate polynomials. The candidate polynomials represent the polynomials that will be processed in the given iteration to determine which, if any, of the polynomials evaluate on a given set of points to approximately 0 (e.g., below a threshold). Those candidate polynomials that do evaluate on the points to less than the threshold are chosen as polynomials for the given class. The initial set of candidate polynomials may include all of the monomials in the coordinates. That is, there are as many monomials as there are coordinates in the training data.
The projection engine 104 then processes the set of candidate polynomials, for example, as described in illustrative action 204 in
The following is an example of the computation of the linear combination of the candidate polynomials of degree d on the polynomials of degree less than d that do not evaluate to 0 on the set of points. The projection engine 104 may multiply the polynomials of degree less than d that do not evaluate to 0 by the polynomials of degree less than d that do not evaluate to 0 evaluated on the points and then multiply that result by the candidate polynomials of degree d evaluated on the points. In one example, the projection engine 104 computes:
Ed=O<dO<d(P)tCd(P)
where O<d represents the set polynomials that do not evaluate to 0 and are of lower than order d, O<d(P)t represents the transpose of the matrix of the evaluations of the O<d polynomials, and Cd(P) represents the evaluation of the candidate set of polynomials on the set of points (P). Ed represents the projection set of polynomials evaluated on the points.
The subtraction engine 106 subtracts (as indicated at 206 in
Subtraction matrix=Cd(P)−Ed(P)
The subtraction matrix represents the difference between evaluations of polynomials of degree d on the points, and evaluations of polynomials of lower degrees on the points.
The SVD engine 108 (at 208 in
Subtraction matrix=USV*
A matrix may be represented a linear transformation between two distinct spaces. To better analyze the matrix, rigid (i.e., orthonormal) transformations may be applied to these space. The “best” rigid transformations would be the ones which will result in the transformation being on a diagonal of a matrix, and that is exactly what the SVD achieve. The values on the diagonal of the S matrix are called the “singular values” of the transformation.
The candidate polynomials for the next iteration of the SAVI process either include all of the candidate polynomials from the previous iteration or a subset of such polynomials. If a subset is used, then the SAVI process removes from the candidate polynomials those polynomials that evaluate to less than the threshold. If candidate polynomials are to be removed for a subsequent iteration of the process, then such polynomials are removed from further use in a numerically stable manner as described below.
The partitioning engine 110 partitions (action 210 in
In one implementation, the partitioning engine 110 sets Ud equal to (Cd−Ed)VS−1 and then partitions the polynomials of Ud according to the singular values to obtain Gd and Od. Gd is the set of polynomials that evaluate to less than the threshold on the points. Od is the set of polynomials that do not evaluate to less than the threshold on the points.
The partitioning engine 110 also may increment the value of d, multiply the set of candidate polynomials in degree d−1 that do not evaluate to 0 on the points by the degree 1 candidate polynomials that do not evaluate to 0 on the points. The partitioning engine 110 further computes Dd=O1×Od-1 and then sets the candidate set of polynomials for the next iteration of the SAVI process to be the orthogonal complement in Dd of span Åi=1d-1Gi×Od-i.
The partitioning engine 110 then may cause control to loop back to action 204 in
The above discussion is meant to be illustrative of the principles and various embodiments of the present invention. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5555317, | Aug 18 1992 | Eastman Kodak Company | Supervised training augmented polynomial method and apparatus for character recognition |
6493380, | May 28 1999 | Microsoft Technology Licensing, LLC | System and method for estimating signal time of arrival |
7958063, | Nov 11 2004 | The Trustees of Columbia University in the City of New York | Methods and systems for identifying and localizing objects based on features of the objects that are mapped to a vector |
20080313179, | |||
20100150577, | |||
20100185423, | |||
20100238305, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jul 25 2012 | SCHEIN, SAGI | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 035079 | /0620 | |
Jul 25 2012 | LEHAVI, DAVID | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 035079 | /0620 | |
Jul 30 2012 | Hewlett Packard Enterprise Development LP | (assignment on the face of the patent) | / | |||
Oct 27 2015 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Hewlett Packard Enterprise Development LP | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 037079 | /0001 | |
Apr 05 2017 | Hewlett Packard Enterprise Development LP | ENTIT SOFTWARE LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 042746 | /0130 | |
Sep 01 2017 | NetIQ Corporation | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
Sep 01 2017 | MICRO FOCUS US , INC | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
Sep 01 2017 | MICRO FOCUS SOFTWARE, INC | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
Sep 01 2017 | ENTIT SOFTWARE LLC | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
Sep 01 2017 | ARCSIGHT, LLC | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
Sep 01 2017 | SERENA SOFTWARE, INC | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
Sep 01 2017 | Borland Software Corporation | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
Sep 01 2017 | Attachmate Corporation | JPMORGAN CHASE BANK, N A | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 044183 | /0718 | |
May 23 2019 | ENTIT SOFTWARE LLC | MICRO FOCUS LLC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 050004 | /0001 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | MICRO FOCUS LLC F K A ENTIT SOFTWARE LLC | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0718 | 062746 | /0399 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | Borland Software Corporation | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0718 | 062746 | /0399 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | MICRO FOCUS US , INC | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0718 | 062746 | /0399 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | SERENA SOFTWARE, INC | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0718 | 062746 | /0399 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | Attachmate Corporation | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0718 | 062746 | /0399 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | MICRO FOCUS SOFTWARE INC F K A NOVELL, INC | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0718 | 062746 | /0399 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | NetIQ Corporation | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0718 | 062746 | /0399 | |
Jan 31 2023 | JPMORGAN CHASE BANK, N A | MICRO FOCUS LLC F K A ENTIT SOFTWARE LLC | RELEASE OF SECURITY INTEREST REEL FRAME 044183 0577 | 063560 | /0001 |
Date | Maintenance Fee Events |
Mar 01 2021 | REM: Maintenance Fee Reminder Mailed. |
Aug 16 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jul 11 2020 | 4 years fee payment window open |
Jan 11 2021 | 6 months grace period start (w surcharge) |
Jul 11 2021 | patent expiry (for year 4) |
Jul 11 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 11 2024 | 8 years fee payment window open |
Jan 11 2025 | 6 months grace period start (w surcharge) |
Jul 11 2025 | patent expiry (for year 8) |
Jul 11 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 11 2028 | 12 years fee payment window open |
Jan 11 2029 | 6 months grace period start (w surcharge) |
Jul 11 2029 | patent expiry (for year 12) |
Jul 11 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |