A method for determining what records have been added, deleted or changed in a database uses primary keys and crcs to make the determination. A first snapshot is taken of a record, the record retrieved, converted into characters, if necessary, and concatenated to produce a long string. A crc code is calculated for the string. The crc and the primary key for the record are stored. The crc is compared with a subsequent crc for a second snapshot of the record and a subsequent primary key. Comparing the two primary keys and the two crcs allows determination of whether the record has been deleted, added or changed. If the first and second primary keys do not match, or if the first and second primary keys match and the crcs do not match, the record is processed according to the particular application to be applied. The process is repeated for all records in the database.
|
10. A computer readable memory for causing a computer to identify added, deleted or changed records in a database comprising:
a first computer readable storage medium; a computer program stored in said storage medium; and the storage medium, so configured by said computer program causes the computer to take a first snapshot and a second snapshot of said records, and to compare the first snapshot to the second snapshot and comparing a first primary key to a second primary key to determine whether the record has been changed; wherein said primary key unambiguously identifies a single said record among a plurality of said records and said primary key is stored in conjunction with a first crc and a second crc; and wherein the comparison of the first snapshot with the second snapshot and the comparison of the first primary key with the second primary key determines the specific location and nature of an addition, deletion, or change to the record in the database.
1. A method for identifying added, deleted or changed records in a database comprising the steps of:
taking a first snapshot of each of the records of a database at a first point in time; calculating a first crc for the record; storing the first crc and a first primary key for the record; taking a second snapshot of each of the records of the database at a second point in time; calculating a second crc for the record at the second point in time; storing the second crc and a second primary key for the record; comparing the first crc to the second crc and comparing the first primary key to the second primary key to determine whether the record has been changed; wherein said primary key unambiguously identifies a single said record among a plurality of said records and said primary key is stored in conjunction with said first crc and said second crc; and wherein the comparison of the first crc with the second crc and the comparison of the first primary key with the second primary key determines the specific location and nature of an addition, deletion, or change to the record in the database.
7. A programmable apparatus for identifying added, changed or deleted records in a database comprising:
programmable hardware comprising: a computer having a processor and a non-volatile memory such as disk, hard drive or data base; a plurality of records installed on said memory; a program installed on said computer for causing said computer to take a first snapshot of each record in a database and for each record, to retrieve the record, to convert the data in the record into characters to create a string from the characters, to calculate a first crc from the string, and to store the first crc and primary key in non-volatile memory such as disk, hard drive or data base; responsive to a command at a later time, causing the computer to take a second snapshot of each record in the database, and for each record to retrieve the record, to convert the data in the record into characters, to create a string from the characters, to calculate a second crc from the string, and to store the second crc and primary key in memory; wherein, responsive to comparing the first crc to the second crc and comparing the first primary key to the second primary key to determine whether the record has been changed; wherein said primary key unambiguously identifies a single said record among a plurality of said records and said primary key is stored in conjunction with said first crc and said second crc; and wherein the comparison of the first crc with the second crc and the comparison of the first primary key with the second primary key determines the specific location and nature of an addition, deletion, or change to the record in the database. 2. The method of
responsive to a finding a primary key from the first snapshot that does not appear in the second snapshot, identifying the record corresponding to that primary key as a record that has been deleted and notifying a user of said deletion in said record identified by said primary key.
3. The method of
responsive to finding a primary key in the second snapshot that did not appear in the first snapshot, identifying that record as a record that has been added and notifying a user of said addition in said record identified by said primary key.
4. The method of
retrieving the record; converting the data in the record into characters; creating a string from the characters; calculating a first crc from the string; and storing the first crc in non-volatile memory such as disk, hard drive or data base.
5. The method of
retrieving the record; converting the data in the record into characters; creating a string from the characters; calculating a second crc from the string; and storing the second crc in memory.
6. The method of
8. The programmable apparatus of
responsive to finding a primary key from the first snapshot that does not appear in the second snapshot, identifying the record corresponding to that primary key as a record that has been deleted and notifying a user of said deletion in said record identified by said primary key.
9. The programable apparatus of
responsive to finding a primary key the second snapshot that did not appear in the first snapshot, identifying that record as a record that has been added and notifying a user of said addition in said record identified by said primary key.
11. The computer readable memory of
and responsive to a command at a later time, causing the computer to take a second snapshot of each record in the database, and for each record to retrieve the record, to convert the data in the record into characters, to create a string from the characters, to calculate a second crc from the string, and to store the second crc and primary key in memory; and wherein, responsive to comparing said first crc and said second crc for a record with the same primary key in the first snapshot and in the second snapshot, determining whether that record has been changed.
12. The computer readable memory of
responsive to a finding a primary key from the first snapshot that does not appear in the second snapshot, identifying the record corresponding to that primary key as a record that has been deleted and notifying a user of said deletion in said record identified by said primary key.
13. The computer readable memory of
responsive to finding a primary key in the second snapshot that did not appear in the first snapshot, identifying that record as a record that has been added and notifying a user of said addition in said record identified by said primary key.
14. The method of
responsive to a finding that said primary key of said first snapshot matches said primary key of said second snapshot, comparing said first crc and said second crc and upon a determination that said first crc and said second crc are different, notifying a user that a revision has been made to said record identified by said primary key.
15. The apparatus of
responsive to a finding that said primary key of said first snapshot matches said primary key of said second snapshot, comparing said first crc and said second crc and upon a determination that said first crc and said second crc are different, notifying a user that a revision has been made to said record identified by said primary key.
16. The apparatus of
responsive to a finding that said primary key of said first snapshot matches said primary key of said second snapshot, comparing said first crc and said second crc and upon a determination that said first crc and said second crc are different, notifying a user that a revision has been made to said record identified by said primary key.
17. The method of
18. The apparatus of
19. The apparatus of
|
The present invention relates to a solution to the problem of recognizing what records have been changed in a database using a Cyclic Redundancy Check (CRC).
The problem of recognizing what has changed in a database arise in three general situations. First, multiple databases may exist in a system requiring the same data in each. Therefore, if a change is made, the fact that a change has taken place must be recognized and the changed record transferred to the other databases. Second, an unauthorized change to a record may take place. Malicious software, commonly called viruses, change records in ways that are designed to conceal the change. Therefore, changed records must be identified for elimination of the virus and restoration of the attacked record. Third, it may be necessary to migrate data from a legacy data base in an old format base into a new data base in a new format. Because of the volume of data that may exist in the legacy data base, the time necessary for the migration can disrupt the user's business activity. Therefore, the records in the legacy data base can be transferred to the new data base and the new format prior to the changeover, and then it only remains to migrate the changed, added or deleted records from the period between the creation of the new data base and the changeover. In other words, if the old data base was migrated into the new data base and format one month before the business was to start using the new database, only the changes that occurred in the one month need to be processed prior to the changeover, and that can be done in a shorter time, thereby eliminating a lengthy disruption of the user's business activity.
In the first two situations described above, Cyclic Redundancy Checks (CRCs) have been used to identify changes in the database. As used herein CRCs mean a redundancy check in which the check key is generated by a cyclic algorithm. CRCs are used to verify database synchronization. Synchronization means, in application or data-base files, making version comparisons of copies of the files to ensure they contain the same data. U.S. Pat. No. 5,261,092 discloses verifying that two databases are synchronized. The '092 patent discloses a method for verifying data base information of redundant processors, and more particularly, for verifying that the data base of a primary processor and the data base of a secondary processor are synchronized without requiring the processors to be running in a synchronous manner.
A checksum is a form of CRC. As used herein a checksum means the sum of a group of data associated with the group and used for checking purposes. The data are either numeric or other character strings regarded as numeric for the purpose of calculating the checksum. The checksum is a calculated value that is used to test data for the presence of errors that can occur when data is transmitted or when it is written to a disk. The checksum is calculated for a given amount of data by sequentially combining all the bytes of data with a series of arithmetic or logical operations. After the data is transmitted or stored, a new checksum is calculated in the same way using the (possibly faulty) transmitted or stored data. If the two checksums do not match, an error has occurred, and the data should be transmitted or stored again. Checksums cannot detect all errors, and they cannot be used to correct erroneous data. U.S. Pat. No. 4,849,978 discloses the use of a checksum to recreate memory data. In a system having "n" memory units, an additional memory unit is added to serve as a backup. The backup data stored in the backup memory unit consists of the checksum of all data stored in all other memory units such that a word at address x in the backup unit is the checksum of all words at address x of all other memory units.
U.S. Pat. No. 5,974,574 discloses using a checksum to verify the credibility of a database. The method disclosed includes forming a position sensitive checksum for each entry of the database to be used in the check. The checksums are then formed into a first database checksum. Periodically, the checksums are again created from each of the entries, and a second database checksum formed and compared to the first. A match indicates that the checked entries of the database have not changed. In another embodiment, the checksum value for a modified entry, and for the entry before the modification are compared with the first checksum value, and that result compared with the master checksum to ensure that the modification was properly made and that database remains credible.
U.S. Pat. No. 5,613,002 discloses using a checksum to verify a reconstructed database. The invention of the '002 is directed toward an improved method of "generic disinfection". By generic disinfection is meant the use of a small amount of information about each host program in a database, and the use of that information to reconstruct the original program. Specifically, the '002 patent discloses a method of recording certain information about a computer program, and for using that information to reconstruct the program in the event that it subsequently becomes infected with any of a very broad class of viruses. The method includes constructing a database with entries for each program to be protected. It further includes a fast trial-and-error method for constructing the original uninfected host based on cyclical redundancy checks (CRCs), or a broader class of linear modular-arithmetic mappings. The method is founded on the observation that, for virtually every known virus, the infected host contains no more than two contiguous blocks of code taken from the original uninfected host (and possibly reversed in order). Prior to the infection, a "checksum" (a many-to-one bit mapping of the bytes comprising the uninfected host to a small number of bits or bytes) is computed. In addition, information pertaining to bytes near the beginning and end of the host is computed (designated Begin Tag and End Tag, respectively). Furthermore, the length of the host is determined. The checksum, Begin Tag, End Tag and filelength are recorded in a database containing one or more such entries, one for each host. The '002 patent uses the checksum, Begin Tag, End Tag and filelength to reconstruct the original host if it is subsequently determined that the program has changed in a suspicious manner. The '002 patent begins with an infected host and does not address detection of the infection.
U.S. Pat. No. 5,649,089 discloses, for a redundant, mirror database, using a checksum to assure that the standby database mirrors the active database. The disclosure deals with the real time tracking of change to a data base within a data communication system and more particularly to the maintenance of a redundant database in such systems. The active database includes a plurality of records each of which is subject to modifications from time to time. Those modifications must be communicated to and copied or mirrored in the standby database. In order to achieve the redundant controller system, a central processor modifies a record which includes a record checksum and incorporates the record checksum for the active data base. The active controller communicates the record, including the record checksum, and record number to the standby controller for incorporation into the standby database and concurrently constructs a virtual checksum. The virtual checksum is the checksum the active controller expects the standby database to have after the incorporation of the record being communicated. In this fashion, the active controller may continue making record modifications as required, reflected in the active checksum, and provide record updates, in a compatible timescale, to the standby data base, and yet retain the ability to compare, periodically the standby checksum to the virtual checksum, thus insuring that the standby data base mirrors the active data base.
What is needed beyond the prior art is a simplified extension to using CRCs to determine what has been changed in a database.
The invention which meets the needs identified above is a method for determining what records have been added, deleted or changed in a database utilizing primary keys and CRCs to make the determination. A first snapshot is taken of a record, the record retrieved, converted into characters, if necessary, and concatenated to produce a long string. A CRC code is calculated for the string. The CRC and the primary key for the record are stored. The CRC is compared with a subsequent CRC for a second snapshot of the record and a corresponding primary key. Comparing the two primary keys and the two CRCs allows determination of whether the record has been deleted, added or changed. If the first and second primary keys do not match, or if the first and second primary keys match and the CRCs do not match, the record is processed according to the particular application to be applied. The process is repeated for all records in the database.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
Referring to
Referring to
The advantages provided by the present invention should be apparent in light of the detailed description provided above. The description of the present invention has been presented for purposes of illustration and description, but is not limited to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention the practical application and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Patent | Priority | Assignee | Title |
10067674, | Jan 16 2013 | GOOGLE LLC | Consistent, disk-backed arrays |
6829640, | Dec 23 1999 | International Business Machines Corporation | Method and system for creating a byte stream characteristic number suitable for change quantification |
7051031, | Oct 09 2001 | Oracle America, Inc | Method, system, and program for managing accesses to data objects by multiple user programs over a network |
7054891, | Mar 18 2002 | BMC Software, Inc. | System and method for comparing database data |
7231408, | Jan 28 2000 | NOKIA SOLUTIONS AND NETWORKS OY | Data recovery in a distributed system |
7243289, | Jan 25 2003 | Oracle International Corporation | Method and system for efficiently computing cyclic redundancy checks |
7650367, | Jan 13 2006 | TEKELEC, INC | Methods, systems, and computer program products for detecting and restoring missing or corrupted data in a distributed, scalable, redundant measurement platform database |
7680833, | Nov 22 2002 | BMC Software, Inc. | System and method for database structure comparison |
8095511, | Jun 30 2003 | Microsoft Technology Licensing, LLC | Database data recovery system and method |
8260742, | Apr 03 2009 | International Business Machines Corporation | Data synchronization and consistency across distributed repositories |
8521695, | Jun 30 2003 | Microsoft Technology Licensing, LLC | Database data recovery system and method |
9424125, | Jan 16 2013 | GOOGLE LLC | Consistent, disk-backed arrays |
9558078, | Oct 28 2014 | Microsoft Technology Licensing, LLC | Point in time database restore from storage snapshots |
9600513, | Jun 09 2011 | AIRBNB, INC | Database table comparison |
9928281, | Mar 20 2015 | International Business Machines Corporation | Lightweight table comparison |
Patent | Priority | Assignee | Title |
4849978, | Jul 02 1987 | International Business Machines Corporation | Memory unit backup using checksum |
5261092, | Sep 26 1990 | Honeywell Inc. | Synchronizing slave processors through eavesdrop by one on periodic sync-verify messages directed to another followed by comparison of individual status |
5479654, | Apr 26 1990 | DOUBLE-TAKE SOFTWARE CANADA INC | Apparatus and method for reconstructing a file from a difference signature and an original file |
5613002, | Nov 21 1994 | LENOVO SINGAPORE PTE LTD | Generic disinfection of programs infected with a computer virus |
5649089, | Nov 30 1994 | Google Technology Holdings LLC | Method and apparatus for maintaining a redundant database system |
5974574, | Sep 30 1997 | Hewlett Packard Enterprise Development LP | Method of comparing replicated databases using checksum information |
6014676, | Dec 03 1996 | CDD TECHNOLOGIES, LLC | System and method for backing up computer files over a wide area computer network |
6101507, | Feb 11 1997 | MICRO FOCUS LLC | File comparison for data backup and file synchronization |
6219818, | Jan 14 1997 | Nokia Corporation; Intellisync Corporation | Checksum-comparing change-detection tool indicating degree and location of change of internet documents |
6233589, | Jul 31 1998 | EMC IP HOLDING COMPANY LLC | Method and system for reflecting differences between two files |
6393438, | Jun 19 1998 | BARCLAYS BANK PLC, AS ADMINISTATIVE AGENT | Method and apparatus for identifying the existence of differences between two files |
WO9745786, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jul 13 2000 | PERKS, MICHAEL ALBERT | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011021 | /0825 | |
Jul 20 2000 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Feb 28 2003 | ASPN: Payor Number Assigned. |
Sep 26 2006 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Sep 28 2010 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Nov 28 2014 | REM: Maintenance Fee Reminder Mailed. |
Apr 22 2015 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Apr 22 2006 | 4 years fee payment window open |
Oct 22 2006 | 6 months grace period start (w surcharge) |
Apr 22 2007 | patent expiry (for year 4) |
Apr 22 2009 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 22 2010 | 8 years fee payment window open |
Oct 22 2010 | 6 months grace period start (w surcharge) |
Apr 22 2011 | patent expiry (for year 8) |
Apr 22 2013 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 22 2014 | 12 years fee payment window open |
Oct 22 2014 | 6 months grace period start (w surcharge) |
Apr 22 2015 | patent expiry (for year 12) |
Apr 22 2017 | 2 years to revive unintentionally abandoned end. (for year 12) |