A method for processing missing values in measured data is provided. The method includes assigning weights to measured objects or measured items, according to a priority of the measured objects or measured items; selecting a set of the measured objects and measured items including the missing values having the smallest sum of the weights among a plurality of sets of the measured objects and measured items including the missing values; and removing the measured objects and measured items included in the selected set from the measured data.
|
6. A method for processing missing values in genetic data, the method comprising:
assigning weights to samples or single nucleotide polymorphism positions according to a priority of the samples or single nucleotide polymorphism positions;
selecting a set of the samples and single nucleotide polymorphism positions including the missing values having the smallest sum of weights among sets of the samples and single nucleotide polymorphism positions including the missing values; and
removing the samples and single nucleotide polymorphism positions included in the selected set from the genetic data.
1. A method for processing missing values in measured data including a plurality of missing values, the method comprising:
assigning weights to a plurality of measured objects or a plurality of measured items, according to a priority of the measured objects or measured items;
selecting a set of the measured objects and measured items including the missing values having the smallest sum of the weights among a plurality of acts of the measured objects and measured items including the missing values; and
removing the measured objects and measured items included in the selected set from the measured data,
wherein each measured item of plurality of measured items is assigned to each measured object.
2. The method of
generating a bipartite graph by placing the measured objects as vertices of one side of the bipartite graph, and the measured items as vertices of the other side of the bipartite, and connecting each of the vertices of the measured objects and each of the vertices of the measured items by an edge, at which each missing value occur;
assigning the weights assigned to the measured objects or measured items to corresponding vertices of the measured objects or measured items, respectively; and
selecting a vertex cover having the smallest sum of weights, among vertex covers.
3. The method of
generating a bipartite graph by placing the measured objects as vertices of one side of the bipartite graph, and the measured items as vertices of the other side of the bipartite graph, and connecting each of the vertices of the measured objects and each of the vertices of the measured items by an edge; at which each missing value occurs;
converting the bipartite graph into a weighted vertex cover problem to solve the weighted vertex cover problem; and
selecting a vertex cover having the smallest sum of weights.
4. The method of
5. The method of
7. The method of
8. The method of
generating a bipartite graph by placing the samples as vertices of one side of the bipartite graph, and the single nucleotide polymorphism positions as vertices of the other side of the bipartite graph, and connecting each of the vertices of the samples and each of the vertices of the single nucleotide polymorphism positions by an edge, at which each missing value occurs;
assigning the weights assigned to the samples or single nucleotide polymorphism positions to corresponding vertices of the samples or single nucleotide polymorphism positions, respectively; and
selecting a vertex cover having the smallest sum of weights among vertex covers.
9. The method of
generating a bipartite graph by placing the samples as vertices of one side of the bipartite graph, and the single nucleotide polymorphism positions as vertices of the other side of the bipartite graph;
connecting each of the vertices of the samples and each of the vertices of the single nucleotide polymorphism positions by an edge, at which each missing value occurs;
converting the bipartite graph into a weighted vertex cover problem to solve the weighted vertex cover problem; and
selecting a vertex cover having the smallest sum of weights.
10. A computer readable recording medium having embodied thereon a code for executing a method of
11. A computer readable recording medium having embodied thereon a code for executing a method of
|
This application claims the priority of Korean Patent Application No. 10-2004-0011001, filed on Feb. 19, 2004, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to a method for processing missing values in measured data, and more particularly, to a method for analyzing genotypic information having missing values.
2. Description of the Related Art
High-throughput genotyping technology has enabled generation of a vast amount of genotypic information with a large number of samples at a time. This genotyping technology has been used to analyze a nucleotide sequence of a target sample or single nucleotide polymorphism (SNP) information of a target gene. The high-throughput genotyping technology has been also used to search disease related genes or draw genetic maps based on the SNP information. Genotypic information is expressed in a form of a matrix where rows denote samples and columns denote genes or SNP positions.
Because high-throughput genotyping is performed on a large scale, genotyping error including missing values, for example, may exist in the genotypic information. Accordingly, when genetic experiments, for example, are performed on a large scale with missing values frequently occurring, a method for processing missing values while minimizing loss in the remaining measured data is needed.
The present invention provides a method for analyzing genotypic information capable of minimizing the loss of generated genotypic information while improving reliability of the genotyping.
According to an aspect of the present invention, a method for processing missing values in measured data including a plurality of missing values. The method comprises assigning weights to measured objects or measured items, according to a priority of the measured objects or measured items; selecting a set of the measured objects and measured items including the missing values having the smallest sum of the weights among a plurality of sets of the measured objects and measured items including the missing values; and removing the measured objects and measured items included in the selected set from the measured data.
According to another aspect of the present invention, a method for processing missing values in genetic data. The method comprises assigning weights to samples or single nucleotide polymorphism positions according to a priority of the samples or single nucleotide polymorphism positions; selecting a set of the samples and single nucleotide polymorphism positions including the missing values having the smallest sum of weights among sets of the samples and single nucleotide polymorphism positions including the missing values; and removing the samples and single nucleotide polymorphism positions included in the selected set from the genetic data.
According to further aspect of the present invention, a computer readable recording medium has embodied thereon a code for executing a method of processing missing values, such as analyzing genotypic information having missing values.
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings in which:
The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown.
Here, the step for selecting a set of samples and SNP positions having the smallest sum of weights will now be explained in more detail. When the number of samples or SNP positions is small as in
The bipartite graph to which weights are thus assigned is converted to a weighted vertex cover problem. A vertex cover is a partial set of vertex set covering all edges in the graph. By using the weighted vertex cover problem, a vertex cover having the smallest sum of weights is obtained. Therefore, the SNP positions or samples, corresponding to the vertices of the set obtained by using the solution of the weighted vertex cover problem, are removed from the genotypic information. The vertices marked by solid lines, in
The vertex cover problem can be solved in various methods, and Hungarian method is a leading one. The Hungarian method is disclosed in a book by Christos H. Papadimitiou and Kenneth Steiglitz, “Combinatorial optimization: Algorithms and complexity”, 1982, Prentice-Hall.
The method for processing missing values in genetic data, such as analyzing genotypic information having missing values, for example, according to the present invention can be embodied as computer readable codes on a computer readable recording medium. The computer readable recording medium includes any data storage device that can store data and read the data through a computer system.
Examples of the computer readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves such as data transmission through the Internet. The computer readable recording medium can also be performed over a network coupled to computer systems. The computer readable code is distributed via the network, and stored and executed at the computer systems coupled to the network.
Since the method for analyzing genotypic information having missing values, according to the present invention, assigns different weights to samples or SNP positions based on the priority of the samples or SNP positions, selects a set of samples and SNP positions having the smallest sum of weights, and removes the samples and SNP positions included in the selected set from the genotypic information, the method can prevent the genotypic information from having missing values, as well as minimizing the loss of important generic data.
Although the exemplary embodiments of the present invention have been described, it is understood that the present invention should not be limited to these exemplary embodiments but various changes and modifications can be made by one ordinary skilled in the art within the spirit and scope of the present invention as hereinafter claimed.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5025404, | Dec 07 1983 | U.S. Philips Corporation | Method of correcting erroneous values of samples of an equidistantly sampled signal and device for carrying out the method |
6519576, | Sep 25 1999 | International Business Machines Corporation | Method and system for predicting transaction |
6980691, | Jul 05 2001 | TAHOE RESEARCH, LTD | Correction of “red-eye” effects in images |
20050089906, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Feb 17 2005 | Samsung Electronics Co., Ltd. | (assignment on the face of the patent) | / | |||
May 06 2005 | NAM, YUNSUN | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 016314 | /0238 | |
May 06 2005 | PARK, KYUNGHEE | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 016314 | /0238 |
Date | Maintenance Fee Events |
Sep 09 2010 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Sep 14 2014 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Sep 17 2014 | ASPN: Payor Number Assigned. |
Sep 27 2018 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Apr 10 2010 | 4 years fee payment window open |
Oct 10 2010 | 6 months grace period start (w surcharge) |
Apr 10 2011 | patent expiry (for year 4) |
Apr 10 2013 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 10 2014 | 8 years fee payment window open |
Oct 10 2014 | 6 months grace period start (w surcharge) |
Apr 10 2015 | patent expiry (for year 8) |
Apr 10 2017 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 10 2018 | 12 years fee payment window open |
Oct 10 2018 | 6 months grace period start (w surcharge) |
Apr 10 2019 | patent expiry (for year 12) |
Apr 10 2021 | 2 years to revive unintentionally abandoned end. (for year 12) |