Provided is a k-nearest neighbor search method of searching for a query number k of nearest points to an arbitrary point in a DBMS for creating a spatial index from multidimensional points, comprising setting a search conditions, judging which of a lowest branch and an intermediate branch of the spatial index a nearest region to the query point is, calculating, when the nearest region is judged to be the lowest branch, a distance between the query point and a child region of the nearest region, storing information of a divided region which has become a calculation target, calculating, when the nearest region is judged to be the intermediate region, a distance between the query point and a point included in the nearest region, storing information of the point which has become a calculation target, finishing search processing when the search conditions are satisfied, and obtaining a search result from the DBMS.
|
1. A k-nearest neighbor search method of searching a database for a query number k of nearest points to a query point, the database including multidimensional points and a spatial index where a region including the points is divided into a plurality of regions to set child regions in the region, a tree structure including branches and leaf nodes being created from the points and the region,
the search method comprising:
setting the query point and the query number as search conditions;
judging whether a nearest region to the query point is a lowest branch or an intermediate branch of the spatial index;
calculating, when the nearest region is judged to be an intermediate branch having a child region, a distance between the query point and the child region of the nearest region as a region distance;
storing information on a region which has become a calculation target of the region distance to obtain a nearest region to the region;
calculating, when a result of the judging shows that the nearest region is a lowest branch having no child region, a distance between the query point and a point included in the nearest region as a point distance;
storing information on the point which has become a calculation target of the point distance;
repeating, until the search conditions are satisfied, search processing from the judging to the storing the information on the point which has become the calculation target of the point distance, and finishing the search processing when the search conditions are satisfied; and
obtaining, after finishing the search processing, a record of the stored point as a search result from a database management system for managing the database.
25. A non-transitory storage medium storing a program for receiving a query point which becomes a search start point and searching for a query number k of nearest points to the query point in a database including multidimensional points and a spatial index where a region including the points is divided into a plurality of regions to set child regions in the region, and a tree structure including branches and leaf nodes is created from the points and the region, and a database management system for managing the database,
the program controlling a computer to execute:
setting the query point and the query number as search conditions;
judging which of a lowest branch and an intermediate branch of the spatial index a nearest region to the query point is;
calculating, when the nearest region is judged to be the intermediate branch having a child region, a distance between the query point and the child region of the nearest region as a region distance;
storing information on a region which has become a calculation target of the region distance to obtain a nearest region to the region;
calculating, when a result of the judging which of the lowest branch and the intermediate branch shows that the nearest region is the lowest branch having no child region, a distance between the query point and a point included in the nearest region as a point distance;
storing information on the point which has become a calculation target of the point distance;
repeating, until the search conditions are satisfied, search processing from the judging which of the lowest branch and the intermediate branch of the spatial index the nearest region to the query point is to the storing the information on the point which has become the calculation target of the point distance, and finishing the search processing when the search conditions are satisfied; and
obtaining, after the finishing the search processing, a record of the stored point as a search result from the database management system.
26. A k-nearest neighbor search device for receiving a query point which becomes a search start point and searching for a query number k of nearest points to the query point, comprising:
a processor for performing calculation processing;
a storage device for storing information;
a database including multidimensional points and a spatial index where a region including the points is divided into a plurality of regions to set child regions in the region and a tree structure including branches and leaf nodes is created from the points and the region;
a database management system for managing the database;
an initial setting manager for setting the query point and the query number as search conditions by the processor;
a lowest branch checker for judging which of a lowest branch and an intermediate branch of the spatial index a nearest region to the query point is by the processor;
a region distance calculator for calculating, when the nearest region is judged to be the intermediate branch having a child region, a distance between the query point and the child region of the nearest region as a region distance by the processor;
a region manager for storing information on a region which has become a calculation target of the region distance to obtain a nearest region to the region by the processor;
a point manager for calculating, when a result of the judging which of the lowest branch and the intermediate branch shows that the nearest region is the lowest branch having no child region, a distance between the query point and a point included in the nearest region as a point distance by the processor;
a point distance calculator for storing information on the point which has become a calculation target of the point distance in the storage device by the processor;
a termination checker for repeating, until the search conditions are satisfied, search processing from the lowest branch checker to the point manager, and finishing the search processing when the search conditions are satisfied by the processor; and
a result manager for obtaining, after the finishing the search processing, a record of the stored point as a search result from the database management system by the processor.
2. The k-nearest neighbor search method according to
3. The k-nearest neighbor search method according to
the multidimensional points comprise points each being locatable by one coordinate in a two-dimensional space; and
in the spatial index, the points are divided into a plurality of regions by one of a quadtree and an R-tree.
4. The k-nearest neighbor search method according to
the multidimensional points comprise points each being locatable by one coordinate in a three-dimensional space; and
in the spatial index, the points are divided into a plurality of regions by one of an octree and an R-tree.
5. The k-nearest neighbor search method according to
the spatial index contains attribute information on the point in the information on the point to manage the information by a leaf node;
the setting the query point and the query number as the search conditions comprises setting the query point, the query number, and the attribute information as search conditions;
the calculating the distance between the query point and the point included in the nearest region comprises storing only points matching the attribute information of the search condition; and
the obtaining the record of the stored point as the search result from the database management system comprises obtaining k nearest points to the query point that match the attribute information.
6. The k-nearest neighbor search method according to
7. The k-nearest neighbor search method according to
the initial setting manager sets the query point, the query number, and an interruption condition as search conditions, and interrupts, even in a situation where the query number is not reached, the search processing if the interruption condition is satisfied; and
the result manager is capable of obtaining up to k nearest points to the query point.
8. The k-nearest neighbor search method according to
the interruption condition is described by a query distance indicating a maximum distance of points which become a search result;
the region distance calculator records only information on divided points having distances equal to or shorter than the query distance in the region manager;
the point distance calculator records only information on points having distances equal to or shorter than the query distance in the point manager; and
the result manager is accordingly capable of obtaining up to k nearest points to the query point that have distances equal to or shorter than the query distance from the database management system.
9. The k-nearest neighbor search method according to
the interruption condition is described by maximum processing time indicating maximum time expended for the search processing;
the search processing is interrupted when an elapsed execution time of the search processing becomes equal to the maximum processing time; and
the result manager obtains up to k nearest points to the query point from the database management system.
10. The k-nearest neighbor search method according to
11. The k-nearest neighbor search method according to
the initial setting manager sets the query point, the query number, attribute information and an interruption condition as search condition;
the point distance calculator records only points matching the attribute information of the search conditions in the point manager;
even in a situation where the query number is not reached, the search processing is interrupted if the interruption condition is satisfied; and
the result manager obtains up to k points matching the attribute information from the database management system.
12. The k-nearest neighbor search method according to
the interruption condition is described by a query distance indicating a maximum distance of points which become a search result;
the region distance calculator records only information on divided points having distances equal to or shorter than the query distance in the region manager;
the point distance calculator records only information on points having distances equal to or shorter than the query distance in the point manager; and
the result manager is accordingly capable of obtaining up to k nearest points to the query point that match the attribute information and have distances equal to or shorter than the query distance from the database management system.
13. The k-nearest neighbor search method according to
the interruption condition is described by maximum processing time indicating maximum time expended for the search processing;
the search processing is interrupted when an elapsed execution time of the search processing becomes equal to the maximum processing time; and
the result manager obtains up to k nearest points to the query point that match the attribute information and have distances equal to or shorter than the query distance from the database management system.
14. The k-nearest neighbor search method according to
15. The k-nearest neighbor search method according to
16. The k-nearest neighbor search method according to
17. The k-nearest neighbor search method according to
18. The k-nearest neighbor search method according to
19. The k-nearest neighbor search method according to
20. The k-nearest neighbor search method according to
21. The k-nearest neighbor search method according to
22. The k-nearest neighbor search method according to
23. The k-nearest neighbor search method according to
24. The k-nearest neighbor search method according to
27. The k-nearest neighbor search device according to
the initial setting manager sets the query point, the query number, and an interruption condition as search conditions;
the termination checker interrupts, even in a situation where the query number is not reached, the search processing when the interruption condition is satisfied; and
the result manager obtains up to k nearest points to the query point.
28. The k-nearest neighbor search device according to
the interruption condition includes any one of a query distance indicating a longest distance from the query point to the point and maximum processing time indicating maximum time expended for the search processing;
if the interruption condition is the query distance,
the region distance calculator stores only information on divided regions having distances equal to or shorter than the query distance in the region manager;
the point distance calculator stores only information on points having distances equal to or shorter than the query distance in the point manager; and
the result manager obtains up to k nearest points to the query point that have distances equal to or shorter than the query distance from the database management system; and
if the interruption condition is the maximum processing time,
the termination checker interrupts the search processing when elapsed time of the search processing becomes equal to the maximum processing time; and
the result manager obtains up to k nearest points to the query point from the database management system.
29. The k-nearest neighbor search method according to
30. The k-nearest neighbor search method according to
31. The k-nearest neighbor search method according to
|
The present application claims priority from Japanese patent application JP2008-37362 filed on Feb. 19, 2008, the content of which is hereby incorporated by reference into this application.
This invention relates to a k-nearest neighbor search technique for strictly searching arbitrary multidimensional points for k nearest points at high speed, and more particularly, to a technique for searching for points in a two-dimensional or three-dimensional space assuming map information management.
A database management system that has a spatial search function has been developed for the purpose of map information management. This database management system is referred to as a spatial database management system. The spatial database management system enables management of graphic elements such as points, lines or surfaces of objects, and attribute elements such as characters or numerical values indicating contents of the objects. The spatial search function realizes range search for obtaining objects included or in contact with an arbitrary range. To achieve a high speed of range search, a spatial index technique such as quadtree, grid file, or R-tree has been proposed. The spatial index technique divides a spatial region according to arrangement and distribution of objects in the space area.
The spatial database management system has conventionally been developed for enterprise applications. Recently, however, a spatial database management system has been developed also for application to an embedded device. The embedded device that needs the spatial database is a device such as a car navigation device or a personal navigation device (PND) for managing map information. The car navigation device has a function of searching a spatial database for points such as restaurants or parking lots near a given point such as a current location or a destination designated by a user. For that purpose, k-nearest neighbor search for obtaining a number k (hereinafter referred to as a query number) of nearest points to the user-designated point (hereinafter referred to as a query point) from the spatial database has been known.
The k-nearest neighbor search in the conventional spatial database management system is realized by using range search. According to the conventional k-nearest neighbor search, first, a search range of an arbitrary size around a query point is set. If the number of points included in the search range exceeds the query number, distances between the query point and the respective points are calculated. The points are sorted in order of increasing distances, and k nearest points are obtained as the search result. On the other hand, if the number of points included in the search range is less than the query number, until the number exceeds the query number, a larger search range is set to repeat the range search. For example, a method that uses range search based on a grid file (JP 2003-242151 A) and a method that uses range search based on quadtree (U.S. Pat. No. 6,879,980) have been known.
However, the application of the conventional technique to the embedded device has had the following problems.
A first problem is extension of search time caused by disk access and a calculation load.
The embedded device generally includes a main memory of small capacity. A disk access accordingly occurs during search execution to read a page of a spatial index from an external storage device into the main memory. Thus, in the embedded device, when the number of points included in a search range is very large, disk access occurs frequently to extend search time. Especially, this problem occurs in a region with high population density of points and the search range is wide. On the other hand, an enterprise server includes a main memory of large capacity. The server can accordingly store most pages of a spatial index in the main memory beforehand. In this case, even when a large number of points are included within the search range, disk access does not occur frequently. Thus, search time is not extended.
The embedded device includes a low-speed central processing unit. Thus, a calculation load for calculation of a distance from a query point to respective points or sort processing by distance affects search time. Especially, when the number of points included in the search range is very large, the calculation load increases to extend the search time. On the other hand, the enterprise server includes a high-speed central processing unit. Thus, an influence of a calculation load on the search time is small.
It has been described that when the number of points included in the search range is very large, the problems occur in terms of disk access and calculation load. Even when the number of points included in the search range is less than the query number, problems similar to the above occur since search processing is repeated until the number of points reaches the query number.
To solve the problems, a method of setting a search range to an appropriate size based on the query number and a population density of points near the query point in the spatial database may be employed. However, this method may not be suitable for the embedded device. It is because for the embedded device, management of a population density of points has a heavy load while insertion or deletion of points may occur.
As a result, the k-nearest neighbor search using a range search in the embedded device has a problem of extended search time.
A second problem is inhibition of executing search processing or extension of search time when a capacity of memory usage is large during search execution.
According to the conventional technique, all points included in the search range are stored in the main memory, and sorted by using distances from the query point. Thus, when a large number of points are included within the search range, a capacity of memory usage increases. In this case, it may not be possible to implement the conventional technique since the embedded device includes a main memory of small capacity.
To solve this problem, there is a method in which an external storage device stores points that a main memory can not store. In this method, however, a disk access occurs in sort processing of points, and then, the search time increases.
A third problem is a possibility of an inaccurate search result with the number of points less than k when conventional k-nearest neighbor search and attribute search are combined.
The k-nearest neighbor search and the attribute search are executed in combination for a table which includes many types of points. For example, a point table managed by a car navigation device includes a restaurant, a parking lot, and a gas station. When a type is a restaurant, the point table includes a detailed classification such as Japanese food, Italian food, or French food. In this case, a user sets, in addition to a query point and the query number, a point type to execute k-nearest neighbor search. In a normal spatial database, attribute information and point information are managed as different attributes in the table. Thus, in a where-phrase of Structured Query Language (SQL), conditions of k-nearest neighbor search and attribute search are coupled together by an AND operator.
In this case, even when a number of points equal to the query number is obtained in k-nearest neighbor search, the number of points may be less than the query number. It is because a product set is obtained from a set of points obtained in k-nearest neighbor search and a set of points obtained in attribute search.
To solve the problem, a database management system for enterprise employs a method of setting the query number greater than k. In the embedded device, however, this method is not suitable. It is because when the query number is large, the number of necessary pages of a spatial index increases, and in the embedded device which includes the main memory of small capacity, disk access occurs frequently to extend search time. On the other hand, the enterprise server includes the main memory of large capacity. Thus, storing pages of the spatial index in the main memory beforehand can prevent frequent disk access even when the query number is large.
A fourth problem is extension of search time when k nearest points are not near a given query point.
In the conventional k-nearest neighbor search, the search range is widened until the query number is reached. When there is no desired point near the query point, a search result may include a point several tens of kilometers away from the query point. As the search range is widened, the number of necessary pages of the spatial index is increased. Thus, as described above, the search time is extended in the embedded device which includes the main memory of small capacity.
This invention has been developed with the aforementioned problems in mind, and achieves to reduce a capacity of memory usage during search processing and to shorten search time in a spatial database.
According to this invention, a k-nearest neighbor search method of receiving a query point which becomes a search start point and searching for a query number k of nearest points to the query point in a database including multidimensional points and a spatial index where a region including the points is divided into a plurality of regions to set child regions in the region, and a tree structure including branches and leaf nodes is created from the points and the region, and a database management system for managing the database, the method comprising: setting the query point and the query number as search conditions; judging which of a lowest branch and an intermediate branch of the spatial index a nearest region to the query point is; calculating, when the nearest region is judged to be the intermediate branch having a child region, a distance between the query point and the child region of the nearest region as a region distance; storing information on a region which has become a calculation target of the region distance to obtain a nearest region to the region; calculating, when a result of the judging which of the lowest branch and the intermediate branch shows that the nearest region is the lowest branch having no child region, a distance between the query point and a point included in the nearest region as a point distance; storing information on the point which has become a calculation target of the point distance; repeating, until the search conditions are satisfied, search processing from the judging which of the lowest branch and the intermediate branch of the spatial index the nearest region to the query point is to the storing the information on the point which has become the calculation target of the point distance, and finishing the search processing when the search conditions are satisfied; and obtaining, after the finishing the search processing, a record of the stored point as a search result from the database management system.
This invention enables reduction of the number of disk accesses to pages of a spatial index compared to that in the conventional k-nearest neighbor search using the range search. Thus, search time can be reduced. Reducing points targeted for distance calculation or sort processing can lower a calculation load of a processor. Moreover, a capacity of memory usage during search execution can be reduced.
Referring to the accompanying drawings, the preferred embodiments of this invention will be described.
The storage device 5 stores a spatial database 100 which includes graphic elements such as points, lines or surfaces of map objects, and attribute elements such as characters or numerical values indicating contents of the objects. Among information stored in the spatial database 100, information of one spot such as a shop or facility is a point.
In the memory 2, a DBMS (Database Management System) 8 for managing the spatial database 100, an application 9 for using the spatial database 100 via the DBMS 8, and an OS (Operating System) 7 for managing the DBMS 8 and the application 9 are loaded to be executed by the CPU 3. The application 9 calculates a current position from a signal of the GPS satellite received by the receiver 10, searches the spatial database 100 for a current point, and obtains map information to output it to the display device 4. In the car navigation device 1, when a user receives a search command from the input device 6, the application 9 searches the spatial database 100 via the DBMS 8 as described below, and outputs a requested search result to the display device 4. The OS 7, the DBMS 8, and the application 9 are stored in the storage device 5 which is a recording medium, and loaded in the memory 2 at the time of starting the car navigation device 1 to be executed by the CPU 3.
The application 9 allocates work areas in the memory 2. These areas are a point heap 91, a region heap 92, and a result list 93 described below.
This embodiment realizes k-nearest neighbor search which can solve the first and second problems. An application range of this invention is not basically limited by the number of dimensions of spatial data or a type of a spatial index technique. Requirements of a spatial index technique usable by this invention are as follows.
1. A spatial region is recursively divided, and branches for storing pieces of information of the divided regions are set as nodes of a tree structure.
2. A range of the branches has a hierarchical structure where it is included in a range of branches as parent nodes of the branches.
3. A divided region not subdivided has a leaf node for storing information of points included in the divided region, and the leaf node is connected to a lowest node of the tree structure.
An example where points of a two-dimensional space stored in the spatial database 100 is a search target of the car navigation device 1 and a quadtree method is applied to a spatial index technique will be described below.
The spatial database 100 illustrated in
First, the spatial index 101 is created by a computer (not shown) when points are inserted into the point table 501 managed in the spatial database 100. In this embodiment, a divided region table 401 illustrated in
According to the quadtree method, insertion of points into the spatial database 100 is accompanied by division of the spatial region into four regions at a plane parallel to the X and Y axes of a two-dimensional space. Generally, in the quadtree method, the maximum number of points that can be stored in the divided region is decided, and the divided region is divided into four when the maximum number is exceeded due to insertion of points. As region dividing methods, a method of making uniform areas of divided regions after division into four regions, and a method of making the numbers of points in the divided regions after division into four as uniform as possible have been presented. The latter method is realized by, for example, dividing the region into four on barycentric coordinates of all points in the region of a division target. The k-nearest neighbor search of this invention is operable without dependence on any region dividing method. A data structure of the quadtree method will be described below. This invention is directed to k-nearest neighbor search operated on the spatial index technique, but not directed to the spatial index technique itself. Thus, this embodiment necessitates application of only a well-known method for quadtree generation or a search procedure, and detailed description thereof will be omitted.
The region ID 412 is a unique identifier indicating a divided region. The range 413 is represented by X-Y coordinates of a bottom left and a top right of a region. The divided point 414 indicates a coordinate of a divided point when the region is divided into four. The divided region is divided into four by two straight lines passing through the divided point 414 and parallel to X and Y axes.
The child region ID 415 is an identifier of a divided region when the region is divided into four. The pointer to child region 416 indicates an address value (e.g., Logical Block Address (LBA)) of the storage device 5 which stores a branch 301 corresponding to the divided region indicated by the child region ID 415. When the branch 301 is a lowest branch, the divided region corresponding to the branch 301 has no child region. The divided point 414, the child region ID 415, and the pointer to child region 416 of the divided region corresponding to the lowest branch are NULL. The pointer to leaf node 417 indicates an address value of the storage device 5 which stores a leaf node 302 adjacent to the branch 301. The number of points 418 indicates the number of points stored in the leaf node 302 adjacent to the branch 301. When the branch 301 is an intermediate branch, the pointer to leaf node 417 of the branch 301 is NULL, and the number of points is 0.
When the processing of k-nearest neighbor search is started, first, in an initial setting manager 111, the CPU 3 sets the query point or query number and allocates regions of the memory 2 to be used for a region distance calculator 113 and a point manager 116 (S101).
A lowest branch checker 112 judges whether a nearest region includes a region to be further divided into four, in other words, whether the nearest region includes any child region (S102). The nearest region is a region nearest to a query point among divided regions including a point whose entry as result candidates is yet to be checked. If the condition of Step S102 is not satisfied, the nearest region is judged not to be a lowest branch. In this case, the region distance calculator 113 calculates the shortest Euclidean distance (region distance d) between the query point and a child region of the nearest region (S103).
A region manager 114 records information of child regions with child regions of the nearest region set as regions (S104). The region indicates a divided region including a point whose entry as a result candidate is yet to be checked. Then, the region manager 114 selects a next nearest region among the regions based on the region distance d (S105).
On the other hand, if the condition of Step S102 is satisfied, a nearest region currently focused on is judged to be a lowest branch of the nearest region. In this case, a point distance calculator 115 calculates a distance (point distance d′) between a query point and a point included in the nearest region (S106).
The point manager 116 records information on k nearest points near to the query point based on the obtained point distance d′ (S107). A termination checker 117 judges whether a point in the region cannot be a candidate (S108). If this termination condition is not satisfied, the process proceeds to Step S105 to continue the search processing.
On the other hand, if the termination condition is satisfied, a result manager 118 obtains a result record corresponding to the point stored by the point manager 116 (S109).
The processing is continued until the termination condition is satisfied to obtain a search result from the spatial database 100.
Setting initialization processing of
Then, the initial setting manager 111 allocates a memory area for storing a region in a heap structure (region heap 92 hereinafter) from the memory 2 (S702). The heap structure is used because it is an efficient data structure in the insertion operation performed by the region manager 114 when it records information of the child region of the nearest region (S104) and the selection operation when the region manager 114 selects a next nearest region (S105).
The region heap 92 stores an element constituted by region ID 412, a pointer to region information, and a region distance d. The pointer to region information indicates an address value of the storage device 5 which stores a branch 301 corresponding to the region. The region distance d indicates a shortest distance from a query point to the region. The region heap 92 is managed by a tree structure. The region heap 92 satisfies a heap condition that a region distance d of each element is equal to or smaller than a region distance d of a child element of the element even if an element is inserted or deleted. In other words, a root element of the region heap 92 stores information of a divided region having a minimum region distance d.
The initial setting manager 111 allocates a memory area for storing a point by a heap structure (point heap 91 hereinafter) from the memory 2 (S703). The heap structure is used because it is an efficient data structure in the insertion operation when the point manager 116 records information on k nearest points (S107). The point heap 91 stores an element constituted by a point ID 511, a pointer 513, and a point distance d′. The point heap 91 satisfies a heap condition that a point distance d′ of each element is equal to or larger than a point distance d′ of a child element of the element even if an element is inserted or deleted. In other words, a root element of the point heap 91 stores information of a point having a maximum point distance d′. This feature enables, when whether k+1-th and subsequent points can be candidates is judged in a situation where k points are present, easy check only by referring to the root element. Specifically, when a point distance d′ of a point of a checking target is smaller than a point distance d′ of a point of the root element, the point of the checking target replaces the point of the root element as a candidate. This operation enables a reduction of a used memory capacity during search execution because only k nearest points are always stored in the memory.
The initial setting manager 111 sets a region corresponding to the root in the quadtree to a nearest region (S704), and then finish the subroutine. In this case, the nearest region becomes an overall region. Then, the process proceeds to lowest branch check of the nearest region illustrated in
First, the region distance calculator 113 sets a range of a bottom left child region of the nearest region as follows. The bottom left and top right coordinates are respectively (Xmin, Ymin) and (Xdiv, Ydiv) (S901).
Then, the region distance calculator 113 sets a range of a bottom right child region of the nearest region as follows. The bottom left and top right coordinates are respectively (Xdiv, Ymin) and (Xmax, Ydiv) (S902). The region distance calculator 113 sets a range of a top left child region of the nearest region as follows. The bottom left and top right coordinates are respectively (Xmin, Ydiv) and (Xdiv, Ymax) (S903). The region distance calculator 113 sets a range of a top right child region of the nearest region as follows. The bottom left and top right coordinates are respectively (Xdiv, Ydiv) and (Xmax, Ymax) (S904).
The region distance calculator 113 calculates region distances d of the bottom left, bottom right, top left and top right child regions by using the following Equation (1) (S905).
where:
d is a distance value between a divided region R and a query point (x, y); and
the range of the divided region R is defined by bottom left coordinate (Xmin, Ymin) and top right coordinate (Xmax, Ymax).
In Equation (1), a coordinate of the query point is represented by (x, y), a bottom left coordinate of a divided region is represented by (Xmin, Ymin), and a top right coordinate is represented by (Xmax, Ymax). Equation (1) indicates a square of a distance d between a point nearest to the query point among points on a boundary line of the divided region and the query point. When the divided region includes the query point, the distance between the divided region and the query point is 0.
Referring back to
If the condition is not satisfied, the region manager 114 judges whether a region distance d of the inserted element is shorter than a region distance d of a parent element (S1104). If the condition is satisfied, the heap condition of the region heap 92 is not satisfied. In this case, the region manager 114 exchanges storing positions of the inserted element and the parent element (S1107). In the one-dimensional array, a value obtained by subtracting 1 from a minimum integer of a storing position exceeding i/2 is a parent element, where i denotes a storing position of the inserted element.
If the condition of Step S1103 is satisfied, or if the condition of Step S1104 is not satisfied, the heap condition of the region heap 92 is satisfied. In this case, the region manager 114 judges whether pieces of information of all the child regions have been inserted into the region heap 92. If the condition is not satisfied, proceeding to Step S1101, the region manager 114 inserts the pieces of information of the child regions into the heap. On the other hand, if the condition is satisfied, the subroutine is finished, and the processing proceeds to selection of a next nearest region illustrated in
On the other hand, if an element is judged to be present in the region heap 92 in Step S1303, the region manager 114 relocates the tail element (relocated element) of the region heap 92 to the root element (S1304). In the one-dimensional array, an element where a storing position is a tail element is relocated to a position where a storing position is 0. Then, the region manager 114 judges whether the relocated element has any child element (S1305). If the condition is not satisfied, the heap condition of the region heap 92 is satisfied. In this case, the region manager 114 finishes the subroutine to proceed to the lowest branch check of the nearest region which is next processing (S102). On the other hand, if the condition of Step S1305 is satisfied, the region manager 114 judges whether a region distance d of the relocated element is larger in value than a region distance d of a child element (S1306). If the condition is satisfied, the heap condition of the region heap 92 is not satisfied. In this case, the region manager 114 exchanges storing positions of the relocated element and the child element to return to Step S1305 (S1307). If the two child elements satisfy the condition of Step S1306, the child element of a shorter region distance d is an exchange target. In the one-dimensional array, the relocated element is exchanged with a child element of a storing position 2×i+1 or 2×i+2, where i denotes a storing position of the relocated element.
If the condition of Step S1306 is not satisfied, the region heap 92 satisfies the heap condition. In this case, the region manager 114 finishes the subroutine to proceed to lowest branch check of the process-target region which is next processing (S102) illustrated in
d′=√{square root over ((x−x′)2+(y−y′)2)}{square root over ((x−x′)2+(y−y′)2)} (2)
In Equation (2), the point distance d′ is a Euclidian distance between a query point (x, y) and a process-target point (x′, y′). In a real environment, calculation of a square root may greatly affect a delay of search time. In this case, k-nearest neighbor search may be executed while the point distance d′ and the region distance d are kept squares. The point distance calculator 115 finishes the subroutine to proceed to next processing of recording a process-target point as a point in the point heap 91 (S107) illustrated in
If the condition of Step S1502 is satisfied, the query number of the point heap 91 is satisfied. In this case, the point manager 116 judges whether the point distance d′ of the process-target point is shorter than a point distance d′ of a point farthest from the query point in the point heap 91 (S1503). The point farthest from the query point in the point heap 91 is a root element of the point heap 91. If the condition is satisfied, the process-target point replaces the root element of the point heap 91 as a new point.
In Steps S1504 to S1507, the point manager 116 records the process-target point in the point heap 91 while satisfying the heap condition. On the other hand, if the condition of Step S1503 is not satisfied, the process-target point is not set as a point. In this case, the point manager 116 proceeds to Step S1508.
If the condition of Step S1502 is not satisfied, the point heap 91 does not satisfy the query number. In this case, the point manager 116 is presupposed to insert the element of the process-target point into the point heap 91. In Steps S1510 to S1512, the point manager 116 stores the element of the process-target point in the point heap 91 while satisfying the heap condition.
The point manager 116 proceeds to Step S1508 to judge whether all points of the nearest region have been checked to be points. If the condition is satisfied, the point manager 116 finishes the subroutine to proceed to termination check (S108). On the other hand, if the condition of Step S1508 is not satisfied, the point manager 116 proceeds to Step S1405 illustrated in
On the other hand, if the condition of Step S1601 is satisfied, the termination checker 117 judges whether a region distance d of the root element in the region heap 92 is longer than a point distance d′ of the root element of the point heap 91 (S1602). If the condition is satisfied, the search processing is finished. A termination condition is that the region distance d of the root element in the region heap 92 is longer than the point distance d′ of the root element of the point heap 91. This condition means that no points to be set as points can be found even if points in a divided region present within a wider range are checked. The termination condition checking can be realized by the number of searching times O(1) in the point heap 91 because a heap structure is used for storing regions and points. If the condition of Step S1602 is satisfied, the termination checker 117 selects acquisition of a search result illustrated in
First, the result manager 118 inserts the root element of the point heap 91 into a tail of a result list 93 (S1701). The result list 93 is a one-dimensional array capable of storing the same element as that of the point heap 91, and an area allocated beforehand in the memory 2 by the result manager 118. The result manager 118 deletes the root element from the point heap 91 (S1702). Then, the result manager 118 judges whether any element is present in the point heap 91 (S1703). If the condition is satisfied, the result manager 118 reconstructs the point heap 91 after the deletion of the root element in Steps S1704 to S1707. If the condition of Step S1703 is not satisfied, the result manager 118 obtains points one by one from a tail to a head of the result list 93 (S1708). The point is a record containing a coordinate 512 and any number of attributes to be managed on the spatial database 100.
Thus, in this processing, first, heaps are sorted in descending order of point distances d′, and results thereof are stored sequentially from the head of the result list 93 constituted by the one-dimensional array. By obtaining points from the tail to the head of the one-dimensional array, points are obtained in ascending order of distances from the query point. The number of searching times until a result is obtained can be reduced to O(N·log N), where N denotes the number of elements of points, O(N·log N) denotes the number of searching times of heap sort. After the result manager 118 finishes Step S1708, the k-nearest neighbor search is finished.
Referring to
When k-nearest neighbor search is executed, first, the initial setting manager 111 sets a nearest region to 0. The region distance calculator 113 calculates region distances d of child regions 1, 2, 3 and 4 of the region 0, and stores pieces of region information of the child regions in the region heap 92 ((1) of
Through similar processing thereafter, storage situations of the region heap 92 and the point heap 91 are as shown in (11) of
A medium recording a k-nearest neighbor search program having a function similar to that of the first embodiment and a k-nearest neighbor search device are included within this invention. The same holds true for second and third embodiments described below.
According to the first embodiment of this invention, the number of disk accesses to the spatial index 101 during search execution can be reduced, and thus search time can be shortened as compared with the k-nearest neighbor search using the range search. Moreover, according to the first embodiment, only minimum necessary points are targets of distance calculation. Thus, a calculation load of the CPU 3 can be reduced as compared with the conventional k-nearest neighbor search.
The second embodiment is an extension of the first embodiment and is designed to realize k-nearest neighbor search considering attribute search for solving the above-mentioned third problem.
Specifically, the point table 501 of
Processing of the second embodiment basically conforms to the flowchart of
In Step S2304, a point distance calculator 115 judges whether a point type of a process-target point matches the point type set in Step S2201 of the initial setting manager. If a result of judgment is NO, the process proceeds to Step S2307. Through Step S2304, in the second embodiment, only points matching a designated attribute value can be obtained. In
According to the second embodiment of this invention, even when k-nearest neighbor search and attribute search are combined, search execution time can be shortened as compared with the conventional k-nearest neighbor search. It is because in the second embodiment, the k-nearest neighbor search can be executed only by searching of a spatial index 101. According to the second embodiment, k points matching a designated attribute value can be accurately obtained. It is because in the second embodiment, in a point table 2101 of search targets, k-nearest neighbor search can be executed for one row including point information and attribute information.
The third embodiment is an extension of the first embodiment and is designed to provide nearest neighbor search for solving the above-mentioned fourth problem. Processing of the third embodiment basically conforms to the flowchart of
The initial setting manager 111 sets a maximum distance value (referred to as query distance hereinafter) from the query point and, even if the query number is not reached at time when no point equal to or shorter than the query distance can be found, the search processing is finished. The extension of the third embodiment to the first embodiment can be similarly applied to the second embodiment, and the fourth problem can be solved. In the third embodiment, the processing contents of search initial setting S101, Step S103 of calculation of the region distance d, and Step S106 of calculation of the point distance d′ of
Processing (S2605) of judging whether a point distance d′ is equal to or shorter than the query distance is added after calculation of the point distance d′ of a process-target point of Step S2604. If the condition is satisfied, the process proceeds to recording processing of a point (S107). If the condition is not satisfied, the process-target point is not included in a search result. Thus, unless all points have been processed (NO in Step S2607), the process proceeds to calculation of a point distance d′ of the next point. Steps S2601 to S2604 and S2606 are similar to Steps S1401 to S1404 and S1405 of
According to the third embodiment of this invention, neighborliness of an execution result of k-nearest neighbor search can be guaranteed. It is because the third embodiment enables setting of an interruption condition as one of search conditions. The interruption condition is, for example, a maximum distance from the query point or maximum processing time which can be expended for search processing.
A fourth embodiment is directed to a query language or a function described when k-nearest neighbor search is executed by a DBMS 8. The fourth embodiment is equivalent to, for example, SQL when an application developer requests k-nearest neighbor search to the DBMS 8 from an application program. The fourth embodiment will be described below by using the SQL which is a general query language of the database management system 8.
SQL of 2701 in
SQL of 2702 in
SQL of 2703 in
SQL of 2704 in
According to the fourth embodiment of this invention, the application program developer can describe k-nearest neighbor search in SQL. Thus, a program developing period of time can be shortened.
Each of the four embodiments has been directed to the k-nearest neighbor search method targeting points in a geographical space. However, application of this invention is not limited to the points in the geographical space. This invention can also be applied to data representable by feature vectors such as television programs or pieces of music, and similarity search can be realized. For example, in the case of television programs, serious and variety axes are set in a first feature vector while fiction and nonfiction axes are set in a second feature vector. Television programs are correlated with a two-dimensional space including those axes and, if a program of a short distance from a designated point can be searched, a television program having a similar feature can be searched.
Each of the embodiments has been described by way of example in which this invention is applied to the car navigation device 1. However, this invention can also be applied to an embedded device such as a PND, a portable phone, or a portable game machine. For the spatial database 100, points of a two-dimensional or three-dimensional space can be search targets.
Each of the embodiments has been described by way of example in which a quadtree is applied to the spatial index 101. However, other spatial indexes such as an R-tree may be applied.
Each of the embodiments has been described by way of example in which the points are represented by the coordinates in the two-dimensional space. However, this invention may be applied to points represented by multidimensional coordinates. For example, when multidimensional points are three-dimensional and locatable by one coordinate in a three-dimensional space, the points are divided into a plurality of regions by one of an octree and an R-tree of the spatial index 101, and the search processing described above can be carried out.
As described above, this invention can be applied to a computer system which uses the spatial database, especially an embedded device such as a car navigation device.
While the present invention has been described in detail and pictorially in the accompanying drawings, the present invention is not limited to such detail but covers various obvious modifications and equivalent arrangements, which fall within the purview of the appended claims.
Hayashi, Hideki, Kimura, Kouji, Ito, Daisuke, Tanizaki, Masaaki, Kajiyama, Hisanori
Patent | Priority | Assignee | Title |
10095724, | Aug 09 2017 | THE FLORIDA INTERNATIONAL UNIVERSITY BOARD OF TRUSTEES | Progressive continuous range query for moving objects with a tree-like index |
10200814, | Apr 24 2018 | THE FLORIDA INTERNATIONAL UNIVERSITY BOARD OF TRUSTEES | Voronoi diagram-based algorithm for efficient progressive continuous k-nearest neighbor query for moving objects |
10628452, | Aug 08 2016 | International Business Machines Corporation | Providing multidimensional attribute value information |
10698912, | Feb 13 2015 | International Business Machines Corporation | Method for processing a database query |
10713254, | Aug 08 2016 | International Business Machines Corporation | Attribute value information for a data extent |
9116961, | May 06 2011 | Fujitsu Limited | Information processing device, information processing system and search method |
9128969, | Jun 04 2012 | SAP SE | Columnwise storage of point data |
9465835, | Jun 25 2012 | SAP SE | Columnwise spatial aggregation |
9547543, | Jan 27 2014 | International Business Machines Corporation | Detecting an abnormal subsequence in a data sequence |
9552243, | Jan 27 2014 | International Business Machines Corporation | Detecting an abnormal subsequence in a data sequence |
9953065, | Feb 13 2015 | International Business Machines Corporation | Method for processing a database query |
9959323, | Feb 13 2015 | International Business Machines Corporation | Method for processing a database query |
Patent | Priority | Assignee | Title |
6137493, | Oct 16 1996 | Kabushiki Kaisha Toshiba | Multidimensional data management method, multidimensional data management apparatus and medium onto which is stored a multidimensional data management program |
6834278, | Apr 05 2001 | Thothe Technologies Private Limited | Transformation-based method for indexing high-dimensional data for nearest neighbour queries |
6879980, | Jun 29 2001 | Oracle International Corporation | Nearest neighbor query processing in a linear quadtree spatial index |
7007019, | Dec 21 1999 | MATSUSHITA ELECTRIC INDUSTRIAL CO , LTD | Vector index preparing method, similar vector searching method, and apparatuses for the methods |
7080065, | Jun 22 2001 | Oracle International Corporation | Query pruning using interior rectangles in an R-tree index |
7167856, | May 15 2001 | Method of storing and retrieving multi-dimensional data using the hilbert curve | |
7181467, | Mar 27 2003 | Oracle International Corporation | Delayed distance computations for nearest-neighbor queries in an R-tree index |
7185023, | Jun 22 2001 | Oracle International Corporation | Query pruning using interior circles for geodetic data in an R-tree index |
7216129, | Feb 15 2002 | RAKUTEN GROUP, INC | Information processing using a hierarchy structure of randomized samples |
7219108, | Jun 22 2001 | Oracle International Corporation | Query prunning using exterior tiles in an R-tree index |
7239989, | Jul 18 2003 | Oracle International Corporation | Within-distance query pruning in an R-tree index |
7379936, | Jun 22 2001 | Oracle International Corporation | Pruning of spatial queries on geodetic data when query window has holes |
7428541, | Dec 19 2002 | International Business Machines Corporation | Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface |
7644105, | Nov 08 2006 | Palo Alto Research Center Incorporated; Palo Alto Resarch Center Incorporated | Systems and methods for structured variable resolution information dissemination and discovery |
7899230, | Jan 08 2007 | SIEMENS HEALTHINEERS AG | System and method for efficient real-time technique for point localization in and out of a tetrahedral mesh |
20020123987, | |||
20080016037, | |||
20100169323, | |||
JP2003242151, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 13 2009 | TANIZAKI, MASAAKI | HITACHI SOFTWARE ENGINEERING CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 13 2009 | ITO, DAISUKE | HITACHI SOFTWARE ENGINEERING CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 13 2009 | HAYASHI, HIDEKI | Hitachi, LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 13 2009 | ITO, DAISUKE | Hitachi, LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 13 2009 | TANIZAKI, MASAAKI | Hitachi, LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 13 2009 | HAYASHI, HIDEKI | HITACHI SOFTWARE ENGINEERING CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 14 2009 | KIMURA, KOUJI | HITACHI SOFTWARE ENGINEERING CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 14 2009 | KIMURA, KOUJI | Hitachi, LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 15 2009 | KAJIYAMA, HISANORI | HITACHI SOFTWARE ENGINEERING CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 15 2009 | KAJIYAMA, HISANORI | Hitachi, LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022182 | /0087 | |
Jan 30 2009 | Hitachi, Ltd. | (assignment on the face of the patent) | / | |||
Jan 30 2009 | HITACHI SOLUTIONS, LTD. | (assignment on the face of the patent) | / | |||
Oct 01 2010 | HITACHI SOFTWARE ENGINEERING CO , LTD | HITACHI SOLUTIONS, LTD | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 027132 | /0847 |
Date | Maintenance Fee Events |
Jan 24 2013 | ASPN: Payor Number Assigned. |
Jun 17 2015 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jun 20 2019 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Aug 21 2023 | REM: Maintenance Fee Reminder Mailed. |
Feb 05 2024 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jan 03 2015 | 4 years fee payment window open |
Jul 03 2015 | 6 months grace period start (w surcharge) |
Jan 03 2016 | patent expiry (for year 4) |
Jan 03 2018 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jan 03 2019 | 8 years fee payment window open |
Jul 03 2019 | 6 months grace period start (w surcharge) |
Jan 03 2020 | patent expiry (for year 8) |
Jan 03 2022 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jan 03 2023 | 12 years fee payment window open |
Jul 03 2023 | 6 months grace period start (w surcharge) |
Jan 03 2024 | patent expiry (for year 12) |
Jan 03 2026 | 2 years to revive unintentionally abandoned end. (for year 12) |