This invention discloses an image retrieval apparatus. The image retrieval apparatus comprises an unlabelled image selector for selecting one or more unlabelled image(s) from an image database; and a main learner for training in each feedback round of the image retrieval, estimating relevance of images in the image database and a user's intention, and determining retrieval results, wherein the main learner makes use of the unlabelled image(s) selected by the unlabelled image selector in the estimation. In addition, the image retrieval apparatus may also include an active selector for selecting, in each feedback round and according to estimation results of the main learner, one or more unlabelled image(s) from the image database for the user to label.

Patent
   RE47340
Priority
Apr 25 2007
Filed
Apr 30 2015
Issued
Apr 09 2019
Expiry
Apr 24 2028
Assg.orig
Entity
Large
0
15
currently ok
20. An image retrieval method comprising:
estimating a relevance of images in an image database and a user's intention to select one or more unlabelled images from the image database where the one or more unlabelled images selected are one or more images most irrelevant to the user's intention; and
training in a round of a retrieval feedback, estimating relevance of images in the image database and the user's intention, and determining retrieval results by a main learner, wherein the main learner uses the selected one or more unlabelled images; and
selecting, in each feedback round and according to estimation results of the main learner, one or more unlabelled images from the image database for the user to label;
wherein the selecting includes: calculating information capacity of each of the one or more unlabelled images in the image database, and selecting the one or more unlabelled images to be provided to the user to label in accordance with calculation results, and
generating image samples for a user to label, the image samples determined to be most important to a user based on the user's intention, wherein the estimating builds a classifier to select a highest number of negative samples of the one or more unlabelled images.
0. 21. A method of an image retrieval apparatus which includes a feedback analysis unit, a user interface unit and an active selector, the method comprising:
estimating a relevance of images in an image database and a user's intention to select one or more unlabelled images from the image database where the one or more unlabelled images selected are one or more images most irrelevant to the user's intention;
training in a round of a retrieval feedback, estimating relevance of images in the image database and the user's intention, and determining retrieval results by a main learner, wherein the main learner uses the selected one or more unlabelled images; and
selecting, in each feedback round and according to estimation results of the main learner, one or more unlabelled images from the image database for the user to label;
wherein the selecting includes: calculating representativeness of each of the unlabelled images in the image database, and selecting one or more unlabelled images to be provided to the user to label in accordance with calculation results,
wherein the estimating builds a classifier to select a highest number of negative samples of the one or more unlabelled images.
0. 22. An image retrieval apparatus which includes a feedback analysis unit, a user interface unit and an active selector, the image retrieval apparatus comprising:
an unlabelled image selector to estimate a relevance of images in an image database and a user's intention to select one or more unlabelled images from the image database, where the one or more unlabelled images selected are one or more images most irrelevant to the user's intention,
a main learner to train in a round of a retrieval feedback, estimating relevance of images in the image database and the user's intention, and determining retrieval results, wherein the main learner uses the one or more unlabelled images selected by the unlabelled image selector in the estimating; and
the active selector to select, in each feedback round and according to estimation results of the main learner, one or more unlabelled images from the image database for the user to label, the active selector including an information capacity measurement calculation unit and a selection unit,
wherein the information capacity measurement calculation unit calculates the information capacity of each of the one or more unlabelled images in the image database, and the selection unit selects the one or more unlabelled images to be provided to the user to label in accordance with calculation results of the information capacity measurement calculation unit, and
wherein the information capacity measurement calculation unit calculates the information capacity of an image by calculating a distance between the image and a current classification boundary determined by the main learner.
1. An image retrieval apparatus comprising:
a user interface unit;
a feedback analysis unit, the feedback analysis unit including:
an unlabelled image selector to estimate a relevance of images in an image database and a user's intention to select one or more unlabelled images from the image database, and
a main learner to train in a round of a retrieval feedback, estimating relevance of images in the image database and a the user's intention, and determining retrieval results, wherein the main learner uses the one or more unlabelled images selected by the unlabelled image selector in the estimating; and
an active selector to select, in each feedback round and according to estimation results of the main learner, one or more unlabelled images from the image database for the user to label, and
wherein the active selector includes an information capacity measurement calculation unit and a selection unit, wherein the information capacity measurement calculation unit calculates information capacity of each of the one or more unlabelled images in the image database, and the selection unit selects the one or more unlabelled images to be provided to the user to label in accordance with calculation results of the information capacity measurement calculation unit,
a user interface unit, wherein the user interface unit being is used to interact with a user of the image retrieval apparatus, and a retrieval result being presented to the user via the user interface unit in the round of the retrieval feedback, and
wherein the one or more unlabelled images selected by the unlabelled image selector are one or more images most irrelevant to the user's intention.
2. The An image retrieval apparatus according to claim 1, comprising:
a user interface unit;
a feedback analysis unit, the feedback analysis unit including:
an unlabelled image selector to estimate a relevance of images in an image database and a user's intention to select one or more unlabelled images from the image database, and
a main learner to train in a round of a retrieval feedback, estimating relevance of images in the image database and the user's intention, and determining retrieval results, wherein the main learner uses the one or more unlabelled images selected by the unlabelled image selector in the estimating, and
an active selector to select, in each feedback round and according to estimation results of the main learner, one or more unlabelled images from the image database for the user to label, and
wherein the active selector includes a representativeness measurement calculation unit and a selection unit, wherein the representativeness measurement calculation unit calculates representativeness of each of the unlabelled images in the image database, and the selection unit selects one or more unlabelled images to be provided to the user to label in accordance with calculation results of the representativeness measurement calculation unit,
wherein the user interface unit is used to interact with a user of the image retrieval apparatus, and a retrieval result being presented to the user via the user interface unit in the round of the retrieval feedback, and
wherein the one or more unlabelled images selected by the unlabelled image selector are one or more images most irrelevant to the user's intention.
3. The image retrieval apparatus according to claim 1, wherein the unlabelled image selector uses a method different from a method used by the main learner to calculate the relevance of the images in the image database and the user's intention.
4. The image retrieval apparatus according to claim 1, wherein the unlabelled image selector uses a weighted Euclidean distance model to select the one or more unlabelled images.
5. The image retrieval apparatus according to claim 4, wherein the unlabelled image selector uses the weighted Euclidean distance model to generate a new query point in accordance with a labeled image and a query image inputted by the user, and takes the weighted Euclidean distance between the one or more unlabelled images in the image database and the new query point as estimation results of each unlabelled image of the one or more unlabelled images in the image database.
6. The image retrieval apparatus according to claim 4, wherein the unlabelled image selector uses the weighted Euclidean distance model and adopts an inverse of a variance of values of each component of an eigenvector for an image of positive examples as its weight.
7. The image retrieval apparatus according to claim 4, wherein the unlabelled image selector only uses a predetermined number of images most irrelevant to the user's intention outputted by the weighted Euclidean distance model as images of negative examples to output to the main learner.
8. The image retrieval apparatus according to claim 4, wherein the unlabelled image selector only takes images outputted according to the weighted Euclidean distance model and having a distance greater than a predetermined distance from the a query point as images of negative examples to output to the main learner.
9. The image retrieval apparatus according to claim 1, wherein the main learner re-trains in each of a plurality of feedback rounds in accordance with the following images: labeled images, query images inputted by the user, and images of negative examples outputted by the unlabelled image selector.
0. 10. The image retrieval apparatus according to claim 2, wherein the active selector includes an information capacity measurement calculation unit and a selection unit,
wherein the information capacity measurement calculation unit calculates the information capacity of each of the one or more unlabelled images in the image database, and the selection unit selects the one or more unlabelled images to be provided to the user to label in accordance with calculation results of the information capacity measurement calculation unit.
0. 11. The image retrieval apparatus according to claim 2, wherein the active selector includes a representativeness measurement calculation unit and a selection unit, wherein the representativeness measurement calculation unit calculates the representativeness of each of the unlabelled images in the image database, and the selection unit selects one or more unlabelled images to be provided to the user to label in accordance with calculation results of the representativeness measurement calculation unit.
12. The image retrieval apparatus according to claim 11 2, wherein the active selector includes an information capacity measurement calculation unit, wherein the information capacity measurement calculation unit calculates the information capacity of each of the unlabelled images in the image database, and the selection unit selects unlabelled images to be provided to the user to label in accordance with calculation results of the information capacity measurement calculation unit and calculation results of the representativeness measurement calculation unit.
13. The image retrieval apparatus according to claim 10 1, wherein the information capacity measurement calculation unit calculates the information capacity of an image by calculating a distance between the image and a current classification boundary determined by the main learner.
14. The image retrieval apparatus according to claim 11 2, wherein the representativeness measurement calculation unit includes:
an image collector to determine images to be clustered;
a clustering device to partition the images determined by the image collector into a plurality of clusters; and
an image selector to select most representative images from the clusters obtained from the clustering device.
15. The image retrieval apparatus according to claim 14, wherein the images collected by the image collector include: query images inputted by the user, unlabelled images closer to the current classification boundary determined by the main learner, and labeled images.
16. The image retrieval apparatus according to claim 14, wherein the clustering device uses a normalized cut method to partition the images.
17. The image retrieval apparatus according to claim 14, wherein the clustering device adaptively determines a number of clusters, that is, a number of clusters not containing the labeled images and query images is made to equal a number of images returned to the user to label.
18. The image retrieval apparatus according to claim 14, wherein the image selector selects representative images only from the clusters not containing the labeled images and query images.
19. The image retrieval apparatus according to claim 18, wherein the representative images are selected from negative samples of the one or more unlabelled images.

This application

In this equation, p(C|x)=1−p(C|x) indicates the irrelevant probability of x with regard to C. When p(C|x)=0.5, the value of the entropy En(x) is at maximum, indicating that the attribute of the image x is indefinite, that is, it is totally impossible to determine whether x is relevant or irrelevant to C. In addition, the closer the value of p(C|x) is to 0.5, the greater the value of En(x) is. It should be noted that the value of the entropy En(x) varies with the p(C|x) as follows: when p(C|x) increases from 0 to 1, the value of the entropy En(x) first increases and then decreases. When p(C|x)=0.5, the value of the entropy En(x) is at maximum. If the p(C|x) acquired from the classifier (main learner 2) genuinely represents the probability as noted above, 0.5 should be used as the classification boundary. Of course, what is outputted by some classifiers (main learner 2) is not the genuine probability, and the classification boundary may therefore not be 0.5.

Suppose p(C|x)=0.5 is regarded as the classification boundary of a main classifier, the strategy of selecting images in accordance with the information capacity measurement becomes as follows: the closer an image approaches the classification boundary, the larger the information capacity of this image is, and the greater the chance for this image to be selected will be.

For instance, if an SVM classifier is used as the main learner, certain existing methods can be employed to carry out the information capacity measurement. As one of the existing methods, [non-patent document-5] describes a strategy based on solution space. This method regards image selection as a searching problem for solution space. A better way to carry out the strategy is to select an image capable of halving the solution space, so as to quickly decrease the size of the solution space. In practical operation, the distance between each image and the center of the solution space can be checked: the closer an image is distanced from the center of the solution space, the higher the possibility whereby the image halves the solution space will be. The center of the solution space can be approximated by the classification boundary of the SVM classifier, so that the aforementioned strategy is changed to select the image that is distanced closest to the current classification boundary. Therefore, the strategy as described in [non-patent document-5] is consistent with the measurement based on the entropy according to this invention.

1.2 Representativeness Measurement

The representativeness measurement calculation unit 302 calculates the representativeness measurement. In one embodiment of this invention, a clustering process is employed to decrease the information redundancy of the selected images, so as to enhance the representativeness of the selected images. In each round of feedback, a non-supervised clustering process is first employed to cluster the unlabelled images in the vicinity of the classification boundary of the main learner and the images in the training data pool, and one image is then selected from each cluster that does not contain the already labeled images for the user to label. Through this process, each of the selected images represents a small eigenspace, and they therefore have stronger representativeness. Since both the clustering and the selection are directed to the images in the vicinity of the classification boundary, the information capacities of these images are ensured.

FIG. 5 is a basic diagram showing the representativeness measurement. Being a classification diagram of two classes of data, FIG. 5 provides the basic principle of the representativeness measurement, wherein the two classes of data are indicated by “+” and “∘”, respectively. In FIG. 5, the dashed line represents the current classification boundary of the main learner; as can be seen, a serious classification error occurs, that is to say, some samples labeled with “∘” are classified into the other class. The image collector in the active selector 3 first selects the data distanced closer to the classification boundary, as shown by the rectangular with dotted-lines in the figure; subsequently, the clustering device in the active selector 3 clusters these data by means of a non-supervised clustering process, whereby altogether four clusters are obtained, as shown by the ellipse with dotted-lines in FIG. 5; and finally, the image selector in the active selector 3 selects one representative data, such as the one shown by the solid triangle in the figure, from each cluster acquired by the clustering device.

In the selection of the data distanced closer to the classification boundary, the number of the images to be selected is first determined. This number is determined according to practical application, and generally determined in advance according to experience. For instance, the number may be 100, 150, and 180, etc. Of course, if the number is too large, the information capacity measurement might be decreased, whereas if the number is too small, the representativeness measurement might be decreased. Moreover, this number is also associated with the number of the images in the image database. When the number is determined, the selection is made relatively easy. For instance, if what is outputted by the main learner is the probability, 0.5 is the classification boundary, so that the closer the probability of an image approaches 0.5, the more is the likelihood that this image should be selected. In other words, the value difference between the probability of each image and 0.5 is calculated, and those portions with less value difference are selected.

Certain existing methods can be employed to carry out the aforementioned clustering process, such as the Normalized Cut Method (see [non-patent document-10]), the K-Mean Method, and the Hierarchical Clustering Method (for which refer to [non-patent document-11]).

The normalized cut method is briefly discussed below.

Given the images to be clustered, an undirected total connection graph G=(V,E) is first constructed, where each node of the graph corresponds to one image. The weighted value of edges represents the similarity between two nodes of the graph; given the nodes i and j of the graph, the weighted value wij of the edge connecting these two nodes is defined as follows:
wij=e−(d(i,j)/σ)2   [2]
where d(i,j) denotes the Euclidean distance between images i and j, σ is a scaling parameter, which can be set to 10 to 20 percent of the maximal distance between the images.

Generally speaking, the objective of the normalized cut method is to organize the nodes into two groups, so that the similarity inside one group is relatively high, whereas the similarity between groups is relatively low. Suppose the nodes in the graph G are organized into two disjoint collections A and B, and let it be satisfied that A∪B=V and A∩B=Ø, the success of the grouping can be measured by means of the total weighted value of the edges connecting the two collections. This is referred to as “cut” in the graph theory:

cut ( A , B ) = u A , v B w uv [ 3 ]
where cut(A,B) is the total weighted value of the edges connecting the nodes in the collection A and the nodes in the collection B. To avoid biased results, success of the normalized cut method is measured by the following measurement:

Ncut ( A , B ) = cut ( A , B ) assoc ( A , V ) + cut ( A , B ) assoc ( B , V ) [ 4 ]
where assoc(A,V)=ΣuϵA,tϵVwut is the total weighted value of the connection between the nodes in the collection A and all of the nodes in the graph, and definition to assoc(B,V) is similar thereto.

The smaller the value of Ncut is, the better the corresponding grouping will be. The smallest value of Ncut can be acquired by solving the problem of the following extended eigenvalue.
(D−W)y=λDy   [5]
where W is a symmetric matrix with W(i, j)=wij, and D is a diagonal matrix with d(i)=Σjwij. In addition, λ is a publicly known parameter in solving the problem of the eigenvalue, and no detailed explanation is made thereto in this paper. After solving the problem of the aforementioned extended eigenvalue, the eigenvector to which the second smallest eigenvalue corresponds is the optimal grouping.

In this invention the average value of the eigenvector to which the second smallest eigenvalue corresponds is used as the cutting point. In the eigenvector, if the value of a certain element is greater than the cutting point, the image (node) corresponding thereto is grouped in the collection A; otherwise, the image (node) corresponding thereto is grouped in the collection B.

The clustering process is achieved by recursively performing the normalized cut, namely cutting the collections of nodes into smaller sub-collections by continuously applying the normalized cut. However, two issues should be addressed during the process: (1) which sub-collection should be operated during each application of the normalized cut? And (2) how is the number of the clusters determined, in other words, when is the clustering process stopped?

As regards the first problem, this invention employs a simple method, namely selecting the sub-collection having the greatest number of nodes to group each time.

As regards the second problem, the following strategy is employed to control the clustering process: the clustering and cutting process is repeated, until the number of clusters not containing the already labeled images and the enquiry image(s) equals to the number of the images returned to the user for labeling.

Moreover, the clusters containing the already labeled images or the enquiry image(s) are regarded to be capable of represented by the already labeled images or the enquiry image(s) contained therein, so that this invention merely selects the representative images from the clusters not containing the already labeled images and the enquiry image(s), and returns them to the user for labeling.

In each round of feedback, the aforementioned clustering process can be summarized as follows:

Let I=IN+IL+IQ represent images participating the clustering, where IN denotes N number of images distanced closest to the current classification boundary of the main learner, IL denotes the images in the training data pool, IQ denotes the enquiry image, and T denotes the number of the images returned to the user for labeling.

Suppose c={xi, . . . xM} represents a cluster consisting of M number of images acquired by the aforementioned process, and does not contain the already labeled images and the enquiry image. The representativeness measurement of each image in the cluster is as follows:
Rep(xi)=Σjϵcwij   [6]

The information capacity measurement of each image is as follows:
Inf(xi)=|En(xi)|  [7]

If the main learner employs the support vector machine (SVM) as the classifier, the information capacity measurement can also be denoted as follows:

Inf(xi)=|g(xi)|, where g(xi) represents the predictive output of the SVM classifier with regard to the image.

The two measurements are integrated together to obtain the final score of the image xi:
s(xi)=λInf(xi)+(1−λ)Rep(xi)  [8]
where, the parameter λ controls the contribution ratio of the two measurements.

The equation [8] is used to calculate the final scores of all images in the cluster c, and the image having the highest score is selected to be returned to the user for labeling.

The aforementioned weighting method is only exemplary in nature, as other methods can also be employed, for instance, by multiplying the values of the two measurements and adding the weighted values after squaring them, etc.

2. Unlabelled Image Selector

The unlabelled image selector is explained in details below with reference to FIGS. 6 and 7. FIG. 6 is a diagram showing an embodiment of the unlabelled image selector according to this invention, and FIG. 7 is a diagram showing another embodiment of the unlabelled image selector according to this invention.

As shown in FIG. 6, the unlabelled image selector includes a calculation unit 401 and a determination unit 402. In each round of feedback, the calculation unit 401 of the unlabelled image selector 4 employs a classifier (algorithm) different from the one employed by the main learner to calculate the degree of relevance between the unlabelled images in the image database and the user's intention. The determination unit 402 selects the most irrelevant unlabelled images as negative samples to provide to the main learner in accordance with the calculation result of the calculation unit 401.

The determination unit 402 may select a predetermined number of unlabelled images that are most irrelevant to the user's intention as negative samples to provide to the main learner. As an alternative, it is also possible to select the unlabelled images whose degree of irrelevance to the user's intention is greater than a predetermined threshold value as negative samples to provide to the main learner.

As should be noted, in one embodiment of this invention, the calculation unit 401 performs calculation in each round, thereby enhancing reliability. However, it is not necessary to recalculate and determine each time the unlabelled images to be provided to the main learner as negative samples. For example, the calculation and determination may also be performed only once, but the performance will deteriorate as a result. It is also possible to perform the calculation and determination in predetermined rounds of feedback (for instance, every 5 rounds or every 10 rounds). It is still possible to perform the calculation and determination according to a certain algorithm (for instance, the intervals for performing the calculation and determination become increasingly greater or lesser with the increase in the rounds of feedback). Although the performance will be even worse as a result, the speed is increased thereby. Under the circumstance the rounds of feedback are taken into consideration, as shown in FIG. 7, the unlabelled image selector further includes a counter 403, to count the rounds of feedback and input the rounds as counted into the calculation unit 401, which determines whether to perform the calculation based on the rounds.

When the main learner is retrained during the feedback process, the unlabelled images selected also participate therein as negative samples.

The operating processes of the calculation unit 401 and the determination unit 402 are explained in details in the following paragraphs.

Let L and U denote respectively the unlabelled images in the training data pool and those in the image database, of which the training data pool contains the already labeled images in the image database and the inputted enquiry image. At the beginning of the retrieval, L merely contains the enquiry image inputted by the user, while U contains all images in the image database. During the subsequent feedback process, images newly labeled by the user are incessantly added into L. In each round of feedback (or in other embodiments, in the feedback requiring calculation), the calculation unit 401 estimates, in accordance with the images in L, the images in U by employing a classifier (algorithm) different from the one employed by the main classifier. Finally, the determination unit 402 selects the images that are most irrelevant to the user's intention according to certain rules (as those discussed above) and outputs them to the main learner, and these images augment the training data collection of the main learner as negative samples.

However, as should be noted here, the training pool images are consistent with the images used by the calculation unit 401 for training, but they are not completely consistent with the images used by the main learner for training. The images used by the main learner for training further include the unlabelled images outputted by the unlabelled image selector.

To satisfy the real-time requirement of image retrieval, in one embodiment of this invention the calculation unit employs a simple model to perform the calculation. This model is the weighted Euclidean distance model. Be noted that the algorithm (or model) is only exemplary in nature, as such other methods as the aforementioned Bayesian classifier or BP neuronetwork classifier, etc., can also be employed, insofar as they are different from the classifier employed by the main learner.

Refer to the following document for the weighted Euclidean distance model: Yong Rui, Thomas S. Huang, Michael Ortega, Sharad Mehrotra, Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval, IEEE Transactions on Circuits and Video Technology, Vol. 8, No. 5, pp. 644-655, 1998. Brief introduction follows.

As shown below:
f(x)=dist(x,q)=(Σi=1d(xi−qi)2*wi)1/2   [9]

where x and q are respectively an unlabelled image and an enquiry point, which is an eigenvector used to substitute for the eigenvector to which the enquiry image corresponds so as to better describe the user's intention. Both of them are expressed as d-dimensional vectors; dist(x,q) denotes the distance between the vectors x and q. wi is the weighted value assigned to each feature component, and f(x) denotes the estimation result of the classifier with regard to the unlabelled image x, namely the distance between x and the user's intention.

Let P represent the already labeled images of positive samples and the enquiry image inputted by the user, and N represent the already labeled images of negative samples, therefore, P∩N=Ø, L=P∪N. Based on these samples, the enquiry point q is as follows:

q = 1 P x k P x k - 1 N x k N [ 10 ]

The weighted value wi of each feature component reflects the influence of this feature component on the aforementioned algorithm (classifier). As regards a certain feature component, if all images of positive samples have similar values, it is indicated that this feature component excellently captures the user's intention, so that the feature component should be of great importance in the calculation of the distance between the query point and the unlabelled image, that is, this feature component should have made a great contribution, and should hence be assigned with a greater weighted value. To the contrary, if the difference in value corresponding to the images of positive samples is relatively large, it is indicated that this feature component does not conform to the user's intention, so that this feature component can only make lesser contribution in the calculation of the distance between the query point and the unlabelled image, and should hence be assigned with a lesser weighted value. As regards a certain feature component, an inverse number of the square deviation of the value corresponding to the images of positive samples can be used as its weighted value, with the smaller the square deviation, the greater the weighted value, and vice versa.

Let {xk, k=1, . . . , |P|} represent the eigenvector to which the images of positive samples correspond, xki, i=1, . . . , d be the eigenvector of the image xk, and σi represent the square deviation of the collection {xki, k=1, . . . , |P|}, then:

w ^ i = 1 σ i [ 11 ]

The aforementioned weighted value is normalized to obtain the final weighed value:

w i = w i d i = 1 w ^ i [ 12 ]

In the case there is only one positive sample, the square deviation is zero, at which time, each of the feature components uses the same weighted value.

Moreover, the classifying capability of the aforementioned classifier based on the weighted Euclidean distance model is relatively weak, and there usually are classification errors. To avoid this problem, a relatively conservative strategy can be employed in the embodiments of this invention, that is, the main learner merely makes use of the images with relatively small number (which can be determined in advance upon specific application and based on experience) and most irrelevant to the user's intention as outputted by the unlabelled image selector 4 to strengthen the training data collection of the main learner. In other words, the determination unit 402 merely determines the unlabelled images with relatively small number to provide to the main learner. Given the user's intention, most of the images in the image database are usually irrelevant, and only a small portion of the images is relevant. Accordingly, the images most irrelevant to the user's intention as obtained by the classifier are basically reliable.

In addition, as discussed above, it is also possible to employ another conservative strategy to further enhance the reliability, that is, the main learner only temporarily, rather than permanently, uses these unlabelled images. In other words, in each round of feedback, the calculation unit 401 bases on the images in the training data pool to dynamically perform the calculation. Consequently, the unlabelled images thus generated might be different in each round of feedback.

FIG. 8 is a flowchart showing the image retrieval method according to one embodiment of this invention.

As the retrieval task begins, the user inputs the enquiry image via the user interface to express his intention (S801). The image retrieval apparatus stores the enquiry image into the training data pool. U is used to denote the unlabelled images, and corresponds to all images in the image database when the retrieval task begins (S802).

In accordance with the images in the training data pool, the retrieval apparatus constructs a classifier f(x) based on the weighted Euclidean distance model (S803). In order to construct the classifier, it is first necessary to calculate the enquiry point q, and calculate the weighted value wi of each feature component at the same time. It has been previously described as how to calculate the enquiry point q and how to calculate the weighted value, so no repetition is made here.

Subsequently in step 804, the classifier f(x) is used to estimate the images in U, namely calculating the distance between these images and the enquiry point q. The images having the greatest distance are regarded to be images most irrelevant to the user's intention, and labeled as N*. These images will be used to strengthen the training data collection of the main learner.

Subsequently, the images in the training data pool and the negative samples most irrelevant to the user's intention as outputted from f(x) are used to retrain the main learner (S805), namely making use of L and N* to train the main learner.

The main learner is used to estimate the images in U, and the retrieval result is returned via the user interface (S806), namely rearranging the database images according to the estimation result of the main learner.

Concurrently, the unlabelled images in the vicinity of the current classification boundary are selected, and clustered together with the images in the training data collection (S807). After the clustering operation, most representative images are selected from the clusters not containing the already labeled images and the enquiry image, labeled as L*, and returned to the user for labeling (S808).

If the user is satisfied with the current retrieval result (S809, YES), the retrieval operation can be stopped; otherwise (S809, NO), the user needs to label from the images L* as outputted by the active selector and enters the next round of feedback operation.

Before a new round of feedback operation starts, images contained in L* are deleted from U, and added in L. Moreover, the information, such as positive or negative samples, labeled by the user on the images in L* are stored together into the training data pool.

Be noted that the operational flow as mentioned above is directed to the image retrieval apparatus as shown in FIG. 1. The operational flows as regards the image retrieval apparatuses as shown in FIGS. 2 and 3 vary correspondingly. Taking for example the image retrieval apparatus 1′ as shown in FIG. 2, steps 803 and 804 should be omitted, while as regards the image retrieval apparatus 1″ as shown in FIG. 3, steps 807 and 808 should be omitted.

In view of the above, the present invention provides an image retrieval apparatus, which comprises an unlabelled image selector for selecting one or more unlabelled image(s) from an image database; and a main learner for training in each feedback round of the image retrieval, estimating relevance of images in the image database and a user's intention, and determining retrieval results, wherein the main learner makes use of the unlabelled image(s) selected by the unlabelled image selector in the estimation.

In one embodiment, the image retrieval apparatus further comprises an active selector for selecting, in each feedback round and according to estimation results of the main learner, one or more unlabelled image(s) from the image database for the user to label.

In one embodiment, the unlabelled image selector uses a method different from the method used by the main learner to calculate the relevance of the images in the image database and the user's intention.

In one embodiment, the unlabelled image(s) selected by the unlabelled image selector is/are one or more image(s) most irrelevant to the user's intention.

In one embodiment, the unlabelled image selector uses a weighted Euclidean distance model to select the unlabelled image.

In one embodiment, the unlabelled image selector uses the weighted Euclidean distance model to generate a new query point in accordance with a labeled image and a query image inputted by the user, and takes the weighted Euclidean distance between the unlabelled images in the image database and the new query point as the estimation results of each unlabelled image in the image database.

In one embodiment, the unlabelled image selector uses the weighted Euclidean distance model and adopts an inverse number of a square deviation of the value of an image of positive example to which each component of an eigenvector corresponds as its weighted value.

In one embodiment, the unlabelled image selector only uses a predetermined number of images most irrelevant to the user's intention outputted by the weighted Euclidean distance model as images of negative examples to output to the main learner.

In one embodiment, the unlabelled image selector only takes images outputted according to the weighted Euclidean distance model and having a distance greater than a predetermined distance from the query point as images of negative examples to output to the main learner.

In one embodiment, the main learner re-trains in each feedback round in accordance with the following images: labeled images, query images inputted by the user, and images of negative examples outputted by the unlabelled image selector.

In one embodiment, the active selector comprises an information capacity measurement calculation unit and a selection unit, wherein the information capacity measurement calculation unit calculates the information capacity of each of the unlabelled images in the image database, and the selection unit selects the unlabelled images to be provided to the user for labeling in accordance with calculation results of the information capacity measurement calculation unit.

In one embodiment, the active selector comprises a representativeness measurement calculation unit and a selection unit, wherein the representativeness measurement calculation unit calculates the representativeness of each of the unlabelled images in the image database, and the selection unit selects the unlabelled images to be provided to the user for labeling in accordance with calculation results of the representativeness measurement calculation unit. Moreover, in one embodiment, the active selector further comprises an information capacity calculation unit.

In one embodiment, the information capacity measurement calculation unit calculates the information capacity of an image by calculating the distance between the image and a current classification boundary of the main learner.

In one embodiment, the representativeness measurement calculation unit comprises an image collector for determining the images to be clustered; a clustering device for partitioning the images determined by the image collector into a plurality of clusters; and an image selector for selecting the most representative images from the clusters obtained from the clustering device.

In one embodiment, the images collected by the image collector include query images inputted by the user, unlabelled images closer to the current classification boundary of the main learner, and labeled images.

In one embodiment, the clustering device uses a normalized cut method to accomplish the clustering process.

In one embodiment, the clustering device adaptively determines the number of clusters, that is, the number of clusters not containing the labeled images and query images is made to equal the number of images returned to the user for labeling.

In one embodiment, the image selector selects the representative images only from the clusters not containing the labeled images and query images.

In one embodiment, the present invention provides an image retrieval apparatus, which comprises a main learner for training in each feedback round of the image retrieval, estimating relevance of images in an image database and a user's intention, and determining retrieval results; an active selector for selecting, in each feedback round and according to estimation results of the main learner, one or more unlabelled image(s) from the image database for the user to label; and an output unit for outputting the retrieval results determined by the main learner and the one or more unlabelled image(s) selected by the active selector.

Moreover, in view of the above, the present invention provides an image retrieval method, which comprises an unlabelled image selecting step for selecting one or more unlabelled image(s) from an image database; and a main learning step for estimating relevance of images in the image database and a user's intention in each feedback round of the image retrieval, and determining retrieval results, wherein the unlabelled image(s) selected by the unlabelled image selecting step is/are made use of in the estimation.

In one embodiment, the image retrieval method further comprise an actively selecting step for selecting, in each feedback round and according to estimation results of the main learning step, one or more unlabelled image(s) from the image database for the user to label.

In one embodiment, the unlabelled image selecting step uses a method different from the method used by the main learning step to calculate the relevance of the images in the image database and the user's intention.

In one embodiment, the unlabelled image(s) selected by the unlabelled image selecting step is/are one or more image(s) most irrelevant to the user's intention.

In one embodiment, the unlabelled image selecting step uses a weighted Euclidean distance model to select the unlabelled image.

In one embodiment, the unlabelled image selecting step uses the weighted Euclidean distance model to generate a new query point in accordance with a labeled image and a query image inputted by the user, and takes the weighted Euclidean distance between the unlabelled images in the image database and the new query point as the estimation results of each unlabelled image in the image database.

In one embodiment, the unlabelled image selecting step uses the weighted Euclidean distance model and adopts an inverse number of a square deviation of the value of an image of positive example to which each component of an eigenvector corresponds as its weighted value.

In one embodiment, the unlabelled image selecting step only uses a predetermined number of images most irrelevant to the user's intention outputted according to the weighted Euclidean distance model as images of negative examples to output to the main learning step.

In one embodiment, the unlabelled image selecting step only takes images outputted by the weighted Euclidean distance model and having a distance greater than a predetermined distance from the query point as images of negative examples to output to the main learning step.

In one embodiment, the main learning step re-trains in each feedback round in accordance with the following images: labeled images, query images inputted by the user, and images of negative examples outputted by the unlabelled image selecting step.

In one embodiment, the actively selecting step comprises an information capacity measurement calculating step and a selecting step, wherein the information capacity measurement calculating step calculates the information capacity of each of the unlabelled images in the image database, and the selecting step selects the unlabelled images to be provided to the user for labeling in accordance with calculation results of the information capacity measurement calculating step.

In one embodiment, the actively selecting step comprises a representativeness measurement calculating step and a selecting step, wherein the representativeness measurement calculating step calculates the representativeness of the unlabelled images in the image database, and the selecting step selects the unlabelled images to be provided to the user for labeling in accordance with calculation results of the representativeness measurement calculating step. Moreover, in one embodiment, the actively selecting step further comprises an information capacity measurement calculating step, wherein the information capacity measurement calculating step calculates the information capacity of the unlabelled images in the image database, and the selecting step selects the unlabelled images to be provided to the user for labeling in accordance with calculation results of the information capacity measurement calculating step and calculation results of the representativeness measurement calculating step.

In one embodiment, the information capacity measurement calculating step calculates the information capacity of an image by calculating the distance between the image and a current classification boundary determined by the main learning step.

In one embodiment, the representativeness measurement calculating step comprises an image collecting step for determining the images to be clustered; a clustering step for partitioning the images determined by the image collecting step into a plurality of clusters; and an image selecting step for selecting the most representative images from the clusters obtained from the clustering step.

In one embodiment, the images collected by the image collecting step include query images inputted by the user, unlabelled images closer to the current classification boundary determined by the main learning step, and labeled images.

In one embodiment, the clustering step uses a normalized cut method to accomplish the clustering process.

In one embodiment, the clustering step adaptively determines the number of clusters, that is, the number of clusters not containing the labeled images and query images is made to equal the number of images returned to the user for labeling.

In one embodiment, the image selecting step selects the representative images only from the clusters not containing the labeled images and query images.

In one embodiment, the image retrieval method comprises a main learning step for training in each feedback round of the image retrieval, estimating relevance of images in an image database and a user's intention, and determining retrieval results; an actively selecting step for selecting, in each feedback round and according to estimation results of the main learning step, one or more unlabelled image(s) from the image database for the user to label; and an outputting step for outputting the retrieval results determined by the main learning step and the one or more unlabelled image(s) selected by the actively selecting step.

The aforementioned steps, elements, and component parts of the present invention are combinable to one another.

The image retrieval apparatus and image retrieval method according to the present invention can be implemented by hardware, and can also be implemented by a common computer through execution of software programs.

According to one aspect of the present invention, there is provided a computer program enabling a computer to implement the following steps: an unlabelled image selecting step for selecting one or more unlabelled image(s) from an image database; and a main learning step for estimating relevance of images in the image database and a user's intention in each feedback round of the image retrieval, and determining retrieval results, wherein the unlabelled image(s) selected by the unlabelled image selecting step is/are made use of in the estimation. Resultantly, the computer program also enables the computer to perform the various steps in the aforementioned methods.

According to another aspect of the present invention, there is provided a computer readable medium storing a computer program thereon, for enabling a computer to perform the various steps in the aforementioned image retrieval methods, according to the present invention or to implement the various functions of the aforementioned image retrieval apparatuses, according to the present invention. The computer readable medium can be a floppy disk, a CD, a tape, a magneto optical disk, a DVD, a hard disk, an RAM or an ROM, etc., or any other information recording media known in the art.

As it's apparent to a person skilled in the art, various modifications and transformations can be made to this invention, without departing from the spirit or scope of the present invention. Therefore, the present invention is directed to cover these modifications and transformations as long as they fall within the scope as claimed in the claims or analogues thereof.

Shiitani, Shuichi, Endo, Susumu, Liu, Rujie, Uehara, Yusuke, Nagata, Shigemi, Baba, Takayuki, Masumoto, Daiki

Patent Priority Assignee Title
Patent Priority Assignee Title
5678046, Nov 18 1994 The Chase Manhattan Bank, N.A. Method and apparatus for distributing files on a file storage device
6721733, Oct 27 1997 Massachusetts Institute of Technology Information search and retrieval system
6795818, Jul 05 1999 LG Electronics Inc. Method of searching multimedia data
6859802, Sep 13 1999 Microsoft Technology Licensing, LLC Image retrieval based on relevance feedback
7113944, Mar 30 2001 Microsoft Technology Licensing, LLC Relevance maximizing, iteration minimizing, relevance-feedback, content-based image retrieval (CBIR).
8086549, Nov 09 2007 Microsoft Technology Licensing, LLC Multi-label active learning
AG200592866,
20040267740,
20050131951,
20050210015,
20060010117,
20060112092,
20070244870,
20080163087,
JP200592866,
/
Executed onAssignorAssigneeConveyanceFrameReelDoc
Apr 30 2015Fujitsu Limited(assignment on the face of the patent)
Date Maintenance Fee Events
Sep 25 2020M1552: Payment of Maintenance Fee, 8th Year, Large Entity.


Date Maintenance Schedule
Apr 09 20224 years fee payment window open
Oct 09 20226 months grace period start (w surcharge)
Apr 09 2023patent expiry (for year 4)
Apr 09 20252 years to revive unintentionally abandoned end. (for year 4)
Apr 09 20268 years fee payment window open
Oct 09 20266 months grace period start (w surcharge)
Apr 09 2027patent expiry (for year 8)
Apr 09 20292 years to revive unintentionally abandoned end. (for year 8)
Apr 09 203012 years fee payment window open
Oct 09 20306 months grace period start (w surcharge)
Apr 09 2031patent expiry (for year 12)
Apr 09 20332 years to revive unintentionally abandoned end. (for year 12)