A counting device (100) provided with a subtree generating part (123) for generating first subtree comprising a first sentence and a second subtree comprising a second sentence. The counting device (100) is provided with: a categorizing part (125) for categorizing the first subtree in the same group as the second subtree when it is determined that a first expression represented by the first subtree and a second expression represented by a second subtree represent a matching content; and an output part (127) for outputting the number of subtrees categorized in the group, or an expression represented by a plurality of syntax trees or one of the subtrees categorized in the aforementioned group.
|
9. A non-transitory computer-readable recording medium on which is recorded a counting program for causing a computer to function as:
an input part for inputting a first sentence and a second sentence;
a syntax analyzing part for generating a syntax tree of the first sentence by analyzing syntax of the first sentence, and a syntax tree of the second sentence by analyzing syntax of the second sentence;
a subtree generating part for generating one or more first subtrees of the first sentence based on the syntax tree of the first sentence, and generating one or more second subtrees of the second sentence based on the syntax tree of the second sentence;
a match determination part for determining whether or not a first expression represented by a first subtree of the one or more first subtrees and a second expression represented by a second subtree of the one or more second subtrees represent matching content based on each of the one or more first subtrees and each of the one or more second subtrees;
a categorizing part for categorizing the first subtree and the second subtree into a group based on the match determination part determining one combination of the first expression and the second expression represents matching content, or respective multiple combinations of the first expression and the second expression represent matching content; and,
an output part for outputting a counted number of subtrees categorized into the group,
wherein the match determination part, for one group combination or each of multiple group combinations that are combinations of a first group and a second group into which one or multiple subtrees are respectively categorized, determines an inclusion relationship between the first group and the second group, based on height, branch number and number of phrases of subtrees belonging to the first group and height, branch number and number of phrases of subtrees belonging to the second group; and
the output part, based on the determined inclusion relationship, arranges and outputs an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the first group, and an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the second group.
10. A method executed by a counting device comprising an input part, a syntax analyzing part, a subtree generating part, a match determination part, a categorizing part and an output part, the method including:
an input step in which the input part inputs a first sentence and a second sentence;
a syntax analysis step in which the syntax analyzing part generates a syntax tree of the first sentence by analyzing syntax of the first sentence, and a syntax tree of the second sentence by analyzing syntax of the second sentence;
a subtree generation step in which the subtree generating part generates first subtrees of the first sentence based on the syntax tree of the first sentence, and generates second subtrees of the second sentence based on the syntax tree of the second sentence;
a match determination step in which the match determination part determines whether or not a first expression represented by one of the first subtrees and a second expression represented by one of the second subtrees represent matching content based on each of the first subtrees and each of the second subtrees;
a categorization step in which the categorizing part categorizes the one of the first subtrees and the one of the second subtrees into a group based on the match determination step determining one combination of the first expression and the second expression represents matching content, or respective multiple combinations of the first expression and the second expression represent matching content; and,
an output step in which the output part outputs a counted number of subtrees categorized into the group,
wherein the match determination step, for one group combination or each of multiple group combinations that are combinations of a first group and a second group into which one or multiple subtrees are respectively categorized, determines an inclusion relationship between the first group and the second group, based on height, branch number and number of phrases of subtrees belonging to the first group and height, branch number and number of phrases of subtrees belonging to the second group; and
the output step, based on the determined inclusion relationship, arranges and outputs an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the first group, and an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the second group.
1. A counting device for counting input sentences, comprising:
at least one non-transitory memory operable to store program code;
at least one processor operable to read said program code and operate as instructed by said program code, said program code including:
receiving code that causes the at least one processor to receive a first sentence and a second sentence;
syntax analyzing code that causes the at least one processor to generate a syntax tree of the first sentence by analyzing syntax of the first sentence, and a syntax tree of the second sentence by analyzing syntax of the second sentence;
subtree generating code that causes the at least one processor to generate one or more first subtrees of the first sentence based on the syntax tree of the first sentence, and one or more second subtrees of the second sentence based on the syntax tree of the second sentence;
match determination code that causes the at least one processor to determine whether or not a first expression represented by a first subtree of the one or more first subtrees and a second expression represented by a second subtree of the one or more second subtrees represent matching content based on each of the one or more first subtrees and each of the one or more second subtrees;
categorizing code that causes the at least one processor to categorize the first subtree and the second subtree into a group based on the match determination code determining one combination of the first expression and the second expression represents matching content, or respective multiple combinations of the first expression and the second expression represent matching content; and,
output code that causes the at least one processor to output a counted number of subtrees categorized into the group,
wherein the match determination code further causes the at least one processor to, for one group combination or each of multiple group combinations that are combinations of a first group and a second group into which one or multiple subtrees are respectively categorized, determine an inclusion relationship between the first group and the second group, based on height, branch number and number of phrases of subtrees belonging to the first group and height, branch number and number of phrases of subtrees belonging to the second group; and
the output code further causes the at least one processor to, based on the determined inclusion relationship, arrange and output an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the first group, and an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the second group.
2. The counting device according to
3. The counting device according to
determine that a first subtree modifier phrase of the first subtree composing the subtree combination and a second subtree modifier phrase of a second subtree composing the subtree combination match when the first subtree modifier phrase is a synonym of the second subtree modifier phrase, or when the difference between the first subtree modifier phrase and the second subtree modifier phrase is a difference in conjugation, or when the difference between the first subtree modifier phrase and the second subtree modifier phrase is a difference in notation; and
determine that a first subtree head phrase of the first subtree and a second subtree head phrase of the second subtree match when the first subtree head phrase is a synonym for the second subtree head phrase, or when the difference between the first subtree head phrase and the second subtree head phrase is a difference in conjugation, or when the difference between the first subtree head phrase and the second subtree head phrase is a difference in notation.
4. The counting device according to
phrase conversion code that causes the at least one processor to convert a head phrase to an affirmative expression and a modifier phrase to a negative expression when, for the one subtree combination or each of the multiple subtree combinations, the end of a modifier phrase of a first subtree composing the subtree combination is “to” or “to ha”, the modifier phrase of the first subtree includes a declinable word and includes an affirmative expression, there are no commas between the modifier phrase of the first subtree and the head phrase of the first subtree, and the head phrase of the first subtree includes a negative expression;
wherein for the one subtree combination or each of the multiple subtree combinations, the match determination code further causes the at least one processor to determine whether or not the modifier phrase of the converted first subtree composing the subtree combination and a modifier phrase of a second subtree composing the subtree combination match, and whether or not the head phrase of the converted first subtree and the head phrase of the second subtree match.
5. The counting device according to
the receiving code further causes the at least one processor to input a first sentence and a second sentence that are response sentences to questions; and
for subtree combinations comprising a first subtree possessing a head phrase and a prescribed number of modifier phrases and a second subtree possessing a head phrase and the prescribed number of modifier phrases, within the one subtree combination or each of the multiple subtree combinations, the match determination code further causes the at least one processor to determine that the head phrase possessed by the first subtree and the head phrase possessed by the second subtree match upon determining that each of the prescribed number of modifier phrases possessed by the first subtree respectively match the prescribed number of modifier phrases possessed by the second subtree.
6. The counting device according to
the subtree generating code further causes the at least one processor to generate the one or more first subtrees possessing at least a first subtree modifier phrase modifying another first subtree phrase and a first subtree head phrase that is the other first subtree phrase, from among multiple phrases comprising the first sentence, from the generated syntax tree of the first sentence, and the one or more second subtrees possessing at least a second subtree modifier phrase modifying another second subtree phrase and a second subtree head phrase that is the other second subtree phrase, from among the multiple phrases comprising the second sentence, from the generated syntax tree of the second sentence; and
the match determination code further causes the at least one processor to determine whether or not the first expression represented by the first subtree and the second expression represented by the second subtree match based on whether or not the first subtree modifier phrase and the second subtree modifier phrase match, and whether or not the first subtree head phrase and the second subtree head phrase match.
7. The counting device according to
the match determination code further causes the at least one processor to, for group combinations in which a number of modifier phrases possessed by subtrees categorized in the first group is smaller than a number of modifier phrases possessed by subtrees categorized in the second group from the one group combination or the multiple group combinations, determine that the expression represented by subtrees categorized to the first group is a superior expression to the expression represented by subtrees categorized to the second group when all of the modifier phrases possessed by subtrees categorized to the first group match any of the modifier phrases possessed by subtrees categorized to the second group; and
the categorizing code further causes the at least one processor to, for the one group combination or the respective multiple group combinations, make the first group a superior group to the second group when the match determination code determines that the expression represented by subtrees categorized to the first group comprising the group combination is a superior expression to the expression represented by subtrees categorized to the second group comprising the group combination.
8. The counting device of
|
This application is a National Stage of International Application No. PCT/JP2013/056196 filed Mar. 6, 2013, claiming priority based on Japanese Patent Application No. 2012-103996 filed Apr. 27, 2012, the contents of all of which are incorporated herein by reference in their entirety.
The present invention relates to a counting device, a counting program, a memory medium and a counting method.
A text mining device has been known from before that can find multiple texts representing the same characteristic content despite having mutually different expressions, from among multiple input texts (for example, see Patent Literature 1). This text mining device mutually associates and stores multiple differing expressions having the same characteristic content. This text mining device is such that when an expression associated with a prescribed expression is included in input text, the expression is converted to the prescribed expression and text containing the prescribed expression is found.
In the art of Patent Literature 1, the problem existed that which expressions were used how much cannot be counted in text consisting of multiple input texts.
In consideration of the foregoing, it is an objective of the present invention to provide a counting device, counting program computer-readable recording medium on which a counting program is recorded, and counting method, with which it is possible to count what expressions are used how much in multiple input texts.
In order to achieve the above objective, the counting device according to a first aspect of the present invention comprises:
an input part for inputting a first sentence and a second sentence;
a syntax analyzing part for generating a syntax tree of the first sentence and a syntax tree of the second sentence by accomplishing syntax analysis on the first sentence and the second sentence;
a subtree generating part for generating one or multiple first subtrees that are subtrees comprising the first sentence, from the generated syntax tree of the first sentence, and generating one or multiple second subtrees that are subtrees comprising the second sentence, from the generated syntax tree of the second sentence;
a match determination part for determining whether or not a first expression represented by a first subtree comprising a subtree combination and a second expression represented by a second subtree comprising the subtree combination represent matching content, for one or multiple subtree combinations that are combinations or any one of the one or multiple first subtrees generated and one or multiple of the second subtrees generated;
a categorizing part for categorizing a first subtree representing a first expression and a second subtree representing a second expression into the same group for one combination of the first expression and the second expression determined to be representing matching content, or respective multiple combinations of the first expression and the second expression determined to be representing matching content; and,
an output part for outputting the number of subtrees categorized into the group, or an expression respectively represented by one subtree or multiple subtrees categorized into the group.
In addition, in the counting device according to the first aspect, the match determination part may determine that a first expression represented by the first subtree and a second expression represented by the second subtree match when, for the one subtree combination or the multiple subtree combinations, the first subtree comprising the subtree combination and the second subtree comprising the subtree combination match in all of the following: height, branch number, and phrases respectively divided into a root and one or multiple leaves.
In addition, in the counting device according to the first aspect,
the subtree generating part may generate one or multiple first subtrees possessing at least a modifier phrase modifying another phrase and a head phrase that is the other phrase, from among multiple phrases comprising the first sentence, from the generated syntax tree of the first sentence, and one or multiple second subtrees possessing at least a modifier phrase modifying another phrase and a head phrase that is the other phrase, from among the multiple phrases comprising the second sentence, from the generated syntax tree of the second sentence; and
the match determination part may determine whether or not a first expression represented by the first subtree and a second expression represented by the second subtree match based on whether or not the modifier phrase of the first subtree comprising the subtree combination and the modifier phrase of the second subtree comprising the subtree combination match, and whether or not the head phrase of the first subtree and the head phrase of the second subtree match, for the one subtree combination or each of the multiple subtree combinations.
In addition, in the counting device according to the first aspect,
the match determination part, for one group combination or each of multiple group combinations that are combinations of a first group and a second group into which one or multiple subtrees are respectively categorized, may determine an inclusion relationship between the first group and the second group, based on the height, branch number and number of phrases of subtrees belonging to the first group and the height, branch number and number of phrases of subtrees belonging to the second group; and
the output part, based on the determined inclusion relationship, may arrange and output an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the first group, and an expression represented by one subtree or multiple expressions respectively represented by multiple subtrees categorized in the second group.
In addition, in the counting device according to the first aspect,
the match determination part, for group combinations in which the number of modifier phrases possessed by subtrees categorized in the first group is smaller than the number of modifier phrases possessed by subtrees categorized in the second group from the one group combination or the multiple group combinations, may determine that the expression represented by subtrees categorized to the first group is a superior expression to the expression represented by subtrees categorized to the second group when all of the modifier phrases possessed by subtrees categorized to the first group match any of the modifier phrases possessed by subtrees categorized to the second group; and
the categorizing part, for the one group combination or the respective multiple group combinations, may make the first group a superior group to the second group when it is determined that the expression represented by subtrees categorized to the first group comprising the group combination is a superior expression to the expression represented by subtrees categorized to the second group comprising the group combination.
In addition, in the counting device according to the first aspect, the match determination part, for group combinations comprising a first group into which subtrees possessing a head phrase and a modifier phrase are categorized and a second group into which subtrees possessing a head phrase and a modifier phrase and a phrase modifying the modifier phrase are categorized from the one group combination or from among the multiple group combinations, may determine that the expression represented by subtrees categorized to the first group is a superior expression to the expression represented by subtrees categorized to the second group upon determining that the head phrase of a subtree categorized to the first group and the head phrase of a subtree categorized to the second group match, and that the modifier phrase of a subtree categorized to the first group and the modifier phase of a subtree categorized to the second group match.
Furthermore, in the counting device according to the first aspect, the match determination part, for the one subtree combination or each of the multiple subtree combinations:
may determine that a modifier phrase of the first subtree comprising the subtree combination and a modifier phrase of a second subtree comprising the subtree combination match when the modifier phrase of the first subtree is a synonym of a modifier phrase of the second subtree, or when the difference between a modifier phrase of the first subtree and a modifier phrase of the second subtree is a difference in conjugation, or when the difference between a modifier phrase of the first subtree and a modifier phrase of the second subtree is a difference in notation; and
may determine that a head phrase of the first subtree and a head phrase of the second subtree match when a head phrase of the first subtree is a synonym for a head phrase of the second subtree, or when the difference between a head phrase of the first subtree and a head phrase of the second subtree is a difference in conjugation, or when the difference between a head phrase of the first subtree and a head phrase of the second subtree is a difference in notation.
Furthermore, the counting device according to the first aspect, may further comprise:
a phrase conversion part for converting a head phrase to an affirmative expression and a modifier phrase to a negative expression when, for the one subtree combination or each of the multiple subtree combinations, the end of a modifier phrase of a first subtree comprising the subtree combination is the particle “to” or “to ha”, the modifier phrase of the first subtree includes a declinable word and includes an affirmative expression, there are no commas between the modifier phrase of the first subtree and the head phrase of the first subtree, and the head phrase of the first subtree includes a negative expression;
wherein for the one subtree combination or each of the multiple subtree combinations, the match determination part may determine whether or not the modifier phrase of the converted first subtree comprising the subtree combination and a modifier phrase of a second subtree comprising the subtree combination match, and whether or not the head phrase of the converted first subtree and the head phrase of the second subtree match.
Furthermore, in the counting device according to the first aspect,
the input part may input a first sentence and a second sentence that are response sentences to questions; and
for subtree combinations comprising a first subtree possessing a head phrase and a prescribed number of modifier phrases and a second subtree possessing a head phrase and the prescribed number of modifier phrases, within the one subtree combination or each of the multiple subtree combinations, the match determination part may determine that the head phrase possessed by the first subtree and the head phrase possessed by the second subtree match upon determining that each of the prescribed number of modifier phrases possessed by the first subtree respectively match the prescribed number of modifier phrases possessed by the second subtree.
In order to active the above objective, the counting program according to a second aspect of the present invention causes a computer to function as:
an input part for inputting a first sentence and a second sentence;
a syntax analyzing part for generating a syntax tree of the first sentence and a syntax tree of the second sentence by accomplishing syntax analysis on the first sentence and the second sentence;
a subtree generating part for generating one or multiple first subtrees that are subtrees comprising the first sentence, from the generated syntax tree of the first sentence, and generating one or multiple second subtrees that are subtrees comprising the second sentence, from the generated syntax tree of the second sentence;
a match determination part for determining whether or not a first expression represented by a first subtree comprising a subtree combination and a second expression represented by a second subtree comprising the subtree combination represent matching content, for one or multiple subtree combinations that are combinations or any one of the one or multiple first subtrees generated and one or multiple of the second subtrees generated;
a categorizing part for categorizing a first subtree representing a first expression and a second subtree representing a second expression into the same group for one combination of the first expression and the second expression determined to be representing matching content, or respective multiple combinations of the first expression and the second expression determined to be representing matching content; and,
an output part for outputting the number of subtrees categorized into the group, or an expression respectively represented by one subtree or multiple subtrees categorized into the group.
In order to achieve the above objective, the computer-readable recording medium according to a third aspect of the present invention has recorded thereon a counting program for causing a computer to function as:
an input part for inputting a first sentence and a second sentence;
a syntax analyzing part for generating a syntax tree of the first sentence and a syntax tree of the second sentence by accomplishing syntax analysis on the first sentence and the second sentence;
a subtree generating part for generating one or multiple first subtrees that are subtrees comprising the first sentence, from the generated syntax tree of the first sentence, and generating one or multiple second subtrees that are subtrees comprising the second sentence, from the generated syntax tree of the second sentence;
a match determination part for determining whether or not a first expression represented by a first subtree comprising a subtree combination and a second expression represented by a second subtree comprising the subtree combination represent matching content, for one or multiple subtree combinations that are combinations or any one of the one or multiple first subtrees generated and one or multiple of the second subtrees generated;
a categorizing part for categorizing a first subtree representing a first expression and a second subtree representing a second expression into the same group for one combination of the first expression and the second expression determined to be representing matching content, or respective multiple combinations of the first expression and the second expression determined to be representing matching content; and,
an output part for outputting the number of subtrees categorized into the group, or an expression respectively represented by one subtree or multiple subtrees categorized into the group.
In order to achieve the above objective, the counting method according to a fourth aspect of the present invention is method executed by a counting device comprising an input part, a syntax analyzing part, a subtree generating part, a match determination part, a categorizing part and an output part, the method including:
an input step in which the input part inputs a first sentence and a second sentence;
a syntax analysis step in which the syntax analyzing part generates a syntax tree of the first sentence and a syntax tree of the second sentence by accomplishing syntax analysis on the first sentence and the second sentence;
a subtree generation step in which the subtree generating part generates one or multiple first subtrees that are subtrees comprising the first sentence, from the generated syntax tree of the first sentence, and generates one or multiple second subtrees that are subtrees comprising the second sentence, from the generated syntax tree of the second sentence;
a match determination step in which the match determination part determines whether or not a first expression represented by a first subtree comprising a subtree combination and a second expression represented by a second subtree comprising the subtree combination represent matching content, for one or multiple subtree combinations that are combinations or any one of the one or multiple first subtrees generated and one or multiple of the second subtrees generated;
a categorization step in which the categorizing part categorizes a first subtree representing a first expression and a second subtree representing a second expression into the same group for one combination of the first expression and the second expression determined to be representing matching content, or respective multiple combinations of the first expression and the second expression determined to be representing matching content; and,
an output step in which the output part outputs the number of subtrees categorized into the group, or an expression respectively represented by one subtree or multiple subtrees categorized into the group.
With the counting device, counting program, computer-readable recording medium on which a counting program is recorded, and counting method according to the present invention, it is possible to count what expressions are used how many times in multiple input sentences.
Below, the preferred embodiments of the present invention are described in detail with reference to the attached drawings.
A counting device 100 according to a first preferred embodiment of the present invention is part of a counting system 1 as shown in
In addition to the counting device 100, the counting system 1 comprises a computer communication network 10 (hereafter simply called a communication network 10) and terminal devices 20 and 21.
The communication network 10 for example comprises the Internet. The communication network 10 may also comprise a LAN (Local Area Network) or a public circuit network.
The terminal devices 20 and 21 have mutually similar compositions and accomplish the same actions, so the explanation below will primarily describe the terminal device 20.
The terminal device 20 comprises, for example, personal computers provided with a display device such as an LCD (Liquid Crystal Display) and/or the like, and input parts such as a keyboard and a mouse.
The terminal device 20 displays a question screen displaying survey questions such as shown in
The counting device 100 comprises a server such as shown in
The CPU 101 accomplishes complete control of the counting device by executing programs in accordance with programs stored in the ROM 102 or the hard disk 104. The RAM 103 is a work memory for temporarily storing data that is the target of processing, during execution of programs by the CPU 101.
The hard disk 104 is an information memory for storing tables preserving various types of data. The counting device 100 may be provided with a flash memory in place of the hard disk 104.
The media controller 105 reads various types of data and programs from recording media, including flash memory, CD (Compact Disc), DVD (Digital Versatile Disc) and Blu-ray Disc®.
The LAN card 106 sends and receives data between the terminal devices 20 and 21 connected via the communication network 10. The keyboard 109 and the touchpad 111 input signals in accordance with user manipulation.
The video card 107 depicts (that is to say, renders) images based on digital signals output from the CPU 101, and also outputs image signals showing the rendered images. The LCD 108 displays images in accordance with image signals output from the video card 107. The counting device 100 may be provided with a PDP (Plasma Display Panel) or EL (Electroluminescence) display in place of the LCD 108. The speaker 110 outputs audio based on signals output from the CPU 101.
Next, the functions possessed by the counting device 100 will be described.
The CPU 101 executes the counting process shown in
The input part 120 inputs a survey response sentence the LAN card 106 shown in
Here, the description of the functions possessed by the counting device 100 is temporarily interrupted and explanation of a syntax tree generated by the syntax analyzing part 122 is described, citing as an example the syntax tree shown in
The syntax tree shown in
A phrase modifying (that is to say, embellishing) another phrase in this manner is called a modifier phrase, and a phrase being modified by a modifier phrase (that is to say, embellished by a modifier phrase) is called a head phrase. In other words, the phrase “of socks” is a modifier phrase modifying the phrase “the dirt”, and the phrase “the dirt” is a head phrase being modified by the phrase “of sock”. In addition, the phrase “the dirt” and the phrase “finely” are modifier phrases modifying the phrase “comes off”, and the phrase “comes off” is a head phrase being modified by the phrase “the dirt” and the phrase “finely”.
The syntax tree shown in
The counting device 100 counts survey questions represented by syntax trees. Below, the description of the functions possessed by the counting device 100 is resumed.
The subtree generating part 123 shown in
To explain citing as an example the syntax tree (full tree) representing the sentence “The dirt of socks comes off finely” shown in
The subtrees respectively shown in
The subtree shown in
Specifically, the subtree shown in
The number of branches of the subtrees is smaller than or the same as the number of branches of the full tree, and the height of a subtree is shorter than or the same as the height of the full tree. In addition, all branches of the subtrees match any or all of the branches of the full tree.
As a concrete example, the subtree shown in
In contrast, the height of the subtree shown in
The sentence “The dirt of socks comes off” represented by the subtree of
In addition, the sentence “The dirt comes off finely” represented by the subtree of
In this manner, subtrees representing superior concepts to the subtree that is the standard are called superior subtrees to the subtree that is the standard, and subtrees representing inferior concepts than the subtree that is the standard are called inferior subtrees to the subtree that is the standard.
In addition, the subtree of
In this manner, subtrees with one fewer branch than the subtree that is the standard are called the closest superior subtrees of the subtree that is the standard, and subtrees having one more branch than the subtree that is the standard are called closest inferior subtrees of the subtree that is the standard.
The match determination part 124 of
To explain by citing a concrete example, the match determination part 124 determines that the first subtree and the second subtree as shown in
In addition, the match determination part 124 determines that the first subtree representing the sentence “The dirt of socks comes off finely” and the second subtree representing the sentence “Finely the dirt of socks comes off” match each other completely, as shown in
Furthermore, the match determination part 124 determines that the first subtree representing “The dirt of socks comes off finely” and the second subtree representing “The DIRT of socks comes off finely” effectively match each other, as shown in
In addition, the match determination part 124 determines that a first subtree representing “The dirt of socks comes off finely” and a second subtree representing “The dirt of socks is removed finely” effectively match each other, as shown in
In addition, the match determination part 124 determines that a first subtree representing “The dirt of socks comes off finely” and a second subtree representing “The dirt of socks came off finely” effectively match each other, as shown in
The categorizing part 125 of
Next, the various types of tables stored by the information memory 129 are explained with reference to
The information memory 129 stores an input sentence table shown in
In addition, the information memory 129 stores a phrase table shown in
Furthermore, the information memory 129 stores a subtree table shown in
Furthermore, the information memory 129 stores a notation table shown in
Two words with different notations associated with each other in the notation table, two synonyms associated with each other in the synonym table and two words with different conjugations associated with each other in the conjugation table are respectively considered effectively matching (or, effectively the same) words.
In addition, the information memory 129 stores a group table shown in
When in a given group a subtree representing an inferior concept to a subtree categorized in a group that is a given standard is categorized, the group is considered an inferior group to the group that is the standard. In addition, concepts represented by subtrees categorized into the group that is the standard contain concepts represented by subtrees categorized into groups inferior to the group that is the standard. That is because superior concepts include inferior concepts.
Consequently, the information memory 129 stores an inclusion relationship table shown in
A set constituting multiple groups into which subtrees representing common concepts are respectively categorized is called a group set. Multiple groups belonging to a group set mutually comprise a tier structure because of having superior and inferior relationships respectively, as discussed above.
Consequently, the information memory 129 stores a tier structure table shown in
Next, the actions of the CPU 101 accomplished by the input part 120, the saving part 121, the syntax analyzing part 122, the subtree generating part 123, the match determination part 124, the categorizing part 125, the counting part 126 and the output part 127 shown in
The explanation takes as an example a case in which the counting device 100 has received a survey response sentence of “The dirt of socks comes off finely.”
The CPU 101 begins execution of the counting process shown in
First, the input part 120 inputs a response sentence of “The dirt of socks comes off finely” from the LAN card 106 of
Next, the saving part 121 generates a sentence ID identifying the input response sentence “The dirt of socks comes off finely”, and saves this in the input sentence table shown in
The explanation assumes that the following six sentences are stored in the input sentence table when the process of step S02 has ended.
The sentence “The dirt of socks comes off finely” identified by a sentence ID “ST1”, the sentence “The dirt of socks is removed finely” identified by a sentence ID “ST2”, the sentence “The DIRT comes off finely” identified by a sentence ID “ST3”, the sentence “The dirt came off” identified by a sentence ID “ST4”, the sentence “The dirt comes off” identified by a sentence ID “ST5”, and the sentence “The package is nice” identified by a sentence ID “ST6”.
Following step S02 in
Next, the syntax analyzing part 122 acquires morpheme strings of the sentences by accomplishing morpheme analysis on the six sentences read by the input part 120 (step S04a).
Next, the syntax analyzing part 122 accomplishes syntax analysis on the morpheme string obtained through morpheme analysis (step S04b). In this manner, the syntax analyzing part 122 specifies multiple phrases comprising the aforementioned six input sentences from the morpheme string of the aforementioned six input sentences.
Following this, the saving part 121 saves the sentence IDs identifying the sentences, the phrase IDs of phrases obtained from the sentences, and the phrases, associated together, in the phrase table shown in
In addition, the syntax analyzing part 122 generates a full syntax tree (that is to say, a full tree) FT1 shown in
Next, the subtree generating part 123 generates subtrees PT10 to PT15 (that is to say, all subtrees) shown in
Following this, for the multiple subtrees generated in step S06, the saving part 121 of
After the subtrees are generated by the process of step S06 of
When the subtree categorization process begins, the categorizing part 125 generates a new group and then references the subtree table shown in
At this time, all subtrees identified by subtree IDs stored in the subtree table are uncategorized trees. Here the explanation assumes that the categorizing part 125 selects the subtree ID “PT10” stored at the front of the subtree table and categorizes the subtree PT10 shown in
Next, the saving part 121 saves the subtree ID “PT10” of the subtree PT10 and the group ID “G10” of the group G10 in the group table shown in
Next, the categorizing part 125 determines whether or not all of the subtrees stored in the subtree table of
Following this, the categorizing part 125 takes one of the uncategorized trees as a categorization target tree to be a subtree categorized into a group (step S23).
Here, the explanation assumes that the categorizing part 125 selects the subtree PT11 shown in
Next, the categorizing part 125 determines whether or not all subtrees already categorized into groups (hereafter called already-categorized trees) have been focused on (step S24). At this time, the already-categorized tree is only the subtree PT10 and the subtree PT10 has not yet been focused on. Consequently, the categorizing part 125 determines that all already-categorized trees have not been focused on (step S24 of
Following this, the categorizing part 125 focuses on one of the unfocused already-categorized trees and makes this a focus already-categorized tree (step S25). Here, the explanation assumes that the categorizing part 125 focusses on the already-categorized tree PT10 shown in
Next, the match determination part 124 finds the sentence ID “ST1” associated with the subtree ID “PT11” of the categorization target tree PT11 of
Following this, the process returns to step S24 in
Next, the categorizing part 125 generates a new group and categorizes the categorization target trees in the generated group (step S28). Here, the explanation assumes that the categorizing part 125 generates a new group G11 and categorizes the categorization target tree PT11 of
Next, the saving part 121 associates the group ID “G11” of the group G11, the subtree ID “PT11” of the categorization target tree PT11, the height “2” of the categorization target tree PT11 and the number of branches “2” with each other and saves this information in the group table shown in
The explanation assumes that following this the categorizing part 125 takes the order of the categorization target trees to be from subtree PT12 to subtree PT15 from
Next, the explanation assumes that the categorizing part 125 takes an uncategorized tree PT20 shown in
In step S26b, the match determination part 124 determines whether or not the categorization target tree PT20 of
Here, the match determination part 124 does not determine that the categorization target tree PT20 representing the sentence “The dirt of socks is removed finely” and the focus already-categorized tree PT10 representing the sentence “The dirt of socks comes off finely” completely match. That is because the root node “is removed” of the categorization target tree PT20 and the root node “comes off” of the focus already-categorized tree PT10 are different.
Next, the match determination part 124 determines that “is removed” is a synonym for “comes off” because “is removed” and “comes off” are stored associated with each other in the synonym table of
Following this, the categorizing part 125 categorizes the categorization target tree PT20 of
Next, the saving part 121 saves the subtree ID “PT20” of the subtree PT20 and the group ID “G10” of the group G10 in the group table shown in
Next, the explanation assumes that the categorizing part 125 categorizes categorization target trees in order from the subtree PT21 through PT25 in
Next, the explanation assumes that the categorizing part 125 takes the uncategorized tree PT30 shown in
Next, the explanation assumes that the categorizing part 125 takes the already-categorized tree PT12 shown in
Here, the match determination part 124 determines that the categorization target tree PT30 representing the sentence “The DIRT is removed finely” and the focus already-categorized tree PT12 representing the sentence “The dirt comes off finely” do not completely match. That is because the modifier phrase “The DIRT” modifying the root node of the categorization target tree PT30 and the modifier phrase “The dirt” modifying the root node of the focus already-categorized tree PT12 are different.
Next, the match determination part 124 determines that the difference between the phrases is nothing more than a notational difference because “the DIRT” and “the dirt” are stored associated with each other in the notation table of
Following this, the categorizing part 125 categorizes the categorization target tree PT30 into the same group G12 as the focus already-categorized tree PT12 (step S27).
Next, the saving part 121 saves the subtree ID “PT30” of the subtree PT30 and the group ID “G12” of the group G12 in the group table shown in
Following this, the explanation assumes that the categorizing part 125 takes the subtree PT31 shown in
Next, the explanation assumes that the categorizing part 125 takes the uncategorized tree PT32 shown in
Next, the explanation assumes that the categorizing part 125 takes the already-categorized tree PT14 of
Here, the match determination part 124 determines that the categorization target tree representing the sentence “comes off finely” and the focus already-categorized tree PT14 representing the sentence “comes off finely” completely match. This is because the modifier phrase “finely” and the head phrase “comes off” of the categorization target tree PT32, and the modifier phrase “finely” and the head phrase “comes off” of the focus already-categorized tree PT14, match.
Following this, the categorizing part 125 categorizes the categorization target tree PT32 into the same group G14 as the focus already-categorized tree PT14 (step S27). Next, the saving part 121 saves the subtree ID “PT32” of the categorization target tree PT32 and the group ID “G14” of the group G14 in the group table shown in
Next, the explanation assumes that the categorizing part 125 takes an uncategorized tree PT40 shown in
Next, the explanation assumes that the categorizing part 125 takes the already-categorized tree PT13 shown in
Here, the match determination part 124 determines that the categorization target tree PT40 representing the sentence “The dirt came off” and the focus already-categorized tree PT13 representing the sentence “The dirt comes off” do not completely match. This is because the root note “came off” of the categorization target tree PT40 and the root node “comes off” of the focus already-categorized tree PT13 are different.
Next, the match determination part 124 determines that the difference between the root nodes is nothing more than a difference in conjugation, because “came off” and “comes off” are stored associated with each other in the conjugation table of
Following this, the categorizing part 125 categorizes the categorization target tree PT40 into the same group G13 as the focus already-categorized tree PT13 (step S27).
Following this, the explanation assumes that the categorizing part 125 takes the subtree PT50 shown in
Following this, the explanation assumes that the categorizing part 125 takes the subtree PT60 shown in
Next, when the categorizing part 125 determines that all subtrees have been categorized (step S22; Yes), execution of the subtree categorization process ends.
When execution of the subtree categorization process ends in step S07a of
After step S07a of
When the counting process starts, the counting part 126 references the group table shown in
Next, the counting part 126 focuses on one uncounted group and takes the focused-on group as a focus group (step S52). At this time, the explanation assumes that the counting part 126 focusses on the group G10, out of the uncounted groups G10 to G16.
Next, the counting part 126 counts the number of subtrees completely matching each other (that is to say, the number of completely matching trees) for all subtrees categorized in the focus group (step S53). In the group table shown in
Next, the counting part 126 takes one subtree having the largest number of matching trees as a representative tree (step S54). Here, the number of completely matching trees of the subtrees PT10 and PT20 are respectively “0”, so the explanation assumes that the counting part 126 takes as a representative tree the subtree PT10 having the lower subtree ID.
Next, the counting part 126 determines a name for the focus group based on the representative tree (step S55). Here, the explanation assumes that the counting part 126 sets as the name of the focus group G10 the sentence “The dirt of socks comes off finely” represented by the representative tree PT10.
Next, the counting part 126 counts the total number (hereafter referred to as the total tree number) of subtrees categorized into the focus group (step S56). Here, the explanation assumes that the counting part 126 has calculated a total tree number of “2” for the “PT10, PT20” associated with the group ID “G10”.
Following this, the saving part 121 saves the group ID “G10”, the subtree ID “PT10” of the representative tree, the group name “The dirt of socks comes off finely”, the group name matching tree count “1” of the group and the total tree number “2” in the group table shown in
Next, the explanation assumes that the counting part 126 takes the groups G11 and G12 in order as the focus group. The counting part 126 repeatedly executes the processes from step S51 to step S56.
Through this, the group ID “G11”, the subtree ID “PT11” of the representative tree, the group name “The dirt of socks comes off”, the group name complete match tree number “1” and the total tree number “2” are stored in the group table associated with each other.
In addition, the group ID “G11”, the subtree ID “PT12” of the representative tree, the group name “The dirt comes off finely”, the group name complete match tree number “1” and the total tree number “3” are stored in the group table associated with each other.
Next, the explanation assumes that the counting part 126 takes the group G13 as the focus group. The counting part 126 executes the processes of step S51 and step S52.
Here, in the group table shown in
Consequently, the counting part 126 counts that the complete match tree numbers of the subtrees PT13 and PT50 categorized in the focus group G13 are respectively “1”, and the complete match tree numbers of the subtrees PT23, PT31 and PT40 are respectively “0” (step S53).
Next, because the complete match tree numbers of the subtrees PT13 and PT50 are respectively “1”, the counting part 126 takes the subtree PT13 with the smaller subtree ID as the representative tree (step S54).
Next, the counting part 126 executes step S55 and step S56. Through this, the group ID “G13”, the subtree ID “PT13” of the representative tree, the group name “the dirt comes off”, the group name complete match tree number “2” and the total tree number “5” are stored in the group table, associated with each other.
Next, the explanation assumes that the counting part 126 takes groups G14 to G16 in order as the focus group. The counting part 126 repeatedly executes the processes of step S51 to step S56.
Through this, the group ID “G14”, the subtree ID “PT14” of the representative tree, the group name “comes off finely”, the group name complete match tree number “2” and the total tree number “3” are stored in the group table, associated with each other. In addition, the group ID “G15”, the subtree ID “PT15” of the representative tree, the group name “the dirt of socks”, the group name complete match tree number “1” and the total tree number “2” are stored in the group table, associated with each other.
Furthermore, the group ID “G16”, the subtree ID “PT60” of the representative tree, the group name “The package is nice”, the group name complete match tree number “1” and the total tree number “1” are stored in the group table, associated with each other.
Following this, the counting part 126 determines that all of the groups G10 to G16 have been counted (step S51; Yes), and ends execution of the subtree counting process.
When execution of step S07b of
When the inclusion relationship specification process begins, the categorizing part 125 acquires the group IDs “G10” to “G16” from the group table shown in
Next, the categorizing part 125 determines whether or not all groups respectively identified by the multiple group IDs acquired from the group table have been focused on (step S61). At this time, the categorizing part 125 has just begun the inclusion relationship specification process so none of the groups has been focused on. Consequently, the categorizing part 125 determines that all groups have not been focused on (step S61; No).
Next, the categorizing part 125 takes one of the group IDs “G10” to “G16” of groups that have not yet been focused on as the focus group (step S62). Here, the explanation will assume that the group G10 with the lowest group ID number is taken as the focus group.
Next, the categorizing part 125 takes a group G11 to G16 different from the focus group G10 as a group for comparison with the focus group (hereafter referred to as the comparison target group) (step S63).
Following this, the categorizing part 125 determines whether or not all of the comparison target groups G11 to G16 have been focused on (step S64). At this time, the categorizing part 125 has not focused on any of the comparison target groups G11 to G16, so the determination is that all comparison target groups G11 to G16 have not been focused on (step S64: No).
Next, the categorizing part 125 focuses on one of the unfocussed-on comparison target groups G11 to G16, and takes the focused-on group as the focus comparison target group (step S65). Here, the explanation assumes that the categorizing part 125 takes the comparison target group G11 having the lowest group ID as the focus comparison target group.
Next, the match determination part 124 determines whether or not the focus comparison target group G11 is the closest superior group to the focus group G10 (step S66). Specifically, the match determination part 124 acquires the total branch number “3” associated with the focus group G10 from the group table shown in
Next, the match determination part 124 determines whether or not the representative tree PT11 of the focus comparison target group G11 is a partial syntax tree of the representative tree PT10 of the focus group G10. If the representative tree PT11 is a partial syntax tree of the representative tree PT10, the concept represented by the representative tree PT11 contains the concept represented by the representative tree PT10. Consequently, when the match determination part 124 determines that the representative tree PT11 is a partial syntax tree of the representative tree PT10, it is determined that the focus comparison target group G11 is the closest superior group to the focus group G10.
Specifically, the match determination part 124 acquires the height “2” associated with the group ID “G10” of the focus group G10 (hereafter called the focus group ID), from the group table. The height is the height of the representative tree PT10 representing the focus comparison target group G10 shown in
Because the representative tree PT11 of the focus comparison target group G11 and representative tree PT10 of the focus group G10 have the same heights, the match determination part 124 determines that the concept represented by the representative tree PT11 is a superior concept to the concept represented by the representative tree PT10. This is because the height of the subtree representing a superior concept is the same as or lower than the height of a subtree representing an inferior concept.
Furthermore, the match determination part 124 determines that the root node “comes off” of the representative tree PT11 and the root node “comes off” of the representative tree PT10 match. In addition, the match determination part 124 determines that the phrase “the dirt” modifying the root node of the representative tree PT11 and the phrase “the dirt” modifying the root node of the representative tree PT10 match. Furthermore, the match determination part 124 determines that “of socks” modifying the phrase “the dirt” of the representative tree PT11 and “of socks” modifying the phrase “the dirt” of the representative tree PT10 match. That is to say, the match determination part 124 determines that the representative tree PT10 of the focus group G10 possess all of the head phrases and modifier phrases possessed by the representative tree PT11 of the focus comparison target group G11.
Consequently, the match determination part 124 determines that the representative tree PT11 categorized in the focus comparison target group G11 is a partial syntax tree of the representative tree PT10 of the focus group G10. Accordingly, the match determination part 124 determines that the focus comparison target group G11 is the closest superior group to the focus group G10 (step S66 of
Even when the representative tree PT10 possesses head phrases and modifier phrases that completely match or effectively match all of the head phrases and modifier phrases possessed by the representative tree PT11, the match determination part 124 determines that the representative tree PT11 is a partial syntax tree of the representative tree PT10.
Next, the categorizing part 125 sets the group ID “G11” of the focus comparison target group G11 as the closest superior group ID for the focus group G10. Following this, the saving part 121 saves the group ID “G10” of the focus group G10 and the closest superior group ID “G11” associated with each other in the inclusion relationship table shown in
Following this, the categorizing part 125 returns to step S64 and determines that all of the comparison target groups G11 to G16 have not been focused on (step S64; No). This is because the categorizing part 125 has only focused on the comparison target group G11 out of the comparison target groups G11 to G16.
Next, the categorizing part 125 takes the comparison target group G12 as the focus comparison target group, from among the unfocussed-on comparison target groups G12 to G16 (step S65).
Next, the match determination part 124 determines that the focus comparison target group G12 is the closest superior group to the focus group G10 (step S66). Specifically, the match determination part 124 acquires the total branch number “3” associated with the focus group G10 and the total branch number “2” associated with the focus comparison target group G12, from the group table shown in
Next, the match determination part 124 acquires the height “2” associated with the group ID “G10” of the focus group G10 and the height “1” associated with the focus group ID “G12” of the focus comparison target group G12, from the group table. The height of the representative tree PT12 of the focus comparison target group G12 is lower than the height of the representative tree PT10 of the focus group G10, so the match determination part 124 determines that there is a possibility that the concept represented by the representative tree PT12 is a superior concept to the concept represented by the representative tree PT10.
Furthermore, the match determination part 124 determines that the root node “comes off” of the representative tree PT12 and the root node “comes off” of the representative tree PT10 match. In addition, the match determination part 124 determines that the phrase “the dirt” modifying the root node of the representative tree PT12 and the phrase “the dirt” modifying the root node of the representative tree PT10 match. Furthermore, the match determination part 124 determines that the phrase “finely” modifying the root node of the representative tree PT12 and the phrase “finely” modifying the root node of the representative tree PT10 match. That is to say, the match determination part 124 determines that the representative tree PT10 of the focus group G10 possess all of the head phrases and modifier phrases possessed by the representative tree PT12 of the focus comparison target group G12.
Consequently, the match determination part 124 determines that the representative tree PT12 categorized in the focus comparison target group G12 is a partial syntax tree of the representative tree PT10 of the focus group G10, and that the focus comparison target group G12 is a closet superior group of the focus group G10 (step S66; Yes).
Next, the categorizing part 125 sets the group ID “G12” of the focus comparison target group G12 as the closest superior group ID of the focus group G10. Consequently, the saving part 121 saves the group ID “G10” of the focus group G10 and the closest superior group ID “G11” in the inclusion relationship table shown in
Following this, the categorizing part 125 executes the processes of steps S64 and S65 with the comparison target group G13 as the focus comparison target group.
Following this, the match determination part 124 determines that the focus comparison target group G13 is not the closest superior group of the focus group G10 (step S66; No). This is because in the group table, the total branch number “1” associated with the focus comparison target group G13 is at least two smaller than the total branch number “3” associated with the focus group G10.
Following this, the categorizing part 125 repeatedly executes the processes from steps S64 to S66 with the comparison target groups G14 to G16 respectively as the focus comparison target group. Through this, the categorizing part 125 determines that the focus comparison target groups G14 to G16 are not closest superior groups of the focus group G10.
Following this, the categorizing part 125 determines that all of the comparison target groups G11 to G16 have been focused on (step S64; Yes).
Next, the categorizing part 125 determines that the group ID “G10” of the focus group G10 and the group IDs “G11” and “G12” of the closest superior groups are stored associated with each other in the inclusion relationship table shown in
Next, the categorizing part 125 repeats the above-described processes from step S61 with the groups G11 and G12 in order as the focus group. Through this, the saving part 121 saves the group ID “G11” of the group G11 and the group IDs “G13” and “G15” of the closest superior groups of the group G11 associated with each other in the inclusion relationship table shown in
Next, the categorizing part 125 repeats the processes from step S61 to step S66 with the group G13 as the focus group and the groups G10 to G12 and G14 to G16 as focus comparison target groups. Following this, the categorizing part 125 determines that all of the comparison target groups G10 to G12 and G14 to G16 have been focused on (step S64; Yes).
Next, the categorizing part 125 determines that the group ID “G13” of the focus group G13 and the group ID of the closest superior group are not stored associated with each other in the inclusion relationship table shown in
Following this, the saving part 121 saves the group ID of the group G13 and a symbol “-” representing that the closest superior group of the group G13 does not exist, in the inclusion relationship table, associated with each other.
Following this, the categorizing part 125 repeats the processes from steps S61 to S66, and steps S68 and S69 with the groups G14 to G16 as focus groups. Through this, the saving part saves the group IDs of the groups G14 to G16 and the symbol “-” representing that closest superior groups of these groups do not exist, in the inclusion relationship table, associated with each other.
Following this, the categorizing part 125 determines that all of the groups G10 to G16 have been focused on (step S61; Yes), and ends execution of the inclusion relationship specification process.
After step S08 of
When the group categorization process begins, the categorizing part 125 refers to the inclusion relationship table shown in
Next, the categorizing part 125 determines whether or not all of the most superior groups not yet categorized into group sets (hereafter referred to as uncategorized most superior groups) G13 to G16 have been focused on (step S71). At this time, the group categorization process has just begun, so none of the uncategorized most superior groups G13 to G16 have been focused on, and consequently the categorizing part 125 determines that all of the uncategorized most superior groups G13 to G16 have not been focused on (step S71; No).
Next, the categorizing part focusses on one of the uncategorized most superior groups G13 to G16 and sets this as the focus group (step S72). Here, the explanation assumes the categorizing part 125 is focusing on the group G13.
Next, the categorizing part 125 creates a new group set SG1 into which are categorized groups into which are categorized subtrees representing the concept of “the dirt comes off” represented by the representative tree PT13 of the focus group G13 (step S73). Following this, the categorizing part 125 categorizes the focus group G13 into the created group set SG1 (step S74). Next, the saving part 121 saves the group set ID “SG1” of the group set SG1 and the group ID “G13” of the focus group G13, associated with each other, in the tier structure table shown in
Next, the categorizing part 125 determines that the closest inferior group of the focus group G13 has been searched and the groups G11 and the group G12 have been acquired, from the inclusion relationship table shown in
Next, the categorizing part 125 categorizes the acquired groups G11 and G12 into the new group set SG1 created in step S73 (step S76).
Next, the categorizing part 125 focusses on one of the acquired groups G11 and G12 and sets this as the focus group (step S77). Here, the explanation assumes that the categorizing part 125 sets the acquired group G11 as the focus group.
Following this, the categorizing part 125 executes steps S75 to S77, acquires the closest inferior group G10 of the focus group G11, categorizes the group G10 into the group set SG1 and sets the group G10 as the focus group.
Following this, the categorizing part 125 executes step S75, and determines that the closest inferior group of the focus group G10 cannot be acquired from the inclusion relationship table shown in
Next, the categorizing part 125 determines that there is an unfocussed-on group G12 out of the acquired groups G11, G12 and G10 acquired in step S76 (step S78; Yes).
Next, the categorizing part 125 repeats the processes from steps S75 to S77 with the unfocused-on group G12 as the focus group (step S79). Through this, the categorizing part acquires the closest inferior group G10 of the focus group G12. In addition, the saving part 121 saves the group set ID “SG1”, the group ID “G10” and the closest superior group ID “G12”, associated with each other, in the tier structure table shown in
Following this, the categorizing part 125 determines in step S75 that the closest inferior group of the focus group G10 could not be acquired from the inclusion relationship table shown in
Next, the categorizing part 125 determines that there are no unfocussed-on groups out of the acquired groups G11, G12 and G10 acquired in step S76 (step S78; No).
Following this, the categorizing part 125 repeatedly executes steps S71 to S77. Through this, the categorizing part creates new group set SG2 to SG4. In addition, the categorizing part 125 categorizes the group G14, the closest inferior group G12 of the group G14, and the closest inferior group G10 of the group G12 in the group set SG2. Furthermore, the categorizing part 125 categorizes the group G15, the closest inferior group G11 of the group G15, and the closer inferior group G10 of the group G11 in the group set SG3. Furthermore, the categorizing part 125 categorizes the group G16 into the group set SG4.
Following this, the categorizing part 125 determines that there are no acquired groups that have not been focused on (step S78; No). Next, the categorizing part 125 determines that all of the most superior groups G13 to G16 have been focused on (step S71; Yes), and ends execution of the group categorization process.
When execution of the group categorization process ends in step S09 of FIG. 5, the groups G10, G12 and G14 whose subtrees represent the mutually common concept of “comes off finely” are categorized into the group set SG2. These groups G10, G12 and G14 form a tier structure like that shown in
The groups G10, G11, G12 and G13 categorized into the group set SG1, the groups G10, G11 and G15 categorized into the group set SG3 and the group G16 categorized into the group set SG4 respectively form different tier structures, although such are omitted from the drawings.
After the subtree counting process in step S09 of
When the counting results screen generation process begins, the output part 127 generates the counting results screen shown in
Specifically, first the output part 127 acquires the group ID (that is to say the group ID of the most superior group) “G13” associated with the symbol “-” representing that there is no closest superior group, from the tier structure table. Next, the output part 127 acquires the group name “the dirt comes off” associated with the group ID “G13”, the group name matching tree count “2” and the total tree number “5”, from the group table shown in
Next, the output part 127 acquires the group IDs (that is to say, the group IDs of the closest inferior groups of the group G13) “G11” and “G12” associated with the closest superior group ID “G13”, from the tier structure table. Next, the output part 127 acquires the group name “the dirt of socks comes off” associated with the group ID “G11”, the group name matching tree count “1” and the total tree number “2”, from the group table. In addition, the output part 127 acquires the group name “the dirt comes off finely” associated with the group ID “G12”, the group name matching tree count “1” and the total tree number “3” from the group table. Following this, the output part 127 generates leaves LF12 and LF13 representing the character string in which the group name matching tree counts of the respective groups are enclosed in parentheses, and the character string in which the total tree number is enclosed in brackets, at the end of the character string representing the acquired group name with respect to the two groups, respectively. Next, the output part 127 respectively generates branches linking the leaves LF12 and LF13, and the root RT1.
Following this, the output part 127 acquires the group ID associated with the closest superior group ID “G11” (that is to say the group ID of the closest inferior group of the group G11) “G10”, from the tier structure table. Next, the output part 127 acquires the group name “the dirt of socks comes off finely” associated with the group ID “G10”, the group name matching tree count “1”, and the total tree number “2”, from the group table. Following this, the output part 127 generates a leaf LF14 representing a character string in which the group name matching tree count of the respective groups is enclosed in parentheses and a character string in which the total tree number is enclosed in brackets, at the end of a character string representing the acquired group names. Following this, the output part 127 generates a branch linking the leaf LF14 and the leaf LF12.
Similarly, the output part 127 acquires the group ID associated with the closest superior group ID “G12” (that is to say, the group ID of the closest inferior group of the group G12) “G10”, from the tier structure table. Following this, the output part 127 generates a leaf LF15 representing a character string in which the group name matching tree count of the group G10 is enclosed in parentheses and a character string in which the total tree number is enclosed in brackets, at the end of a character string representing the group name of the group G10. Following this, the output part 127 generates a branch linking the leaf LF15 and the leaf LF13.
In this manner, the output part 127 generates a tree T1 having a root RT1 and leaves LF11 to LF15. In addition, the output part 127 similarly generates a tree T2 having a root RT2 and leaves LF22 and LF23, a tree T3 having a root RT3 and leaves LF31 and LF32, and a tree T4 having a root RT4. Following this, the output part 127 displays the trees T1 to T4 on the counting results screen.
Following this, the output part 127 outputs the generated counting result screen to the LAN card 106 (step S11 of
The terminal device 20, upon receiving the counting results screen, displays the counting results screen received on a display device.
In this preferred embodiment, the input part 120 was explained as inputting responses to questionnaires received from the terminal device 20 or 21, but the responses are not limited to Japanese sentences.
With this kind of composition, the counting device 100 executes the process of step S26b in
In addition, with this composition, the counting device 100 determines whether or not multiple subtrees represent the same expressions based on phrases respectively assigned to the height of the subtree, the branch number and leaves appended to the root and branches. Here, if the height of the subtree and the number of branches differ, the modifier-head relationships of phrases respectively assigned to the leaves appended to the root and branches differ. Consequently, it is possible for the counting device 100 to determine whether or not multiple subtrees represent the same expression of the modifier-head relationship.
Furthermore, with this composition, the counting device 100 determines whether or not to represent content in which an expression represented by the subtree PT10 and an expression represented by the subtree PT20 match, based on whether or not the modifier phrases of the subtree PT10 and the modifier phrases of the subtree PT20 match and whether or not the head phrases of the subtree PT10 and the head phrases of the subtree PT20 match, in step S26b of
Furthermore, with this composition, the counting device 100 determines the inclusion relationship between the group G10 and the group G11 in step S38 of
With this composition, the counting device 100 determines, in the group categorization process shown in
With this composition, in the group categorization process shown in
In addition, with this composition, in the subtree categorization process shown in
In this preferred embodiment, the explanation used an as example a case in which the counting device 100 comprises the input part 120, saving part 121, syntax analyzing part 122, subtree generating part 123, match determination part 124, categorizing part 125, counting part 126, output part 127 and information memory 129 shown in
In the first preferred embodiment, as explained with reference to
In the second preferred embodiment, when the first subtree has a head phrase such as a root node and a prescribed number of modifier phrases modifying the head phrase, and the second subtree has a head phrase such as the root node and a prescribed number of modifier phrases modifying the head phrase, the match determination part 124 determines that the head phrases match each other upon determining that the prescribed number of modifier phrases respectively match each other, and determines that the first subtree and the second subtree match. The ideal prescribed number can be established by one skilled in the art through experimentation.
The explanation will use as an example the first subtree and second subtree shown in
The match determination part 124 determines that the sentence “I am not so interested” represented by the first subtree shown in
That is because although the root node of the first subtree and the root node of the second subtree are different and are not synonyms of each other, when the phrases “so” and “interested” modifying the root node and the phrase “I” modifying the phrase “interested” match in the first subtree and the second subtree, the match determination part 124 determines that the first subtree and the second subtree effectively match.
Users frequently respond with standardized sentences to questions such as surveys and/or the like. These standardized sentences often include multiple sentences representing the same content in which the modifier (that is to say, the modifier phrase) is standardized but the modified term (that is to say, the head phrase) is not standardized. Consequently, with this kind of composition, when a prescribed number of modifier phrases match each other even if the head phrases do not match each other, the match determination part 124 deems the head phrases to match each other. Consequently, when multiple sentences respectively represented by multiple subtrees are standardized sentences, it is possible to determine with better accuracy than before whether or not multiple subtrees represent mutually matching content.
In the first preferred embodiment, the CPU 101 shown in
When a subtree generated by the subtree generating part 123 satisfies all of the below pre-conversion conditions (1) to (3), the phrase conversion part 128 converts the subtree so as to satisfy the below post-conversion conditions (1) and (2).
Pre-conversion condition (1): The end of the modifier phrase of the subtree is the particle “to” or “to ha”.
Pre-conversion condition (2): The modifier phrase includes a verb, an adjective or a quasi-adjective (that is to say, an inflectable word) and represents an affirmative expression, and there is no comma between the modifier phrase and the head phrase modified by the modifier phrase.
Pre-conversion condition (3): The head phrase represents a negative expression.
Post-conversion condition (1): The head phrase represents an affirmative expression.
Post-conversion condition (2): The modifier phrase represents a negative expression.
The explanation will take as an example a case in which the generated subtree is the subtree shown in
The subtree shown in
This is because the end of the modifier phrase “comes off” modifying the phrase “do not think” in this subtree is the particle “to”, so the phrase conversion part 128 determines that the subtree of
In addition, the modifier phrase “comes off” of this subtree includes a verb (that is to say, an inflectable word). In addition, the modifier phrase “comes off” is an affirmative expression. Furthermore, there are no commas between the modifier phrase “comes off” and the head phrase “do not think” being modified by the modifier phrase “comes off”. Consequently, the phrase conversion part 128 determines that the subtree of
Furthermore, the head phrase “do not think” of this subtree includes a negative expression. Consequently, the phrase conversion part 128 determines that the subtree of
Because of this, the phrase conversion part 128 converts the head phrase “do not think” into “think” representing an affirmative expression and converts the modifier phrase “comes off” to “does not come off” representing a negative expression. Through this, the phrase conversion part 128 converts the subtree to the subtree shown in
The pre-conversion subtree shown in
Similarly, when the subtree generated by the subtree generating part 123 is a subtree such as is shown in
The subtree shown in
In contrast, when the subtree generated by the subtree generating part 123 is a subtree such as is shown in
The subtree shown in
The pre-conversion subtree shown in
Consequently, when the phrase “do not hear” is converted to the affirmative expression “hear” and the modifier phrase “comes off” is converted to the negative expression “does not come off”, the meaning changes. In other words, the sentence “I do not hear the dirt comes off” represented by the subtree shown in
Next, the actions of the CPU 101 accomplished by the various functional components such as the phrase conversion part 128 and/or the like shown in
The CPU 101 starts execution of the counting process shown in
Next, the phrase conversion part 128 executes a conversion process for converting subtrees satisfying all of the above-described pre-conversion conditions (1) to (3) into subtrees satisfying the above-described post-conversion conditions (1) and (2), from among multiple subtrees respectively generated in step S05 and step S06b.
Following this, the categorizing part executes the subtree categorization process shown in
Here, the explanation takes as an example a case in which for step S26b of
The sentence represented by the pre-conversion categorization target tree is “I do not think the dirt comes off” as shown in
However, the sentence represented by the post-conversion categorization target tree is the same as the sentence represented by the focus already-categorized tree and is the sentence “I think the dirt does not come off” shown in
In this preferred embodiment, the explanation assumes the counting device 100 comprises the input part 120, saving part 121, syntax analyzing part 122, subtree generating part 123, match determination part 124, categorizing part 125, counting part 126, output part 127, phrase conversion part 128 and information memory 129 shown in
In the third preferred embodiment, the explanation assumes that the phrase conversion part 128 converted a subtree so as to satisfy the above-described post-conversion conditions (1) and (2) when a subtree generated by the subtree generating part 123 satisfies all of the above-described pre-conversion conditions (1) to (3).
In other words, the explanation assumes that the phrase conversion part 128 for example converts the subtree representing the sentence “I do not think the dirt comes off” such as is shown in
In this preferred embodiment, the phrase conversion part 128 converts a subtree so as to satisfy the below-described post-conversion conditions (3) and (4) when a subtree generated by the subtree generating part 123 satisfies all of the above-described pre-conversion conditions (1) and (2) and the below-described pre-conversion condition (4).
Pre-conversion condition (4): The modifier phrase represents a negative expression.
Post-conversion condition (3): The head phrase represents a negative expression.
Post-conversion condition (4): The modifier phrase represents an affirmative expression.
In other words, the phrase conversion part 128 convers the subtree representing the sentence “I think the dirt does not come off” such as is shown in
In the first preferred embodiment, the explanation was for execution of the counting process executed by the counting device 100 when a response sentence comprising two or more phrases is input. In contrast, in this preferred embodiment, the explanation is for execution of the counting process executed by the counting device 100 when a response sentence comprising two or more phrases and a response sentence comprising just one phrase are input. Below, the explanation primarily is for differences from the first preferred embodiment.
In this preferred embodiment, the explanation assumes that response sentences to the survey question “What are the good points about this product XXX?” are input into the counting device 100.
The counting device 100 upon starting execution of the counting process shown in
Here, the explanation assumes that when the process in step S02 has ended, eight sentences such as those shown in
The eight sentences shown in
The sentences respectively identified by the sentences IDs “ST11” to “ST13” are sentences composed of two or more phrases. In contrast, the sentences respectively identified by the sentence IDs “ST14” to “ST18” are sentences composed of only one phrase.
Following step S02 in
In this manner, the syntax analyzing part 122 generates full trees FT11 to FT13 representing the sentences respectively shown in
Next, the subtree generating part 123 generates subtrees PT110 to PT112 respectively shown in
In addition, in this preferred embodiment, the subtree generating part 123 generates a subtree (hereafter referred to as the head-added subtree) PT140 such as is shown in
Similarly, the subtree generating part 123 generates a head-added subtree PT150 shown in
Following this, the subtree categorization process shown in
When execution of the subtree categorization process ends, the subtree PT110 is categorized into a group G110 and the subtree PT111 is categorized into a group G111, as shown in
After step S07a of
When execution of step S07b of
Next, the categorizing part 125 executes the group categorization process shown in
Furthermore, the categorizing part 125 specifies a tier structure in which the group G110 categorized into the group set SG11 is included by the group G112 that is a superior group to the group G110. Similarly, the categorizing part 125 specifies a tier structure in which the group G110 categorized into the group set SG12 is included by the group G111 that is a superior group to the group G110.
Next, the output part 127 executes the counting results screen generation process generating the counting results screen shown in
When the counting results screen generation process beings, the output part 127 generates trees T11 to T13 shown in
Next, the output part 127 specifies the head-added subtree PT140 shown in
Next, the output part 127 specifies the group G112 into which are categorized the subtree PT112, composed of two phrases and representing the sentence “the price is low” and effectively matching the head-added subtree PT140 representing “the cost *”, and PT120 composed of two phrases and representing the sentence “the cost is low” and completely matching the head-added subtree PT140. In addition, the output part 127 specifies the group G130 into which is categorized the subtree PT130 composed of two phrase and representing “The price is attractive” and effectively matching the head-added subtree PT140 representing “the cost *”. Next, the output part 127 determines that a group into which subtrees completely or effectively matching the modifier-added subtree PT141 does not exist.
Following this, the output part 127 determines that the total tree number “2” of the subtrees categorized into the group G112 is larger than the total tree number ‘1’ of the subtrees categorized into the group G130. The sentence “the cost” is conjectured to be a sentence provided as a response by a responder as a sentence representing the same content as content represented by the sentence “The cost is low” represented by the subtree PT112 categorized into the group G112. Consequently, the output part 127 determines that the meaning represented by the sentence “cost” composed of one phrase used in generating the added subtree PT140 is complementarily explained by the head phrase “low”.
Next, the output part 127 changes the root RT11 representing “The price is low, the cost is low” possessed by the tree T11 shown in
Next, the output part 127 specifies the head-added subtree PT150 shown in
Next, the output part 127 specifies the head-added subtree PT160 shown in
Next, the output part 127 determines that a group to which are categorized subtrees completely matching or effectively matching the head-added subtree PT160 representing “low *” does not exist. In addition, the output part specifies the group G111 to which is categorized the subtree PT111 comprising two sentences and representing the sentence “very low” and completely matching the head-added subtree PT160 representing “* low”. Furthermore, the output part 127 specifies the group G112 to which are categorized PT120 representing the sentence “The cost is low” and the subtree PT112 representing the sentence “The price is low”, each comprising two sentences and completely matching the head-added subtree PT160 representing “* low”.
Following this, the output part 127 determines that the total tree number “2” of the subtrees categorized to the group G112 is larger than the total tree number “1” of the subtrees categorized to the group G111. Consequently, the output part 127 determines that the meaning represented by the sentence “low” comprising only one sentence used in generating the head-added subtree PT160 is complementarily explained by the head phrase “the price” or “the cost”.
Next, the output part 127 changes the root RT11 representing “The price is low, the cost is low, the cost, the price” possessed by the tree T11 shown in
Following this, the output part 127 specifies the head-added subtree PT170 shown in
Next, the output part 127 determines that a group into which are categorized subtrees completely matching or effectively matching the head-added subtrees PT170 and PT180 representing “detergency *” does not exist. In addition, the output part 127 determines that a group into which are categorized subtrees completely matching or effectively matching the modifier-added subtrees PT171 and PT181 representing “* detergency” does not exist. Following this, the output part 127 generates a tree T14 possessing a root RT14 representing “detergency” and the total tree number “2”.
Following this, the output part 127 displays the trees T11 to T14 on the counting results screen.
Following this, the output part 127 outputs the generated counting result screen to the LAN card 106 (step S11 of
The first through fifth preferred embodiments can be combined with each other. It is possible to provide a counting device 100 provided with a composition for realizing the functions according to any of the first through fifth preferred embodiments, and it is also possible to provide a system that is a system comprising multiple devices and structured to realize functions according to any of the first through fifth preferred embodiments.
It is possible to provide a counting device 100 provided in advance with a composition for realizing the functions according to any of the first through fifth preferred embodiments, and by applying a program, it is possible to cause an existing counting device 100 to function as a counting device according to any of the first through fifth preferred embodiments. In other words, by enabling a counting program for causing the various functional compositions of the counting device 100 illustrated by any of the first through fifth preferred embodiments to be realized to be executed by a computer (CPU and/or the like) controlling an existing counting device, it is possible to cause this counting device to function as the counting device 100 according to any of the first through fifth preferred embodiments.
The distribution method of such a program is arbitrary, and it is possible to distribute the program by storing such for example on a recording medium such as a memory card, CD-ROM or DVD-ROM and/or the like, or to distribute the program via a communications medium such as the Internet and/or the like. In addition, the counting method according to the present invention can be implemented using a counting device 100 according to any of the first through fifth preferred embodiments.
The preferred embodiments of the present invention were described in detail above, but the present invention is not limited to the specified preferred embodiment, for various variations and changes are possible within the scope of the present invention as stated in the Claims.
Moreover, the above-described preferred embodiments are used to explain the present invention but are intended to be illustrative and not limiting on the scope of the present invention. In other words, the scope of the present invention is illustrated by the Claims and not the preferred embodiments. In addition, it is intended that the application be construed as including all such modifications and variations insofar as they come within the spirit and scope of the subject matter disclosed herein.
This application claims the benefit of Japanese Patent Application No. 2012-103996, filed on 27 Apr. 2012, the entire disclosure of which is incorporated by reference herein.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6493663, | Dec 17 1998 | Fuji Xerox Co., Ltd. | Document summarizing apparatus, document summarizing method and recording medium carrying a document summarizing program |
20070038643, | |||
JP4815934, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 06 2013 | Rakuten, Inc. | (assignment on the face of the patent) | / | |||
Jul 23 2014 | SHINZATO, KEIJI | RAKUTEN INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 033394 | /0845 | |
Sep 07 2015 | RAKUTEN, INC | RAKUTEN, INC | CHANGE OF ADDRESS | 037690 | /0315 | |
Sep 01 2021 | RAKUTEN, INC | RAKUTEN GROUP, INC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 058314 | /0657 | |
Sep 01 2021 | RAKUTEN, INC | RAKUTEN GROUP, INC | CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE PATENT NUMBERS 10342096 10671117 10716375 10716376 10795407 10795408 AND 10827591 PREVIOUSLY RECORDED AT REEL: 58314 FRAME: 657 ASSIGNOR S HEREBY CONFIRMS THE ASSIGNMENT | 068066 | /0103 |
Date | Maintenance Fee Events |
Feb 16 2021 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Dec 11 2024 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Date | Maintenance Schedule |
Aug 22 2020 | 4 years fee payment window open |
Feb 22 2021 | 6 months grace period start (w surcharge) |
Aug 22 2021 | patent expiry (for year 4) |
Aug 22 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 22 2024 | 8 years fee payment window open |
Feb 22 2025 | 6 months grace period start (w surcharge) |
Aug 22 2025 | patent expiry (for year 8) |
Aug 22 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 22 2028 | 12 years fee payment window open |
Feb 22 2029 | 6 months grace period start (w surcharge) |
Aug 22 2029 | patent expiry (for year 12) |
Aug 22 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |