A compressed lexicon is built by receiving a word list, which includes word-dependent data associated with each word in the word list. A word is selected from the word list. A hash value is generated based on the selected word, and the hash value identifies an address in a hash table which, in turn, is written with a location in lexicon memory that is to hold the compressed form of the selected word, and the compressed word-dependent data associated with the selected word. The word is then encoded, or compressed, as is its associated word-dependent data. This information is written at the identified location in the lexicon memory.
|
12. A method of accessing word information related to a word stored in a compressed speech lexicon, comprising:
receiving the word;
accessing an index to obtain a word location in the compressed speech lexicon that contains information associated with the received word including word-dependent data selected from the group consisting of a pronunciation and a part-of-speech;
reading encoded word information from the word location; and
decoding the word information for use in a speech application.
1. A method of building a compressed speech lexicon for use in a speech application, comprising:
receiving a word list configured for use in the speech application, the word list including a plurality of words, with each word in the word list having associated word-dependent data selected from the group consisting of a pronunciation and part-of-speech;
selecting one of the words from the word list;
generating an index entry identifying a location in a compressed speech lexicon memory for holding the selected word;
encoding the selected word and its associated word-dependent data to obtain encoded words and associated encoded word-dependent data; and
writing the encoded word and its associated word-dependent data at the identified location in the speech lexicon memory.
19. A compressed speech lexicon builder for building a compressed speech lexicon for use in a speech application based on a word list containing a plurality of domains, the domains including words and word-dependent data associated with each of the words, the compressed speech lexicon builder comprising:
a plurality of domain encoders, one domain encoder being associated with each domain in the word list, the domain encoders being configured to compress the words and the associated word-dependent data selected from the group consisting of a pronunciation and a part-of-speech, to obtain compressed words and compressed word-dependent data;
a hashing component configured to generate a hash value for each word in the word list;
a hash table generator, coupled to the hashing component, configured to determine a next available location in a speech lexicon memory and write, at an address in a hash table identified by the hash value, the next available location in the speech lexicon memory; and
a speech lexicon memory generator, coupled to the domain encoders and the hash table generator, configured to store in the speech lexicon memory, for use by the speech application, the compressed words and compressed word-dependent data, each compressed word and its associated compressed word-dependent data being stored at the next available location in the speech lexicon memory written in the hash table at the hash table address associated with the compressed word.
2. The method of
repeating the steps of selecting, generating, encoding and writing for each word in the word list and the associated word-dependent data.
3. The method of
writing codebooks corresponding to the encoded words and the encoded word-dependent data in the speech lexicon memory.
4. The method of
counting the words in the word list;
allocating a hash table memory based on a number of words in the word list; and
allocating a speech lexicon memory based on the number of words in the word list.
5. The method of
providing a word encoder to encode the words in the word list and encoding the words with the word encoder; and
providing word-dependent data encoders for each type of word-dependent data in the word list and encoding the word-dependent data with the word-dependent data encoders.
6. The method of
Hufmann encoding the selected word and its associated word-dependent data.
7. The method of
writing a data structure comprising:
a word portion containing the encoded word;
a word-dependent data portion containing the encoded word-dependent data; and
wherein each word-dependent data portion has an associated last indicator portion and word-dependent data indicator portion, the last indicator portion containing an indication of a last portion of word-dependent data associated with the selected word, and the word-dependent data indicator portion containing an indication of the type of word-dependent data stored in the associated word dependent data portion.
8. The method of
9. The method of
determining a next available location in the speech lexicon memory.
10. The method of
calculating a hash value for the selected word;
indexing into the hash table to an index location based on the hash value; and
writing location data identifying the next available location in the speech lexicon memory into the index location in the hash table.
11. The method of
writing an offset into the speech lexicon memory that corresponds to the next available location in the speech lexicon memory.
13. The method of
prior to reading the encoded word information, reading an encoded word from the word location;
decoding the encoded word; and
verifying that the decoded word is the same as the received word.
14. The method of
initializing decoders associated with the word and its associated information.
15. The method of
calculating a hash value based on the received word;
finding an index location in the index based on the hash value; and
reading from the index location a pointer value pointing to the word location in the compressed lexicon.
16. The method of
reading a plurality of fields from the word location containing variable length word information.
17. The method of
prior to reading each field, reading data type header information indicating a type of word information in an associated field.
18. The method of
reading a last field indicator indicating whether an associated one of the plurality of fields is a last field associated with the received word.
20. The compressed speech lexicon builder of
a codebook generator generating a codebook associated with each domain encoder.
|
The present application is based on and claims the benefit of U.S. Provisional Patent Application Ser. No. 60/219,861, filed Jul. 20, 2000, the content of which is hereby incorporated by reference in its entirety.
The present invention deals with a lexicon for use by speech recognition and speech synthesis technology. In particular, the present invention relates to an apparatus and method for compressing a lexicon and accessing the compressed lexicon, as well as the compressed lexicon data structure.
Speech synthesis engines typically include a decoder which receives textual information and converts it to audio information which can be synthesized into speech on an audio device. Speech recognition engines typically include a decoder which receives audio information in the form of a speech signal and identifies a sequence of words from the speech signal.
In speech recognition and text-to-speech (speech synthesis) systems, a lexicon is used. The lexicon can contain a word list and word-dependent data, such as pronunciation information and part-of-speech information (as well as a wide variety of other information). The lexicon is accessed by a text-to-speech system, for example, in order to determine the proper pronunciation of a word which is to be synthesized.
In such systems (speech recognition and text-to-speech) a large vocabulary lexicon is typically a highly desirable feature. However, it is also desirable to provide speech recognition and speech synthesis tasks very quickly. Due to the large number of words which can be encountered by such systems, the lexicon can be extremely large. This can take an undesirable amount of memory.
Compression of data, however, brings its own disadvantages. For example, many compression algorithms make it cumbersome to recover the compressed data. This often requires an undesirable amount of time, especially with respect to the desired time limitations imposed on speech recognition and speech synthesis tasks. Further, since a conventional lexicon may contain in excess of 100,000 words, along with and each word's associated word-dependent data, it can take an undesirable amount of time to build the compressed lexicon based upon an input text file containing the uncompressed lexicon. Similarly, many compression algorithms can render the compressed text non-extensible, or can make it quite cumbersome to extend the compressed data. However, it may be desirable to change the lexicon, or modify the lexicon by adding or deleting words. Similarly, it may be desirable to add additional word-dependent data to the lexicon or delete certain types of word-dependent data from the lexicon. Therefore, limiting the extensibility of the lexicon is highly undesirable in speech-related systems.
A compressed lexicon is built by receiving a word list, which includes word-dependent data associated with each word in the word list. A word is selected from the word list. A hash value is generated based on the selected word, and the hash value identifies an address in a hash table which, in turn, is written with a location in lexicon memory that is to hold the compressed form of the selected word, and the compressed word-dependent data associated with the selected word. The word is then encoded, or compressed, as is its associated word-dependent data. This information is written at the identified location in the lexicon memory.
In one embodiment, each type of word-dependent data which is compressed and written in the lexicon memory is written in a word-dependent data portion and has an associated header portion. The header portion includes a last data portion indicator which indicates whether the associated word dependent data portion is the last one associated with the selected word. The header also includes a word-dependent data type indicator indicating the type of word-dependent data stored in the associated word-dependent data portion.
Another embodiment of the present invention includes accessing a compressed lexicon. In that embodiment, a word is received and an index is accessed to obtain a word location in the compressed lexicon that contains information associated with the received word. Encoded word information is read from the word location and is decoded. The header information can be read as well, to determine the type of word-dependent data being decoded as well as whether any additional word-dependent data is associated with the received word.
The present invention can be implemented as a compressed lexicon builder and a compressed lexicon accesser for building and accessing the compressed lexicon discussed above.
The present invention can also be implemented as a data structure for a compressed lexicon which includes a word portion storing a compressed word, a word-dependent data portion storing a first type of compressed word-dependent data, and a header portion associated with each word-dependent data portion storing a type indicator indicating the type of word-dependent data in the associated word-dependent data portion and a last field indicator indicating whether the word-dependent data portion is a last word-dependent data portion associated with the compressed word.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 100. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier WAV or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, FR, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way o example, and not limitation,
The computer 110 may also include other removable/non-removable volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
A user may enter commands and information into the computer 110 through input devices such as a keyboard 162, a microphone 163, and a pointing device 161, such as a mouse, trackball or touch pad. Other input devices (not shown) may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110. The logical connections depicted in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
In one illustrative embodiment, speech middleware component 204 is implemented in the operating system 134 illustrated in
Briefly, in operation, speech middleware component 204 resides between applications 202 and engines 206 and 208. Applications 202 can be speech recognition and speech synthesis applications which desire to invoke engines 206 and 208. In doing so, applications 202 make calls to speech middleware component 204 which, in turn, makes calls to the appropriate engines 206 and 208 in order to have speech recognized or synthesized. For example, applications 202 may provide the source of audio data for speech recognition. Speech middleware component 204 passes that information to speech recognition engine 206 which simply recognizes the speech and returns a recognition result to speech recognition middleware component 210. Speech recognition middleware component 210 places the result in a desired format and returns it to the application 202 which requested it. Similarly, an application 202 can provide a source of textual data to be synthesized. TTS middleware component 214 assembles that data, and provides it to TTS engine 208, for synthesis. TTS engine 208 simply synthesizes the data and returns audio information to TTS middleware component 214, which handles spooling of that information to an audio device, writing that information to memory, or placing that information in any other desired location, as specified by the application 202 which requested it.
CFG engine 212, briefly, assembles and maintains grammars which are to be used by speech recognition engine 206. This allows multiple applications and multiple grammars to be used with a single speech recognition engine 206.
In one embodiment, both engines 206 and 208 may wish to access a lexicon to perform recognition or synthesis, respectively. In one illustrative embodiment, speech middleware component 204 contains a lexicon which can be used by both engines. In that case, the engines 206 and 208 communicate with speech middleware component 204, illustratively through methods exposed by interfaces on speech middleware component 204. Thus, the compressed lexicon in accordance with one aspect of the present invention can be implemented in speech middleware component 204 as an instantiated object which is accessible by the engines 206 and 208.
Of course, it will be appreciated that the inventive aspects of the present invention with respect to the compressed lexicon and the apparatus and method for creating and accessing the compressed lexicon are independent of the specific structure shown in
More specifically,
In one specific embodiment, lexicon text file 222 is composed of a plurality of entries. Each entry includes a word 236 (in
In any case, a counter component 242 in component 220 first counts the number of words contained in the word list provided in text file 222. This is indicated by block 247 in
Memory generator 246 then allocates sufficient lexicon memory 228 to hold the compressed words in the word list, along with the word-dependent data in compressed form. Memory generator 246 also allocates sufficient memory 226 to hold codebooks 230 which can be used for decoding the compressed words and word-dependent data in lexicon memory 228. Allocating these memories is indicated by block 248 in
Recall that, in the present description, the lexicon includes a word and two types of word-dependent data (the word's pronunciation, and the word's part-of-speech). The present description proceeds by referring to each of those items as a domain. In other words, the word is a domain, the first type of word-dependent data (the pronunciation) is a separate domain as is the second type of word-dependent data (the part-of-speech). Each domain has an associated alphabet. The alphabet for the English word's domain is the English alphabet (26 symbols). The alphabet for the pronunciation domain is illustratively the international phonetic alphabet (IPA), or a custom phone set. For a particular language, such as English, there are approximately 50 phones in the international phonetic alphabet. The alphabet for the parts-of-speech domain is the parts-of-speech in the language. Since the size of the alphabet in all three domains is small (e.g., less that 100 symbols) a suitable encoding algorithm can be chosen. In one illustrative embodiment, the well known Huffman encoding algorithm is used for compressing the members of each domain.
Therefore, compressed lexicon memory generator 246 first selects a word in the word list. That word is provided to hash table generator 252 which provides the word (e.g., word 253) to hashing component 224. Hashing component 224 implements any desired hashing algorithm to calculate a hash value based on word 253. This is indicated by block 254 in
Hash table generator 252, in turn, indexes into hash table 232 to an address indicated by the word hash value 256. This is indicated by block 258 in FIG. 4. In one illustrative embodiment, hashing component 224 implements an imperfect hash algorithm. In that case, two or more words may have the same hash value. In such an embodiment, if two or more words have the same hash value, this is identified as a collision, and must be resolved using any number of desirable, and well known collision algorithms. Resolving collisions is indicated by block 260 in
In any case, once the collisions have been resolved, if there were any, memory generator 246 determines the next available memory location in lexicon memory 228. This is indicated by block 262 in
Once the next available location in lexicon memory 228 has been identified and written at the index location in hash table 232, the word for which the hash value was calculated, along with its word-dependent data (pronunciation 238 and part-of-speech 240) are provided to domain encoders 266. In one illustrative embodiment, since there are three domains in the present lexicon, there are three domain encoders 266. Specifically, in the embodiment currently being discussed, domain encoders 266 include a word domain encoder 268, a pronunciation domain encoder 270 and a part-of-speech domain encoder 272. As discussed above, the domain encoders may illustratively implement the Huffman encoding algorithm to compress the word 236, pronunciation data 238 and part-of-speech data 240. Encoding the word and the word-dependent data domains is indicated by block 274 in
The encoded domains are then provided to memory generator 246 which writes them at the next available location in lexicon memory 228.
Memory generator 246 then determines whether there are any additional words in the word list 222. If so, the next word in the word list is selected and processing continues at block 250. This is indicated by block 278 in
However, if all of the words and word-dependent data in the word list have been encoded and written to lexicon memory 228, then codebook generator 280 writes the codebooks 230 associated with each of the domain encoders 268, 270 and 272 into memory 226. This is indicated by block 280. This way, the compressed lexicon is created.
Each field will now be discussed in greater detail. Of course, encoded word field 290 simply contains the value indicative of the encoded or compressed form of the word from the word list. The word-dependent type field 292 and the last indicator field 294 form a simple header. In one illustrative embodiment, each field 292 and 294 is composed of a single bit. The word-dependent data type field 292 indicates which type of word-dependent data is contained in the following field 296. Therefore, if the bit is a one, the type of encoded word-dependent data in the following field 296 is illustratively pronunciation information. If the bit is a zero, the type of word-dependent data encoded in field 296 is part-of-speech information. Of course, if there are more than two types of word-dependent data in the compressed lexicon, then field 292 must contain additional bits.
The last indicator field 294 is also illustratively a single bit and simply indicates whether the encoded word-dependent data field 296 is the last field in the record associated with the encoded word 290. In other words, in one embodiment, if the last field indicator bit 294 is a one, then the encoded word-dependent data in field 296 is the last portion of information contained in the record associated with encoded word 290. However, if that bit is zero, this indicates that there are additional encoded word-dependent data fields (such as 296′) which are associated with encoded word 290.
Of course, the number of bits in the header fields 292 and 294 depends on the number of types of word-dependent data. If the number of types of word-dependent data is 2, then 1 bit is required for differentiating the type (in field 292) and 1 bit is needed for signaling the last word (in field 294). So 2 bits are needed for the header fields 292 and 294. If the number of types of word-dependent data is 3 or 4, then 2 bits are required for differentiating the types of 1 bit for signaling the last word. Therefore, the header fields 292 and 294 have 3 bits.
In accessing the lexicon in memory 226, component 220 first receives a word at its input and is expected to provide the word-dependent data (such as the word pronunciation or its part-of-speech) at its output. Therefore, component 220 first receives a word. This is indicated by block 310 in
Component 320 provides the word to hash table generator 252, which, in turn, provides the word to hashing component 224. Hashing component 224 is illustratively the same hashing component used to create hash table 232. Therefore, hashing component 224 creates a word hash value for the word passed into it. This is indicated by block 312 in
The word hash value 256 is provided from hash table generator 252 to memory generator 246. Memory generator 246 indexes into hash table 232 based on the hash value computed in block 312. This is indicated by block 314 in
Memory generator 246 then initializes decoders 302 and causes codebook generator 280 to initialize codebooks 230 for each domain in the lexicon memory 228. This is indicated by block 318 in
Memory generator 246 then begins reading from the beginning of the record in lexicon memory 228. Therefore, memory generator 246 first reads from the offset value given in the hash table the encoded word field 290 until it encounters a null separator, or another separator 291. This is indicated by block 320. The value read from encoded word field 290 is passed to word decoder 304 which decodes the word using the codebook initialized by codebook generator 280. This is indicated by block 322. Memory generator 246 then compares the decoded word with the word which was received to ensure that they are the same. Of course, this is similar to collision avoidance. If the hashing algorithm is a perfect hashing algorithm, then it will be guaranteed that the hash value associated with a word will not lead to an erroneous location in memory. In that case, the verification step indicated by block 324 in
Once the proper word has been read from lexicon memory 228, and decoded, then the two bit header information in fields 292 and 294 is read by memory generator 246. Based on this information, memory generator 246 can determine what type of word-dependent data is encoded in the following word-dependent data field 296. In this way, memory generator 246 can pass the encoded word-dependent data from field 296 to the appropriate decoder 306 or 308. Based on the information in field 294, memory generator 246 can determine whether the word-dependent data in field 296 is the last word-dependent data associated with this record (or this word). This is indicated by block 326 in
Once this determination is made, memory generator 246 begins reading the encoded information in the encoded word-dependent data field 296 until a null separator or other separator 291 is encountered. This is indicated by block 328.
Memory generator 246 then passes the encoded word-dependent data from field 296 to the appropriate word-dependent data decoder 306 or 308 which decodes the information with the corresponding codebook 280 which has been initialized. This is indicated by block 330 in
Memory generator 246 then reads out the next header segment 292′ and 294′ if any additional word-dependent data is encoded for the word in the present record. If not, and the last item of word-dependent data has been read out, compressed lexicon memory generator 246 simply provides the decoded word and/or its associated decoded word-dependent data at its output. This is indicated by blocks 332 and 334 in
The following is but one illustrative API for accessing information in the lexicon. The API illustrates possible part-of-speech categories, lexicon types, word types, structures and interfaces for getting, adding, and removing pronunciations. of course, any desired interface can be used.
//--- ISpLexicon ------------------------------------
typedef enum SPPARTOFSPEECH
{
//--- SAPI5 public POS category values (bits 28-31)
SPPS_NotOverriden
= −1,
SPPS_Unknown
= 0,
// Probably from user lexicon
SPPS_UnknownXMLTag
= 0x1000,
// Used when private tags
are passed to engine
SPPS_Noun
= 0x2000,
SPPS_Verb
= 0x3000,
SPPS_Modifier
= 0x4000,
SPPS_Function
= 0x5000,
SPPS_Interjection
= 0x6000,
} SPPARTOFSPEECH;
typedef enum SPLEXICONTYPE
{
eLEXTYPE_USER
= (1L << 0),
eLEXTYPE_APP
= (1L << 1),
eLEXTYPE_RESERVED1
= (1L << 2),
eLEXTYPE_RESERVED2
= (1L << 3),
eLEXTYPE_RESERVED3
= (1L << 4),
eLEXTYPE_RESERVED4
= (1L << 5),
eLEXTYPE_RESERVED5
= (1L << 6),
eLEXTYPE_RESERVED6
= (1L << 7),
eLEXTYPE_RESERVED7
= (1L << 8),
eLEXTYPE_RESERVED8
= (1L << 9),
eLEXTYPE_RESERVED9
= (1L << 10),
eLEXTYPE_RESERVED10
= (1L << 11),
eLEXTYPE_PRIVATE1
= (1L << 12),
eLEXTYPE_PRIVATE2
= (1L << 13),
eLEXTYPE_PRIVATE3
= (1L << 14),
eLEXTYPE_PRIVATE4
= (1L << 15),
eLEXTYPE_PRIVATE5
= (1L << 16),
eLEXTYPE_PRIVATE6
= (1L << 17),
eLEXTYPE_PRIVATE7
= (1L << 18),
eLEXTYPE_PRIVATE8
= (1L << 19),
eLEXTYPE_PRIVATE9
= (1L << 20),
eLEXTYPE_PRIVATE10
= (1L << 21),
eLEXTYPE_PRIVATE11
= (1L << 22),
eLEXTYPE_PRIVATE12
= (1L << 23),
eLEXTYPE_PRIVATE13
= (1L << 24),
eLEXTYPE_PRIVATE14
= (1L << 25),
eLEXTYPE_PRIVATE15
= (1L << 26),
eLEXTYPE_PRIVATE16
= (1L << 27),
eLEXTYPE_PRIVATE17
= (1L << 28),
eLEXTYPE_PRIVATE18
= (1L << 29),
eLEXTYPE_PRIVATE19
= (1L << 30),
eLEXTYPE_PRIVATE20
= (1L << 31),
} SPLEXICONTYPE;
typedef enum SPWORDTYPE
{
eWORDTYPE_ADDED
= (1L << 0),
eWORDTYPE_DELETED
= (1L << 1)
} SPWORDTYPE;
typedef [restricted] struct SPWORDPRONUNCIATION
{
struct SPWORDPRONUNCIATION
* pNextWordPronunciation;
SPLEXICONTYPE
eLexiconType;
LANGID
LangID;
WORD
wReserved;
SPPARTOFSPEECH
ePartOfSpeech;
WCHAR
szPronunciation [1];
} SPWORDPRONUNCIATION;
typedef [restricted] struct SPWORDPRONUNCIATIONLIST
{
ULONG
ulSize;
BYTE
* pvBuffer;
SPWORDPRONUNCIATION
* pFirstWordPronunciation;
} SPWORDPRONUNCIATIONLIST;
typedef [restricted] struct SPWORD
{
struct SPWORD
* pNextWord;
LANGID
LangID;
WORD
wReserved;
SPWORDTYPE
eWordType;
WCHAR
* pszWord;
SPWORDPRONUNCIATION
* pFirstWordPronunciation;
} SPWORD;
typedef [restricted] struct SPWORDLIST
{
ULONG
ulSize;
BYTE
* pvBuffer;
SPWORD
* pFirstWord;
} SPWORDLIST;
[
object,
uuid(DA41A7C2-5383-4db2-916B-6C1719E3DB58),
helpstring (“ISpLexicon Interface”),
pointer_default (unique),
restricted
]
interface ISpLexicon : IUnknown
{
HRESULT GetPronunciations(
[in] const WCHAR * pszWord,
[in] LANGID LangID,
[in] DWORD dwFlags,
[in, out] SPWORDPRONUNCIATIONLIST *
pWordPronunciationList
);
HRESULT AddPronunciation(
[in] const WCHAR * pszWord,
[in] LANGID LangID,
[in] SPPARTOFSPEECH ePartOfSpeech,
[in] const WCHAR * pszPronunciation
);
HRESULT RemovePronunciation(
[in] const WCHAR * pszWord,
[in] LANGID LangID,
[in] SPPARTOFSPEECH ePartOfSpeech,
[in] const WCHAR * pszPronunciation
);
};
It can thus be seen that, by decoding the domains separately, and using hash table 232, an enormous amount of information can be compressed, yet be accessed very quickly. Similarly, by using the simple header scheme in fields 292 and 294, and by using appropriate separators, the lexicon compressed in accordance with the present invention can easily be extended such that additional words can be added or additional word-dependent data can be added for individual entries. Similarly, this allows the portions containing the compressed words and word-dependent data to be variable lengths. Also, the present invention is designed to be language-independent. In other words, since the lexicon is extensible, and since the individual encoded domains of the lexicon can be variable length, the fundamental operation of the present system remains unchanged regardless of the particular language being compressed. Therefore, the present invention provides significant advantages over prior art systems.
It should also be noted that any number of the components in
Although the present invention has been described with reference to particular embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.
Patent | Priority | Assignee | Title |
10354006, | Oct 26 2015 | International Business Machines Corporation | System, method, and recording medium for web application programming interface recommendation with consumer provided content |
8078454, | Sep 28 2007 | Microsoft Technology Licensing, LLC | Two-pass hash extraction of text strings |
9552349, | Aug 31 2006 | MAPLEBEAR INC | Methods and apparatus for performing spelling corrections using one or more variant hash tables |
Patent | Priority | Assignee | Title |
4771401, | Feb 18 1983 | VANTAGE TECHNOLOGY HOLDINGS, LLC | Apparatus and method for linguistic expression processing |
4914590, | May 18 1988 | PRC INC A CORPORATION OF DE | Natural language understanding system |
5621859, | Jan 19 1994 | GOOGLE LLC | Single tree method for grammar directed, very large vocabulary speech recognizer |
5630121, | Feb 02 1993 | International Business Machines Corporation | Archiving and retrieving multimedia objects using structured indexes |
5696980, | Apr 30 1992 | Sharp Kabushiki Kaisha | Machine translation system utilizing bilingual equivalence statements |
5748974, | Dec 13 1994 | Nuance Communications, Inc | Multimodal natural language interface for cross-application tasks |
5768580, | May 31 1995 | Oracle Corporation | Methods and apparatus for dynamic classification of discourse |
5778405, | Nov 10 1995 | Fujitsu Ltd. | Apparatus and method for retrieving dictionary based on lattice as a key |
5799273, | Sep 27 1996 | ALLVOICE DEVELOPMENTS US, LLC | Automated proofreading using interface linking recognized words to their audio data while text is being changed |
5819220, | Sep 30 1996 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Web triggered word set boosting for speech interfaces to the world wide web |
5864863, | Aug 09 1996 | EUREKA DATABASE SOLUTIONS, LLC | Method for parsing, indexing and searching world-wide-web pages |
5865626, | Aug 30 1996 | GOOGLE LLC | Multi-dialect speech recognition method and apparatus |
5913192, | Aug 22 1997 | Nuance Communications, Inc | Speaker identification with user-selected password phrases |
5933525, | Apr 10 1996 | Raytheon BBN Technologies Corp | Language-independent and segmentation-free optical character recognition system and method |
5991720, | May 06 1996 | MATSUSHITA ELECTRIC INDUSTRIAL CO , LTD | Speech recognition system employing multiple grammar networks |
5995928, | Oct 02 1996 | SPEECHWORKS INTERNATIONAL, INC | Method and apparatus for continuous spelling speech recognition with early identification |
6018708, | Aug 26 1997 | RPX CLEARINGHOUSE LLC | Method and apparatus for performing speech recognition utilizing a supplementary lexicon of frequently used orthographies |
6021409, | Aug 09 1996 | EUREKA DATABASE SOLUTIONS, LLC | Method for parsing, indexing and searching world-wide-web pages |
6044347, | Aug 05 1997 | AT&T Corp; Lucent Technologies Inc | Methods and apparatus object-oriented rule-based dialogue management |
6064959, | Mar 28 1997 | Nuance Communications, Inc | Error correction in speech recognition |
6076056, | Sep 19 1997 | Microsoft Technology Licensing, LLC | Speech recognition system for recognizing continuous and isolated speech |
6138098, | Jun 30 1997 | Nuance Communications, Inc | Command parsing and rewrite system |
6243678, | Apr 07 1998 | AVAYA Inc | Method and system for dynamic speech recognition using free-phone scoring |
6314399, | Jun 12 1998 | Denso Corporation | Apparatus for generating a statistical sequence model called class bi-multigram model with bigram dependencies assumed between adjacent sequences |
6321372, | |||
6374226, | Aug 06 1999 | Oracle America, Inc | System and method for interfacing speech recognition grammars to individual components of a computer program |
6377913, | Aug 13 1999 | Nuance Communications, Inc | Method and system for multi-client access to a dialog system |
6377925, | Dec 16 1999 | PPR DIRECT, INC | Electronic translator for assisting communications |
6456974, | Jan 06 1997 | Texas Instruments Incorporated | System and method for adding speech recognition capabilities to java |
6466909, | Jun 28 1999 | AVAYA Inc | Shared text-to-speech resource |
6470306, | Apr 23 1996 | CANTENA SERVICE AGENT CORPORATION; CATENA SERVICE AGENT CORPORATION | Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens |
6487533, | Jul 03 1997 | AVAYA Inc | Unified messaging system with automatic language identification for text-to-speech conversion |
6513009, | Dec 14 1999 | Nuance Communications, Inc | Scalable low resource dialog manager |
6526381, | Sep 30 1999 | Intel Corporation | Remote control with speech recognition |
6535886, | Oct 18 1999 | Sony Corporation; Sony Electronics Inc. | Method to compress linguistic structures |
6618703, | Oct 26 1999 | Nuance Communications, Inc | Interface to a speech processing system |
6636831, | Apr 09 1999 | INROAD, INC | System and process for voice-controlled information retrieval |
7027975, | Aug 08 2000 | RPX Corporation | Guided natural language interface system and method |
20020138265, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 22 2000 | MOHAMMED, YUNUS | Microsoft Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011429 | /0162 | |
Dec 29 2000 | Microsoft Corporation | (assignment on the face of the patent) | / | |||
Oct 14 2014 | Microsoft Corporation | Microsoft Technology Licensing, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 034541 | /0001 |
Date | Maintenance Fee Events |
Apr 18 2012 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Apr 29 2016 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Jun 29 2020 | REM: Maintenance Fee Reminder Mailed. |
Dec 14 2020 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Nov 11 2011 | 4 years fee payment window open |
May 11 2012 | 6 months grace period start (w surcharge) |
Nov 11 2012 | patent expiry (for year 4) |
Nov 11 2014 | 2 years to revive unintentionally abandoned end. (for year 4) |
Nov 11 2015 | 8 years fee payment window open |
May 11 2016 | 6 months grace period start (w surcharge) |
Nov 11 2016 | patent expiry (for year 8) |
Nov 11 2018 | 2 years to revive unintentionally abandoned end. (for year 8) |
Nov 11 2019 | 12 years fee payment window open |
May 11 2020 | 6 months grace period start (w surcharge) |
Nov 11 2020 | patent expiry (for year 12) |
Nov 11 2022 | 2 years to revive unintentionally abandoned end. (for year 12) |