Task parallelization in a text-to-text system

Task parallelization in a text-to-text system
US7389222

Parallelization of word alignment for a text-to-text operation. The training data is divided into multiple groups, and training is carried out of each group on separate processors. Different techniques can be carried out to increase the speed of the processing. The hookups can be done only once for all of multiple different iterations. Moreover, parallel operations can apply only to the counts, since this may be the most time-consuming part.

PTO Wrapper PDF
Dossier Espace Google

Patent 7389222
Priority Aug 02 2005
Filed Apr 26 2006
Issued Jun 17 2008
Expiry Sep 13 2025 Extension 42 days
Inventors Yamada, Ke…
Assg.orig LANGUAGE W…
Assg.curr SDL INC
Entity Large
Referenced by 62
References 64
Maint.: all paid

CROSS REFERENCE TO R…
BACKGROUND
SUMMARY
BRIEF DESCRIPTION OF…
DETAILED DESCRIPTION

1. A method comprising:

dividing a corpus of information among multiple work units and carrying out a text-to text operation in each of said work units; and

maintaining a single parameter table for all the work carried out in all the work units, wherein said parameter table is a probability table with probabilities of word to word translation.

18. A method, comprising:

dividing a training corpus into at least a plurality of groups;

carrying out a training operation for a text to text application substantially simultaneously on each of said plurality of groups, using separate processors for each of said groups and using a single table of information indicative of word probabilities, for each of said groups, and using said training operation to update said single probability table based on training information obtained from each of said groups, wherein said single probability table comprises probabilities of word to word translations.

13. A computer system, comprising:

a master computer, connected to a corpus of training information about text-to-text operations, having a plurality of work unit computers, having separate processors from said master computer, and said master computer running a routine that maintains a table of information related to training based on said corpus, a routine that provides separated portions of said corpus and said work unit computers, and accumulates information indicative of training each of said work unit computers and maintains said table of information, wherein said table of information includes a probability of word to word translation.

2. A method as in claim 1, wherein said text-to-text determination operation is a word alignment.

3. A method as in claim 1, wherein said text to text operation that is carried out in each of said work units forms a table of counts based on probabilities of hookups for word to word pairing.

4. A method as in claim 1, wherein said text to text operation is carried out in multiple computing iterations.

5. A method as in claim 4, wherein at least one subsequent iteration uses a parameter table from a previous iteration.

6. A method as in claim 4, wherein said multiple computing iterations include a first iteration which computes word-to-word hookup information, and a subsequent iteration which uses said hookup information from the first iteration.

7. A method as in claim 1, wherein said text-to-text operation uses a model 1 algorithm.

8. A method as in claim 1, wherein said dividing comprises a random division of information.

9. A method as in claim 1, wherein said dividing comprises sorting information, and selecting units of information based on said sorting.

10. A method as in claim 1, further comprising monitoring said each of said work units to detect work units that are requiring longer calculation times than other work units.

11. A method as in claim 1, wherein said carrying out a determination comprises doing an initialization, and subsequently doing multiple iterations beyond said initialization.

12. A method as in claim 11, further comprising carrying out alignment after said iterations.

14. A computer system as in claim 13, wherein said master computer also processes at least one of said separated portions of said corpus.

15. A computer system as in claim 13, wherein said training in said working unit computers comprises a word alignment operation.

16. A computer system as in claim 13, wherein said training in said work unit computers comprises multiple computing iterations based on the same data.

17. A computer system as in claim 13, further comprising using a parameter table from a previous iteration in a subsequent iteration.

19. A method as in claim 18, wherein said training operation is a word alignment.

20. A method as in claim 18, wherein said training operation is a computation of counts.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent application Ser. No. 11/196,785, filed Aug. 2, 2005 now abandoned, and entitled “Task Parallelization in a Text-to-Text System,” which is herein incorporated by reference.

BACKGROUND

Text to text applications include machine translation and other machine intelligence systems such as speech recognition and automated summarization. These systems often rely on training that is carried out based on information from specified databases known as corpora.

A training pipeline may include many millions of words. It is not uncommon for the training to take weeks. There is often a tradeoff between the speed of the processing and the accuracy of the obtained information.

It is desirable to speed up the training of such a system.

SUMMARY

The present application describes parallelization of certain aspects of training. Specifically, an embodiment describes how to parallelize a training task which requires knowledge about previous training portions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a block diagram of a machine text training system; and

FIG. 2 illustrates a flowchart of operation.

DETAILED DESCRIPTION

The general structure and techniques, and more specific embodiments which can be used to effect different ways of carrying out the more general goals are described herein.

A current training system may require as long as two weeks to train 100 million words. Of course, faster processors may reduce that time. Parallelization of these operations by partitioning the input corpus is not straightforward, however, since certain operations may require accumulation of accumulated results from other operations. Multiple operations that are operating in multiple processors would not have access to the results of the other processors.

In evaluating the entire training pipeline for machine translation, it was noticed that word alignment takes by far the most time of the entire process. For example, word alignment may take an order of magnitude longer than any of the other 11 processes that are used during training. Parallelization of word alignment can hence speed up training.

The embodiment shown in FIG. 1 uses multiple different known techniques to determine word alignment. For example, FIG. 1 shows using a Model 1, and an HMM model to determine word alignment. The overall algorithm is the well-known expectation maximization algorithm to determine the most likely hookups between the words in parallel corpora of data.

In operation, the expectation maximization algorithm collects counts which are formed from arbitrary choices of probabilities between words in the full corpus. The words in the corpus are analyzed, to find all word to word pairings. A determination of probabilities of hookups is used to form a table of counts. That table of counts is then used, along with the corpus, to determine further probabilities. This process is then iterated.

The task of determining the word alignments requires analysis of both the table of probabilities from the final iteration of the expectation maximization algorithm, as well as the corpus information.

Since the accumulation and normalization of count information is necessary, dividing this task into multiple processors is not a straightforward issue of simply dividing the work among processors and performing multiple isolated iterations of expectation maximization.

FIG. 1 illustrates an embodiment of the word alignment operations that can be carried out in multiple parallelized processors. The corpus 100 includes the training material, that includes, for example, parallel information in two different languages. This training material is used to create the final probability tables and alignment. A master computer 99 executes the operations flowcharted according to FIG. 2. The master computer maintains a “T table”, which is a table of probabilities of word to word translation and other model parameters.

The master computer 99 runs a T table manager 105 which updates the interim T table and other model parameters 110 with counts and probabilities. The T table manager accumulates all of the data from all of the different evaluation passes through the corpus. These evaluations may create parameters and information other than the T table. The embodiment emphasizes the T table, because it is usually very large and hence its manipulation and storage requires significant resources, such as computer RAM. Many, if not all word alignment models, also share this set of parameters. The embodiment contemplates operation with other models such as HMM, model 2 and others. These models may use additional parameters, which may not be specifically discussed herein.

At 200, the master determines pieces of the corpus, shown as 120. Each of those pieces forms a sub corpus 121, 122, 123. These form one component of a “work unit”. The master also creates sub T tables at 210 that include only the word-to-word hookups that occur in the corresponding sub-corpus, shown as 125, 126, 127. The smaller tables minimize the memory requirements of the work unit.

If the model has additional parameters, these are included in the work unit as well.

Computing which word-to-word hookups appear in a given sub-corpus is expensive in terms of computer resources. The system used herein uses multiple computing iterations. One aspect enables reusing the returned sub-T-table output from previous iterations, rather than recomputing those hookups for each iteration.

The first iteration must build the sub-T-tables from scratch. However, rather than creating all of those sub-T-tables on the master machine, the first iteration is made “special”. In the first iteration, only the sub-corpus is sent as a work unit. Each worker computes the hookups and create their sub-T-table. Each worker machine uses the sub-T-table and sub-corpus to compute parameter counts as per the normal expectation maximization operation. When all desired iterations are complete, the worker machines compute the final alignment of the sub-corpus, using the same sub-T-table and other parameters of the model.

These counts in the form of sub T tables 131, 132, 133, and possibly other parameter tables shown generically as 136 are then returned to the T table manager 105 at 215. The T table manager 105 collects the count information, and normalizes using the new information, to form new probabilities at 220. The T table manager sends the new probabilities back to the work units for their use in evaluating their next units of work. After all iterations are complete, the work units return a final alignment of the sub-corpora. This allows the master machine to simply concatenate these in the proper order, completing the full word alignment process.

The probabilities include word to word translation parameters and other model parameters. In operation, for example, the corpus may be passed through both the model 1 algorithm and the HMM algorithm five times. Each pass through the algorithm updates the probabilities in the T table and other tables. The tables are then used for further iterations and eventually alignment.

The T table manager is shown in FIG. 1 and in 200 as breaking the corpus into the sub corpora 121, 122, 123. Of course, this can be done by a separate process running within the master computer 99. The corpora can be broken up in a number of different ways.

The work units should each obtain roughly similar amounts of work. The amount of work to be done by a work unit may be proportional to the sentence lengths. Accordingly, it is desirable for the different work units to have roughly similar amounts of work to do in each sub work corpus.

A first way of breaking up the data relies on the corpora being probabilistically similar. Probabilistically, lengths of random sentences within the corpora should be approximately average. Therefore, a first way of effecting 200 in FIG. 2 is via a round robin between sentences. Each machine is assigned a different randomly selected sentence. The effectively random selection of the sentence is likely to produce sentences with roughly equal word lengths in each subunit.

Another embodiment of 200 sorts the corpus by sentence lengths, and assigns sentences in order from the sentence length sorted corpus. In this way, all work units receive roughly similar length sentences.

The T table manager 105 normalizes between each iteration to produce new T table information from the sub T tables.

According to another embodiment, the T table manager may divide the information in N units, where N is different than the number of machines doing the actual computations. The units are queued up in the T table manager, and are used by the machines during their operation. A work unit queuing system, such as “Condor”, may be used to allocate and provide work to the different units, as each machine becomes available.

The master processor may also carry out other operations in between accumulating the T table results. For example, the master processor may allocate the work units, may itself become a work unit, for a complete unit, or for some unit smaller than the usual work unit.

The calculations by the work units may also be time-monitored by either the master processor or some other processor. Some units may become stragglers, either because they are processing a particularly difficult work unit, or because the computer itself has some error therein of either hardware or software. According to another aspect, the work allocation unit maintains a time out unit shown as 225. If the time becomes longer than a specified time, then the unit may be allocated to another work machine. The first machine to return a result is accepted.

The pseudocode for the word alignment follows:

INPUT: CORPUS C, OUTPUT ALIGNMENT ABIG

- 1. INPUT: NUMBERIZED CORPUS C
- 2. INIT LARGE T TABLE TBIG AND OTHER MODEL PARAMETERS PBIG USING C (ZERO PROBABILITIES)
- 3. DIVIDE CORPUS INTO N PIECES {CI}, I=1, . . . , N
  - a. C → {CI}
- 4. DO N WORK UNITS OF INITIALIZATION (CREATE SMALL T TABLES AND ASSIGN UNIFORM COUNTS)
  - a. CI →OI (COUNTS), I=1, . . . , N
- 5. ADD ALL COUNTS AND NORMALIZE, AND WRITE NEW SUB T TABLES
  - a. TBIG, {OI} →TBIG, {TI}
- 6. DO N WORK UNITS OF ONE ITERATION OF A MODEL
  - a. CI, TI → OI (COUNTS), I=1, . . . , N
- 7. REPEAT STEPS 5 AND 6 FOR EACH MODEL 1 ITERATION, THEN EACH HMM ITERATION, ETC. UNTIL ALL ITERATIONS ARE COMPLETE. END AFTER FINAL RUN OF STEP 5.
- 8. DO N WORK UNITS OF ALIGNMENT USING THE LAST-TRAINED MODEL
  - a. CI, TI → AI (ALIGNMENTS), I=1, . . . , N
- 9. SIMPLY CONCATENATE THE ALIGNMENTS TO OBTAIN AN ALIGNMENT OF THE FULL CORPUS.
  - a. {AI} →ABIG
- 10. RETURN ABIG AS THE RESULT.

It may be useful to return some of the intermediate parameter tables themselves as well, which is commonly done in machine translation, for example.

To summarize the above psuedocode, the operations of the computer are as follows: first the corpus is split into pieces, to form small T tables with uniform probabilities, as an initialization. The counts are added and normalized over multiple iterations of different models. After that iteration, alignment is carried out using the most-recently trained model and the alignments are concatenated to obtain an alignment of the full corpus.

Although only a few embodiments have been disclosed in detail above, other embodiments are possible and are intended to be encompassed within this specification. The specification describes specific examples to accomplish a more general goal that may be accomplished in other way. This disclosure is intended to be exemplary, and the claims are intended to cover any modification or alternative which might be predictable to a person having ordinary skill in the art. For example, while the above describes parallelizing a word alignment, it should be understood that any machine based text application that requires accumulation of probabilities can be parallelized in this way. While the above has described the work being broken up in a specified way, it should be understood that the work can be broken up in different ways. For example, the T-table manager can receive data other than counts and/or probabilities from the sub units and may compute information from raw data obtained from the T-table manager.

Also, only those claims which use the words “means for” are intended to be interpreted under 35 USC 112, sixth paragraph. Moreover, no limitations from the specification are intended to be read into any claims, unless those limitations are expressly included in the claims.

INVENTORS:

Yamada, Kenji, Knight, Kevin, Marcu, Daniel, Langmead, Greg

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent	Priority	Assignee	Title
10061749,	Nov 26 2013	SDL NETHERLANDS B V	Systems and methods for contextual vocabularies and customer segmentation
10095692,	Nov 29 2012	Thomson Reuters Enterprise Centre GmbH	Template bootstrapping for domain-adaptable natural language generation
10140320,	Feb 28 2011	SDL INC	Systems, methods, and media for generating analytical data
10198438,	Sep 17 1999	Trados Incorporated	E-services translation utilizing machine translation and translation memory
10216731,	May 06 2005	SDL Inc.	E-services translation utilizing machine translation and translation memory
10248650,	Mar 05 2004	SDL Inc.	In-context exact (ICE) matching
10261994,	May 25 2012	SDL INC	Method and system for automatic management of reputation of translators
10319252,	Nov 09 2005	SDL INC	Language capability assessment and training apparatus and techniques
10402498,	May 25 2012	SDL Inc.	Method and system for automatic management of reputation of translators
10417646,	Mar 09 2010	SDL INC	Predicting the cost associated with translating textual content
10452740,	Sep 14 2012	SDL NETHERLANDS B V	External content libraries
10521492,	Nov 28 2013	SDL NETHERLANDS B V	Systems and methods that utilize contextual vocabularies and customer segmentation to deliver web content
10572928,	May 11 2012	Fredhopper B.V.	Method and system for recommending products based on a ranking cocktail
10580015,	Feb 25 2011	SDL NETHERLANDS B V	Systems, methods, and media for executing and optimizing online marketing initiatives
10614167,	Oct 30 2015	SDL Limited	Translation review workflow systems and methods
10635863,	Oct 30 2017	SDL INC	Fragment recall and adaptive automated translation
10657540,	Jan 29 2011	SDL NETHERLANDS B V	Systems, methods, and media for web content management
10817676,	Dec 27 2017	SDL INC	Intelligent routing services and systems
10984429,	Mar 09 2010	SDL Inc.	Systems and methods for translating textual content
10990644,	Nov 26 2013	SDL Netherlands B.V.	Systems and methods for contextual vocabularies and customer segmentation
11003838,	Apr 18 2011	SDL INC	Systems and methods for monitoring post translation editing
11044949,	Nov 28 2013	SDL Netherlands B.V.	Systems and methods for dynamic delivery of web content
11080493,	Oct 30 2015	SDL Limited	Translation review workflow systems and methods
11256867,	Oct 09 2018	SDL Inc.	Systems and methods of machine learning for digital assets and message creation
11263390,	Aug 24 2011	SDL Inc.	Systems and methods for informational document review, display and validation
11301874,	Jan 29 2011	SDL Netherlands B.V.	Systems and methods for managing web content and facilitating data exchange
11308528,	Sep 14 2012	SDL NETHERLANDS B V	Blueprinting of multimedia assets
11321540,	Oct 30 2017	SDL Inc.	Systems and methods of adaptive automated translation utilizing fine-grained alignment
11366792,	Feb 28 2011	SDL Inc.	Systems, methods, and media for generating analytical data
11386186,	Sep 14 2012	SDL Netherlands B.V.; SDL INC ; SDL NETHERLANDS B V	External content library connector systems and methods
11475227,	Dec 27 2017	SDL Inc.	Intelligent routing services and systems
11694215,	Jan 29 2011	SDL Netherlands B.V.	Systems and methods for managing web content
8103498,	Aug 10 2007	Microsoft Technology Licensing, LLC	Progressive display rendering of processed text
8175864,	Mar 30 2007	GOOGLE LLC	Identifying nearest neighbors for machine translation
8214196,	Jul 03 2001	SOUTHERN CALIFORNIA, UNIVERSITY OF	Syntax-based statistical translation model
8234106,	Mar 26 2002	University of Southern California	Building a translation lexicon from comparable, non-parallel corpora
8296127,	Mar 23 2004	University of Southern California	Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
8380486,	Oct 01 2009	SDL INC	Providing machine-generated translations and corresponding trust levels
8433556,	Nov 02 2006	University of Southern California	Semi-supervised training for statistical word alignment
8468149,	Jan 26 2007	SDL INC	Multi-lingual online community
8548794,	Jul 02 2003	University of Southern California	Statistical noun phrase translation
8600728,	Oct 12 2004	University of Southern California	Training for a text-to-text application which uses string to tree conversion for training and decoding
8615389,	Mar 16 2007	SDL INC	Generation and exploitation of an approximate language model
8666725,	Apr 16 2004	SOUTHERN CALIFORNIA, UNIVERSITY OF	Selection and use of nonstatistical translation components in a statistical machine translation framework
8676563,	Oct 01 2009	SDL INC	Providing human-generated and machine-generated trusted translations
8694303,	Jun 15 2011	SDL INC	Systems and methods for tuning parameters in statistical machine translation
8825466,	Jun 08 2007	LANGUAGE WEAVER, INC ; University of Southern California	Modification of annotated bilingual segment pairs in syntax-based machine translation
8831928,	Apr 04 2007	SDL INC	Customizable machine translation service
8886515,	Oct 19 2011	SDL INC	Systems and methods for enhancing machine translation post edit review processes
8886517,	Jun 17 2005	SDL INC	Trust scoring for language translation systems
8886518,	Aug 07 2006	SDL INC	System and method for capitalizing machine translated text
8942973,	Mar 09 2012	SDL INC	Content page URL translation
8943080,	Apr 07 2006	University of Southern California	Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
8977536,	Apr 16 2004	University of Southern California	Method and system for translating information with a higher probability of a correct translation
8990064,	Jul 28 2009	SDL INC	Translating documents based on content
9122674,	Dec 15 2006	SDL INC	Use of annotations in statistical machine translation
9152622,	Nov 26 2012	SDL INC	Personalized machine translation via online adaptation
9213694,	Oct 10 2013	SDL INC	Efficient online domain adaptation
9652454,	Jul 13 2012	Microsoft Technology Licensing, LLC	Phrase-based dictionary extraction and translation quality evaluation
9916306,	Oct 19 2012	SDL INC	Statistical linguistic analysis of source content
9954794,	Dec 06 2001	SDL Inc.	Globalization management system and method therefor
9984054,	Aug 24 2011	SDL INC	Web interface including the review and manipulation of a web document and utilizing permission based control

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
4502128,	Jun 05 1981	Hitachi, Ltd.	Translation between natural languages
4599691,	May 20 1982	Kokusai Denshin Denwa Co., Ltd.	Tree transformation system in machine translation system
4787038,	Mar 25 1985	Kabushiki Kaisha Toshiba	Machine translation system
4814987,	May 20 1985	Sharp Kabushiki Kaisha	Translation system
4942526,	Oct 25 1985	Hitachi, Ltd.	Method and system for generating lexicon of cooccurrence relations in natural language
5146405,	Feb 05 1988	AT&T Bell Laboratories; AMERICAN TELEPHONE AND TELEGRAPH COMPANY, A CORP OF NEW YORK; BELL TELEPHONE LABORTORIES, INCORPORATED, A CORP OF NY	Methods for part-of-speech determination and usage
5181163,	Aug 03 1988	Hitachi, Ltd.	Method and apparatus for generating and/or updating cooccurrence relation dictionary
5212730,	Jul 01 1991	Texas Instruments Incorporated	Voice recognition of proper names using text-derived recognition models
5267156,	Dec 05 1991	Nuance Communications, Inc	Method for constructing a knowledge base, knowledge base system, machine translation method and system therefor
5311429,	May 17 1989	Hitachi, Ltd.	Maintenance support method and apparatus for natural language processing system
5432948,	Apr 26 1993	Apple Inc	Object-oriented rule-based text input transliteration system
5477451,	Jul 25 1991	Nuance Communications, Inc	Method and system for natural language translation
5510981,	Oct 28 1993	IBM Corporation	Language translation apparatus and method using context-based translation models
5644774,	Apr 27 1994	Sharp Kabushiki Kaisha	Machine translation system having idiom processing function
5696980,	Apr 30 1992	Sharp Kabushiki Kaisha	Machine translation system utilizing bilingual equivalence statements
5724593,	Jun 07 1995	LIONBRIDGE US, INC	Machine assisted translation tools
5761631,	Nov 17 1994	International Business Machines Corporation	Parsing method and system for natural language processing
5781884,	Mar 24 1995	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Grapheme-to-phoneme conversion of digit strings using weighted finite state transducers to apply grammar to powers of a number basis
5805832,	Jul 25 1991	Nuance Communications, Inc	System for parametric text to text language translation
5848385,	Nov 28 1994	Sharp Kabushiki Kaisha	Machine translation system using well formed substructures
5867811,	Jun 18 1993	Canon Kabushiki Kaisha	Method, an apparatus, a system, a storage device, and a computer readable medium using a bilingual database including aligned corpora
5870706,	Apr 10 1996	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Method and apparatus for an improved language recognition system
5903858,	Jun 23 1995		Translation machine for editing a original text by rewriting the same and translating the rewrote one
5909681,	Mar 25 1996	International Business Machines Corporation	Computer system and computerized method for partitioning data for parallel processing
5987404,	Jan 29 1996	IBM Corporation	Statistical natural language understanding using hidden clumpings
5991710,	May 20 1997	International Business Machines Corporation; IBM Corporation	Statistical translation system with features based on phrases or groups of words
6031984,	Mar 09 1998	JDA SOFTWARE GROUP, INC	Method and apparatus for optimizing constraint models
6032111,	Jun 23 1997	AT&T Corp	Method and apparatus for compiling context-dependent rewrite rules and input strings
6092034,	Jul 27 1998	International Business Machines Corporation	Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models
6119077,	Mar 21 1996	Sharp Kasbushiki Kaisha	Translation machine with format control
6131082,	Jun 07 1995	LIONBRIDGE US, INC	Machine assisted translation tools utilizing an inverted index and list of letter n-grams
6182014,	Nov 20 1998	Schlumberger Technology Corporation	Method and system for optimizing logistical operations in land seismic surveys
6205456,	Jan 17 1997	Fujitsu Limited	Summarization apparatus and method
6223150,	Jan 29 1999	Sony Corporation; Sony Electronics, Inc.; Sony Electronics, INC	Method and apparatus for parsing in a spoken language translation system
6236958,	Jun 27 1997	Nuance Communications, Inc	Method and system for extracting pairs of multilingual terminology from an aligned multilingual text
6278967,	Aug 31 1992	CANTENA SERVICE AGENT CORPORATION; CATENA SERVICE AGENT CORPORATION	Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
6285978,	Sep 24 1998	Nuance Communications, Inc	System and method for estimating accuracy of an automatic natural language translation
6289302,	Oct 26 1998	Matsushita Electric Industrial Co., Ltd.	Chinese generation apparatus for machine translation to convert a dependency structure of a Chinese sentence into a Chinese sentence
6304841,	Oct 28 1993	International Business Machines Corporation	Automatic construction of conditional exponential models from elementary features
6311152,	Apr 08 1999	Kent Ridge Digital Labs	System for chinese tokenization and named entity recognition
6327568,	Nov 14 1997	Nuance Communications, Inc	Distributed hardware sharing for speech processing
6360196,	May 20 1998	Sharp Kabushiki Kaisha	Method of and apparatus for retrieving information and storage medium
6389387,	Jun 02 1998	Sharp Kabushiki Kaisha	Method and apparatus for multi-language indexing
6393388,	May 02 1996	Sony Corporation	Example-based translation method and system employing multi-stage syntax dividing
6393389,	Sep 23 1999	Xerox Corporation	Using ranked translation choices to obtain sequences indicating meaning of multi-token expressions
6415250,	Jun 18 1997	RPX Corporation	System and method for identifying language using morphologically-based techniques
6460015,	Dec 15 1998	Nuance Communications, Inc	Method, system and computer program product for automatic character transliteration in a text string object
6502064,	Oct 22 1997	International Business Machines Corporation	Compression method, method for compressing entry word index data for a dictionary, and machine translation system
6782356,	Oct 03 2000	HEWLETT-PACKARD DEVELOPMENT COMPANY L P	Hierarchical language chunking translation table
6810374,	Jul 23 2001		Korean romanization system
6904402,	Nov 05 1999	Microsoft Technology Licensing, LLC	System and iterative method for lexicon, segmentation and language model joint optimization
7107215,	Apr 16 2001	Sakhr Software Company	Determining a compact model to transcribe the arabic language acoustically in a well defined basic phonetic study
7113903,	Jan 30 2001	Microsoft Technology Licensing, LLC	Method and apparatus for providing stochastic finite-state machine translation
7219051,	Jul 14 2004	Microsoft Technology Licensing, LLC	Method and apparatus for improving statistical word alignment models
20020188438,
20020198701,
20040030551,
20060190241,
EP469884,
EP715265,
EP933712,
JP10011447,
JP11272672,
JP7244666,

ASSIGNMENT RECORDS Assignment records on the USPTO

//////

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Oct 11 2005	MARCU, DANIEL	LANGUAGE WEAVER, INC	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	017830	0075	pdf
Oct 12 2005	LANGMEAD, GREG	LANGUAGE WEAVER, INC	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	017830	0075	pdf
Oct 13 2005	KNIGHT, KEVIN	LANGUAGE WEAVER, INC	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	017830	0075	pdf
Oct 17 2005	YAMADA, KENJI	LANGUAGE WEAVER, INC	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	017830	0075	pdf
Apr 26 2006		Language Weaver, Inc.	(assignment on the face of the patent)
Dec 31 2015	LANGUAGE WEAVER, INC	SDL INC	MERGER SEE DOCUMENT FOR DETAILS	037745	0391	pdf

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events
Nov 09 2011	M1551: Payment of Maintenance Fee, 4th Year, Large Entity.
Nov 15 2011	STOL: Pat Hldr no Longer Claims Small Ent Stat
Dec 10 2015	M1552: Payment of Maintenance Fee, 8th Year, Large Entity.
Nov 14 2019	M1553: Payment of Maintenance Fee, 12th Year, Large Entity.

Date	Maintenance Schedule
Jun 17 2011	4 years fee payment window open
Dec 17 2011	6 months grace period start (w surcharge)
Jun 17 2012	patent expiry (for year 4)
Jun 17 2014	2 years to revive unintentionally abandoned end. (for year 4)
Jun 17 2015	8 years fee payment window open
Dec 17 2015	6 months grace period start (w surcharge)
Jun 17 2016	patent expiry (for year 8)
Jun 17 2018	2 years to revive unintentionally abandoned end. (for year 8)
Jun 17 2019	12 years fee payment window open
Dec 17 2019	6 months grace period start (w surcharge)
Jun 17 2020	patent expiry (for year 12)
Jun 17 2022	2 years to revive unintentionally abandoned end. (for year 12)