Method and apparatus for correcting speech recognition result, device and computer-readable storage medium

Method and apparatus for correcting speech recognition result, device and computer-readable storage medium
US10380996

A method and apparatus for correcting a speech recognition result, a device and a computer-readable storage medium are provided. The method includes performing speech recognition on acquired speech data to obtain initial text information; and recognizing and correcting the initial text information by a neural machine translation nmt model to obtain a final text recognition result.

PTO Wrapper PDF
Dossier Espace Google

Patent 10380996
Priority Jun 28 2017
Filed Mar 28 2018
Issued Aug 13 2019
Expiry Mar 28 2038
Inventors Huang, Jun
Assg.orig Baidu Onli…
Assg.curr Baidu Onli…
Entity Large
Referenced by 0
References 12
Maint.: currently ok

CROSS-REFERENCE TO R…
FIELD
BACKGROUND
SUMMARY
BRIEF DESCRIPTION OF…
DETAILED DESCRIPTION
Embodiment 1
Embodiment 2
Embodiment 3
Embodiment 4
Embodiment 5
Embodiment 6

5. An apparatus for correcting a speech recognition result, comprising:

one or more processors;

a memory;

one or more software modules stored in the memory and executed by the one or more processors, and comprising:

a speech recognizing module configured to perform speech recognition on acquired speech data to obtain initial text information; and

a text correcting module configured to recognize and correct the initial text information by a neural machine translation nmt model to obtain a final text recognition result;

wherein the text correcting module comprises:

a word segmenting unit configured to segment a text contained in the initial text information to obtain at least one word; and

a text correcting unit configured to encode the at least one word into a dense vector by an encoder in the nmt model and decode the dense vector by a decoder in the nmt model so as to obtain the final text recognition result;

wherein the text correcting unit is configured to convert the at least one word into a source hidden vector by the encoder in the nmt model; input the source hidden vector into the decoder in the nmt model and output a target hidden vector by the decoder in the nmt model; determine an attention mechanism hidden vector based on the target hidden vector and the source hidden vector; and obtain the final text recognition result based on the attention mechanism hidden vector.

1. A method for correcting a speech recognition result, comprising:

performing speech recognition on acquired speech data to obtain initial text information; and

recognizing and correcting the initial text information by a neural machine translation nmt model to obtain a final text recognition result;

wherein recognizing and correcting the initial text information by the neural machine translation nmt model to obtain the final text recognition result comprises:

segmenting a text contained in the initial text information to obtain at least one word; and

encoding the at least one word into a dense vector by an encoder in the nmt model and decoding the dense vector by a decoder in the nmt model so as to obtain the final text recognition result;

wherein encoding the word into the dense vector by the encoder in the nmt model and decoding the dense vector by the decoder in the nmt model so as to obtain the final text recognition result comprises:

converting the at least one word into a source hidden vector by the encoder in the nmt model;

inputting the source hidden vector into the decoder in the nmt model, and outputting a target hidden vector by the decoder in the nmt model;

determining an attention mechanism hidden vector based on the target hidden vector and the source hidden vector; and

obtaining the final text recognition result based on the attention mechanism hidden vector.

9. A non-transitory computer-readable storage medium having stored therein a computer program that, when executed by a processor, causes the processor to perform a method for correcting a speech recognition result, the method for correcting a speech recognition result comprising: