The International Workshop on Spoken Language Translation ( IWSLT), organized by C-STAR, an international consortium for research on speech translation, has been held since 2004. In this way, efficient research is being promoted. The concept of those workshop is a kind of contest: a common dataset is provided by the organizers and the participating research institutes create systems that are evaluated. They allow research institutes to cooperate and compete against each other at the same time. International evaluation workshops were established to support the development of speech-translation technology. Research and development has gradually progressed from relatively simple to more advanced translation. Additionally, speech-to-speech translation also has its advantages compared with text translation, including less complex structure of spoken language and less vocabulary in spoken language. Features Īpart from the problems involved in the text translation, it also has to deal with special problems occur in speech-to-speech translation, incorporating incoherence of spoken language, fewer grammar constraints of spoken language, unclear word boundary of spoken language, the correction of speech recognition errors and multiple optional inputs. In 1999, the C-Star-2 consortium demonstrated speech-to-speech translation of 5 languages including English, Japanese, Italian, Korean, and German. In 1983, NEC Corporation demonstrated speech translation as a concept exhibit at the ITU Telecom World (Telecom '83). Waveforms matching the text are selected from this database and the speech synthesis connects and outputs them. The generated translation utterance is sent to the speech synthesis module, which estimates the pronunciation and intonation matching the string of words based on a corpus of speech data in language B. Current systems do not use word-for-word translation, but rather take into account the entire context of the input to generate the appropriate translation. Early systems replaced every word with a corresponding word in language B. The machine translation module then translates this string. The input is then converted into a string of words, using dictionary and grammar of language A, based on a massive corpus of text in language A.
![auto translator with voice auto translator with voice](https://images-na.ssl-images-amazon.com/images/I/61ZSaZVaNGL.__AC_SY300_QL70_ML2_.jpg)
It compares the input with a phonological model, consisting of a large corpus of speech data from multiple speakers.
![auto translator with voice auto translator with voice](https://i5.walmartimages.com/asr/4b40342d-7938-46fc-a680-54184eba92ab_1.e778e7a781c301eff59e431c1b4e260b.jpeg)
The speaker of language A speaks into a microphone and the speech recognition module recognizes the utterance.
#Auto translator with voice software#
A speech translation system would typically integrate the following three software technologies:Īutomatic speech recognition (ASR), machine translation (MT) and voice synthesis (TTS).