A comparative case study in statistical machine translation

Németh, Noémi
Folyóirat címe
Folyóirat ISSN
Kötet címe (évfolyam száma)
In this thesis I aimed to demonstrate common translation mistakes of two statistical Machine Translation systems, Google Translate and Webfordítás and compare them through a case study. The 2nd chapter discusses the main notions and problems of Machine Translation, such as Machine-Aided Human Translation, Human-Aided Machine Translation, pre-editing, controlled language and post-editing. The main problems of Machine Translation include lexical and structural ambiguity (in the case of lexical ambiguity, the problems of categorical ambiguity, heteronyms, homonymy, polysemy and transfer ambiguity), anaphora and reducing dictionary sizes. The 3rd chapter describes methods of Machine Translation, according to Hutchins and Somers it can be rule-based or corpus-based. Rule-based methods include the direct approach, the interlingua method and the transfer method, while the statistics-based and the example-based method belong to corpus-based methods. Also, there are Machine Translation systems that incorporate more than one method, they are hybrid systems. The 4th chapter is about the evaluation of Machine Translation, both human and automatic evaluation. Human evaluation can be based on adequacy, fluency and style, while automatic evaluation is based on notions such as Word Error Rate and Position independent Word Error Rate. In the 5th chapter the main events that shaped the history of Machine Translation are described, such as the Georgetown - I.B.M. experiment, the Bar-Hillel report and the ALPAC report, we also discuss online Machine Translation. I carried out a research in human evaluation of Machine Translation, in which I compared two statistical Machine Translation systems, Google Translate and Webfordítás. The 6th chapter describes the details of my questionnaire, such as the Machine Translation systems I chose to compare, the subjects, the number and genre of sentences translated and the instructions that the subjects were given. The 7th chapter looks at the evaluation results of this questionnaire sentence by sentence, also by genres, by adequacy, by fluency and finally by Machine Translation systems, the 8th chapter was the qualitative analysis of the answers of the subjects. In the 9th chapter I gave a conclusion of this thesis and in the 10th chapter the reader can find the questionnaire that the subjects filled in with the instructions that were given to them.
machine translation