The motivation behind the discpline of Machine Translation (MT) is the dream of automatically overcoming language barriers using computing technology. Translation services on the web such as Google Translate are widely known. The ongoing globalization of the economy generates a lot of interest in the further development and advancement of these applications. A rather young but promising subfield of MT is called syntax-based statistical MT.
We investigate applications of the theory of weighted tree automata and tree transducers to said subfield of MT. We regularly offer the lecture Machine Translation as well as a research oriented course (seminar or reading group). In addition, we offer study projects dealing with our software system Vanda MT.
The rest of this document provides a little insight into the fascinating field of syntax-based statistical MT and the application of weighted tree automata and tree transducers.
Dealing with probabilities
A translation may at the same time be deemed correct with regard to content and syntax and not sound fluent at all. The degree of fluency of a translation can only be measured by man or in comparison with existing translations. However, the process of human translation implies a large number of contingencies (such as origin, education, or mood of the translator). Probability theory is used to model these contingencies.
Syntax-based statistical MT
In statistical MT, these probabilities are obtained by training. Roughly speaking, that means generalizing from a large number of known translation pairs. In syntax-based MT one takes advantage of the inherent grammatical structure of a sentence, which is represented by one or more parse trees. The augmented structure offered by trees versus sentences facilitates certain aspects of translation. Combining the approaches of statistical and syntax-based MT, we arrive at statistical syntax-based MT.
Linguistic and operational models
A linguistic model is used to describe the process of human translation on an abstract level. For instance, an English sentence might be translated into Japanese by reordering words, inserting (filler) words, and finally translating remaining English words into Japanese.
Operational models may be used to make this kind of description more precise, making it more or less executable by a machine. Operational models tailored to MT offer advantages compared to general-purpose programming languages, such as platform independence and better mathematical accessibility, which enables optimizations on a much higher level.
Application of tree automata and tree transducers
The mathematical framework of weighted tree automata and tree transducers provides potential operational models to describe languages and their translation. The MT research team of Prof. Kevin Knight at the University of Southern California has developed a toolkit called Tiburon, which shows that this approach is indeed practical. Big challenges currently faced in this approach include adaption of the models found in theoretical literature to the needs of MT.