Machine translation systems, such as Google Translate, have become omnipresent. The continuous increase in accuracy is fueled by a thriving research community. Systems used in research, such as Moses, usually consist of several programs and scripts that have to be operated in concert for the system to work properly. The user interfaces to those programs and scripts are heterogeneous and thus constitute a considerable hurdle for the understanding of the underlying ideas and the implementation of new ones.
With Vanda, we pursue a modular approach to machine translation that focuses on conceptual clarity and the rapid development of ideas. This approach rests on two columns:
Vanda-Haskell: A library of data structures and algorithms together with an easy-to-use command line interface for the execution of some typical tasks in machine translation. The features currently implemented include: application of a string-to-tree transducer; extraction, training, and parsing of probabilistic context-free grammars; read-off, state-merging, and parsing of bottom-up deterministic probabilistic tree automata; extraction and binarization of linear context-free rewriting systems; and training and application of n-gram models. The source code of Vanda-Haskell is available on github under the BSD3 license: https://github.com/tud-fop/vanda-haskell.
Vanda-Studio: An integrated development environment that allows for rapid incremental design of small-scale machine-translation experiments. In Vanda-Studio, an experiment is described in terms of a workflow, such as the one shown in the screenshot below. Vanda-Studio keeps track of all pieces of data that occur during the run of an experiment, and it offers visualizations for many data types, e.g., word alignments, syntax trees, and grammars. In addition to the functionality provided by Vanda-Haskell one can also choose from a growing repertoire of third-party tools and easily integrate new tools. You can find further information on Vanda-Studio in our manuscript. The source of Vanda-Studio is available on github under the BSD3 license: https://github.com/tud-fop/vanda-studio.
Call for Participation
During the course of their studies, students have several opportunities to participate in the development of Vanda: theses (bachelor, master, diploma, Belegarbeit), lab courses and projects, etc. Visit our teaching website to check for suitable courses, or ask a teaching assistant for a thesis topic.
Thomas Ruprecht, M.Sc.
Phone: +49 (0) 351 463-38469