Maschinelles Übersetzen natürlicher Sprachen im Wintersemester 2019/20

Beim maschinellen Übersetzen von Texten in einer natürlichen Sprache in eine andere kommen verschiedene Formalismen wie Grammatiken und Automaten zum Einsatz. Diese Vorlesung gibt einen Überblick, wie man solche Formalismen zur Modellierung von natürlichsprachlichen Übersetzungen nutzen kann und wie man ein so modelliertes Übersetzungssystem anhand von Beispielen trainiert.

Termine

Die Ersatzübung für den Buß- und Bettag wird aufgrund eines Missverständnis verschoben. Als alternativer Termin ist jetzt der 28.11. 3.DS vorgesehen.

Achtung: Die Lehrveranstaltung findet in diesem Wintersemester im Umfang V4/Ü1 statt. Die Übung findet dabei einmal wöchentlich bis Ende November statt. Anschließend wird ein fakultatives Repetitorium angeboten.

Montags, 2. DS (11:10 – 12:40 Uhr), APB/E007: Vorlesung
Donnerstags, 2. DS (09:20 – 10:50 Uhr), APB/E007: Vorlesung
Mittwochs, 3. DS (11:10 – 12:40 Uhr), APB/E006 bzw. APB/3027: Übung

Übungen/Repetitorien in den ungeraden Kalenderwochen finden im Raum APB/E006 statt (23.10., 6.11., 4.12., 18.12., 15.1., 29.1.). Übungen/Repetitorien in den geraden Kalenderwochen finden im Raum APB/3027 statt (30.10., 13.11., 11.12., 8.1., 22.1., 5.2.). Davon abweichend findet die Übung am 27.11. in Raum APB/E007 statt und die Übung am 28.11. (3. DS, Ersatzübung für Buß- und Bettag) im Raum APB/3027 statt.

Alle interessierten Studierenden sind auch herzlich zum Freitagsseminar eingeladen.

Übungsaufgaben

Nur aus dem Netz der TU abrufbar; ggf. über VPN herunterladen. Werden später ergänzt.

Material

Nur aus dem Netz der TU abrufbar; ggf. über VPN herunterladen.

Die Folien werden zu Beginn der Vorlesung unter diesem Link verfügbar gemacht und regelmäßig entsprechend des Vorlesungsstands aktualisiert.
Material zur Wahrscheinlichkeitlehre

Übungsaufgaben

2019-10-23: 1. Übungsblatt (Preliminaries, IBM-1, BLEU)
2019-10-30: 2. Übungsblatt (n-Gramm Modelle)
2019-11-06: 3. Übungsblatt (HMM, Decoding)
2019-11-13: 4. Übungsblatt (Yamada-Knight)
2019-11-27 und 2019-11-28: 5. Übungsblatt (PCFG)

Weitere Materialien werden im Laufe der Vorlesung zur Verfügung gestellt. Sie können sich vorab anhand des vorherigen Vorlesungsdurchlaufs einen Überblick verschaffen.

Literatur

Baum, L.E., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The annals of mathematical statistics, 164–171. [url]
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., and Mercer, R.L. 1993. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19, 2, 263–311. [url]
Chiang, D. 2007. Hierarchical Phrase-Based Translation. Computational Linguistics 33, 2, 201–228. [doi, url]
Dempster, A.P., Laird, N.M., and Rubin, D.B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1, 1–38. [url]
Dupont, P., Denis, F., and Esposito, Y. 2005. Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognition 38, 9, 1349–1371. [doi, url]
Hopcroft, J.E., Motwani, R., and Ullman, J.D. 2006. Introduction to Automata Theory, Languages, and Computation (3rd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. [url]
Huang, L. and Chiang, D. 2005. Better K-best Parsing. Proceedings of the Ninth International Workshop on Parsing Technology, Association for Computational Linguistics, 53–64. [url]
Hutchins, W.J. and Somers, H.L. 1992. An introduction to machine translation. London: Academic Press. [url]
Jurafsky, D. and Martin, J.H. 2000. Speech and Language Processing – An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall. [url]
Klein, D. and Manning, C.D. 2003. A* parsing: fast exact Viterbi parse selection. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, Association for Computational Linguistics, 40–47. [url]
Knaster, B. and Tarski, A. 1928. Un théoreme sur les fonctions d’ensembles. Ann. Soc. Polon. Math 6, 133, 2013134.
Knight, K. 1999. Squibs and Discussion – Decoding complexity in word-replacement translation models. Computational Linguistics 25(4), 607–615. [url]
Knight, K. 1997. Automating knowledge acquisition for machine translation. AI Mag, 81–96. [url]
Knight, K. 1999. Decoding complexity in word-replacement translation models. Comput. Linguist. 25, 4, 607–615. [url]
Knuth, D.E. 1977. A Generalization of Dijkstra’s Algorithm. Inform. Process. Lett. 6, 1, 1–5. [doi]
Kuich, W. 1998. Formal power series over trees. 3rd International Conference on Developments in Language Theory, DLT 1997, Thessaloniki, Greece, Proceedings, Aristotle University of Thessaloniki, 61–101.
Lari, K. and Young, S.J. 1990. The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech and Language 4, 1, 35–56. [doi, url]
Lewis II, P.M. and Stearns, R.E. 1968. Syntax-Directed Transduction. Journal of the ACM 15, 3, 465–488. [doi, url]
Lopez, A. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3, 8:1–8:49. [doi, url]
McLachlan, G.J. and Krishnan, T. 2008. The EM algorithm and extensions. Wiley, Hoboken, NJ. [url]
Mohri, M. 2009. Weighted automata algorithms. In: M. Droste, W. Kuich and H. Vogler, eds., Handbook of Weighted Automata. Springer-Verlag, 213–254.
Och, F.J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1, 19–51. [doi, url]
Prescher, D. 2005. A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilisitic Context-Free Grammars. University of Amsterdam. [url]
Tarski, A. 1955. A lattice-theoretical fixpoint theorem and its applications. Pacific J. Math. 5, 2, 285–309. [url]
Wang, Y.-Y. and Waibel, A. 1997. Decoding algorithm in statistical machine translation. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 366–372. [doi, url]
Yamada, K. and Knight, K. 2001. A Syntax-based Statistical Translation Model. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 523–530. [doi, url]
Yamada, K. and Knight, K. 2002. A Decoder for Syntax-based Statistical MT. Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 303–310. [doi, url]
Higuera, C. de la. 2010. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY, USA.
Bar–Hillel, Y., Perles, M., and Shamir, E. 1961. On formal properties of simple phrase structure grammars. Z. Phonetik. Sprach. Komm. 14, 143–172. [doi]
Ésik, Z. and Kuich, W. 2003. Formal Tree Series. J. Autom. Lang. Comb. 8, 2, 219–285.

Kontakt

Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged
Heiko Vogler
Tel.: +49 (0) 351 463-38232
Dr.-Ing. Kilian Gebhardt
Tel.: +49 (0) 351 463-38237

Stand: 19.08.2020 15:18 Uhr