Maschinelles Übersetzen natürlicher Sprachen im Wintersemester 2016/2017

Beim maschinellen Übersetzen von Texten in einer natürlichen Sprache in eine andere kommen verschiedene Formalismen wie Grammatiken und Automaten zum Einsatz. Diese Vorlesung gibt einen Überblick, wie man solche Formalismen zur Modellierung von natürlichsprachlichen Übersetzungen nutzen kann und wie man ein so modelliertes Übersetzungssystem anhand von Beispielen trainiert.

Termine

Am 20.03.2016, 13:00 Uhr, im Raum APB/3027 findet eine Fragestunde zur Prüfungsvorbereitung statt.

Die erste Übung findet am 20. Oktober statt.

Montags, 3. DS (11:10–12:40 Uhr), APB/E010: Vorlesung
Donnerstags, 2. DS (09:20–10:50 Uhr), APB/E007: Vorlesung
Donnerstags, 4. DS (13:00–14:30 Uhr), APB/E009: Übung

Am Donnerstag, den 2.02.2017, 2. DS (09:20-10:50 Uhr), werden wir Themen für Bachelor–, Master–, und Belegarbeiten sowie Projekte und Praktika vorstellen, welche der Lehrstuhl Grundlagen der Programmierung derzeit anbietet. Hier die Links zu den Folien des Vortrags und der Themenliste.

Übungsaufgaben

2016-10-20: 1. Übungsblatt
2016-11-03: 2. Übungsblatt
2016-11-10: 3. Übungsblatt
2016-11-17: 4. Übungsblatt
2016-11-24: 5. Übungsblatt
2016-12-01: 6. Übungsblatt
- Englisches Satzkorpus
- Deutsches Satzkorpus
- Python Skript zur Anzeige von Parsebäumen. Dazu muss NLTK installiert sein (pip2 install nltk --user).
2016-12-08: 7. Übungsblatt
2016-12-15: 8. Übungsblatt
2016-01-05: 9. Übungsblatt
2016-01-19: 10. Übungsblatt
2016-01-26: 11. Übungsblatt

Materialien

Polyluxfolien (Der Foliensatz wird regelmäßig auf den aktuellen Stand der Vorlesung gebracht.)
Polyluxfolien zum EM Algorithmus
Übersicht zu EM Algorithmen

Machine Translation (2016-10-10)
- Folien „Natural language“, „Statistical Machine Translation“
- Literatur: [HS92], [Bro+93], [Lop08], [Pre05]
Grundlagen der Wahrscheinlichkeitstheorie (2016-10-13)
IBM Model 1 (2016-10-17)
- Folien: „IBM Model 1: Rules of the Game“
- Literatur: [Bro+93]
IBM Model 1 – Training (2016-10-20)
- Folien: „Dictionary training algorithm for IBM model 1“
- Literatur: [Kni99], [Hig10].
  The dictionary training algorithm finds a global maximum, cf. [Bro+93, p. 303, Concavity]
n-Gramme (2016-10-20), (2016-10-24)
- Folien: „Examples for ngrams“
- Literatur: [JM00] (Smoothing)
Hidden Markov Model (HMM), (2016-10-24), (2016-10-27), (2016-11-03)
- Folien: „An example of a hidden Markov model“, „Forward algorithm“, „Backward algorithm“, „Viterbi algorithm“, „Baum-Welch algorithm“
- Literatur: [Bau+70]
Source-Channel Model (2016-11-07)
- Herleitung des Source-Channel Model via Satz von Bayes.
- Source-Channel Model als Lineares Model.
Decoding (2016-11-07)
- Folien: „Decoding for Source-Channel Models“
- Literatur: Decoding with source-channel model is NP-complete [Kni99].
  Incremental decoding [WW97].
Yamada-Knight translation model (2016-11-10), (2016-11-14)
- Model formulation, Training, Decoding
- Folien: Beispiele für Transformation, Model Parameter, und Training Algorithmus
- Literatur: [YK01], [YK02]
  CFG Parsing: CYK algorithm [HMU06], A fast algorithm for Viterbi parsing [KM03]
- Resources: Penn Treebank
Probabilistic Context-free Grammars (2016-11-21), (2016-11-24), (2016-11-28), (2016-12-01), (2016-12-05)
- Folien: Example, Notational conventions, Parse Tree, Abstract Syntax Tree
- Language Model Definition
- (Supervised) Training
- Evaluation and Parsing, Reduct, Inside weights
- Hypergraphen und Algorithmus von Knuth
- Unsupervised Training, Outside weights
- Literatur: [BPS61], [LY90], [KT28], [Tar55], [Knu77]
Translation based on hierarchical phrases (2016-12-05), (2016-12-08), (2016-12-12)
- transduction grammars [LS68]) a.k.a. synchronous context-free grammars (SCFG) [Chi07]
- Definitions: SCFG, derivation relation, probabilistic SCFGs, properness, Hiero translation model.
- Decoding: apply CFG reduct construction to one component of SCFG. Afterwards use algorithm by [Knu77] to find best derivation.
- Training: GIZA++ [ON03], initial phrase pairs, rule extraction (see slides), rule probabilities.
- Material: Auszug Vorlesungsskript
- Literatur: Comparison of HMM and probabilistic automata [DDE05]
Translation based on Tree Transducer (2016-12-12), (2016-12-14), (2016-12-19), (2017-01-05)
- Unranked trees, definition Extended Tree Transducer (XTT).
- Rule extraction: Frontier nodes, minimal graph fragments.
- Training of probabilities. EM algorithm.
- Material: Formal treatment of rule extraction
EM algorithms (2017-01-09), (2017-01-12), (2017-01-12), (2017-01-19), (2017-10-23)
- Preliminaries
- Regular Tree Grammars
- algorithmic skeleton for EM algorithms
- corpus-based EM algorithm
- simple-counting EM algorithm
- io EM algorithm
- Literatur: [DLR77], [Pre05], [MK08]

Weitere Materialien werden im Laufe der Vorlesung zur Verfügung gestellt. Sie können sich vorab anhand des vorherigen Vorlesungsdurchlaufes einen Überblick verschaffen.

Literatur

[Bau+70]: L.E. Baum, T. Petrie, G. Soules und N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The annals of mathematical statistics (1970), 164–171.
[BPS61]: Y. Bar–Hillel, M. Perles und E. Shamir. On formal properties of simple phrase structure grammars. Z. Phonetik. Sprach. Komm. 14 (1961), 143–172.
[Bro+93]: Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra und Robert L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19.2 (Juni 1993), 263–311. issn: 0891-2017.
[Chi07]: David Chiang. Hierarchical Phrase-Based Translation. Computational Linguistics 33.2 (Juni 2007), 201–228. issn: 0891-2017. doi: 10.1162/coli.2007.33.2.201.
[DDE05]: Pierre Dupont, François Denis und Yann Esposito. Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern Recognition 38.9 (2005). Grammatical Inference, 1349–1371. issn: 0031-3203. doi: 10.1016/j.patcog.2004.03.020.
[DLR77]: A. P. Dempster, N. M. Laird und D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39.1 (1977), 1–38. issn: 00359246.
[Hig10]: Colin de la Higuera. Grammatical Inference: Learning Automata and Grammars. New York, NY, USA: Cambridge University Press, 2010. isbn: 0521763169, 9780521763165.
[HMU06]: John E. Hopcroft, Rajeev Motwani und Jeffrey D. Ullman. Introduction to Automata Theory, Languages, and Computation (3rd Edition). Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2006. isbn: 0321462254.
[HS92]: W.J. Hutchins und H.L. Somers. An introduction to machine translation. ISBN: 0-12-362830-X. London: Academic Press, 1992.
[JM00]: D. Jurafsky und J.H. Martin. Speech and Language Processing – An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, 2000.
[KM03]: Dan Klein und Christopher D. Manning. A* parsing: fast exact Viterbi parse selection. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics. 2003, 40–47.
[Kni99]: K. Knight. Squibs and Discussion – Decoding complexity in word-replacement translation models. Computational Linguistics 25(4) (1999), 607–615.
[Knu77]: D.E. Knuth. A Generalization of Dijkstra’s Algorithm. Inform. Process. Lett. 6.1 (Feb. 1977), 1–5. doi: 10.1016/0020-0190(77)90002-3.
[KT28]: Bronisław Knaster und A Tarski. Un théoreme sur les fonctions d’ensembles. Ann. Soc. Polon. Math 6.133 (1928), 2013134.
[Lop08]: Adam Lopez. Statistical machine translation. ACM Comput. Surv. 40.3 (Aug. 2008), 8:1–8:49. issn: 0360-0300. doi: 10.1145/1380584.1380586.
[LS68]: Philip M. Lewis II und Richard Edwin Stearns. Syntax-Directed Transduction. Journal of the ACM 15.3 (Juli 1968), 465–488. issn: 0004-5411. doi: 10.1145/321466.321477.
[LY90]: K. Lari und S.J. Young. The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech and Language 4.1 (1990), 35–56. issn: 0885-2308. doi: 10.1016/0885-2308(90)90022-X.
[MK08]: Geoffrey J. McLachlan und Thriyambakam Krishnan. The EM algorithm and extensions. 2. ed. Wiley series in probability and statistics. Hoboken, NJ: Wiley, 2008. XXVII, 359. isbn: 978-0-471-20170-0.
[ON03]: Franz Josef Och und Hermann Ney. A systematic comparison of various statistical alignment models. Computational Linguistics 29.1 (März 2003), 19–51. issn: 0891-2017. doi: 10.1162/089120103321337421.
[Pre05]: D. Prescher. A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilisitic Context-Free Grammars. Techn. Ber. University of Amsterdam, 2005.
[Tar55]: Alfred Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific J. Math. 5.2 (1955), 285–309.
[WW97]: Ye-Yi Wang und Alex Waibel. Decoding algorithm in statistical machine translation. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics. EACL ’97. Madrid, Spain: Association for Computational Linguistics, 1997, 366–372. doi: 10.3115/979617.979664.
[YK01]: Kenji Yamada und Kevin Knight. A Syntax-based Statistical Translation Model. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. ACL ’01. Toulouse, France: Association for Computational Linguistics, 2001, 523–530. doi: 10.3115/1073012.1073079.
[YK02]: Kenji Yamada und Kevin Knight. A Decoder for Syntax-based Statistical MT. Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Juli 2002, 303–310. doi: 10.3115/1073083.1073134.

weiterführende Literatur: ACL Anthology – “over 40,000 papers on the study of computational linguistics and natural language processing”

Stand: 30.08.2017 12:37 Uhr