Maschinelles Übersetzen natürlicher Sprachen im Wintersemester 2015/2016

Beim maschinellen Übersetzen von Texten in einer natürlichen Sprache in eine andere kommen verschiedene Formalismen wie Grammatiken und Automaten zum Einsatz. Diese Vorlesung gibt einen Überblick, wie man solche Formalismen zur Modellierung von natürlichsprachlichen Übersetzungen nutzen kann und wie man ein so modelliertes Übersetzungssystem anhand von Beispielen trainiert.

Termine

Montags, 3. DS (11:10–12:40 Uhr), APB/E010: Vorlesung
Donnerstags, 2. DS (09:20–10:50 Uhr), APB/E007: Vorlesung
Donnerstags, 4. DS (13:00–14:30 Uhr), APB/E009: Übung

Die letzte reguläre Vorlesung fand am 11. Januar 2016 statt. Am 1. Februar 2016 findet eine Veranstaltung zur Prüfungsvorbereitung statt.

Übungsaufgaben

2015-10-22: 1. Übungsblatt
2015-10-29: 2. Übungsblatt
2015-11-05: 3. Übungsblatt
2015-11-12: 4. Übungsblatt
2015-11-19: 5. Übungsblatt
2015-11-26: 6. Übungsblatt
2015-12-03: 7. Übungsblatt
2015-12-10: 8. Übungsblatt
2015-12-17: Bearbeitung offen gebliebener Aufgaben
2016-01-07: 9. Übungsblatt

Das 9. Übungsblatt schließt die Übungen ab.

Materialien

Polyluxfolien (Der Foliensatz wird regelmäßig auf den aktuellen Stand der Vorlesung gebracht.)

Machine Translation (2015-10-12)
- Folien „Natural language“, „Statistical Machine Translation“
- Literatur: [Lop08]
Arcturan-Centauri (2015-10-12)
- Literatur: [Kni97]
- Weiterführendes: Entschlüsselung der „Copiale Cipher“ mit Methoden der Computerlinguistik
Statistical Machine Translation (2015-10-15)
- allgemeiner Ansatz von Statistical Machine Translation: modelling, training, evaluation
Lexical Translation: IBM Model 1 [Bro+93] (2015-10-15, 2015-10-19)
- Herleitung des Modells aus Spielregeln (Folien „IBM Model 1: Rules of the Game“)
- Training des Längenmodels
- Training des Wörterbuches mit Instanz des Expectation-Maximization-Algorithmus (Folie „Dictionary training algorithm for IBM model 1“) [Kni99]
- Beispiel für Wörterbuch-Training
Language Model: Bigrams (2015-10-19, 2015-10-22)
- Einführungsbeispiel (Folie „Examples for ngrams“)
- Modelling
- Training
The Source-Channel Model [Bro+93] (2015-10-22)
Decoding for Source-Channel Models [WW97] (2015-10-26)
- Folien „Decoding for Source-Channel Models“, „Example for Decoding“
Syntax-based Translation: The Yamada-Knight Model [YK01; YK02] (2015-10-29, 2015-11-02)
- Modeling (2015-10-29)
- Training (2015-11-02)
- Decoding (2015-11-02)
Syntax-based Language Model: Probabilistic Context-Free Grammars (2015-11-05–2015-11-12)
- Modeling (2015-11-05, 2015-11-09)
  - Context-Free Grammars (2015-11-05)
  - Abstract Syntax Trees and Parse Trees (2015-11-05)
  - Probabilistic Context-Free Grammars (2015-11-05)
  - Language Model (2015-11-09)
- Training (2015-11-09)
- Evaluation and Parsing (2015-11-12)
  - Construction of the reduct (2015-11-12)
  - Computing the probability of an English sentence (2015-11-12, 2015-11-16)
    - Weiterführendes: elimination of chain rules for PCFG: [Kui98; ÉK03; Moh09]
  - Calculating the best derivation of a PCFG [Knu77; HC05] (2015-11-16)
- Training II: unsupervised training (2015-11-19)
Translation based on Hierarchical Phrases [LS68; Chi07] (2015-11-23)
- Modeling (2015-11-23)
- Training (2015-11-23, 2015-11-26)
  - Weiterführendes: GIZA++ [ON03]
Tree Transducers: A Syntax-Based Translation Model (2015-11-26, 2015-11-30, 2015-12-03, 2015-12-07, 2015-12-10, 2015-12-14, 2015-12-17)
- Modeling (2015-11-26, 2015-11-30, 2015-12-03)
  - Probabilistic Extended Tree Transducers (2015-11-26, 2015-11-30)
  - Translation Model (2015-11-30)
  - Probabilistic Extended Tree-to-String Transducers (2015-12-03)
  - Example: Implementation of the Yamada-Knight translation model (2015-11-30)
- Training (2015-12-03, 2015-12-07, 2015-12-10, 2015-12-14, 2015-12-17)
  - Rule Extraction (2015-12-03, 2015-12-07)
  - Training of a Probability Assignment (2015-12-07, 2015-12-10, 2015-12-14, 2015-12-17)
    - Probabilistic Regular Tree Grammars (2015-12-10)
    - Input- and output-product of probabilistic tree transducers (2015-12-10, 2015-12-14)
      - Property of Input- and output-product
    - Inside- and outside-probabilities (2015-12-17)
Hidden Markov Models (2016-01-04, 2016-01-07, 2016-01-11)
- Modeling (2016-01-04)
- Forward and backward algorithms (2016-01-07)
- Decoding: Viterbi algorithm (2016-01-11)

Weitere Materialen werden im Laufe der Vorlesung zur Verfügung gestellt. Sie können sich vorab anhand des vorherigen Vorlesungsdurchlaufes einen Überblick verschaffen.

Literatur

[Bro+93]: Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra und Robert L. Mercer. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19.2 (Juni 1993), 263–311. issn: 0891-2017.
[Chi07]: David Chiang. Hierarchical Phrase-Based Translation. Computational Linguistics 33.2 (Juni 2007), 201–228. issn: 0891-2017. doi: 10.1162/coli.2007.33.2.201.
[ÉK03]: Z. Ésik und W. Kuich. Formal Tree Series. J. Autom. Lang. Comb. 8.2 (2003), 219–285.
[HC05]: Liang Huang und David Chiang. Better K-best Parsing. Proceedings of the Ninth International Workshop on Parsing Technology. Parsing ’05. Vancouver, British Columbia, Canada: Association for Computational Linguistics, 2005, 53–64.
[Kni97]: Kevin Knight. Automating knowledge acquisition for machine translation. AI Mag (1997), 81–96.
[Kni99]: Kevin Knight. Decoding complexity in word-replacement translation models. Comput. Linguist. 25 (4 Dez. 1999), 607–615. issn: 0891-2017.
[Knu77]: D.E. Knuth. A Generalization of Dijkstra’s Algorithm. Inform. Process. Lett. 6.1 (Feb. 1977), 1–5.
[Kui98]: W. Kuich. Formal power series over trees. 3rd International Conference on Developments in Language Theory, DLT 1997, Thessaloniki, Greece, Proceedings. Hrsg. von S. Bozapalidis. Aristotle University of Thessaloniki, 1998, 61–101.
[Lop08]: Adam Lopez. Statistical machine translation. ACM Comput. Surv. 40.3 (Aug. 2008), 8:1–8:49. issn: 0360-0300. doi: 10.1145/1380584.1380586.
[LS68]: Philip M. Lewis II und Richard Edwin Stearns. Syntax-Directed Transduction. Journal of the ACM 15.3 (Juli 1968), 465–488. issn: 0004-5411. doi: 10.1145/321466.321477.
[Moh09]: M. Mohri. Weighted automata algorithms. Handbook of Weighted Automata. Hrsg. von M. Droste, W. Kuich und H. Vogler. Springer-Verlag, 2009. Kap. 6, 213–254.
[ON03]: Franz Josef Och und Hermann Ney. A systematic comparison of various statistical alignment models. Computational Linguistics 29.1 (März 2003), 19–51. issn: 0891-2017. doi: 10.1162/089120103321337421.
[WW97]: Ye-Yi Wang und Alex Waibel. Decoding algorithm in statistical machine translation. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics. EACL ’97. Madrid, Spain: Association for Computational Linguistics, 1997, 366–372. doi: 10.3115/979617.979664.
[YK01]: Kenji Yamada und Kevin Knight. A Syntax-based Statistical Translation Model. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. ACL ’01. Toulouse, France: Association for Computational Linguistics, 2001, 523–530. doi: 10.3115/1073012.1073079.
[YK02]: Kenji Yamada und Kevin Knight. A Decoder for Syntax-based Statistical MT. Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Juli 2002, 303–310. doi: 10.3115/1073083.1073134.

weiterführende Literatur: ACL Anthology – “over 21,200 papers on the study of computational linguistics and natural language processing”

Stand: 16.10.2017 12:49 Uhr