Ergänzungen zum Maschinelles Übersetzen natürlicher Sprachen im Sommersemester 2017
Um einen Satz in einer natürlichen Sprache maschinell zu verarbeiten, muss dieser in einer geeigneten Form im Computer repräsentiert werden. Diese Vorlesung befasst sich mit der Darstellung von natürlichsprachlichen Sätzen als sogenannte Hybridbäume. Es wird gezeigt, warum ein Hybridbaum eine geeignete Datenstruktur darstellt und mit welchen formalen Modellen ein Satz automatisch in einen Hybridbaum überführt werden kann. Zudem behandelt die Vorlesung, wie verschiedene Grammatikformalismen aus einer repräsentativen Menge von Hybridbäumen gewonnen werden können.
Termine
Die erste Vorlesung findet am 3. April statt, die erste Übung am 12. April.
- Montags, 2. DS (09:20–10:50 Uhr), APB/E006: Vorlesung
- Donnerstags, 5. DS (14:50–16:20 Uhr), APB/E006: Vorlesung
- Mittwochs, 3. DS (11:10–12:40 Uhr), APB/E008: Übung
In der Woche vom 26.06. bis 29.06. fallen die Vorlesungen und die Übung aus. Am 13.07. findet keine Vorlesung mehr statt.
Übungsaufgaben
- 2017-04-12: 1. Übungsblatt
- 2017-04-19: 2. Übungsblatt
- 2017-04-26: 3. Übungsblatt
- 2017-05-03: 4. Übungsblatt
- 2017-05-10: 5. Übungsblatt
- 2017-05-24: 6. Übungsblatt
- 2017-05-31, 2017-06-01: 7. Übungsblatt Notizen zu Aufgaben 1 und 3
- 2017-06-14: 8. Übungsblatt
- 2017-06-21: 9. Übungsblatt
- 2017-07-05: 10. Übungsblatt
- 2017-07-12: 11. Übungsblatt
Materialien
Foliensatz (Nur aus dem Netz der TU abrufbar; ggf. über VPN herunterladen. Der Foliensatz wird regelmäßig auf den aktuellen Stand der Vorlesung gebracht.) Folien von Heiko Voglers Vortrag Parsing of Natural Languages bei der CAI 2017, welche unter anderem die Induktion von Hybridgrammatiken schrittweise graphisch veranschaulichen.
- 2017-04-03: Einführung zu phrase structures, dependency structures [MM08] und hybrid trees [NV14; GNV17]
                      - Berkeley parser für phrase structures online und zum Download
- Stanford Core NLP (dependency parser) online und zum Download
 
- 2017-04-06: Hybridbäume (formale Definition), Preliminarien
- 2017-04-10: Quality Measures
- 2017-04-10, 2017-04-13: Transitionssysteme [KMN09]
- 2017-04-20: Regular tree grammars,  -algebras, and -algebras, and -homomorphisms [Gog+77] -homomorphisms [Gog+77]
- 2017-04-20: Context-free grammar as RTG + algebra
- 2017-04-24: Other RTG-based models
                      - Linear context-free rewriting systems [VWJ87; Kal10]
                          - Range concatenation grammars [Bou00]
- LCFRS-algebra over  , fan-out , fan-out
 
- Extended top-down tree transducer [Rou70; AD76; GKM08; Mal+09]
- Synchronous context-free grammars [Chi07] (transduction grammars [LS68])
- Synchronous tree substitution grammars [Sch90]
- Synchronous tree-adjoining grammars [ASJ90; SS90; JS97]
 
- Linear context-free rewriting systems [VWJ87; Kal10]
                          
- 2017-04-27, 2017-05-04: LCFRS to describe hybrid trees
                      - properties of LCFRS: simple, lexicalized
- derivation tree generated by a LCFRS, dependency graph
- tree traversal algorithm to obtain a hybrid tree from a derivation tree of a simple or lexicalized LCFRS
- grammar induction for LCFRS
 
- 2017-05-04, 2017-05-08: Probabilistic RTG and Training
- 2017-05-11: Parsing with LCFRS
- 2017-05-15: Simple definite clause programs (sDCP) [DM85] (see also attribute grammars [Knu68])
                      - synthesized attributes, inherited attributes
- simpleness (syntactic single use requirement) [Gie88]
 
- 2017-05-18, 2017-05-22: Hybrid grammars [NV14; GNV17].
                      - potential string grammar component: regular grammar, context-free grammar, LCFRS, macro grammar [Fis68]
- potential tree grammar component: regular tree grammar, context-free tree grammar [Rou70; ES78], tree adjoining grammar, sDCP
- we consider only (LCFRS,sDCP)-hybrid grammars
- shape of a rule of a hybrid grammar
- derivation relation of a hybrid grammar
- probabilistic hybrid grammars
- parsing a string with a probabilistic hybrid grammar
 
- 2017-05-22, 2017-05-29: (Recursive) Partitionings [NV14; GNV17]
                      - Recall parsing complexity of LCFRS (degree)
- Definition recursive partitioning
- Induction of a LCFRS from a string and a recursive partitioning
- Left branching and right branching recursive partitioning
- Extraction of a recursive partitioning from a hybrid tree
- Transformation of a recursive partitioning for reducing the fanout
 
- 2017-06-12: Definitions of top, bottom, gspans, and closure of a set of tree positions
- 2017-06-19: Induction of an sDCP from a phrase structure/dependency structure and a recursive partitioning [NV14; GNV17]
- 2017-06-22: Synchronization of induced LCFRS and sDCP to a hybrid grammar, induction on a corpus, naming scheme (strict labeling)
- 2017-07-02, 2017-07-06: n-best parsing
- 2017-07-10: Experiments
Korpora
- Penn Treebank [MSM94] Download (von TU lizenziert, Account wird benötigt)
- Universal dependencies Dependency Korpora in verschiedenen Sprachen in einem einheitlichen Format unter CC Lizenz.
- Tiger Korpus [For+04; Bra+04] Download
- Negra Korpus [Sku+97] Informationen zur Lizenzierung
Literatur
- [AD76]
- 
                      A. Arnold und M. Dauchet. Bi-transductions de forêts. Proc. 3rd Int. Coll. Automata, Languages and Programming. Hrsg. von S. Michaelson und R. Milner. Edinburgh University Press, 1976, 74–86. 
- [ASJ90]
- 
                      Anne Abeillé, Yves Schabes und Aravind K. Joshi. Using lexicalized TAGs for machine translation. Proc. 13th CoLing. Bd. 3. University of Helsinki, Finland, 1990, 1–6. 
- [BL70]
- 
                      G. Birkhoff und J. D. Lipson. Heterogeneous Algebras. Journal of Combinatorial Theory 8(1) (1970), 115–133. doi: 10.1016/s0021-9800(70)80014-x. 
- [Bla+91]
- 
                      E. Black, S. P. Abney, D. Flickinger, C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. P. Marcus, S. Roukos, B. Santorini und T. Strzalkowski. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. Proceedings of the Workshop on Speech and Natural Language. HLT ’91. Pacific Grove, California: Association for Computational Linguistics, 1991, 306–311. doi: 10.3115/112405.112467. 
- [BM06]
- 
                      S. Buchholz und E. Marsi. CoNLL-X Shared Task on Multilingual Dependency Parsing. Proceedings of the Tenth Conference on Computational Natural Language Learning. CoNLL-X ’06. New York City, New York: Association for Computational Linguistics, 2006, 149–164. 
- [BNV11]
- 
                      M. Büchse, M.-J. Nederhof und H. Vogler. Tree parsing with synchronous tree-adjoining grammars. Proc. 12th Int. Conf. on Parsing Technologies (IWPT 2011). Association for Computational Linguistics, 2011, 14–25. 
- [Bou00]
- 
                      P. Boullier. Range concatenation grammars. Proc. of 6th Int. Workshop on Parsing Technologies (IWPT 2000). 2000. 
- [BPS61]
- 
                      Y. Bar–Hillel, M. Perles und E. Shamir. On formal properties of simple phrase structure grammars. Z. Phonetik. Sprach. Komm. 14 (1961), 143–172. 
- [Bra+04]
- 
                      S. Brants, S. Dipper, P. Eisenberg, S. Hansen-Schirra, E. König, W. Lezius, Chr. Rohrer, G. Smith und H. Uszkoreit. TIGER: Linguistic Interpretation of a German Corpus. Res. Lang. Comput. 2(4) (2004). doi: 10.1007/s11168-004-7431-3. 
- [Bra69]
- 
                      W. S. Brainerd. Tree generating regular systems. Inform. and Control 14 (1969), 217–231. doi: 10.1016/S0019-9958(69)90065-5. 
- [Büc+10]
- 
                      M. Büchse, D. Geisler, T. Stüber und H. Vogler. N-best Parsing Revisited. Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing. ATANLP ’10. Uppsala, Sweden: Association for Computational Linguistics, 2010, 46–54. isbn: 978-1-932432-79-4. 
- [Chi07]
- 
                      D. Chiang. Hierarchical phrase-based translation. Computational Linguistics 33(2) (2007), 201–228. 
- [DM85]
- 
                      P. Deransart und J. Maluszynski. Relating logic programs and attribute grammars. J. Logic Programming 2 (1985), 119–155. 
- [ES78]
- 
                      J. Engelfriet und E.M. Schmidt. IO and OI.II. J. Comput. System Sci. 16.1 (1978), 67–99. 
- [Fis68]
- 
                      M.J. Fischer. Grammars with macro–like productions. Diss. Harvard University, Massachusetts, 1968. 
- [For+04]
- 
                      M. Forst, N. Bertomeu, B. Crysmann, F. Fouvry, S. Hansen-Schirra und V. Kordoni. Towards a dependency-based gold standard for German parsers. Proceedings of the 5th Workshop on Linguistically Interpreted Corpora. 2004. 
- [Gie88]
- 
                      R. Giegerich. Composition and evaluation of attribute coupled grammars. Acta Inform. 25 (1988), 355–423. 
- [GKM08]
- 
                      J. Graehl, K. Knight und J. May. Training tree transducers. Computational Linguistics 34.3 (2008), 391–427. 
- [GNV17]
- 
                      K. Gebhardt, M.-J. Nederhof und H. Vogler. Hybrid grammars for parsing of discontinuous phrase structures and non-projective dependency structures. Computational Linguistics (2017). accepted for publication. doi: 10.1162/COLI_a_00291. 
- [Gog+77]
- 
                      J. A. Goguen, J. W. Thatcher, E. G. Wagner und J. B. Wright. Initial algebra semantics and continuous algebras. J. ACM 24 (1977), 68–95. doi: 10.1145/321992.321997. 
- [HC05]
- 
                      L. Huang und D. Chiang. Better K-best Parsing. Proceedings of the Ninth International Workshop on Parsing Technology. Parsing ’05. Vancouver, British Columbia, Canada: Association for Computational Linguistics, 2005, 53–64. 
- [JS97]
- 
                      A. K. Joshi und Y. Schabes. Tree-adjoining grammars. Handbook of Formal Languages. Hrsg. von G. Rozenberg und A. Salomaa. Bd. 3. vogler-2346: Springer-Verlag, 1997, 69–123. 
- [Kal10]
- 
                      L. Kallmeyer. Parsing beyond context-free grammars. Springer, 2010. doi: 10.1007/978-3-642-14846-0. 
- [KGS11]
- 
                      M. Kuhlmann, C. Gómez-Rodríguez und G. Satta. Dynamic Programming Algorithms for Transition-based Dependency Parsers. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1. HLT ’11. Portland, Oregon: Association for Computational Linguistics, 2011, 673–682. isbn: 978-1-932432-87-9. 
- [KK11]
- 
                      A. Koller und M. Kuhlmann. A Generalized View on Parsing and Translation. Proceedings IWPT 2011. vogler-2443, 2011. 
- [KM10]
- 
                      L. Kallmeyer und W. Maier. Data-driven parsing with probabilistic linear context-free rewriting systems. 23rd International Conference on Computational Linguistics, Beijing, China. vogler-2500, 2010, 537–545. 
- [KM13]
- 
                      L. Kallmeyer und W. Maier. Data-driven parsing using probabilistic linear context-free rewriting systems. Computational Linguistics 39(1) (2013), 87–119. doi: 10.1162/COLI_a_00136. 
- [KMN09]
- 
                      S. Kübler, R. McDonald und J. Nivre. Dependency parsing. Morgan und Claypool Publishers, 2009. doi: 10.2200/S00169ED1V01Y200901HLT002. 
- [Knu68]
- 
                      D. E. Knuth. Semantics of context–free languages. Math. Systems Theory 2 (1968), 127–145. 
- [KS09]
- 
                      M. Kuhlmann und G. Satta. Treebank Grammar Techniques for Non-projective Dependency Parsing. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. EACL ’09. Athens, Greece: Association for Computational Linguistics, 2009, 478–486. 
- [LS68]
- 
                      P. M. Lewis und R. E. Stearns. Syntax-directed transduction. J. ACM 15.3 (1968), 465–488. 
- [Mal+09]
- 
                      A. Maletti, J. Graehl, M. Hopkins und K. Knight. The Power of Extended Top-down Tree Transducers. SIAM J. Comput. 39.2 (2009), 410–430. 
- [MM08]
- 
                      M.-C. de Marneffe und C. D. Manning. „Stanford typed dependencies manual“. unpublished. 2008. 
- [MMT05]
- 
                      Takuya Matsuzaki, Yusuke Miyao und Jun’ichi Tsujii. Probabilistic CFG with Latent Annotations. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. ACL ’05. Ann Arbor, Michigan: Association for Computational Linguistics, 2005, 75–82. doi: 10.3115/1219840.1219850. 
- [MS08]
- 
                      W. Maier und A. Søgaard. Tree-banks and mildly context-sensitivity. Proc. of Formal Grammar 2008. 2008, 61–76. 
- [MSM94]
- 
                      M. P. Marcus, B. Santorini und M. A. Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19.2 (1994), 313–330. 
- [Ned03]
- 
                      M.-J. Nederhof. Weighted deductive parsing and Knuth’s algorithm. Computational Linguistics 29(1) (2003), 135–143. 
- [Niv08]
- 
                      J. Nivre. Algorithms for Deterministic Incremental Dependency Parsing. Computational Linguistics 34(4) (2008), 513–553. doi: 10.1162/coli.07-056-R1-07-027. 
- [Niv09]
- 
                      J. Nivre. Non-Projective Dependency Parsing in Expected Linear Time. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Suntec, Singapore: Association for Computational Linguistics, Aug. 2009, 351–359. 
- [NV14]
- 
                      M.-J. Nederhof und H. Vogler. Hybrid grammars for discontinuous parsing. Proc. of 25th International Conference on Computational Linguistics (COLING 2014). 2014. 
- [Pet+06]
- 
                      Slav Petrov, Leon Barrett, Romain Thibaux und Dan Klein. Learning Accurate, Compact, and Interpretable Tree Annotation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Sydney, Australia: Association for Computational Linguistics, 2006, 433–440. 
- [Rou70]
- 
                      W. C. Rounds. Mappings and grammars on trees. Math. Systems Theory 4.3 (1970), 257–287. 
- [Sch90]
- 
                      Y. Schabes. Mathematical and computational aspects of lexicalized grammars. Diss. University of Pennsylvania, 1990. 
- [Sek+91]
- 
                      H. Seki, T. Matsumura, M. Fujii und T. Kasami. On multiple context-free grammars. Theoretical Computer Science 88 (1991), 191–229. 
- [Sku+97]
- 
                      W. Skut, B. Krenn, T. Brants und H. Uszkoreit. An annotation scheme for free word order languages. Fifth Conference on Applied Natural Language Processing. Washington, DC, USA, March–April. 1997, 88–95. doi: 10.3115/974557.974571. 
- [SS90]
- 
                      Stuart M. Shieber und Yves Schabes. Synchronous tree-adjoining grammars. Proc. 13th CoLing. ACL, 1990, 253–258. 
- [VWJ87]
- 
                      K. Vijay-Shanker, D. J. Weir und A. K. Joshi. Charaterizing structural descriptions produced by various grammatical formalisms. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1987, 104–111. doi: 10.3115/981175.981190. 
weiterführende Literatur: ACL Anthology – “over 40,000 papers on the study of computational linguistics and natural language processing”
 -sorted sets,
-sorted sets,  -sorted trees,
-sorted trees,  -algebras,
-algebras,  -homomorphisms,
-homomorphisms,  -term algebra
-term algebra -free (
-free ( -rules), simple
-rules), simple -free, and simple LCFRS [
-free, and simple LCFRS [