Einführung in das Parsing natürlicher Sprachen im Wintersemester 2021/22
Aktuelle Hinweise:
- Vorlesung und Übung finden online statt. Die Links werden im OPAL-Kurs bekanntgegeben.
- Für weitere Informationen zu corona-bedingten Regelungen bitte die Corona-FAQ der TU Dresden konsultieren.
Einführung
Um einen Satz in einer natürlichen Sprache maschinell zu verarbeiten, muss dieser in einer geeigneten Form im Computer repräsentiert werden. Diese Vorlesung befasst sich mit der Darstellung von natürlichsprachlichen Sätzen als sogenannte Hybridbäume. Es wird gezeigt, warum ein Hybridbaum eine geeignete Datenstruktur darstellt und mit welchen formalen Modellen ein Satz automatisch in einen Hybridbaum überführt werden kann. Zudem behandelt die Vorlesung, wie verschiedene Grammatikformalismen aus einer repräsentativen Menge von Hybridbäumen gewonnen werden können.
Termine
Vorlesung: Donnerstag, 2. DS (9:20 Uhr bis 10:50 Uhr), online
Übung: Mittwoch, 4. DS (13:00 Uhr bis 14:30 Uhr), online
Der Übungsbetrieb startet in der 42. KW (d.h. am 20.10.).
Materialien
Hinweis: Sie können sich anhand des vorherigen Durchlaufs einen Überblick verschaffen. Dort kann weiterhin auf die Audioaufzeichnungen zugegriffen werden. Achtung: für die Prüfung des aktuellen Durchlaufs sind diese Aufzeichnungen nicht relevant!
Der Download der Vorlesungsmaterialien ist auf das Netzwerk der TU Dresden beschränkt. Für Downloads aus anderen Netzen können Sie den VPN-Zugang des ZIHs benutzen.
Vorlesung
Der Foliensatz wird im Verlauf des Semesters aktualisiert.
- Foliensatz zur Vorlesung (zuletzt am 13.01.2022 aktualisiert)
- Definition Hybridbaum
- Definition Hybridgrammatik
- Grammar induction of LCFRS from a hybrid tree corpus
Übung
An dieser Stelle werden im Verlauf des Semesters die Übungsblätter bereitgestellt.
- 1. Übungsblatt (KW 42)
- 2. Übungsblatt (KW 43)
- 3. Übungsblatt (KW 44) depparse.tar.gz
- 4. Übungsblatt (KW 45)
- 5. Übungsblatt (KW 47)
- 6. Übungsblatt (KW 48)
- 7. Übungsblatt (KW 49)
- 8. Übungsblatt (KW 2)
- 9. Übungsblatt (KW 3)
- 10. Übungsblatt (KW 4)
Literatur
- Abeillé, A., Schabes, Y., and Joshi, A.K. 1990. Using lexicalized TAGs for machine translation. Proc. 13th CoLing, University of Helsinki, Finland, 1–6.
- Arnold, A. and Dauchet, M. 1976. Bi-transductions de forêts. Proc. 3rd Int. Coll. Automata, Languages and Programming, Edinburgh University Press, 74–86.
- Büchse, M., Nederhof, M.-J., and Vogler, H. 2011. Tree parsing with synchronous tree-adjoining grammars. Proc. 12th Int. Conf. on Parsing Technologies (IWPT 2011), Association for Computational Linguistics, 14–25.
- Büchse, M., Geisler, D., Stüber, T., and Vogler, H. 2010. N-best Parsing Revisited. Proceedings of the 2010 Workshop on Applications of Tree Automata in Natural Language Processing, Association for Computational Linguistics, 46–54. [url]
- Birkhoff, G. and Lipson, J.D. 1970. Heterogeneous Algebras. Journal of Combinatorial Theory 8(1), 115–133. [doi]
- Black, E., Abney, S.P., Flickinger, D., et al. 1991. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars. Proceedings of the Workshop on Speech and Natural Language, Association for Computational Linguistics, 306–311. [doi, url]
- Boullier, P. 2000. Range concatenation grammars. Proc. of 6th Int. Workshop on Parsing Technologies (IWPT 2000). [url]
- Brainerd, W.S. 1969. Tree generating regular systems. Inform. and Control 14, 217–231. [doi]
- Brants, S., Dipper, S., Eisenberg, P., et al. 2004. TIGER: Linguistic Interpretation of a German Corpus. Res. Lang. Comput. 2(4). [doi]
- Buchholz, S. and Marsi, E. 2006. CoNLL-X Shared Task on Multilingual Dependency Parsing. Proceedings of the Tenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, 149–164. [url]
- Chiang, D. 2007. Hierarchical phrase-based translation. Computational Linguistics 33(2), 201–228.
- Deransart, P. and Maluszynski, J. 1985. Relating logic programs and attribute grammars. J. Logic Programming 2, 119–155.
- Drewes, F., Gebhardt, K., and Vogler, H. 2016. EM-Training for Weighted Aligned Hypergraph Bimorphisms. Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata, Association for Computational Linguistics, 60–69. [url]
- Engelfriet, J. and Schmidt, E.M. 1978. IO and OI.II. J. Comput. System Sci. 16, 1, 67–99.
- Fischer, M.J. 1968. Grammars with macro–like productions. .
- Forst, M., Bertomeu, N., Crysmann, B., Fouvry, F., Hansen-Schirra, S., and Kordoni, V. 2004. Towards a dependency-based gold standard for German parsers. Proceedings of the 5th Workshop on Linguistically Interpreted Corpora.
- Gebhardt, K., Nederhof, M.-J., and Vogler, H. 2017. Hybrid grammars for parsing of discontinuous phrase structures and non-projective dependency structures. Computational Linguistics. [doi]
- Gebhardt, K. 2018. Generic refinement of expressive grammar formalisms with an application to discontinuous constituent parsing. Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018). [url]
- Giegerich, R. 1988. Composition and evaluation of attribute coupled grammars. Acta Inform. 25, 355–423.
- Goguen, J.A., Thatcher, J.W., Wagner, E.G., and Wright, J.B. 1977. Initial algebra semantics and continuous algebras. J. ACM 24, 68–95. [doi]
- Graehl, J., Knight, K., and May, J. 2008. Training tree transducers. Computational Linguistics 34, 3, 391–427.
- Huang, L. and Chiang, D. 2005. Better K-best Parsing. Proceedings of the Ninth International Workshop on Parsing Technology, Association for Computational Linguistics, 53–64. [url]
- Joshi, A.K. and Schabes, Y. 1997. Tree-adjoining grammars. In: G. Rozenberg and A. Salomaa, eds., Handbook of Formal Languages. Springer-Verlag, 69–123.
- Kübler, S., McDonald, R., and Nivre, J. 2009. Dependency parsing. Morgan and Claypool Publishers. [doi]
- Kallmeyer, L. 2010. Parsing beyond context-free grammars. Springer. [doi]
- Kallmeyer, L. and Maier, W. 2010. Data-driven parsing with probabilistic linear context-free rewriting systems. 23rd International Conference on Computational Linguistics, Beijing, China, 537–545. [url]
- Kallmeyer, L. and Maier, W. 2013. Data-driven parsing using probabilistic linear context-free rewriting systems. Computational Linguistics 39(1), 87–119. [doi]
- Knuth, D.E. 1968. Semantics of context–free languages. Math. Systems Theory 2, 127–145.
- Koller, A. and Kuhlmann, M. 2011. A Generalized View on Parsing and Translation. Proceedings IWPT 2011. [url]
- Kuhlmann, M. and Satta, G. 2009. Treebank Grammar Techniques for Non-projective Dependency Parsing. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 478–486. [url]
- Kuhlmann, M., Gómez-Rodríguez, C., and Satta, G. 2011. Dynamic Programming Algorithms for Transition-based Dependency Parsers. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Association for Computational Linguistics, 673–682. [url]
- Lewis, P.M. and Stearns, R.E. 1968. Syntax-directed transduction. J. ACM 15, 3, 465–488.
- Maier, W. and Søgaard, A. 2008. Tree-banks and mildly context-sensitivity. Proc. of Formal Grammar 2008, 61–76. [url]
- Maletti, A., Graehl, J., Hopkins, M., and Knight, K. 2009. The Power of Extended Top-down Tree Transducers. SIAM J. Comput. 39, 2, 410–430.
- Marcus, M.P., Santorini, B., and Marcinkiewicz, M.A. 1994. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2, 313–330. [url]
- Matsuzaki, T., Miyao, Y., and Tsujii, J. 2005. Probabilistic CFG with Latent Annotations. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, 75–82. [doi, url]
- Nederhof, M.-J. 2003. Weighted deductive parsing and Knuth’s algorithm. Computational Linguistics 29(1), 135–143.
- Nederhof, M.-J. and Vogler, H. 2014. Hybrid Grammars for Discontinuous Parsing. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin City University and Association for Computational Linguistics, 1370–1381. [url]
- Nivre, J. 2008. Algorithms for Deterministic Incremental Dependency Parsing. Computational Linguistics 34(4), 513–553. [doi]
- Nivre, J. 2009. Non-Projective Dependency Parsing in Expected Linear Time. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Association for Computational Linguistics, 351–359. [url]
- Petrov, S., Barrett, L., Thibaux, R., and Klein, D. 2006. Learning Accurate, Compact, and Interpretable Tree Annotation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 433–440. [url]
- Rounds, W.C. 1970. Mappings and grammars on trees. Math. Systems Theory 4, 3, 257–287.
- Schabes, Y. 1990. Mathematical and computational aspects of lexicalized grammars. .
- Seki, H., Matsumura, T., Fujii, M., and Kasami, T. 1991. On multiple context-free grammars. Theoretical Computer Science 88, 191–229.
- Shieber, S.M. and Schabes, Y. 1990. Synchronous tree-adjoining grammars. Proc. 13th CoLing, ACL, 253–258.
- Skut, W., Krenn, B., Brants, T., and Uszkoreit, H. 1997. An annotation scheme for free word order languages. Fifth Conference on Applied Natural Language Processing, 88–95. [doi]
- Marneffe, M.-C. de and Manning, C.D. 2008. Stanford typed dependencies manual. Stanford University. [url]
- Bar–Hillel, Y., Perles, M., and Shamir, E. 1961. On formal properties of simple phrase structure grammars. Z. Phonetik. Sprach. Komm. 14, 143–172.
- Vijay-Shanker, K., Weir, D.J., and Joshi, A.K. 1987. Charaterizing structural descriptions produced by various grammatical formalisms. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 104–111. [doi]
Kontakt
- 
                      Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged 
 Heiko Vogler
 Tel.: +49 (0) 351 463-38232
- 
                      Dr. rer. nat.
                      Richard Mörbitz
                      
 Tel.: +49 (0) 351 463-38487