• Home
  • CLT seminar: Prasanth Kolachina - Extraction of Wide-coverage Grammars from annotated corpora

CLT seminar: Prasanth Kolachina - Extraction of Wide-coverage Grammars from annotated corpora


There have been a number of hand-crafted computational grammar development projects in the last few decades attempting to build comprehensive wide-coverage grammars for natural languages using different grammatical formalisms. Such formalisms have been designed to be richer than context-free grammars in terms of their generative capacity. While efforts in defining these formalisms have contributed to grammars with detailed linguistic analysis, such grammars also lack the distributional information necessary for disambiguation tasks such as parsing. Alternatively, grammars constructed with necessary distributional information from annotated corpora like treebanks have shown to be effective in a wide variety of NLP applications, but are typically not linguistically interesting.

However, these efforts to construct grammars from annotated corpora are often interleaved with language-specific and annotation-specific information to extract linguistic units of grammars. Such annotation-specific information can be abstracted away during grammar extraction, allowing uniform extraction of grammars for multiple languages. This has been verified in the case of context-free grammars where language-independent methods to construct grammars from corpora have been proposed over time. In my talk, I will address this issue in the context of Tree Adjoining Grammars. TAG grammars, proposed by Joshi et. al (1976) have been developed for a wide range of languages and put to use in a multitude of NLP applications ranging from parsing to generation.

I propose a ‘normative’ grammar extraction procedure to extract multi-lingual TAG grammars by seperating out language- and annotation-specific details out of the extraction procedure. As part of this, I will address the specific problem of inducing argument/adjunct distinction in syntactic structures without using annotation-specific details. I will present the results of my experiments on the Swedish treebank Talbanken, and show that the procedure can indeed work in an annotation-neutral manner. The results show that the extracted grammars can serve as a first-order approximation to hand-crafted grammars useful in creating wide-coverage grammars.

Date: 2014-02-27 10:30 - 11:30

Location: L308, Lennart Torstenssonsgatan 8


add to Outlook/iCal

To the top

Page updated: 2014-02-25 08:34

Send as email
Print page
Show as pdf