• Home
  • CLT seminar: Krasimir Angelov - Porting Penn Treebank to GF

CLT seminar: Krasimir Angelov - Porting Penn Treebank to GF


So far GF was used only for parsing small controlled languages but the improvements in the parsing performance in the last few years made it possible to dream about parsing open domain unrestricted text. In the resource libraries, we already have wide coverage grammars for many languages but having a grammar is only part of the problem. Even if we improve more and more our grammars it will be always possible to find syntactic constructions which are not covered by the grammar. Another problem is that when we add more syntactic constructions in the grammar, this usually makes it more ambiguous. We need a parser that is robust and is able to do statistical ranking when there are ambiguities in the grammar. I did some preliminary expreriments in robust parsing but the general conclusion is that we need a good treebank which we can use for statistical training. Since we don't want to build our own treebanks an attractive alternative is to try to convert some existing one.

In this talk I will present the current state of the GF port of Penn Treebank. All parse trees in the treebank were converted to abstract syntax trees for the English Resource Grammar. When there are unknown syntactic constructions then we just leave placeholders in the abstract tree. Currently we have matched 69% of the constructions with the grammar. More is possible but takes time.

Date: 2011-03-17 10:15 - 12:00

Location: L308, Lennart Torstenssonsgatan 8


add to Outlook/iCal

To the top

Page updated: 2011-03-09 11:27

Send as email
Print page
Show as pdf