• Home
  • Licentiate thesis defense: Shafqat Virk

Licentiate thesis defense: Shafqat Virk


Shafqat Virk (graduate student at Applied IT) will defend his thesis for the licentiate degree: "Computational Grammar Resources for Indo-Iranian Languages"

Discussion leader: Hans Leiß


A lot of research is being carried out on different aspects of natural language processing (NLP). As a result, there exist state of the art machine translation systems such as Google translate. These systems are based on data driven (i.e. statistical) approaches, which provide huge coverage, but at the cost of limited accuracy. On the other hand knowledge intensive (i.e. grammar based) approaches provide high quality translations, but their coverage is limited. Generally speaking, one can say that a combined solution with human-like capabilities for accuracy and coverage is yet too far to be reached. One major reason of this limitation is the fact that natural languages are complex and ambiguous. This makes it a challenging task to develop a computational grammar of a natural language. Developing a grammar of a natural language requires at least comprehensive knowledge of the language, expertise to describe it, and also practical skills with a grammar formalism tool. This thesis is devoted to the development of computational grammars of four Indo-Iranian languages: Nepali, Persian, Punjabi, and Urdu. We explore different lexical and syntactical features of these languages and develop their resource grammars according to the requirements of Grammatical Framework (GF) resource grammar API.

Grammatical Framework (GF) is a grammar formalism tool, which has been used to develop grammars of a number of natural languages. So far most of these languages belong to the Germanic, Romance, or Slavic branches of the Indo- European family of languages. On the other hand, Indo-Iranian is the largest branch of the Indo-European family. With 310 living languages, this branch comprises 70% of the total languages in its family. Most of these Indo-Iranian languages have either very limited computational resources or no resources at all. This is one reason of developing resource grammars of these resource-limited languages.

Furthermore, Indo-Iranian languages have some distinctive features such as the Ezafe construction and the partial ergativigy. Previously, none of these features has been implemented in GF. Another reason of this study is to explore this dimension, and demonstrate implementation of these features in GF

Date: 2012-01-27 13:15 - 15:15

Location: Steve Jobs, plan 2 Patricia, Forskningsgången 6, Lindholmen


add to Outlook/iCal

To the top

Page updated: 2012-01-23 13:04

Send as email
Print page
Show as pdf