Part-of-speech tagging/syntactic parsing of emergent texts


The goal of this project is to implement a part-of-speech tagger and investigate the possibilities of developing a syntactic parser that could handle emergent text, i.e. texts – or representations of texts – that are being produced (and thus frequently changed) in order to identify the syntactic location of for example pauses.


In research on language production, pauses and revisions are generally viewed as a window to the underlying cognitive and linguistic processes. In research on written language production these processes are captured by means of keystroke logging programs that records all keystrokes and mouse movements and their temporal distributions. These program generates vast amounts of data which are time consuming to analyse manually. Thus a part-of-speech tagger that could handle emergent text would be of utter importance for quantitative analyses of large language production corpora. Naturally, a syntactic parser would add even more value.

Problem description

To develop an HMM-tagger for emergent texts (primarily in Swedish,  but English texts could also be made available)

To investigate the possibilities of implementing a discourse-based incremental parser for emergent texts and if possible implement it.

