Customize

1. You can enlarge the whole site (character size and with) by using the browser function to change characters size.

2. To your right it is possible to change the character size, font, spacing, characters and letters as well as adjust the colours. This will have consequences for the appearance of the whole website design. It will effect all pages at the University  of Gothenburg's website. The changes will remain the next time you log in. (To save your changes the browser must allow cookies.)

*Changes has been made to the look of this website


  • Home
  • Spelling variation in Swedish text

Spelling variation in Swedish text

Goal

Dealing with spelling variation in Swedish text in order to improve lemmatization, part-of-speech tagging and parsing.

Background

Språkbanken <http://språkbanken.gu.se> uses an in-house large lexical resource cum morphological analyzer, plus an off-the-shelf part-of-speech tagger and dependency parser to annotate its online corpora. These tools expect standardized spellings in the texts to be analyzed (although the data-driven tools – the POS tagger and parser – will handle out-of-vocabulary items which are not recognized by the morphological analyzer).

Problem description

Many of the texts in Språkbanken also sport non-standard spellings, either because they represent a pre-standardization language stage – medieval and 17th century texts – or because they are full of spelling errors and variants, which often is the case with modern blog texts. The problem consists in developing and implementing a (partial) solution for discovering and dealing with the spelling variation in modern texts (for which we already have sufficiently large-scale language analysis tools). Preferably the solution should be general and extensible to other text types. The work thus includes a good deal of linguistic analysis of lemmatizer, POS tagger and parser output.

Recommended skills

  • Good knowledge of Swedish grammatical analysis
  • General familiarity with POS tagging and parsing
  • Good programming skills

Supervisor(s)

Lars Borin and possibly others, Språkbanken

To the top

Page updated: 2012-11-26 23:36

Send as email
Print page
Show as pdf

X
Loading