.Enriching IDS using Wiktionary and other multilingual resources


The IDS list is a kind of “universal base vocabulary” containing about 1,500 word senses. See <http://lingweb.eva.mpg.de/ids/>, <http://spraakbanken.gu.se/eng/research/digital-areal-linguistics/word-lists> and <http://spraakbanken.gu.se/swe/sblex/resources#lwt>. There is a general wish on the part of the main editor of the IDS effort to collect IDS lists for as many languages as possible.

Problem description

This project should address the problem of using freely available multilingual resources, such as Wiktionary, in order to add new full or partial IDS lists to the collection. The work should include implementing a way of generating candidate IDS lists from, e.g., Wiktionary, as well as an evaluation of the method by using it to generate lists for languages that are already in the IDS collection.

Recommended skills

  • Fair knowledge of lexicography
  • Good programming skills


Lars Borin and possibly others, Språkbanken

2012-11-26

