• Home
  • A multilingual corpus database for typological and genetic linguistics

A multilingual corpus database for typological and genetic linguistics


.Building a multilingual corpus database and interface for typological and genetic linguistics research


Over the last few years, linguists and computaional linguists have started looking into the possibilities of using multilingual corpora (mainly parallel corpora) for typological and genetic linguistic research.

Problem description

The aims of this work are (1) to collect and link at the verse level as many digitized Bible texts as possible; (2) to apply linguistic annotation tools for those languages where such tools are available (at least English and Swedish); (3) to correlate linguistic units of varying granularity among the languages using the linguistic annotations and freely available word alignment tools; (4) to design the first version of a user interface for conducting research with the database; (5) to conduct a small typological or genetic linguistic study as a showcase of the utility of the database and user interface.

Recommended skills

  • Good knowledge of typological and possibly genetic linguistics
  • Very good programming skills


Lars Borin and possibly others, Språkbanken

To the top

Page updated: 2012-11-26 23:38

Send as email
Print page
Show as pdf