• Home
  • Final PhD seminar: Taraka Rama – Studies in Computational historical linguistics

Final PhD seminar: Taraka Rama – Studies in Computational historical linguistics


Computational analysis of historical and typological data has made great progress in the last fifteen years. In this thesis, we work with vocabulary lists for addressing some classical problems in historical linguistics such as discriminating related languages from unrelated languages, assigning possible dates to splits in a language family, employing structural similarity for language classification, and providing an internal structure to a language family.

In this thesis, we compare the internal structure inferred from vocabulary lists with the family trees inferred given in Ethnologue. We also explore the ranking of lexical items in the widely used Swadesh word list and compare our ranking to another quantitative reranking method and short word lists composed for discovering long-distance genetic relationships. We also show that the choice of string similarity measures is important for internal classification and for discriminating related from unrelated languages. The dating system presented in this thesis can be used for assigning age estimates to any new language group and overcomes the criticism of constant rate of lexical change assumed by glottochronology. We also train and test a linear classifier based on gap-weighted subsequence features for the purpose of cognate identification. An important conclusion from these results is that n-gram approaches can be used for different historical linguistic purposes.

Examiner: Jörg Tiedemann, Department of Linguistics and Philology, Uppsala University

Link to thesis: http://spraakdata.gu.se/taraka/slut_seminar_thesis.pdf

Date: 2015-05-28 13:15 - 16:00

Location: L308, Lennart Torstenssonsgatan 8


add to Outlook/iCal

To the top

Page updated: 2015-05-28 13:15

Send as email
Print page
Show as pdf