  • Topic models in Swedish Litterature and other Collections

Topic modeling is a simple way to analyze large volumes of unlabeled text. A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. (Wikipedia <http://en.wikipedia.org/wiki/Topic_model>). Thus, a "topic" consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with "similar" meanings and distinguish between uses of words with multiple meanings. For a general introduction to topic modeling, see for example: Steyvers and Griffiths (2007).

Material and applications

The textual material the topic modeling resources will be applied on is i) Swedish literature collections and ii) Swedish biomedical texts. The Purpose is to identify e.g. topics that rose or fell in popularity; classify text passages (cf. Jockers, 2011); visualize topics with authors (cf. Meeks, 2011); identify potential issues of interest for historians, literary scholars or other (cf. Yang et al., 2011).

Avaialable Software to be used:

  • MALLET <http://mallet.cs.umass.edu/topics.php>
  • Gensim – Topic Modelling for Humans (Python) <http://radimrehurek.com/gensim/>
  • topicmodel in R; <http://cran.r-project.org/web/packages/topicmodels/vignettes/topicmodels.pdf>
  • Comprehensive list of topic modeling software <http://www.cs.princeton.edu/~blei/topicmodeling.html>


Good programming skills
Not necessary to have Swedish as mother tongue!


Dimitrios Kokkinakis
Richard Johansson
Mats Malm


Page updated: 2014-11-12 15:08

