• Home
  • Topic models in Swedish Litterature and other Collections

Topic models in Swedish Litterature and other Collections

 

Topic modeling is a simple way to analyze large volumes of unlabeled text. A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. (Wikipedia <http://en.wikipedia.org/wiki/Topic_model>). Thus, a "topic" consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with "similar" meanings and distinguish between uses of words with multiple meanings. For a general introduction to topic modeling, see for example: Steyvers and Griffiths (2007).

Material and applications

The textual material the topic modeling resources will be applied on is i) Swedish literature collections and ii) Swedish biomedical texts. The Purpose is to identify e.g. topics that rose or fell in popularity; classify text passages (cf. Jockers, 2011); visualize topics with authors (cf. Meeks, 2011); identify potential issues of interest for historians, literary scholars or other (cf. Yang et al., 2011).
 

Avaialable Software to be used:

  • MALLET <http://mallet.cs.umass.edu/topics.php>
  • Gensim – Topic Modelling for Humans (Python) <http://radimrehurek.com/gensim/>
  • topicmodel in R; <http://cran.r-project.org/web/packages/topicmodels/vignettes/topicmodels.pdf>
  • Comprehensive list of topic modeling software <http://www.cs.princeton.edu/~blei/topicmodeling.html>

Requirements

Good programming skills
Not necessary to have Swedish as mother tongue!
 

Supervisors

Dimitrios Kokkinakis
Richard Johansson
Mats Malm

References

Blei DM. 2012. Probabilistic  topic models. Communications of the ACM. vol. 55 no. 4. <http://www.cs.princeton.edu/~blei/papers/Blei2012.pdf>

Jockers M. 2011 Who's your DH Blog Mate: Match-Making the Day of DH Bloggers with Topic Modeling Matthew L. Jockers, posted 19 March 2010

Meeks E. 2011 Comprehending the Digital Humanities Digital Humanities Specialist, posted 19 February 2011

Steyvers M. and Griffiths T. (2007). Probabilistic Topic Models. In T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum. <http://psiexp.ss.uci.edu/research/papers/SteyversGriffithsLSABookFormatted.pdf>.

Yang T., Torget A. and Mihalcea R. (2011) Topic Modeling on Historical Newspapers. Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. The Association for Computational Linguistics, Madison, WI. pages 96–104.

Extensive Topic Modeling bibliography: <http://www.cs.princeton.edu/~mimno/topics.html>

To the top

Page updated: 2014-11-12 15:08

Send as email
Print page
Show as pdf

X
Loading