Given a number of Swedish novels taken from the Swedish Literature Bank (<http://litteraturbanken.se/#!om/inenglish>), pre-annotated with named entities (i.e. person names with their gender [male, female or unknown]), the purpose of this work is to:
i) find pronominal and other references associated with these person entities and link them to each other and ii) apply different visualization techniques for analyzing the entities in these novels with respect to the characters involved; e.g. using a network representation so that it would be easy to identify possible clusters of e.g. people "communities".
Obviously (i) aims at developing a (simple) coreference resolution software for Swedish either rule based, machine learning or hybrid. According to Wikipedia: "co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent." For example, in the sentence "Mary said she would help me", "she" and "Mary" are most likely referring to the same person or group, in which case they are coreferent. Similarly, in "I saw Scott yesterday. He was fishing by the lake," Scott and he are most likely coreferent." With respect of (ii) any available visualization software can be used and there a number available such as: Visone; Touchgraph or Gephi.
As a practical application the resulting software will be used as a supporting technology for literature scholars that want to get a bird's eye view on analyzing literature; for social network analysis etc.
This project deals with "name linking and visualization" in digital collections (e.g. novels). Theoretically the focus of the project will be framed around the term ”distant reading” (Moretti, 2005) or "macro analysis". Distant reading means that "the reality of the text undergoes a process of deliberate reduction and abstraction”. According to this view, understanding literature is not accomplished by studying individual texts, but by aggregating and analyzing massive amounts of data. This way it becomes possible to detect possible hidden aspects in plots, the structure and interactions of characters becomes easier to follow enabling experimentation and exploration of new uses and development that otherwise would be impossible to conduct. Moretti advocated the usage of visual representations such as graphs, maps, and trees for literature analysis.
Some Swedish language skills - probably not need to be a native speaker
Very good programming skills.
Dimitrios Kokkinakis, PhD, Department of Swedish
Richard Johansson, PhD, Department of Swedish
Mats Malm, Prof., Department Language and Literature
Some Relevant Links
Matthew L. Jockers website <http://www.matthewjockers.net/>
Franco Moretti. 2005. Graphs, maps, trees: abstract models for a literary history. R. R. Donnelley & Sons.
Daniela Oelke, Dimitrios Kokkinakis, Mats Malm. (2012). Advanced Visual Analytics Methods for Literature Analysis. Proceedings of the Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH). An EACL 2012 workshop. Avignon, France. <http://demo.spraakdata.gu.se/svedk/pbl/FINAL_eacl2012-1.pdf>