• event



The 7th annual CLT workshop brings together researchers in language technology and computational linguistics from the University of Gothenburg and Chalmers. We will exchange research results and ideas, discuss the future of CLT, and (last but not least) socialize.


Thursday 30/11

  • 09.00 embark on bus at Olof Wijksgatan 6 (outside FLoV)
  • 09.15 bus leaves
  • 10.45 bus arrives at Gullmarsstrand
  • 10.45-11.15 coffee
  • 11.15-12.00 welcome + presentation session (2 talks; chair: Staffan Larsson)
    • Elena Volodina: SweLL - an upcoming infrastructure for Swedish as a Second Language
    • Dan Rosén: The SweLL normalization editor for learner texts
  • 12.00 lunch
  • 13.20-14.20 presentations (3 talks; chair: Gerlof Bouma)
    • Yuri Bizzoni and Shalom Lappin: Predicting Gradient Metaphor Paraphrase Judgments with a Composite DNN
    • Jacobo Rouces: Sentiment Analysis in Swedish
    • Simon Dobnik: KILLE: a Framework for Situated Agents for Learning Language Through Interaction
  • 14.20-14.30 poster/demo madness (chair: Robin Cooper)
  • 14.30-15.30 poster/demo session 1
  • 15.30-16.10 coffee + check-in
  • 16.10-17.10 presentations (3 talks; chair: Peter Ljunglöf)
    • Ellen Breitholtz and Chris Howes: Incremental Reasoning in Dialogue Involving Patients with Schizophrenia
    • Haris Themistocleous: Deciphering the speech signal: speakers and communities of speakers
    • Robin Cooper: Playing games with types
  • 17.30 Glögg
  • 18.00 Dinner

Friday 1/12

  • before 09.00: check out
  • 09.00-10.00 presentations (3 talks; chair: Jacobo Rouces)
    • Aarne Ranta: Developing a Mobile Translation App for Healthcare
    • Katie Fraser: Detecting cognitive impairment from speech
    • Asad Sayeed: Semantic roles and event knowledge
  • 10.00-10.10 poster/demo madness (chair: Markus Forsberg)
  • 10.10-11.20 poster/demo session 2 + coffe
  • 11.20-11.50 wrap up, discussion, planning (chair: Peter Ljunglöf)
  • 11.50 lunch
  • 13.00 bus leaves Gullmarsstrand
  • 14.30 bus arrives at Olof Wijksgatan 6

Poster/demo session 1 (chair: Robin Cooper)

  • Staffan Larsson: Approaches to compositionality for perceptual meanings
  • Ildikó Pilán: Identifying correction candidates for Swedish learners’ spelling errors
  • Markus Forsberg: Strix: A new bird at Språkbanken
  • Dana Dannélls: Second language learners acquisition of Swedish constructions – A case study
  • Inari Listenmaa: Testing GF grammars
  • Christine Howes: Feedback relevance spaces: The organisation of increments in conversation.
  • Malin Ahlberg: News from Karp - Språkbankens lexical infrastructure
  • Gerlof Bouma and Yvonne Adesam: Eukalyptus treebank of written Swedish
  • Peter Ljunglöf: Interactive correction of speech recognition errors

Poster/demo session 2 (chair: Markus Forsberg)

  • Mehdi Ghanimifard: Spatial Relations in Visually Grounded Neural Language Models
  • Richard Johansson: Introduction to the EPE shared task
  • Richard Johansson: Training Word Sense Embeddings With Lexicon-based Regularization"
  • Herbert Lange: Language Learning with MUSTE
  • Prasanth Kolachina: TBD
  • Vladislav Maraev: Laughter-infused dialogue systems
  • Stergios Chatzikyriakidis: Coq for Natural Language Semantics
  • Sylvie Saget: Cooperative Speaker Revisited


  • Yvonne Adesam, Department of Swedish (Språkbanken)
  • Malin Ahlberg, Department of Swedish (Språkbanken)
  • Yuri Bizzoni, FLoV, CLASP
  • Gerlof Bouma, Department of Swedish (Språkbanken)
  • Ellen Breitholtz, FLoV, CLASP
  • Stergios Chatzikyriakidis, FLoV, CLASP
  • Robin Cooper, FLoV, CLASP
  • Dana Dannélls, Department of Swedish (Språkbanken)
  • Simon Dobnik, FLoV, CLASP
  • Markus Forsberg, Department of Swedish (Språkbanken)
  • Katie Fraser, Department of Swedish (Språkbanken)
  • Mehdi Ghanimifard, FLoV, CLASP
  • Christine Howes, FLoV, CLASP
  • Richard Johansson, CSE
  • Prasanth Kolachina, CSE
  • Herbert Lange, CSE
  • Shalom Lappin, FLoV, CLASP
  • Staffan Larsson, FLoV, CLASP
  • Inari Listenmaa, CSE
  • Peter Ljunglöf, CSE
  • Vladislav Maraev, FLoV, CLASP
  • Bengt Nordström, CSE
  • Ildikó Pilán, Department of Swedish (Språkbanken)
  • Aarne Ranta, CSE
  • Dan Rosén, Department of Swedish (Språkbanken)
  • Jacobo Rouces, Department of Swedish (Språkbanken)
  • Sylvie Saget, FLoV, CLASP
  • Asad Sayeed, FLoV, CLASP
  • Haris Themistocleous FLOV, CLASP
  • Elena Volodina, Department of Swedish, UGOT

Date: 2017-11-30 09:00 - 2017-12-01 14:30

Location: Gullmarsstrand, Fiskebäckskil



The sixth annual Språkbanken Autumn Workshop will be held on the 17th of October. The workshop theme this year is content (semantics).

The language infrastructure of Språkbanken is freely available to all researchers. Our web-based tools can be used to access all kinds of texts, anything from historical and modern newspaper texts, novels and poetry, social media outlets such as blogs and discussion forms. Use our tools to efficiently wade through billions of sentences and produce mesmerising visualisations. At our annual autumn workshop you can try the tools out! We’ll demo the new features, show you how they’re used, and get a discussion going around your particular research questions.

We will start at 13.15 with presentations featuring our research and research infrastructure and finish with some practical exercises combined with demo and poster presentations. This will be followed by a social gathering with some bubbly and snacks.

A programme is available here: https://spraakbanken.gu.se/swe/Om%20oss/hoestworkshop. Note that the workshop language is Swedish. In order to participate in the practical exercises you must bring a laptop, but this is not a requirement for participation in the workshop.

For planning purposes we kindly ask you to register here: https://spraakbanken.gu.se/swe/Om%2520oss/hoestworkshop/registration no later than 9th October if you are planning to attend.


Date: 2016-10-17 13:15 - 18:00

Location: L100, Lennart Torstenssonsgatan 8



We will show the new version of the Swe-Clarin toolbox at an inauguration ceremony. During the course of this day, researchers from different disciplines in digital humanities will talk about their experiences with using language data as primary research data. There will be stations where our tools are presented and a possibility to try them out with guidance. The evening will end with a mingle and refreshments.

You can read more about the event and indicate your interest in participation here: https://sweclarin.se/eng/Inauguration_of_the_Swe-Clarin_toolbox_webform.

Date: 2016-10-07 10:00 - 20:00

Location: Ågrenska villan



Date: 2016-06-03 10:00 - 12:00

Location: room EE, Campus Johanneberg



Join us for this one day workshop where researchers in the Gothenburg area (and guests) will share with us how they use machine learning to solve complex research questions in medicine, transport, biology, language technology and urban planning.

Event Website: http://bit.ly/1QXey0u

Please register here: http://doodle.com/poll/pm88pp6yvt469h97

Date: 2016-04-14 09:00 - 16:00

Location: Chalmers Johanneberg Campus, Palmstedt (Student Union Building)



Olof Mogren (Department of Computer Science and Engineering) will defend his licentiate thesis Multi-Document Summarization and Semantic Relatedness.

Automatic summarization is the process of presenting the contents of written documents in a short, comprehensive fashion. Many approaches have been proposed for this problem, some of which extract content from the input documents (extractive me thods), and others that generate the language in the summary based on some representation of the document contents (abstractive methods).

This thesis is concerned with extractive summarization in the multi-document setting, and we define the problem as choosing the most informative sentences from the input documents, while minimizing the redundancy in the summary. This definition calls for a way ofmeasuring the similarity between sentences that captures as much as possble of the meaning. We present novel ways of measuring the similarity between sentences, based on neural word embeddings and sentiment analysis. We also show that combining multiple sentence similarity scores, by multiplicative aggregation, helps in the process of creating better extractive summaries.

We also discuss the use of information extraction for improving the quality of automatic summarization by providing ways of assessing the salience of information elements, as well as helping with the fluency of the output and providing the temporal dimension.

Furthermore, we present graph-based algorithms for clustering words by co-occurrence, and for summarizing short online user-reviews by computing bicliques. The biclique algorithm provides a fast, simple algorithm for summarization in many e-commerce settings.

Tapani Raiko from Aalto University.

Thesis fulltext: http://www.cse.chalmers.se/~mogren/lic/mogren2015licentiate.pdf

Date: 2015-11-20 10:00 - 12:00

Location: ML2, Hörsalsvägen 7B, Chalmers



The fifth annual Språkbanken autumn workshop (höstworkshop) is held on Monday the 5th of October, starting at 13.15. The theme this year is historical resources and tools.

Read more about the workshop here: http://spraakbanken.gu.se/eng/Om%20oss/hoestworkshop

Date: 2015-10-05 13:15 - 19:00

Location: T307, Olof Wijksgatan 6



Jessica Villing, Department of Philosophy Linguistics and Theory of Science is defending her thesis "Towards Dialogue Strategies for Cognitive Workload Management".

Although it has been shown that drivers are less distracted when using speech interfaces compared to traditional interfaces, using voice control instead of manual controls does not completely solve the problem with distracted drivers. The interaction with the dialogue system may itself add to the driver’s cognitive workload and may therefore be a safety issue. The main purpose of this thesis is to learn more about in-vehicle dialogue during various types of cognitive workload, to use this knowledge to enable safe and non-distracting dialogue system interaction in vehicles. We do this by analysing a corpus of human-human in-vehicle dialogue to learn more about the dialogue strategies used by drivers and passengers during various types of workload. We discuss the types of cognitive workload that we believe are most important to consider when studying the multitasking activity of driving and interacting with a dialogue system, and suggest a method for distinguishing different types of workload by using information about the driver’s workload and driving behaviour. We found that dialogue strategies such as interruptions – in the form of silent pauses and domain switches – are used in response to the driver’s cognitive workload, as well as resumption of unfinished discussions. These behaviours are analysed in order to find strategies for preventing, or shortening the duration time of, high cognitive workload. We also indicate how these strategies can be implemented in in-vehicle dialogue systems.

Opponent: Associate Professor Andrew Kun, University of New Hampshire

Link to the dissertation: https://gupea.ub.gu.se/handle/2077/40178

Date: 2015-10-15 13:15 - 16:00

Location: Lilla Hörsalen, Humanisten, Renströmsgatan 6



In this work, I present a linguistic investigation of the language of Swedish textbooks in the natural sciences, i.e., biology, physics and chemistry. The textbooks, which are used in secondary and upper secondary school, are examined with respect to traditional readability measures, e.g., LIX, OVIX and nominal ratio. I also extract typical linguistic features of the texts, typicality being determined using a proposed quantitative method, labelled the index principle. This empirical, corpus-based method relies on automatic linguistic annotations produced by language technology tools to calculate what I call index lists, rank-ordered lists of characteristic linguistic features of specific text corpora as compared to reference texts.

I produce index lists for typical vocabulary, noun phrase structures and syntactic structures, extracted from a 5.2 million word textbook corpus, compiled as a part of the work presented. As well as being frequent and well dispersed, the linguistic variables selected for the index lists are also characteristic of the text type in question, as is evident when they are compared to a reference corpus, comprising textbooks in the social sciences and mathematics, as well as narrative and academic (university-level) texts.

The results show that textbooks in natural science contain a lot of content-specific, technical vocabulary. This characteristic not only distinguishes natural scientific language from everyday language, but also from social scientific language, which on the lexical level has more in common with narrative texts. On the other hand, the textbook language as a whole is structurally distinguishable from narrative texts, as clearly seen, e.g., in its noun phrase complexity.

In the transition between secondary and upper secondary school, the scores of almost every readability measure go up, indicating an increase in linguistic demands on the readers. In the upper secondary textbooks the words are longer, the vocabulary more varied, the noun phrase longer and more elaborate, and the most typical syntactic structures more complex. Notably, the linguistic development between the form levels is more marked in the natural-science textbooks, compared to social sciences and mathematics. Nevertheless, the textbook language overall shows a relatively low complexity in comparison to academic language.

Mats Wirén, Stockholm University

Date: 2015-12-04 13:15 - 16:00

Location: Lilla Hörsalen, Humanisten, Renströmsgatan 6



Computational analysis of historical and typological data has made great progress in the last fifteen years. In this thesis, I work with vocabulary lists for addressing some classical problems in historical linguistics such as cognate identification, discriminating related languages from unrelated languages, assigning possible dates to splits in a language family, and providing an internal structure to a language family. I compare the internal structure inferred from vocabulary lists with the family trees given in Ethnologue. I explore the ranking of lexical items in the widely used Swadesh word list and compare my ranking to another quantitative reranking method and short word lists composed for discovering long-distance genetic relationships. I show that the choice of string similarity measures is important for internal classification and for discriminating related from unrelated languages. The dating system presented in this thesis can be used for assigning age estimates to any new language group and overcomes the assumption of a constant rate of lexical replacement assumed by glottochronology. I train and test a linear classifier based on gap-weighted subsequence features for the purpose of cognate identification. An important conclusion from these results is that n-gram approaches can be used for different historical linguistic purposes.

Gerhard Jäger, Professor of General Linguistics, University of Tübingen

Date: 2015-11-13 13:15 - 16:00

Location: Lilla Hörsalen, Humanisten, Renströmsgatan 6