• old


Mark an item as old if it should not be shown with the newer ones.

Ticnet dialogue agent for social media platforms


The goal of the project is to evaluate and develop further an existing TDM interface to the Ticnet ticket booking service. The app communicates with users through text interaction in social media services, and optionally also using spoken interaction in a smartphone app.

Problem description

Talkamatic have developed a rough prototype for a Ticnet application which allows written (in a terminal window) or spoken (on a smartphone) interaction. Ticnet want a more extensive prototype which communicates with users through text interaction in social media services. The prototype should be deployed by a test group and evaluated using a variety of methods including user surveys.

The role of Talkamatic will be (1) technical support concerning TDM application development and (2) to formulate requirements and give feedback on ideas and prototypes.

The role of Ticnet will be (1) technical support concerning their APIs and services, and (2) to formulate requirements and give feedback on ideas and prototypes.

Recommended skills

Python programming. Familiarity with other lanuages (C++, Java,PHP) is a plus. Familiarity with the concepts of APIs as well as guidelines, tools and processes for software development is also a plus.


  • Staffan Larsson(FLoV, GU)
  • External supervisor from Talkamatic AB.
  • Requirements, feedback, comments from Ticnet.

Ticnet is the leading marketplace in Sweden for events in sports, culture, music and entertainment. Ticnet is since 2004 a wholly owned subsidiary of American Ticketmaster. Ticnet conveys about 12 million tickets/year spread over 25,000 events. Ticnet.se has 1 100 000 unique visitors each month.

Infrastructure for safe in-vehicle speech interaction


The goal of the project is to develop an existing API for the Talkamatic dialogue system, as well as guidelines, tools and processes for app development. The development would build on ongoing work in a couple of EU FP7 projects underway at Talkamatic, with input from Volvo Trucks.

Problem description

How do we enable app developers of in-vehicle apps to improve the safety of their apps using voice recognition? Talkamatic AB have developed a dialogue system for in-vehicle use. Currently, an API is being developed as part of ongoing EU projects.

Volvo Trucks are interested in the development of APIs, Guidelines, Tools and Processes to enable developers to add safe speech interaction to their apps. The role of Volvo will be to formulate requirements and give feedback on ideas and prototypes.

Recommended skills

Python programming. Familiarity with other lanuages (C++, Java) is a plus. Familiarity with the concepts of APIs as well as guidelines, tools and processes for software development is also a plus.


  • Staffan Larsson(FLoV, GU)
  • External supervisor from Talkamatic AB.
  • Requirements, feedback, comments: contact at Volvo Groups Truck Technology.

Volvo Group Trucks Technology (VGTT) is part of the Volvo Group. The Volvo Group is one of the world’s leading manufacturers of trucks, buses, construction equipment and marine and industrial engines under the leading brands Volvo, Renault Trucks, Mack, UD Trucks, Eicher, SDLG, Terex Trucks, Prevost, Nova Bus, UD Bus, Sunwin Bus and Volvo Penta. Volvo Group Trucks Technology provides Volvo Group Trucks and Business Area's with state-of-the-art research, cutting-edge engineering, product planning and purchasing services, as well as aftermarket product support.

Mapping open-domain ASR output to dialogue systems grammars


Build a dialogue systems module which, given a grammar defining the strings that the system can understand, takes input from an open online ASR (e.g. Google) and maps it onto the (phonetically) nearest string generated by the grammar.

This thesis project will be carried out within Talkamatic's EU PF7 Alfred project.

Problem description

Open ASR is available over the internet, but the results are hard to use with dialogue systems with limited language understanding capabilities. Often, ASR output contains errors caused by the ASR not knowing the vocabulary of the domain which the system can deal with. The task of this project is to come up with innovative and practically useful ways of mapping ASR output to the nearest sentence (or sentences) produced by a grammar.

As a resource, the student will have a Wizard-of-Oz corpus collected in EU FP7 project Alfred, containing ASR output and transcribed speech (to be mapped to nearest in-grammar sentence).

Some ideas towards possible solutions (there may well be other, better ideas!):

  • See it as a machine translation problem?
  • Store memory of human corrections, cached as FSTs for quick application?
  • Text simplification algorithms using Integer Linear Programming

Recommended skills

Python, GF, machine translation/ILP/other method.


  • Staffan Larsson, Christos Koniaris (FLoV, GU)
  • External supervisor from Talkamatic AB.

Sub-corpus topic modeling and Swedish litterature

The goal of the Master thesis will be to: i) use/process a large Swedish text collection ii) experiment and apply topic modeling and consequently sub-corpus topic modeling (according the description by Tangherlini & Leonard, 2013) iii) adapt or create a visual, web based environment to explore the results (this will be done in various ways, preferably as a) network graphs (Smith et al., 2014); se for instance figure 1 and integrated them in b) a web based exploratory environment, such as a dashboard; se figure 2)

For more see the link below:


Analytics tools for dialogue systems


To improve the performance of dialogue systems, analyses of interactions are an important source of knowledge. When dialogue systems are deployed, the interaction can be logged for later analysis. Talkamatic AB are about to begin collection of logs resulting from dialogue system interactions, which will be available to the student.


The task is to build a toolbox for analysing logs of dialogue system interactions and presenting results from such analyses. Examples of relevant analysis features include:

  • speech recognition error % (requires transcribing user utterances)
  • task completion time
  • number of turns
  • dialogue complexity (e.g. number of subdialogues, degree of subdialogue embedding)
  • number of grounding subdialogues
  • estimated task success (measured by observing behaviour after system response)

Such basic analysis dimensions can then be used to detect and diagnose problems with systems which need fixing.


Staffan Larsson, Dialogue Technology Lab, and Talkamatic AB.

A poor man's approach to integrating POS-tagging and parsing.

A poor man's approach to integrating POS-tagging and parsing.

In the by now traditional NLP processing setup, part-of-speech tagging and syntactic parsing are separate, ordered tasks. A sentence is first POS-tagged, after which the results are used as the input for parsing. This model is convenient because it allows one to use a more efficient technique for the "simpler" task of POS-tagging and it helps to keep search space in the expensive parsing task down.

On the downside, however, we note that a POS-tagger is missing out on possibly beneficial syntactic information -- POS-tagging precedes parsing and therefore syntactic information cannot be used to choose between alternative tag sequences. In turn, we can expect parsing to suffer from a resulting decreased accuracy in POS-tagging.

Indeed, in PCFG-based parsing, parsing and POS-tagging has long been one and the same processing step. More recently, in the data-driven dependency parsing literature, algorithms for combined parsing and POS-tagging have been proposed, and they have been shown to lead to improved results.

In this project, you will investigate a simpler, more general approach to integrating POS-tagging and parsing, by letting the POS-tagger and the parser entertain multiple hypothesis about the analysis of a sentence, from which the most best analysis can then be chosen. This way one can achieve a free flow of information between the two processes -- hopefully improving accuracy -- without having to radically change the NLP setup (POS-tagging still precedes parsing). Existing tools can be used with very little alteration, which makes it a poor man's solution.

As part of the project, you will investigate, design and implement different ways of realizing the setup sketched above, and present experiments showing the impact of your choices on analysis accuracy and efficiency.

This MA project combines theoretical aspects (literature study and design of a realization of the system outlined above), implementation, and empirical study (evaluation of the system).

Programming skills and an NLP background are a prerequisite. Knowledge of statistical methods is a big plus, as is affinity with the linguistic side of processing, as this will allow you to do more insightful error analysis. Since the development material will be Swedish text, some passive knowledge of the Swedish language is assumed.

The project would be supervised by Gerlof Bouma and Richard Johansson, Yvonne Adesam or possibly others at Språkbanken.

Building a sentiment lexicon for Swedish

The goal of this project is the semi-automatic construction of a sentiment lexicon for Swedish. For more information see link below.


Adding valency information to a dependency parser

The goal of this project is to improve a Swedish dependency parser by integrating a valency lexicon. For more information see link below.


Part-of-speech tagging/syntactic parsing of emergent texts

The goal of this project is to implement a part-of-speech tagger and investigate the possibilities of developing a syntactic parser that could handle emergent text, i.e. texts – or representations of texts – that are being produced (and thus frequently changed) in order to identify the syntactic location of for example pauses. For more information see link below.


Collocations for learners of Swedish

Collocations for learners of Swedish


Generate a list of collocations, phrasal verbs, set phrases and idioms important for learners of Swedish, linked to proficiency levels, for use in Lärka.


The currently developed application Lärka, www.spraakbanken.gu.se/larka, is intended for computer-assisted language learning of L2 Swedish. Lärka generates a number of exercises based on corpora available through Korp, one of them focusing on vocabulary. It has been mentioned on several occasions that we should include multi-word expressions into our exercise generator. This also complies with the CEFR “can-do” statements at different levels of proficiency (http://www.coe.int/t/dg4/linguistic/Source/Framework_en.pdf). It is, however, a non-trivial task to identify the items that should be included into the curriculum, and even more uncertain how the selected items can be assigned to different proficiency levels.

Problem description

The aims of this work are the following:

  • to study literature on collocations etc. in general and in the L2 context especially, paying special attention to the CEFR guidelines; to make an overview of the practices for training collocations etc. used in other applications and in (online) dictionaries/lexicons
  • to generate a list of collocations, (primarily) by automatic analysis of COCTAILL - a corpus of coursebook texts used for teaching Swedish. Study of different materials available outside COCTAILL, e.g. books written by Anna Hallström, multi-word expressions in Saldo and Lexin, may also prove to be beneficial, however, the challenge would be to define at which level these items should be introduced. To get some inspiration, have a look at English Vocabulary Profile: http://vocabulary.englishprofile.org/staticfiles/about.html (user: englishprofile, password: vocabulary)
  • (potentially) to implement one or more of the suggested exercise formats as web services + user interface in Lärka
  • evaluate/test on users (language learners, teachers, linguists, etc)

Recommended skills:

  • Python
  • interest in Lexical Semantics and Second Language Acquisition


  • Elena Volodina
  • potentially others from Språkbanken/FLOV