• research_dialogue_lab


Hybrid dialogue systems platform (2016)


To connect the Talkamatic Dialogue Manager (TDM) to a commercially available dialogue systems infrastructure, such as Microsoft's Cortana, Nuance Mix, Wit.ai, or IBM Watson, and evaluate the resulting hybrid platform in terms of usability for app designers and end users.


Voice interfaces give users the possibility to interact with a device without using their eyes or hands. In recent years, several commercial platforms for dialogue systems have been released by major players in the field. In most of these platforms, the focus has been on high quality speech recognition (ASR) and natural language understanding (NLU), while the dialogue management (DM) and natural language generation (NLG) components are less developed. Fortunately, DM and NLG are exactly the strengths of TDM, and so the integration of TDM with other commercial platforms is an attractive venue to explore.

Problem description

The overall goal is to integrate TDM with an available infrastructure in a way that (1) allow developers to build new applications as easily as possible, without the need for "doing the same thing twice" despite working on a hybrid platform, and (2) allows users of the commercial dialogue systems platform in question to get full access to the advanced dialogue capabilities of TDM, and if possible also the multilingual and multimodal features of TDM.

  • Read up on TDM functionality and APIs
  • Read up on the (non-TDM) commercial dialogue systems platform selected
  • Draw up a plan for integration, with regard to both development and deployment requirements
  • Implement the integration
  • Evaluate the integrated platform with respect to development and deployment

Recommended background knowledge

  • Python
  • XML


Staffan Larsson, FLoV, together with Talkamatic AB. Talkamatic is a university research spin-off company based in Göteborg.

Prosody and emotion: Towards the development of an emotional agent (2016)


To explore prosody as a communicative channel, that conveys both linguistic, social, and emotional meanings and to provide a classification model of the emotional properties of speech, using multimodal information from the speech signal, e.g., information about the duration, fundamental frequency, formants, and voice quality.


Emotional communicative agents rely on prosodic information for the identification of emotional states. Previous research using such emotional robots has demonstrated robust techniques for identifying affective intent in robot directed speech. For example, by analyzing the prosody of a person’s speech, robots, such as Kismet and Leonardo, can determine whether the robot was scolded, praised, or given an attentional bid.

Most importantly, the robot can discern these affective intents from neutral indifferent speech. Nevertheless, much more work needs to be done to explore the potentials of prosodic information in speech interaction under a computational framework. These models may potentially be included in robots and discourse agents, such as personal assistants.

Problem description

The aims of this work include the following:

  • to study the literature on prosody and emotion.
  • to identify the prosodic categories in speech corpora.
  • to train on corpora developed for this purpose and assess the performance of the classifier on existing prosodic corpora.

Recommended skills

  • Classification/Machine learning
  • Python, R

Supervisor: Charalambos (Haris) Themistocleous

Towards a state of the art platform for Natural Language Inference (2016)


To propose a methodology for constructing a wide coverage, state of the art NLI platform. To construct a small NLI platform buiding on this methodology that could be extended in the future.


Natural Language Inference (NLI), roughly put, is the task of determining whether an NL hypothesis can be inferred from an NL premise. Inferential ability according to Cooper et al. (1996) is the best way to test the semantic adequacy of NLP systems. In this context and given the importance of NLI to computational semantics, a number of NLI platforms have been proposed by the years, the most important ones being the FraCaS test suite, the Recognizing Textual Entailment (RTE) platforms and the Stanford NLI platform (SNLI). Despite their merits, all three of the platforms seem to concentrate on specific aspects of inference while NLI seems to be a much more complex phenomenon. The project will concentrate on tackling the needs of a wider coverage, both theoretically and implementationally, NLI platform.  

Project description

  • Learn about NLI and the three main platforms for it (FraCaS, RTE and NLI)
  • Describe the merits as well as drawbacks of each platform from both a theoretical and practical perspective. Discuss any aspects of NLI that are not covered in these platforms
  • Propose a methodology for constructing a state of the art NLI platform that will remedy the problems associated with earlier platforms. Justify the choices made.
  • Construct a small NLI platform based on the proposed methodology that is machine readable. Discuss any potential challenges that platforms constructed using this methodology will cause to NLI systems. 
  • (optional) Implement an NLI system and evaluate against a part of your constructed test suite. Provide documentation for it.
  • (optional) Evaluate current state of the art NLI systems against part or the whole constructed NLI platform. Discuss the results, ideas for improvement as well as the prospect of hybrid systems (combining both a machine learning/deep learning component as well as a symbolic (logical) component)

Recommended skills

  • Knowledge of semantics and pragmatics
  • XML
  • Programming skills, preferably Python in case of implemenation
  • Knowledge of current techniques used in Machine Learning, Deep Learning and Logical approaches in case of evaluation

Supervisor: Stergios Chatzikyriakidis

(Towards) a TTR parser: from strings of utterance events to types (2016)

Type Theory with Records (TTR) is a formal semantic framework that allows representing meaning closely related to action and perception. As such, we argued [1], it is ideally suited as a unified knowledge representation system for situated dialogue systems. In understanding language, two kinds of events are involved: events in the world and speech events of utterances. Recognising the former as types allows us to model the sense and reference of words, recognising the latter as types allows us to model syntactic structure of linguistic utterances [2].

The primary goal of the project is to explore parsing open text (which may be fragmented and incomplete, i.e. dialogue) into record-type representations which are represented as feature structures. The task might be accomplished in several different ways, (i) for example exploring shallow information extraction techniques can be used to identify entities and events in the text; (ii) adopting existing semantic parsers (e.g. the C&C tools for CCG or the MALT parser for dependency parses) to rewrite the output into the desired type representations; (iii) implementing new independent semantic parsing techniques that would return types directly. As types will represent discourse rather than isolated sentences, one could/would also explore different discourse referent/pronoun resolution methods and named entity identification.

The next step... "(Towards a) TTR parser: from types to perception" Once having type representations of linguistic events, how can they be linked to what we perceive? See the proposal "Situated Learning Agents" http://clt.gu.se/masterthesisproposal/situated-learning-agents"

[1] http://gup.ub.gu.se/publication/190853

[2] http://gup.ub.gu.se/publication/205229

Recommended skills:

Good Python programming skills both for processing text and of logic formalisms.


Simon Dobnik, FLoV and other members of Dialogue Technology Lab or Centre for Linguistic Theory and Studies in Probability (CLASP).

Ticnet dialogue agent for social media platforms


The goal of the project is to evaluate and develop further an existing TDM interface to the Ticnet ticket booking service. The app communicates with users through text interaction in social media services, and optionally also using spoken interaction in a smartphone app.

Problem description

Talkamatic have developed a rough prototype for a Ticnet application which allows written (in a terminal window) or spoken (on a smartphone) interaction. Ticnet want a more extensive prototype which communicates with users through text interaction in social media services. The prototype should be deployed by a test group and evaluated using a variety of methods including user surveys.

The role of Talkamatic will be (1) technical support concerning TDM application development and (2) to formulate requirements and give feedback on ideas and prototypes.

The role of Ticnet will be (1) technical support concerning their APIs and services, and (2) to formulate requirements and give feedback on ideas and prototypes.

Recommended skills

Python programming. Familiarity with other lanuages (C++, Java,PHP) is a plus. Familiarity with the concepts of APIs as well as guidelines, tools and processes for software development is also a plus.


  • Staffan Larsson(FLoV, GU)
  • External supervisor from Talkamatic AB.
  • Requirements, feedback, comments from Ticnet.

Ticnet is the leading marketplace in Sweden for events in sports, culture, music and entertainment. Ticnet is since 2004 a wholly owned subsidiary of American Ticketmaster. Ticnet conveys about 12 million tickets/year spread over 25,000 events. Ticnet.se has 1 100 000 unique visitors each month.

Infrastructure for safe in-vehicle speech interaction


The goal of the project is to develop an existing API for the Talkamatic dialogue system, as well as guidelines, tools and processes for app development. The development would build on ongoing work in a couple of EU FP7 projects underway at Talkamatic, with input from Volvo Trucks.

Problem description

How do we enable app developers of in-vehicle apps to improve the safety of their apps using voice recognition? Talkamatic AB have developed a dialogue system for in-vehicle use. Currently, an API is being developed as part of ongoing EU projects.

Volvo Trucks are interested in the development of APIs, Guidelines, Tools and Processes to enable developers to add safe speech interaction to their apps. The role of Volvo will be to formulate requirements and give feedback on ideas and prototypes.

Recommended skills

Python programming. Familiarity with other lanuages (C++, Java) is a plus. Familiarity with the concepts of APIs as well as guidelines, tools and processes for software development is also a plus.


  • Staffan Larsson(FLoV, GU)
  • External supervisor from Talkamatic AB.
  • Requirements, feedback, comments: contact at Volvo Groups Truck Technology.

Volvo Group Trucks Technology (VGTT) is part of the Volvo Group. The Volvo Group is one of the world’s leading manufacturers of trucks, buses, construction equipment and marine and industrial engines under the leading brands Volvo, Renault Trucks, Mack, UD Trucks, Eicher, SDLG, Terex Trucks, Prevost, Nova Bus, UD Bus, Sunwin Bus and Volvo Penta. Volvo Group Trucks Technology provides Volvo Group Trucks and Business Area's with state-of-the-art research, cutting-edge engineering, product planning and purchasing services, as well as aftermarket product support.

Mapping open-domain ASR output to dialogue systems grammars


Build a dialogue systems module which, given a grammar defining the strings that the system can understand, takes input from an open online ASR (e.g. Google) and maps it onto the (phonetically) nearest string generated by the grammar.

This thesis project will be carried out within Talkamatic's EU PF7 Alfred project.

Problem description

Open ASR is available over the internet, but the results are hard to use with dialogue systems with limited language understanding capabilities. Often, ASR output contains errors caused by the ASR not knowing the vocabulary of the domain which the system can deal with. The task of this project is to come up with innovative and practically useful ways of mapping ASR output to the nearest sentence (or sentences) produced by a grammar.

As a resource, the student will have a Wizard-of-Oz corpus collected in EU FP7 project Alfred, containing ASR output and transcribed speech (to be mapped to nearest in-grammar sentence).

Some ideas towards possible solutions (there may well be other, better ideas!):

  • See it as a machine translation problem?
  • Store memory of human corrections, cached as FSTs for quick application?
  • Text simplification algorithms using Integer Linear Programming

Recommended skills

Python, GF, machine translation/ILP/other method.


  • Staffan Larsson, Christos Koniaris (FLoV, GU)
  • External supervisor from Talkamatic AB.

Analytics tools for dialogue systems


To improve the performance of dialogue systems, analyses of interactions are an important source of knowledge. When dialogue systems are deployed, the interaction can be logged for later analysis. Talkamatic AB are about to begin collection of logs resulting from dialogue system interactions, which will be available to the student.


The task is to build a toolbox for analysing logs of dialogue system interactions and presenting results from such analyses. Examples of relevant analysis features include:

  • speech recognition error % (requires transcribing user utterances)
  • task completion time
  • number of turns
  • dialogue complexity (e.g. number of subdialogues, degree of subdialogue embedding)
  • number of grounding subdialogues
  • estimated task success (measured by observing behaviour after system response)

Such basic analysis dimensions can then be used to detect and diagnose problems with systems which need fixing.


Staffan Larsson, Dialogue Technology Lab, and Talkamatic AB.

Situated learning agents (2016)

Situated agents must be able to interact with the physical environment that are located in with their conversational partner. Such an agent receives information both from its conversational partner and the physical world which it must integrate appropriately. Furthermore, since both the world and the language are changeable from one context to another it must be able to adapt to such changes or to learn from new information. Embodied and situated language processing is trying to solve challenges in natural language processing such as word sense disambiguation and interpretation of words in discourse as well as it gives us new insights about human cognition, knowledge, meaning and its representation. Research in vision relies on information represented in natural language, for example in the form ontologies, as this captures how humans partition and reason about the world. On the other hand, gestures and sign language are languages that are expressed and interpreted as visual information.

The masters thesis could be undertaken independently or as an extension of an existing project from the Embodied and Situated Language Processing (ESLP) course. Experience with dialogue systems and good Python programming skills is a plus.

Several projects are available subject of the approval of the potential supervisors. The main thread of the research would be how a linguistically inquisitive robot can update its representation of the world by engaging in dialogue conversation with a human. Sensory observations of a robot may be incomplete due to errors that robot's sensors or actuators introduce or simply because the robot has not explored and mapped the entire world yet. Can a robot query a human about the missing knowledge linguistically with clarification questions? Robotic view of the world is quite different from that of a human. How we can find a mapping between the representations that a robot builds using its sensors and the representations that are a result of human take on the world? The latter is challenging but necessary if robots and humans were to have a meaningful conversation.

Here are some suggested tasks:

A Lego robot, a miniature environment with blocks in a room

  • Online linguistic annotation of objects and situations that a robot discovers "Please tell me. What is this? And this?"
  • The ability to reason about the discovered objects (i.e. creating more complex propositions from simple ones) using some background knowledge "Aha, this is a chair... so I would expect to find a table here as well."
  • Extracting the ontological information used in the previous task from external text resources (e.g. Wikipedia).

Microsoft Kinect or Microsoft robot studio, a table situation with objects

  • Learning of spatial relations between objects on the table in interaction with humans (using, for example, the Attentional Vector Sum Model of Regier and Carlson)
  • Integrating and testing the effects of adding non-spatial features (the influence of dialogue and the knowledge about the objects) in the learning model.

Generating route descriptions in a complex building

  • How to generate route descriptions that provide the right kind of information so that a person finds the objects or location referred to?
  • Using a map of a complicated building (DG4) and representation of salient features in the building build a computational model that would generate such descriptions.
  • Connect that system with a dialogue system and explore the interaction of referring expressions with the structure and the content of dialogue.

Grounded meaning representations

  • Work towards a novel model of grounded meaning representations and validate them in an experiment such as Roy (2002) and others
  • How can information from vector space models be integrated with perceptual information?
  • What are good and effective models of information fusion: interaction between different dimensions of meaning, for example, how to incorporate world knowledge with perceptional meaning to deal with spatial cognition cases described in the work by Coventry and our own work

Earlier project (which this project could build on)


Simon Dobnik and other members of the Dialogue Technology Lab; for extracting ontological information also members of the Text Technology Lab

Networks and Types

Networks and Types (Vetenskapsrådet/Swedish Research Council project VR 2013-4873) is a project led by Robin Cooper at the Department of Philosophy, Linguistics, and Theory of Science and the Centre for Language Technology at the University of Gothenburg. The project started in 2014 and will run for 3 years.

The purpose of this project is to relate types of events external to an agent (e-events) to types of events in a neural network (n-events) and to bring our work on Type Theory with Records (TTR) and Transparent Neural Networks (TNN) together in a precise way corresponding to the intuitive relationships that we have so far conjectured between them. In order to do this rigorously we will pursue three main aims.

Firstly, we will show how TTR can be used to model both e-events and n-events and an interpretation relationship between them.

Secondly, we will apply the techniques used to explore possibilities for mappings between TTR and TNN. This would provide us with a "neural interpretation" for the type theory and a "logical interpretation" for the neural nets. One reason for thinking that this is feasible is that TTR uses complex record structures for single concepts that would be represented as atoms in many standard logics. This corresponds to the fact that concepts appear to be represented in neural structure by patterns of activation rather than the activation of a single neuron. TNN is particularly suitable for this as the nets are constructed from modules which have intuitive significance.

The third aim relates to evaluating the usefulness of the mappings achieved in the first two: We will explore two related potential applications: dialogue semantics and meaning acquisition in dialogue; perceptual reasoning and meaning acquisition by robots.

Project homepage: https://sites.google.com/site/networksandtypes/home