To connect the Talkamatic Dialogue Manager (TDM) to a commercially available dialogue systems infrastructure, such as Microsoft's Cortana, Nuance Mix, Wit.ai, or IBM Watson, and evaluate the resulting hybrid platform in terms of usability for app designers and end users.
Voice interfaces give users the possibility to interact with a device without using their eyes or hands. In recent years, several commercial platforms for dialogue systems have been released by major players in the field. In most of these platforms, the focus has been on high quality speech recognition (ASR) and natural language understanding (NLU), while the dialogue management (DM) and natural language generation (NLG) components are less developed. Fortunately, DM and NLG are exactly the strengths of TDM, and so the integration of TDM with other commercial platforms is an attractive venue to explore.
The overall goal is to integrate TDM with an available infrastructure in a way that (1) allow developers to build new applications as easily as possible, without the need for "doing the same thing twice" despite working on a hybrid platform, and (2) allows users of the commercial dialogue systems platform in question to get full access to the advanced dialogue capabilities of TDM, and if possible also the multilingual and multimodal features of TDM.
Staffan Larsson, FLoV, together with Talkamatic AB. Talkamatic is a university research spin-off company based in Göteborg.
To explore prosody as a communicative channel, that conveys both linguistic, social, and emotional meanings and to provide a classification model of the emotional properties of speech, using multimodal information from the speech signal, e.g., information about the duration, fundamental frequency, formants, and voice quality.
Emotional communicative agents rely on prosodic information for the identification of emotional states. Previous research using such emotional robots has demonstrated robust techniques for identifying affective intent in robot directed speech. For example, by analyzing the prosody of a person’s speech, robots, such as Kismet and Leonardo, can determine whether the robot was scolded, praised, or given an attentional bid.
Most importantly, the robot can discern these affective intents from neutral indifferent speech. Nevertheless, much more work needs to be done to explore the potentials of prosodic information in speech interaction under a computational framework. These models may potentially be included in robots and discourse agents, such as personal assistants.
The aims of this work include the following:
Supervisor: Charalambos (Haris) Themistocleous
To propose a methodology for constructing a wide coverage, state of the art NLI platform. To construct a small NLI platform buiding on this methodology that could be extended in the future.
Natural Language Inference (NLI), roughly put, is the task of determining whether an NL hypothesis can be inferred from an NL premise. Inferential ability according to Cooper et al. (1996) is the best way to test the semantic adequacy of NLP systems. In this context and given the importance of NLI to computational semantics, a number of NLI platforms have been proposed by the years, the most important ones being the FraCaS test suite, the Recognizing Textual Entailment (RTE) platforms and the Stanford NLI platform (SNLI). Despite their merits, all three of the platforms seem to concentrate on specific aspects of inference while NLI seems to be a much more complex phenomenon. The project will concentrate on tackling the needs of a wider coverage, both theoretically and implementationally, NLI platform.
Supervisor: Stergios Chatzikyriakidis
Type Theory with Records (TTR) is a formal semantic framework that allows representing meaning closely related to action and perception. As such, we argued , it is ideally suited as a unified knowledge representation system for situated dialogue systems. In understanding language, two kinds of events are involved: events in the world and speech events of utterances. Recognising the former as types allows us to model the sense and reference of words, recognising the latter as types allows us to model syntactic structure of linguistic utterances .
The primary goal of the project is to explore parsing open text (which may be fragmented and incomplete, i.e. dialogue) into record-type representations which are represented as feature structures. The task might be accomplished in several different ways, (i) for example exploring shallow information extraction techniques can be used to identify entities and events in the text; (ii) adopting existing semantic parsers (e.g. the C&C tools for CCG or the MALT parser for dependency parses) to rewrite the output into the desired type representations; (iii) implementing new independent semantic parsing techniques that would return types directly. As types will represent discourse rather than isolated sentences, one could/would also explore different discourse referent/pronoun resolution methods and named entity identification.
The next step... "(Towards a) TTR parser: from types to perception" Once having type representations of linguistic events, how can they be linked to what we perceive? See the proposal "Situated Learning Agents" http://clt.gu.se/masterthesisproposal/situated-learning-agents"
Good Python programming skills both for processing text and of logic formalisms.
Simon Dobnik, FLoV and other members of Dialogue Technology Lab or Centre for Linguistic Theory and Studies in Probability (CLASP).
The goal of the project is to evaluate and develop further an existing TDM interface to the Ticnet ticket booking service. The app communicates with users through text interaction in social media services, and optionally also using spoken interaction in a smartphone app.
Talkamatic have developed a rough prototype for a Ticnet application which allows written (in a terminal window) or spoken (on a smartphone) interaction. Ticnet want a more extensive prototype which communicates with users through text interaction in social media services. The prototype should be deployed by a test group and evaluated using a variety of methods including user surveys.
The role of Talkamatic will be (1) technical support concerning TDM application development and (2) to formulate requirements and give feedback on ideas and prototypes.
The role of Ticnet will be (1) technical support concerning their APIs and services, and (2) to formulate requirements and give feedback on ideas and prototypes.
Python programming. Familiarity with other lanuages (C++, Java,PHP) is a plus. Familiarity with the concepts of APIs as well as guidelines, tools and processes for software development is also a plus.
Ticnet is the leading marketplace in Sweden for events in sports, culture, music and entertainment. Ticnet is since 2004 a wholly owned subsidiary of American Ticketmaster. Ticnet conveys about 12 million tickets/year spread over 25,000 events. Ticnet.se has 1 100 000 unique visitors each month.
The goal of the project is to develop an existing API for the Talkamatic dialogue system, as well as guidelines, tools and processes for app development. The development would build on ongoing work in a couple of EU FP7 projects underway at Talkamatic, with input from Volvo Trucks.
How do we enable app developers of in-vehicle apps to improve the safety of their apps using voice recognition? Talkamatic AB have developed a dialogue system for in-vehicle use. Currently, an API is being developed as part of ongoing EU projects.
Volvo Trucks are interested in the development of APIs, Guidelines, Tools and Processes to enable developers to add safe speech interaction to their apps. The role of Volvo will be to formulate requirements and give feedback on ideas and prototypes.
Python programming. Familiarity with other lanuages (C++, Java) is a plus. Familiarity with the concepts of APIs as well as guidelines, tools and processes for software development is also a plus.
Volvo Group Trucks Technology (VGTT) is part of the Volvo Group. The Volvo Group is one of the world’s leading manufacturers of trucks, buses, construction equipment and marine and industrial engines under the leading brands Volvo, Renault Trucks, Mack, UD Trucks, Eicher, SDLG, Terex Trucks, Prevost, Nova Bus, UD Bus, Sunwin Bus and Volvo Penta. Volvo Group Trucks Technology provides Volvo Group Trucks and Business Area's with state-of-the-art research, cutting-edge engineering, product planning and purchasing services, as well as aftermarket product support.
Build a dialogue systems module which, given a grammar defining the strings that the system can understand, takes input from an open online ASR (e.g. Google) and maps it onto the (phonetically) nearest string generated by the grammar.
This thesis project will be carried out within Talkamatic's EU PF7 Alfred project.
Open ASR is available over the internet, but the results are hard to use with dialogue systems with limited language understanding capabilities. Often, ASR output contains errors caused by the ASR not knowing the vocabulary of the domain which the system can deal with. The task of this project is to come up with innovative and practically useful ways of mapping ASR output to the nearest sentence (or sentences) produced by a grammar.
As a resource, the student will have a Wizard-of-Oz corpus collected in EU FP7 project Alfred, containing ASR output and transcribed speech (to be mapped to nearest in-grammar sentence).
Some ideas towards possible solutions (there may well be other, better ideas!):
Python, GF, machine translation/ILP/other method.
To improve the performance of dialogue systems, analyses of interactions are an important source of knowledge. When dialogue systems are deployed, the interaction can be logged for later analysis. Talkamatic AB are about to begin collection of logs resulting from dialogue system interactions, which will be available to the student.
The task is to build a toolbox for analysing logs of dialogue system interactions and presenting results from such analyses. Examples of relevant analysis features include:
Such basic analysis dimensions can then be used to detect and diagnose problems with systems which need fixing.
Staffan Larsson, Dialogue Technology Lab, and Talkamatic AB.
Situated agents must be able to interact with the physical environment that are located in with their conversational partner. Such an agent receives information both from its conversational partner and the physical world which it must integrate appropriately. Furthermore, since both the world and the language are changeable from one context to another it must be able to adapt to such changes or to learn from new information. Embodied and situated language processing is trying to solve challenges in natural language processing such as word sense disambiguation and interpretation of words in discourse as well as it gives us new insights about human cognition, knowledge, meaning and its representation. Research in vision relies on information represented in natural language, for example in the form ontologies, as this captures how humans partition and reason about the world. On the other hand, gestures and sign language are languages that are expressed and interpreted as visual information.
The masters thesis could be undertaken independently or as an extension of an existing project from the Embodied and Situated Language Processing (ESLP) course. Experience with dialogue systems and good Python programming skills is a plus.
Several projects are available subject of the approval of the potential supervisors. The main thread of the research would be how a linguistically inquisitive robot can update its representation of the world by engaging in dialogue conversation with a human. Sensory observations of a robot may be incomplete due to errors that robot's sensors or actuators introduce or simply because the robot has not explored and mapped the entire world yet. Can a robot query a human about the missing knowledge linguistically with clarification questions? Robotic view of the world is quite different from that of a human. How we can find a mapping between the representations that a robot builds using its sensors and the representations that are a result of human take on the world? The latter is challenging but necessary if robots and humans were to have a meaningful conversation.
Here are some suggested tasks:
A Lego robot, a miniature environment with blocks in a room
Microsoft Kinect or Microsoft robot studio, a table situation with objects
Generating route descriptions in a complex building
Grounded meaning representations
Earlier project (which this project could build on)
Simon Dobnik and other members of the Dialogue Technology Lab; for extracting ontological information also members of the Text Technology Lab
Networks and Types (Vetenskapsrådet/Swedish Research Council project VR 2013-4873) is a project led by Robin Cooper at the Department of Philosophy, Linguistics, and Theory of Science and the Centre for Language Technology at the University of Gothenburg. The project started in 2014 and will run for 3 years.
The purpose of this project is to relate types of events external to an agent (e-events) to types of events in a neural network (n-events) and to bring our work on Type Theory with Records (TTR) and Transparent Neural Networks (TNN) together in a precise way corresponding to the intuitive relationships that we have so far conjectured between them. In order to do this rigorously we will pursue three main aims.
Firstly, we will show how TTR can be used to model both e-events and n-events and an interpretation relationship between them.
Secondly, we will apply the techniques used to explore possibilities for mappings between TTR and TNN. This would provide us with a "neural interpretation" for the type theory and a "logical interpretation" for the neural nets. One reason for thinking that this is feasible is that TTR uses complex record structures for single concepts that would be represented as atoms in many standard logics. This corresponds to the fact that concepts appear to be represented in neural structure by patterns of activation rather than the activation of a single neuron. TNN is particularly suitable for this as the nets are constructed from modules which have intuitive significance.
The third aim relates to evaluating the usefulness of the mappings achieved in the first two: We will explore two related potential applications: dialogue semantics and meaning acquisition in dialogue; perceptual reasoning and meaning acquisition by robots.
Project homepage: https://sites.google.com/site/networksandtypes/home