• Home
  • Temporal information in Swedish - identification, resolution, normalization and standardization

Temporal information in Swedish - identification, resolution, normalization and standardization

Background

Identification and resolution of temporal (and numerical) information in natural language text is important in many tasks in artificial intelligence (temporal reasoning) and natural language processing (information extraction and retrieval, Q&A).

A temporal expression in a text is a sequence of tokens  (words, numbers and characters) that denote time, that is express a point in time, a duration or a frequency.

Purpose

The purpose of this work is on temporal information processing and the development of algorithms for temporal information identification, resolution, normalization and standardization using TIMEX3/TimeML (or equivalent) on Swedish text material.

For instance the examples below illustrate hoe the TIMEX3-format is used:

  • "June 7, 2003": <TIMEX3 tid="t1" type="DATE" value="2003-06-07">
  • "the dawn of 2000": <TIMEX3 tid="t2" type="DATE" value="2000" mod="START">the dawn of 2000</TIMEX3>

A more complex example can look like this:

  • "two weeks from June 7, 2003": <TIMEX3 tid="t6" type="DURATION" value="P2W" beginPoint="t61" endPoint="t62">two weeks</TIMEX3> from <TIMEX3 tid="t61" type="DATE" value="2003-06-07">June 7, 2003</TIMEX3><TIMEX3 tid="t62" type="DATE" value="2003-06-21" temporalFunction="true" anchorTimeID="t6"/>

Depending on background and interest of the student, the work can be  given different focus and scope; e.g. own implementation of a temporal information processing or adapting available software to Swedish; compare the effect of different resources and module combinations for temporal processing, etc.

Application

As a practical application the resulting software will be used as a supporting technology for de-identifying temporal information of patient data. Normalized and standardized temporal occurrences in authentic text (patient history) will be used to "mask" the temporal information on the text. For instance, a text occurrence of the date "2011-12-15" will be converted to e.g. "start date + 4 months 7 days" (under the assumption that 'start date' is a relevant point in time from where a patient history started to be recorded).  Note! The development of this application will be made on non-authentic texts but the intention is to use the developed software on real data.

Supervisors

Dimitrios Kokkinakis, PhD, Department of Swedish

Staffan Svensson, PhD, MD, specialist in clinical pharmacology

Prerequisites:

Native Swedish or good Swedish language skills.

Good programming skills.

Relevant Links

TempEval Temporal Relation Identification <http://timeml.org/tempeval/>

TempEval2: Evaluating Events, Time Expressions, and Temporal Relations <http://www.timeml.org/tempeval2/>

TempEval3: Temporal Annotation <http://www.cs.york.ac.uk/semeval-2013/task1/>

TimeML: Markup Language for Temporal and Event Expressions <http://www.timeml.org/site/index.html>

TIMEX at MUC-6 <http://www.timexportal.info/timexmuc6>

Guidelines for Temporal Expression Annotation for English for TempEval 2010. <http://www.timeml.org/tempeval2/tempeval2-trial/guidelines/timex3guidelines-072009.pdf>

To the top

Page updated: 2014-11-12 15:13

Send as email
Print page
Show as pdf

X
Loading