  Number Sense Disambiguation for Swedish - "Assign each Number a Sense"

Number Sense Disambiguation for Swedish - "Assign each Number a Sense"


Word Sense Disambiguation is a well studied field in the Natural Language Processing Community which has resulted in a full range of successful methods and software. However, the identification and disambiguation of numerical information in natural language text is not so well studied and to the best of our knowledge there has not been yet research in Sweden on empirical evidence of the linguistic variation of numerical expressions, therefore this work is a good opportunity to investigate this topic since it is important in many tasks in natural language processing that require understanding of e.g. quantities (e.g. in information extraction or Q&A).

A numerical expression in a text is a sequence or combinationof digits with possible operators, identifiers or a mathematical symbols. Numerals in text can be used to express a variety of different senses, in a similar manner that words are used in different senses. For instance, "11" can denote:

  • the age of a person "11 years of age"
  • a reference of time "11 hours"
  • a reference to a published article "see [11]"
  • a quantity "11 women"
  • a part of a phone number "011-726 11 28"
  • a frequency "11 Hz"
  • a latitude "11 degrees"
  • a length unit "11 km2"
  • a dose "11 mg/ml"
  • ...


The purpose of this work is on numerical information processing and the development of new/or adaptation of existing algorithms for numerical information identification and disambiguation on Swedish text material. Depending on the background and interest of the student, the work can be given different focus and scope; e.g. own implementation of a numerical information processing or adapting available software to Swedish; compare the effect of different resources and module combinations for numerical processing, etc.


As a practical application the resulting software will be used as a supporting technology for number sense disambiguation of medical data perhaps using the LOINC ontology.


Dimitrios Kokkinakis,PhD, Department of Swedish, and possibly others.


Native Swedish or good Swedish language skills.

Good programming skills.

Relevant Links and References

NUMEX: SPECIFIC GUIDELINES - Message Understanding Conferences MUC-6 <http://www.cs.nyu.edu/cs/faculty/grishman/NEtask20.book_17.html#HEADING44>

LOINC: Logical Observation Identifiers Names and Codes (LOINC®) Users' Guide. Clem McDonald, Stan Huff, Kathy Mercer, Jo Anna Hernandez, Daniel J. Vreeman

Definition of Sekine’s Extended Named Entity, Version 6.1.0 (English). 2003. <http://qallme.fbk.eu/SekineENE_Definition_v6.pdf>

Stuart Moore, Anna Korhonen and Sabine Buchholz. 2009. Number Sense Disambiguation. In Proceedings of the 12th Conference of the Pacific Association for Computational Linguistics. Sapporo, Japan.


Page updated: 2014-11-12 15:13

