• Home
  • Medication extraction from "Dirty data"

Medication extraction from "Dirty data"


Dealing with spelling variation in Swedish medical texts with respect to names of drugs and related information, in order to improve indexing and aggregation. Extraction of information related to the medication is an important task within the biomedical area. Updating of drug vocabularies cannot follow the evolution of the drug development. Several methods can be used by e.g. combining internal and contextual clues.

The application will primarily based on "dirty" data (bloggs, twitter, logs) (and if necessary from scientific "clean" data for comparison).

Recommended skills

  • Don't have to be native speaker of Swedish, but some superficial knowledge of Swedish would be good to have.
  • Good programming skills


Dimitrios Kokkinakis and possibly others from Språkbanken


Chen E, Hripcsak G, Xu H, Markatou M, and Friedman C. Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc 2008;15(1):87–98.

Chieng D, Day T, Gordon G, and Hicks J. Use of natural language programming to extract medication from unstructured electronic medical records. In: AMIA, 2007:908–8.

Segura-Bedmar I, Martinez P, and Segura-Bedmar M. Drug name recognition and classification in biomedical texts. Drug Safety 2008;13(17-18):816–23.

Sibanda T and Uzuner O. Role of local context in deidentification of ungrammatical, fragmented test. Proceedings of the North American Chapter of Association for Computational Linguistics/Human Language Technology (NAACL-HLT 2006), New York, USA. 2006.

To the top

Page updated: 2014-11-12 15:11

Send as email
Print page
Show as pdf