• Home
  • A best-first anagram hashing filter for approximate string matching with generalized edit distance

A best-first anagram hashing filter for approximate string matching with generalized edit distance

Sourcetitle: 
Proceedings of COLING 2012
Year of publication: 
2012
PublicationType: 
Conference paper - peer reviewed

This paper presents an efficient method for approximate string matching against a lexicon. Wedefine a filter that for each source word selects a small set of target lexical entries, from whichthe best match is then selected using generalized edit distance, where edit operations can beassigned an arbitrary weight. The filter combines a specialized hash function with best-firstsearch. Our work extends and improves upon a previously proposed hash-based filter, developedfor matching with uniform-weight edit distance. We evaluate an approximate matching systemimplemented with the new best-first filter, by conducting several experiments on a historicalcorpus and a set of weighted rules taken from the literature. We present running times anddiscuss how performance varies using different stopping criteria and target lexica. The resultsshow that the filter is suitable for large rule sets and million word corpora, and encouragefurther development.

http://gup.ub.gu.se/records/fulltext/172769/172769.pdf

Sourcepages: 
13-22
To the top

Page updated: 2013-02-04 13:27

Send as email
Print page
Show as pdf

X
Loading