• Home
  • Robust Parsing using Minimum Edit Distance

Robust Parsing using Minimum Edit Distance

Assume that you have a grammar for parsing utterances into some kind of semantic interpretation, such as voice commands or dialogue acts. Can you use this grammar for handling also non-grammatical utterances?

A possible example application is if you want to implement a dialogue system for an Android device. Android has built-in speech recognition, but the ASR engine is not grammar driven, so it returns a sentence which does not have to be grammatical.

The idea of this project is to use a Minimum Edit Distance metric, such as the Levenshtein Distance, and try to find the closest sentence that is accepted by the grammar.

Suggested workflow:

1. Start with a grammar G and a test set T of sentences with associated semantics.

2. Generate all possible sentences from the grammar G (up to a given maximum length), with associated semantics. This will give a corpus C, a semantic treebank.

3. For each sentence S in the test set T, find the closest sentence S' that is in C.

4. Evaluate each sentence S by comparing its intended semantics T(S), with the semantics C(S') of the grammatical sentence S'. This will give a Semantic Error Rate (SER).

5. Do this for different possible distance metrics.

The different distance metrics can be
- Levenshtein distance on word level
- Levenshtein distance on character level
- possibly the distance on phoneme or morpheme level

Supervisor: Peter Ljunglöf

To the top

Page updated: 2012-11-26 23:32

Send as email
Print page
Show as pdf