• Home
  • Automatic text classification by its readability

Automatic text classification by its readability

Goal

Developing algorithm for automatic assigning of texts to relevant language learner levels (to be used in Lärka and eventually Korp)

Background

Text readability measures assign readability scores to texts according to certain features, like sentence and word length. These are not enough to fully estimate text appropriateness for language learners or eventually other user groups with limited abilities in a language. The recent PhD research at Språkbanken (Katarina Heimann Mühlenbock) has concentrated on studying different aspects of text with regards to text readability. However, no available implementation has been released.

Problem description

The aim of this work is thus:

  1. to study the above-mentioned PhD Thesis as well as a number of other research papers, and find a feasible implementation approach
  2. implement a program in python for automatic categorization of texts into CEFR levels
  3. implement a user interface for working with different text parameters (e.g. for switching them on/off)
  4. evaluate the results by comparing the classification results on a number of texts of known CEFR levels

Recommended skills:

  • Python
  • jQuery

Supervisor(s)

  • Elena Volodina/Katarina Heimann Mühlenbock
  • eventually others from Språkbanken
To the top

Page updated: 2014-11-12 15:10

Send as email
Print page
Show as pdf

X
Loading