Developing an algorithm (web services) for automatic classification of Swedish learner essays by their reached proficiency level.
Suggested approach would be to use machine learning for essay classification. The challenge is to identify features that would be both aware of the Second Language Acquisition (SLA) research and informative of the task at hand.
The classification will be made in terms of the levels of proficiency according to the Common European Framework of Reference (CEFR), which covers 6 learner levels: A1 (beginner), A2, B1, B2, C1, C2 (near-native). At the moment we have electronic corpora of essays at levels B1, B2, and C1. Essays at A2 are hand-written and haven't yet been digitized and annotated (which presumingly can be done in time for the project, if someone picks this topic).
The steps for this project would include: