Studies have shown that up to 25% of the population somtimes have difficulties reading normal informational text. 7–8% of the population has so severe reading difficulties that easy-to-read material is the only possibility for them to read texts. (http://www.lattlast.se/om-oss)
The group of people with reading difficulties is very diverse, they can be dyslectic, deaf, elderly, immigrants och school children. They can also have cognitive disabilities, such as autism, aphasia, dementia, or intellectual disabilities. Research have shown that especially this latter group can be helped by supportive AAC symbols, such as Blissymbolics (http://www.blissymbolics.org).
There is an easy-to-read variant of the English Wikipedia, called the "Simple English Wikipedia" (http://simple.wikipedia.org). The articles are not automatically simplified from the standard Wikipedia, but they are manually crowd-sourced like all other Wikipedias.
In this project you will enhance the Simple English Wikipedia with supportive AAC symbols. You will create a tool (either a server-based web service, or a stand-alone program, or an auto-generated web site), which is a mirror of Wikipedia, but where the text is enhanced by AAC symbols. Like this:
The tool should read the text of a given Wikipedia article and process it to find out which symbols and where they should be attached. The text must be sentence segmented, POS tagged and parsed, to find out the main verbs and other important words and grammatical relations. Then it can look up each word, or a synonym or a hypo-/hypernym, in a symbol dictionary. The resulting text should then be displayed for the user.
There are lots of extra things that can be added. One example is to use a Named Entity Extraction module to find names, and then try to match them with pictures or logos. Another example is to make use of the Wikipedia links to find a suitable picture that can be shown together with the link.
You will only use existing NLP components for tagging, parsing etc. But there will be a lot of programming to make the components work well together. NLTK can probably be used a lot, but there will certainly be things that you have to solve outside of NLTK.
If you work hard and make a good project, you stand a good chance to get a paper published in the International Workshop on Speech and Language Processing for Assistive Technologies (http://www.slpat.org/slpat2013).
Peter Ljunglöf, Department of Computer Science and Engineering (Data- och informationsteknik).