• Home
  • Extra seminar: Barbara Plank – Fortuitous data for improving language technology

Extra seminar: Barbara Plank – Fortuitous data for improving language technology

SEMINAR

Current successful approaches to NLP are for the most part based on supervised learning. In turn, supervised learning critically relies on the availability of annotated data. Such data is usually not plentiful, as it requires time and expertise to develop data. This is the problem of data sparsity. At the same time, available samples usually come from specific domains and languages, e.g., English newswire data, and thus suffer from data bias.

In this talk I will present techniques to overcome data sparsity and bias by proposing to leverage fortuitous data, i.e., data from various sources which is out there, often created as a by-product, but often neglected. I will argue that fortuitous data, combined with weakly supervised learning techniques, helps to improve language technology for task such as POS tagging and dependency parsing. In particular, examples include building more robust taggers for Twitter by exploiting information from hyperlinks, or using doubly-annotated data to improve chunking and parsing. Instead of glossing over such data, it is more fruitful to embrace it during learning. Finally, I will present recent (on-going) work on exploiting cognitive processing data to improve language technology.

Date: 2015-10-23 13:15 - 15:00

Location: K332, Lennart Torstenssonsgatan 6

Permalink

add to Outlook/iCal

To the top

Page updated: 2015-10-14 10:23

Send as email
Print page
Show as pdf

X
Loading