• Home
  • Extra seminar: Preparing for RANLP

Extra seminar: Preparing for RANLP

SEMINAR

Three talks in preparation for the RANLP conference in Hissar, Bulgaria: http://lml.bas.bg/ranlp2015/.

(1) Mehdi Ghanimifard (FLoV): "Enriching word-sense embeddings with translational context".

Vector-space models derived from corpora are an effective way to learn a representation of word meaning directly from data, and these models have many uses in practical applications. A number of unsupervised approaches have been proposed to learn representations of word senses directly from corpora, but since these methods use no information but the words themselves, they sometimes miss distinctions that could be possible to make if more information were available. In this paper, we present a general framework called context enrichment that incorporates external information during the training of multi-sense vector-space models. Our approach is agnostic as to which external signal is used to enrich the context; here, we use translations as the source of enrichment. We evaluated the models trained using the translation-enriched context on several similarity benchmarks and a word analogy test set. In all our evaluations, the enriched model outperformed the purely word-based baseline soundly.

(2) Luis Nieto Piña (Språkbanken): "A simple and efficient method to generate word sense representations".

Distributed representations of words have boosted the performance of many Natural Language Processing tasks. However, usually only one representation per word is obtained, not acknowledging the fact that some words have multiple meanings. This has a negative effect on the individual word representations and the language model as a whole. In this paper we present a simple model that enables recent techniques for building word vectors to represent distinct senses of polysemic words. In our assessment of this model we show that it is able to effectively discriminate between words' senses and to do so in a computationally efficient manner.

(3) Olof Mogren (CSE, Chalmers): "Extractive summarization by aggregating multiple similarities".

News reports, social media streams, blogs, digitized archives and books are part of a plethora of reading sources that people face every day. This raises the question of how to best generate automatic summaries. Many existing methods for extracting summaries rely on comparing the similarity of two sentences in some way. We present new ways of measuring this similarity, based on sentiment analysis and continuous vector space representations, and show that combining these together with similarity measures from existing methods, helps to create better summaries. The finding is demonstrated with MULTSUM, a novel summarization method that uses ideas from kernel methods to combine sentence similarity measures. Submodular optimization is then used to produce summaries that take several different similarity measures into account. Our method improves over the state-of-the-art on standard benchmark datasets; it is also fast and scale to large document collections, and the results are statistically significant.

Date: 2015-08-31 10:15 - 12:00

Location: L307, Lennart Torstenssonsgatan 8

Permalink

add to Outlook/iCal

To the top

Page updated: 2015-08-24 15:26

Send as email
Print page
Show as pdf

X
Loading