Improving the Compositionality of Word Embeddings (2017)(About) (MS thesis, a paper accepted at [TheWebConf 2018](https://www2018.thewebconf.org/program/web-content-analysis/))
> This thesis explores a method to find better encodings of meaning a computer can work with. We specifically want to combine encodings of word meanings in such a way that a good encoding of their joint meaning is created. The act of combining multiple representations of meaning into a new representation of meaning is called semantic composition.
Analysis of four word embeddings (Word2Vec, GloVe, fastText and Paragram) in terms of their semantic compositionality. A method to tune these embeddings towards better compositionality, using a simple neural network architecture with definitions and lemmas from WordNet.
> Since dictionary definitions are semantically similar to their associated lemmas, they are the ideal candidate for our tuning method, as well as evaluating for compositionality. Our architecture allows for the embeddings to be composed using simple arithmetic operations, which makes these embeddings specifically suitable for production applications such as web search and data mining. We also explore more elaborate and involved compositional models, such as recurrent composition and convolutional composition.
Pseudo relevance feedback(About) find an initial set of most relevant documents, assume that the top ranked documents are relevant, finally do relevance feedback under this assumption.
[It's said here](https://www.zbw.eu/fileadmin/pdf/forschung/2017-colloquium-galke-word-embeddings.pdf) that pseudo relevance feedback is not included in Lucene
Word Representations via Gaussian Embedding (2014)(About) > Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Mapping instead to a density provides many interesting advantages
> Novel word embedding algorithms that embed words directly as Gaussian distributional potential functions in an infinite dimensional function space. This allows us to map word types not only to vectors but to soft regions in space, modeling uncertainty, inclusion, and entailment, as well as providing a rich geometry of the latent space.
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical Information Retrieval (2017)(About) > Transferring the success of word embeddings to Information Retrieval (IR) task is currently an active research topic. While embedding-based retrieval models could tackle the vocabulary mismatch problem by making use of the embedding’s inherent similarity between distinct words, most of them struggle to compete with the prevalent strong baselines such as TF-IDF and BM25.
Considering a practical ad-hoc IR task composed of two steps, matching and scoring, compares the performance of several techniques that leverage word embeddings in the retrieval models to compute the similarity between the query and the documents (namely word centroid similarity, paragraph vectors, Word Mover’s distance, as well as a novel inverse document frequency (IDF) re-weighted word centroid similarity).
> We confirm that word embeddings can be successfully employed in a practical information retrieval setting. The proposed cosine similarity of IDF re-weighted, aggregated word vectors is competitive to the TF-IDF baseline.
Web Content Analysis, Semantics and Knowledge – TheWebConf 2018 - Research Track(About) [CFP](https://www2018.thewebconf.org/call-for-papers/research-tracks-cfp/web-content-analysis/)
> In previous years, ‘content analysis’ and ‘semantic and knowledge’ were in separate track. This year, we combined these tracks to emphasize the close relationship between these topics; the use of content to curate knowledge and the use of knowledge to guide content analysis and intelligent usage.
Some of the accepted papers:
- A paper by [David Blei](/tag/david_blei): (Dynamic Embeddings for Language Evolution)
- Large-Scale [Hierarchical Text Classification](/tag/nlp_hierarchical_text_classification) with Recursively Regularized Deep Graph-CNN
- Improving Word Embedding Compositionality using Lexicographic Definitions ([Github](https://github.com/tscheepers/CompVec), [Thesis 2017](https://esc.fnwi.uva.nl/thesis/centraal/files/f1554608041.pdf))
- Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations