]> 2018-10-02 2018-10-02T10:08:30Z An Overview of Multi-Task Learning for Deep Learning Python 3's f-Strings: An Improved String Formatting Syntax (Guide) – Real Python 2018-10-10T11:22:58Z 2018-10-10 Scientists Have Reconstructed Ancient Greek Music And You Can Listen To It | IFLScience 2018-10-22T16:25:41Z 2018-10-22 2018-10-11T08:37:16Z Found in translation: Building a language translator from scratch with deep learning - FloydHub 2018-10-11 Jupyter Notebook Enhancements, Tips And Tricks - Part 1 2018-10-11T08:39:11Z 2018-10-11 Practical Text Classification With Python and Keras – Real Python 2018-10-25T08:39:17Z 2018-10-25 2018-10-05T08:26:47Z SOTAWHAT - A script to keep track of state-of-the-art AI research 2018-10-05 This 3D Human 'Mini-Brain' Is Made of Stem Cells and Can Live For Months - Motherboard 2018-10-20T15:27:05Z 2018-10-20 2018-10-25 2018-10-25T19:51:19Z Caetano Veloso: Dark Times Are Coming for My Country - The New York Times #franceisai, petit récapitulatif des choses vues et entendues ce matin 2018-10-18T14:18:46Z 2018-10-18 [some tweets](https://twitter.com/enahpets/status/1052537794764128257) 2018-10-15 2018-10-15T18:06:56Z Conference - France is AI 2018-10-27T14:59:43Z Time-Contrastive Networks: Self-Supervised Learning from Video (2017) 2018-10-27 Self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. > We train our representations using a metric learning loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. In other words, the model simultaneously learns to recognize what is common between different-looking images, and what is different between similar-looking images. > This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. Writing Code for NLP Research, AllenNLP's tutorial at #emnlp2018 2018-10-31 2018-10-31T18:11:21Z 2018-10-01 2018-10-01T15:06:12Z Ask HN: What are some of the best documentaries you've seen? | Hacker News 2018-10-13T10:46:21Z Comment Trump a manipulé l'Amérique | ARTE 2018-10-13 Ontotext | Semantic Technology Developer 2018-10-13T11:01:58Z 2018-10-13 > Ontotext transforms how organizations **identify meaning across** diverse databases and massive amounts of unstructured data by **combining a semantic graph database with text mining, and machine learning**. Julie Grollier - personal website: Nanodevices - Bio-inspired computing - Spin Transfer Torque - Memristors 2018-10-27T17:38:20Z 2018-10-27 Overcoming device unreliability with continuous learning in a population coding based computing system (2018 - Journal of Applied Physics) 2018-10-21T16:42:53Z 2018-10-21 Aetherial Symbols 2018-10-20T17:49:59Z 2018-10-20 2018-10-26 2018-10-26T16:31:02Z TensorFlow: how to load and save models at every epoch so you never lose time or data. Chiribiquete National Park - Wikipedia 2018-10-16 2018-10-16T00:24:46Z Capturing meaning: Toward an abstract Wikipedia (ISWC 2018) 2018-10-16T00:20:49Z 2018-10-16 2018-10-23T09:38:41Z python - Pythonic way of detecting outliers in one dimensional observation data - Stack Overflow 2018-10-23 The Annotated Transformer 2018-10-12T19:10:45Z 2018-10-12 an “annotated” version of the "Attention is All You Need" paper in the form of a line-by-line implementation Attention Is All You Need 2017-06-12T17:57:34Z 2018-10-12 Noam Shazeer Lukasz Kaiser 1706.03762 Aidan N. Gomez 2017-12-06T03:30:32Z [1706.03762] Attention Is All You Need Jakob Uszkoreit Ashish Vaswani Niki Parmar > The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the **Transformer**, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. 2018-10-12T18:50:14Z Llion Jones Illia Polosukhin Ashish Vaswani The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. 2018-10-19 2018-10-19T14:48:15Z Monstrous moonshine - Wikipedia Training on TPU 2018-10-05T08:19:26Z 2018-10-05 2018-10-18 2018-10-18T13:53:35Z Time, Context and Causality in Recommender Systems 2018-10-07 2018-10-07T12:50:42Z Deploy Your First Deep Learning Model On Kubernetes With Python, Keras, Flask, and Docker Small Data is Big in AI : Train-spotting at France is AI 2018-10-26 2018-10-26T01:22:41Z > Models studied in statistical physics are mathematically equivalent to some of those in high-dimensional statistics. > Statistical physics is often concerned with phase transitions, i.e., abrupt changes in behaviour. Interestingly, there is a deep correspondence between physical phases such as liquid, super-cooled liquid or glass, and solid, and regions of parameters for which a given data analysis task is algorithmically impossible, hard or easy 2018-10-18 2018-10-18T13:27:09Z Cracking big data with statistical physics 2018-10-31T15:56:28Z 2018-10-31 Tutorials - EMNLP 2018 Cost-Sensitive Boosting for Classification of Imbalanced Data (2007) 2018-10-27T23:55:46Z 2018-10-27 The proposed cost-sensitive boosting algorithms are applicable to any base classifier where AdaBoost can be applied 2018-10-22T08:28:19Z 2018-10-22 Effect of Non-linear Deep Architecture in Sequence Labeling > we show the close connection between CRF and “sequence model” neural nets, and present an empirical investigation to compare their performance on two sequence labeling tasks – Named Entity Recognition and Syntactic Chunking. Our results suggest that **non-linear models are highly effective in low-dimensional distributional spaces. Somewhat surprisingly, we find that a non-linear architecture offers no benefits in a high-dimensional discrete feature space**. 2018-10-02 2018-10-02T10:02:54Z A Review of the Recent History of Natural Language Processing - AYLIEN [slides included here](/doc/?uri=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F15ehMIJ7wY9A7RSmyJPNmrBMuC7se0PMP%2Fview) How to boost a Keras based neural network using AdaBoost? - Stack Overflow 2018-10-28T01:00:34Z 2018-10-28 2018-10-09 2018-10-09T10:15:02Z Unsupervised Text Summarization using Sentence Embeddings Julie Grollier (@julie_grollier) | Twitter 2018-10-21T11:15:04Z 2018-10-21 2018-10-26T01:45:25Z 2018-10-26 mentions [Building machines that learn and think like people](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1604.00289) Tomer D. Ullman 2018-10-28T17:08:00Z 2018-10-28 > we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations Building Machines That Learn and Think Like People [1604.00289] Building Machines That Learn and Think Like People 1604.00289 Joshua B. Tenenbaum Brenden M. Lake Samuel J. Gershman 2016-11-02T17:26:50Z Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models. 2016-04-01T15:37:57Z Brenden M. Lake Teaching Machines to Understand Natural Language (2018) We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNsearch to the case where multiple computational steps (hops) are performed per output symbol. The flexibility of the model allows us to apply it to tasks as diverse as (synthetic) question answering and to language modeling. For the former our approach is competitive with Memory Networks, but with less supervision. For the latter, on the Penn TreeBank and Text8 datasets our approach demonstrates comparable performance to RNNs and LSTMs. In both cases we show that the key concept of multiple computational hops yields improved results. 2018-10-23 Jason Weston 2015-03-31T03:05:37Z Neural network with a recurrent attention model over a possibly large external memory. cité par [#A. Bordes](/tag/antoine_bordes) à [#ParisIsAI conf 2018](/tag/france_is_ai_2018.html) End-To-End Memory Networks Sainbayar Sukhbaatar [1503.08895] End-To-End Memory Networks 2018-10-23T20:17:35Z Rob Fergus 2015-11-24T19:41:57Z Sainbayar Sukhbaatar Arthur Szlam 1503.08895 Julie Grollier - Spin Transfer Torque 2018-10-27T17:28:16Z 2018-10-27 The classical way to perform data processing is to reduce all sources of noise to the maximum. An interesting alternative strategy is, on the contrary, to exploit noise for computing. In this trend, stochastic computing has a great potential for the implementation of **low power information processing systems**. Indeed noise is often seen as a key element of neural computation, beneficial for a number of operations as near-threshold signaling and decision making. And spin torque devices, just like neurons, can exhibit noise induced sensitivity improvement, for example via stochastic resonance. We are working on the development of **probabilistic bio-inspired hardware** exploiting the controlled stochasticity provided by spin torque Grounded Language Learning and Understanding — MIT Media Lab (1999-2001) 2018-10-26T00:33:06Z Language is grounded in experience. Unlike dictionaries which define words in terms of other words, humans understand many basic words in terms of associations with sensory-motor experiences. People must interact physically with their world to grasp the essence of words like "red," "heavy," and "above." 2018-10-26 2018-10-23T22:41:09Z Towards bridging the gap between deep learning and brains 2018-10-23 > Underlying Assumption: There are principles giving rise to intelligence (machine, human or animal) via learning, simple enough that they can be described compactly, similarly to the laws of physics, i.e., our intelligence is not just the result of a huge bag of tricks and pieces of knowledge, but of general mechanisms to acquire knowledge. Marcus Liwicki [1810.07150] Subword Semantic Hashing for Intent Classification on Small Datasets Kumar Shridhar Vinaychandran Pondenkandath 2018-10-22T14:23:00Z Gustav Grund Pihlgren 1810.07150 Foteini Simistira In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and Web Application. Our benchmarks are available online: https://github.com/kumar-shridhar/Know-Your-Intent Gyorgy Kovacs 2019-09-14T15:42:30Z Amit Sahu Ayushman Dash 2018-10-16T17:25:22Z Subword Semantic Hashing for Intent Classification on Small Datasets Kumar Shridhar Pedro Alonso 2018-10-22 Solid: Linked Data for personal data management 2018-10-09T10:39:39Z 2018-10-09 Deep Learning for Named Entity Recognition #1: Public Datasets and Annotation Methods 2018-10-15T14:25:14Z 2018-10-15 Grounded Language Learning: Where Robotics and NLP Meet (IJCAI 2018) 2018-10-26T00:50:33Z 2018-10-26 When trained only on large corpuses of text, but not on real-world representations, statistical methods for NLP and NLU lack true understanding of what words mean Kenton Lee Jacob Devlin Kristina Toutanova Ming-Wei Chang 2018-10-11T00:50:01Z 2018-10-12 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding **The "Devlin et al 2019" paper** [Paper Dissected](https://datasciencetoday.net/index.php/en-us/nlp/211-paper-dissected-bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding-explained) Jacob Devlin 1810.04805 2019-05-24T20:37:26Z We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018-10-12T14:36:01Z Universal Basic Income Is Silicon Valley’s Latest Scam 2018-10-10T23:55:25Z 2018-10-10 The plan is no gift to the masses, but a tool for our further enslavement Gilberto Gil and Caetano Veloso in London | Music | The Guardian 2018-10-25T11:21:04Z 2018-10-25 2018-10-15 2018-10-15T13:27:09Z Outlier Detection with Isolation Forest – Towards Data Science Thought vector - Wikipedia 2018-10-20T17:56:34Z 2018-10-20 2018-10-10 2018-10-10T11:30:10Z Patrick Boucheron bouscule l’histoire | CNRS Le journal 2018-10-12T19:05:09Z 2018-10-12 Paper Dissected: "Attention is All You Need" Explained | Machine Learning Explained 2018-10-28T14:35:29Z 2018-10-28 série de blog posts Comprendre l’informatique quantique I will propose the hypothesis that **self-supervised learning of predictive world models is an essential missing ingredient of current approaches to AI**. With such models, one can predict outcomes and plan courses of actions. One could argue that prediction is the essence of intelligence. Good predictive models may be the basis of intuition, reasoning and "common sense", allowing us to fill in missing information: predicting the future from the past and present, or inferring the state of the world from noisy percepts. 2018-10-27T15:12:16Z Self-Supervised Learning, Yann LeCun, Facebook AI Research | Dartmouth News 2018-10-27 2018-10-26T00:36:36Z 4 Approaches To Natural Language Processing & Understanding 2018-10-26 The antithesis of grounded language is inferred language. Inferred language derives meaning from words themselves rather than what they represent. When trained only on large corpuses of text, but not on real-world representations, statistical methods for NLP and NLU lack true understanding of what words mean. Both GANs and VAEs have been remarkably effective at modeling images, and the learned latent representations often correspond to interesting, semantically-meaningful representations of the observed data. In contrast, GANs and VAEs have been less successful at modeling natural language, but for different reasons. - GANs have difficulty dealing with discrete output spaces (such as natural language) as the resulting objective is no longer differentiable with respect to the generator. - VAEs can deal with discrete output spaces, but when a powerful model (e.g. LSTM) is used as a generator, the model learns to ignore the latent variable and simply becomes a language model. 2018-10-31 2018-10-31T23:27:31Z [Seminar] Deep Latent Variable Models of Natural Language 2018-10-16T09:36:31Z 2018-10-16 RDF(-DEV), back to the future (was Re: Semantic Web Interest Group now closed) from Dan Brickley on 2018-10-16 (semantic-web@w3.org from October 2018) Nigel Collier Mohammad Taher Pilehvar 2018-10-09 1710.06632 [1710.06632] Towards a Seamless Integration of Word Senses into Downstream NLP Applications 2017-10-18T09:13:06Z 2017-10-18T09:13:06Z Roberto Navigli Towards a Seamless Integration of Word Senses into Downstream NLP Applications Mohammad Taher Pilehvar Lexical ambiguity can impede NLP systems from accurate understanding of semantics. Despite its potential benefits, the integration of sense-level information into NLP systems has remained understudied. By incorporating a novel disambiguation algorithm into a state-of-the-art classification model, we create a pipeline to integrate sense-level information into downstream NLP applications. We show that a simple disambiguation of the input text can lead to consistent performance improvement on multiple topic categorization and polarity detection datasets, particularly when the fine granularity of the underlying sense inventory is reduced and the document is sufficiently large. Our results also point to the need for sense representation research to focus more on in vivo evaluations which target the performance in downstream NLP applications rather than artificial benchmarks. 2018-10-09T15:08:40Z Jose Camacho-Collados > How deep learning can represent War and Peace as a vector set-up is based on the assumption that books whose wikipedia page link to similar Wikipedia pages are similar to one another 2018-10-09 2018-10-09T10:02:39Z Neural Network Embeddings Explained – Towards Data Science TensorFlow.js 2018-10-10T11:27:13Z 2018-10-10 pandas Multi-index and groupbys (article) - DataCamp 2018-10-08T09:42:43Z 2018-10-08 2017-03-09T04:36:15Z Aurko Roy 1703.03129 2018-10-23 [1703.03129] Learning to Remember Rare Events Łukasz Kaiser Samy Bengio Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task. Łukasz Kaiser Ofir Nachum Learning to Remember Rare Events > a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. > Our memory module can be easily added to any part of a supervised neural network 2017-03-09T04:36:15Z 2018-10-23T12:36:58Z > we show that word vector length has a confounding effect on the probability of a sentence being generated in Arora et al.’s model ([SIF embeddings](tag:sif_embeddings)). We propose a random walk model that is robust to this confound... Our approach beats Arora et al.’s by up to 44.4% on textual similarity tasks... Unlike Arora et al.’s method, ours requires no hyperparameter tuning [Github](https://github.com/kawine/usif) Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline (Ethayarajh 2018) 2018-10-08 Github project associated to [USIF paper](doc:?uri=http%3A%2F%2Fwww.aclweb.org%2Fanthology%2FW18-3012%2F) [About the GEM paper](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1810.00438). GEM compared with [#SIF](/tag/sif_embeddings), "[Sentences as subspaces](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.05358)", ["USIF"](doc:?uri=http%3A%2F%2Fwww.aclweb.org%2Fanthology%2FW18-3012%2F) 2018-10-20T14:41:37Z 2018-10-20 Chenguang Zhu We propose a simple and robust non-parameterized approach for building sentence representations. Inspired by the Gram-Schmidt Process in geometric theory, we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence. We model the semantic meaning of a word in a sentence based on two aspects. One is its relatedness to the word vector subspace already spanned by its contextual words. The other is the word's novel semantic meaning which shall be introduced as a new basis vector perpendicular to this existing subspace. Following this motivation, we develop an innovative method based on orthogonal basis to combine pre-trained word embeddings into sentence representations. This approach requires zero parameters, along with efficient inference performance. We evaluate our approach on 11 downstream NLP tasks. Our model shows superior performance compared with non-parameterized alternatives and it is competitive to other approaches relying on either large amounts of labelled data or prolonged training time. Ziyi Yang Weizhu Chen > We observe a simple geometry of sentences -- the word representations of a given sentence roughly lie in a low-rank subspace (roughly, rank 4). Motivated by this observation, we represent a sentence by the low-rank subspace spanned by its word vectors. A sentence of N words is a matrix (300, N) (if 300 is the dim of the word embeddings space). We take the eg. 4 (hyperparam) heaviest singular values -> a subspace with dim 4 Similarity between docs: principal angle between the subspaces (reminiscent of cosine similarity) [1704.05358] Representing Sentences as Low-Rank Subspaces 2018-10-06T11:22:58Z Jiaqi Mu 1704.05358 2017-04-18T14:30:32Z 2017-04-18T14:30:32Z Pramod Viswanath 2018-10-06 Representing Sentences as Low-Rank Subspaces Suma Bhat Sentences are important semantic units of natural language. A generic, distributional representation of sentences that can capture the latent semantics is beneficial to multiple downstream applications. We observe a simple geometry of sentences -- the word representations of a given sentence (on average 10.23 words in all SemEval datasets with a standard deviation 4.84) roughly lie in a low-rank subspace (roughly, rank 4). Motivated by this observation, we represent a sentence by the low-rank subspace spanned by its word vectors. Such an unsupervised representation is empirically validated via semantic textual similarity tasks on 19 different datasets, where it outperforms the sophisticated neural network models, including skip-thought vectors, by 15% on average. Jiaqi Mu 2019-12-06T05:01:36Z Parameter-free Sentence Embedding via Orthogonal Basis [1810.00438] Parameter-free Sentence Embedding via Orthogonal Basis 2018-09-30T18:26:30Z 2018-10-06T18:01:18Z 1810.00438 2018-10-06 **training-free approach for building sentence representations**, "Geometric Embedding" (GEM), based on the **geometric structure** of word embedding space. > we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence. **We model the semantic meaning of a word in a sentence** based on two aspects. One is its relatedness to the word vector subspace already spanned by its contextual words. The other is the word’s novel semantic meaning which shall be introduced as a new basis vector perpendicular to this existing subspace [on www.groundai.com](https://www.groundai.com/project/zero-training-sentence-embedding-via-orthogonal-basis/) [Open Revieww](/doc/?uri=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3DrJedbn0ctQ) ; [Related to this paper](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1704.05358) Ziyi Yang Zero-training Sentence Embedding via Orthogonal Basis | OpenReview GitHub - kawine/usif: Implementation of unsupervised smoothed inverse frequency 2018-10-20T14:59:02Z 2018-10-20 2018-10-08T00:31:14Z