]> Using Word2Vec for topic modeling - Stack Overflow 2017-05-19T00:22:06Z 2017-05-19 datquocnguyen/LFTM: Improving Topic Models with Latent Feature Word Representations (GitHub) 2017-05-22 2017-05-22T14:53:21Z On APIs, JSON, Linked Data, attitude and opportunities | Linked Data Orchestration 2017-05-15T11:07:13Z 2017-05-15 We investigate whether determining short text similarity is possible using only semantic features. A novel feature of our approach is that an arbitrary number of word embedding sets can be incorporated. 2017-05-18 2017-05-18T01:58:44Z Short Text Similarity with Word Embeddings Text Classification With Word2Vec - DS lore (2016) 2017-05-18T23:42:46Z 2017-05-18 > Overall, we won’t be throwing away our SVMs any time soon in favor of word2vec but it has it’s place in text classification. > > 1. SVM’s are pretty great at text classification tasks > 2. Models based on simple averaging of word-vectors can be surprisingly good too (given how much information is lost in taking the average) > 3. but they only seem to have a clear advantage when there is ridiculously little labeled training data > > Update 2017: actually, the best way to utilise the pretrained embeddings would probably be this [using keras](https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html) Sample code to benchmark a few text categorization models to test whehter word embeddings like word2vec can improve text classification accuracy. Sample code (based on scikit-learn) includes an embedding vectorizer that is given embedding dataset and vectorizes texts by taking the mean of all the vectors corresponding to individual words. 2017-05-28 2017-05-28T18:58:27Z osx - How do I upgrade to Python 3.6 with conda? - Stack Overflow L’homme préhistorique aimait les pains moutardés | Dans les pas des archéologues 2017-05-13T18:39:53Z 2017-05-13 GloVe: Global Vectors for Word Representation 2017-05-18T22:49:32Z 2017-05-18 Spark Framework: A tiny Java web framework 2017-05-15T18:21:16Z 2017-05-15 How to find semantic similarity between two documents? (researchgate) 2017-05-18T09:46:08Z 2017-05-18 Jump into Java microframeworks, Part 1: Introduction | JavaWorld 2017-05-16T02:06:58Z 2017-05-16 2017-05-20T14:50:46Z 2017-05-20 Improving Topic Models with Latent Feature Word Representations (slides) Stratégie d’architecture API | OCTO talks ! 2017-05-15 2017-05-15T11:16:17Z 2017-05-27 2017-05-27T13:08:52Z javascript - How does Firefox reader view operate - Stack Overflow 2017-05-20T14:05:12Z Improving Topic Models with Latent Feature Word Representations | Nguyen | Transactions of the Association for Computational Linguistics 2017-05-20 2017-05-22 Lingo3G: real-time text clustering engine | Carrot Search Instant analysis of small-to-medium quantities of text. Organizes collections of text documents into clearly-labeled hierarchical folders. In real-time, fully automatically, without external knowledge bases 2017-05-22T13:59:23Z 2017-05-28 2017-05-28T18:55:56Z Anaconda | Continuum 2017-05-26T00:45:00Z 2017-05-26 Sentinelese - Wikipedia Hackers Came, but the French Were Prepared - The New York Times 2017-05-10 2017-05-10T21:00:59Z 2017-05-15 2017-05-15T11:32:18Z Designer une API REST | OCTO talks ! 2017-05-13T18:30:00Z 2017-05-13 Chelsea Manning prepares for freedom: 'I want to breathe the warm spring air' | US news | The Guardian pretty basic, use word frequency, stemming and stopwords. Swayy | Blog — An algorithm for generating automatic hashtags 2017-05-24T18:07:27Z 2017-05-24 2017-05-24 2017-05-24T17:58:13Z An Efficient Way to Extract the Main Topics from a Sentence | The Tokenizer based on simple POS tagging (using the Brown corpus), less accurate than the default NLTK tools, but faster Greffe de tête = Greffe du corps 2017-05-04 2017-05-04T20:07:04Z Un pas de plus vers la greffe de tête | Passeur de sciences 2017-05-23 2017-05-23T15:16:18Z Stanford Topic Modeling Toolbox Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors (2014) 2017-05-18 2017-05-18T23:30:46Z (good presentation in the intro of context-counting vs. context-predicting vectors) android - How is 'reader mode' in Firefox triggered? - Stack Overflow 2017-05-27T13:07:29Z 2017-05-27 2017-05-28 2017-05-28T12:26:10Z CNRS - Quand les souvenirs refont surface grâce à la stimulation électrique cérébrale… Latent semantic analysis and indexing - EduTech Wiki 2017-05-26 2017-05-26T01:26:35Z What are the best open source tools for unsupervised clustering of text documents? - Quora 2017-05-22T12:00:39Z 2017-05-22 Zanj Rebellion - Wikipedia 2017-05-18 2017-05-18T22:24:19Z 2017-05-24 2017-05-24T17:32:42Z Extract Subject Matter of Documents Using NLP – Alexander Crosson – Medium Carrot2 manual 2017-05-23T17:42:55Z 2017-05-23 Summarize Documents using Tf-Idf – Alexander Crosson – Medium 2017-05-24T17:10:17Z 2017-05-24 Comment Jacko, le petit singe savant, retrouva sa maman - John S. Goodall 2017-05-20T15:56:31Z 2017-05-20 Premier neurone artificiel monocomposant - CNRS 2017-05-05T01:08:28Z 2017-05-05 Semantic UI 2017-05-10T07:54:40Z 2017-05-10 2017-05-22 2017-05-22T11:37:25Z Topic modeling made just simple enough. | The Stone and the Shell 2017-05-23 2017-05-23T15:06:24Z alternatives to word2vec? - Quora Use Elasticsearch in your Java applications 2017-05-15T19:05:12Z 2017-05-15 2017-05-23 2017-05-23T10:54:18Z Graph databases and RDF: It's a family affair | ZDNet Author: Maali Mnasri (PhD @ CEA) First transform text units to vectors? not always (eg. sentence similarity task using lexical word alignment). But vectors are efficient to process, and benefit from existing clustering algorithms such as k-means. Sentence level or document level? Sentence clustering to summarise large documents. Thematic clustering vs Semantic clustering: depends on the similarity measure. Text similarity measures: - Cosine similarity of tf-idf (suitable to produce thematic clusters) - Knowledge-based Measures (wordNet) (quantify semantic relatedness of words), - Word embedings Examples, sample code: - using wordnet with NLTK, and the formula to compute sentence similarities from word similarities. - computing similarities between docs using gensim/word2vec Which clustering algorithm? - when we have an approximation of the clusters number, and when the similarity measure is not expensive in terms of computation time, clustering algo are suitable and fast. Sample code of k-means clustering using tf-idf vectors with scikit-learn - Hierarchical clustering algorithms - don't need to give the number of clusters - but time consuming (calculate a similarity matrix for the sentences) - for voluminous data, use an incremental clustering algorithm: sentences are processed one at a time ; each new sentence is compared to each of the already formed clusters. 2017-05-18 2017-05-18T01:31:31Z Quick review on Text Clustering and Text Similarity Approaches 2017-05-19 2017-05-19T08:24:26Z Topic Modeling in the Humanities: An Overview - Maryland Institute for Technology in the Humanities 2017-05-15T10:59:28Z JSON Hypertext Application Language The JSON Hypertext Application Language (HAL) is a standard which establishes conventions for expressing hypermedia controls, such as links, with JSON 2017-05-15 2017-05-26 2017-05-26T00:37:54Z Ishi - Wikipedia Ishi (c. 1861 – March 25, 1916) was the last known member of the Yahi, a group of the Yana of the U.S. state of California. Widely acclaimed in his time as the "last wild Indian" in America, Ishi lived most of his life completely outside modern culture. At 50 years of age, in 1911, he emerged near the present-day foothills of Lassen Peak, also known as Wa ganu p'a. Carrot2: Text Clustering Algorithms and Applications 2017-05-23T12:12:49Z 2017-05-23 Open Source Search Results Clustering Engine. It can automatically organize small collections of documents (like, ehm, search results), into thematic categories. 2017-05-24T18:20:50Z NLP keyword extraction tutorial with RAKE and Maui 2017-05-24 2 tools: - simple keyword extraction with a Python library (RAKE) - Java tool (Maui) that uses a machine-learning technique. Focus on 2 tasks: - Extracting the most significant words and phrases that appear in given text - Identifying a set of topics from a predefined vocabulary that match a given text Typical steps: - Candidate selection (extract all possible words, phrases, terms or concepts that can potentially be keywords). - Properties calculation (for each candidate, properties that indicate that it may be a keyword) - Scoring and selecting keywords RAKE: finding multi-word phrases containing frequent words. +: simplicity, ease of use -: limited accuracy, parameter configuration requirement, throws away many valid phrases, doesn’t normalize candidates (no stemming). Maui: ("Multi-purpose automatic topic indexing"). Based on [Weka](/semanlink/tag/weka) (GPL, java, maven, github). Compared to RAKE: - Extract keywords not just from text, but also with a reference to a controlled vocabulary - Improve the accuracy by training Maui on manually chosen keywords - but requires a training model. Maui can use a controlled vocabulary expressed in SKOS - so I could use it in semanlink! 2017-05-15 2017-05-15T09:19:09Z Stephen Wolfram: A New Kind of Science java, not free LingPipe 2017-05-23T11:48:43Z 2017-05-23 Both learn geometrical encodings (vectors) of words from their co-occurrence information. Word2vec is a "predictive" model, whereas GloVe is a "count-based" model. 2017-05-18 2017-05-18T23:20:04Z How is GloVe different from word2vec? - Quora Topic Modeling for Humanists: A Guided Tour 2017-05-19T08:26:01Z 2017-05-19 2017-05-23 2017-05-23T11:57:01Z Result Clustering - Apache Solr Reference Guide - Apache Software Foundation Build your own summary tool! | The Tokenizer 2017-05-24T17:56:43Z 2017-05-24