]> Tim Bouma sur Twitter : "I take the bus home. I stand behind two people. I notice the weird-looking shoes tied to their backpacks. After a few moments, I realize they are rock-climbers and those are grippy shoes..." 2019-02-06 2019-02-06T23:50:12Z Patent finding using free search tools 2019-02-20T11:54:29Z 2019-02-20 Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction - ACL Anthology 2019-02-09T11:28:06Z 2019-02-09 Attempting to answer questions such as: "What is the task described in this paper?", "what method was used in solving the task?", "what dataset did the paper use?". The multi-task setup reduces cascading errors between tasks and leverages cross-sentence relations through coreference links. How to Measure and Draw Causal Inferences with Patent Scope by Jeffrey M. Kuhn, Neil Thompson :: SSRN 2019-02-21T00:29:59Z 2019-02-21 easy-to-use measure of patent scope: the number of words in its first claim (the less words the better) (!) 2019-02-02 2019-02-02T12:03:55Z GraphDB Free Download | Ontotext > **BigQuery ML**, a capability inside BigQuery that allows to build and deploy machine learning models on massive structured or semi-structured datasets > because the core components of gradient descent can be implemented using common SQL operation, we were able to repurpose the existing BigQuery SQL processing engine for BigQuery ML. - Paper (2017): [SQML: large-scale in-database machine learning with pure SQL](https://dl.acm.org/citation.cfm?doid=3127479.3132746) - [User guide](https://cloud.google.com/bigquery/docs/bigqueryml-intro) 2019-02-19 2019-02-19T20:54:08Z Google AI Blog: Machine Learning in Google BigQuery (2018) 2019-02-18 A Look at the USPTO’s AI Development Efforts 2019-02-18T15:10:06Z Tesla Model 3 = #1 Best Selling Electric Car in World 2019-02-15T13:45:49Z 2019-02-15 2019-02-07 2019-02-07T01:39:52Z Intelligence artificielle : DeepMind s’intéresse au jeu de cartes français Hanabi 2019-02-25 2019-02-25T10:15:38Z Microsoft Academic A tutorial on how to use Google Patents Public Datasets, along with Apache Beam, Cloud Dataflow, TensorFlow, and Cloud ML Engine to create a machine learning model to estimate the ‘breadth’ of patent claims. 2019-02-21T00:35:11Z Measuring patent claim breadth using Google Patents Public Datasets | Google Cloud Blog 2019-02-21 Les insectes pourraient avoir complètement disparu dans cent ans 2019-02-11T13:39:10Z 2019-02-11 Neural Transfer Learning for Natural Language Processing - Seb Ruder PhD Thesis 2019-02-27T13:54:03Z 2019-02-27 ReviewNB: Jupyter Notebook Diff for GitHub 2019-02-14T08:38:55Z 2019-02-14 2019-02-24T14:12:27Z Successes and Challenges in Neural Models for Speech and Language - Michael Collins - YouTube 2019-02-24 >3 problems, 3 architectures: >- Speech recognition (feed-forward networks) >- NL parsing (word embeddings and feed-forward networks) >- QA (transformers) 2019-02-07 A collection of notebooks for Natural Language Processing from NLP Town 2019-02-07T00:48:41Z Nashorn 2019-02-26 2019-02-26T22:28:54Z Nashorn (JavaScript engine) [Jaeyoung2018] Patent Document Clustering with Deep Embeddings 2019-02-23T17:43:33Z 2019-02-23 uses [this method](/doc/?uri=https%3A%2F%2Farxiv.org%2Fabs%2F1511.06335) 2019-02-19 Unsupervised Deep Embedding for Clustering Analysis Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective Ali Farhadi Ross Girshick Junyuan Xie Junyuan Xie 1511.06335 2015-11-19T20:06:14Z 2019-02-19T19:06:06Z [1511.06335] Unsupervised Deep Embedding for Clustering Analysis Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods. 2016-05-24T22:27:35Z The state-of-the-art on Intellectual Property Analytics (IPA) - ScienceDirect (2018) 2019-02-13T23:22:59Z 2019-02-13 A literature review on artificial intelligence, machine learning and deep learning methods for analysing intellectual property data. Intellectual Property Analytics (IPA): data science of analysing large amount of IP information, to discover relationships, trends and patterns for decision making 2019-02-22T13:23:39Z 2019-02-22 CNRS - Les plantes graminées peuvent acquérir les gènes de leurs voisines 2019-02-15 2019-02-15T14:52:32Z Online Patent Search Tools | Patent Analytics - XLPAT Labs 2019-03-04T19:45:29Z Franziska Horn Automating the search for a patent's prior art with a full text similarity search Tim Oppermann Lea Helmers [github](https://github.com/helmersl/patent_similarity_search) mouais Klaus-Robert Müller [1901.03136] Automating the search for a patent's prior art with a full text similarity search 2019-01-10T13:04:25Z Lea Helmers 2019-02-15 1901.03136 2019-02-15T15:57:01Z Franziska Biegler More than ever, technical inventions are the symbol of our society's advance. Patents guarantee their creators protection against infringement. For an invention being patentable, its novelty and inventiveness have to be assessed. Therefore, a search for published work that describes similar inventions to a given patent application needs to be performed. Currently, this so-called search for prior art is executed with semi-automatically composed keyword queries, which is not only time consuming, but also prone to errors. In particular, errors may systematically arise by the fact that different keywords for the same technical concepts may exist across disciplines. In this paper, a novel approach is proposed, where the full text of a given patent application is compared to existing patents using machine learning and natural language processing techniques to automatically detect inventions that are similar to the one described in the submitted document. Various state-of-the-art approaches for feature extraction and document comparison are evaluated. In addition to that, the quality of the current search process is assessed based on ratings of a domain expert. The evaluation results show that our automated approach, besides accelerating the search process, also improves the search results for prior art with respect to their quality. 2019-02-28T13:10:48Z Vered Shwartz Building meaningful phrase representations is challenging because phrase meanings are not simply the sum of their constituent meanings. Lexical composition can shift the meanings of the constituent words and introduce implicit information. We tested a broad range of textual representations for their capacity to address these issues. We found that as expected, contextualized word representations perform better than static word embeddings, more so on detecting meaning shift than in recovering implicit information, in which their performance is still far from that of humans. Our evaluation suite, including 5 tasks related to lexical composition effects, can serve future research aiming to improve such representations. Ido Dagan 2019-02-27T16:16:37Z [1902.10618] Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition 2019-05-19T13:47:16Z Vered Shwartz Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition 1902.10618 How well do contextualized word embeddings address lexical composition? They are good in recognizing meaning shift ("give in" is different from "give") but much worse with revealing implicit meaning ("hot tea" is about temperature, "hot debate" isn't). 2019-02-28 2019-02-28 2019-02-28T13:17:25Z Article recommendation with Personalized PageRank and Full Text Search 2019-02-07T00:59:11Z nlp-notebooks/Simple Sentence Similarity.ipynb at master · nlptown/nlp-notebooks 2019-02-07 [blog post](/doc/?uri=http%3A%2F%2Fnlp.town%2Fblog%2Fsentence-similarity%2F) [Zinder (Camille Lefebvre | Langarchiv)](doc:2021/04/camille_lefebvre_%7C_langarchiv) 2019-02-25T14:34:25Z 2019-02-25 Zinder renoue avec son passé | CNRS Le journal 2019-02-19T21:14:16Z 2019-02-19 Machine learning and natural language processing on the patent corpus: Data, tools, and new measures (2015) Olaf Hartig sur Twitter : "Here are typical examples of how people do data integration in the GraphQL context. Everything is explicitly implemented in the program code. No flexibility. Reminds me of the API mash-up apps that were popular 15 years ago. https://t.co/3qJMKXoWDt https://t.co/GWqPdmeFIP" 2019-02-16 2019-02-16T20:09:22Z Generalized Language Models 2019-02-10T19:15:29Z 2019-02-10 2019-02-14T11:43:23Z 2019-02-14 A Supervised Requirement-oriented Patent Classification Scheme Based on the Combination of Metadata and Citation Information (2015) 2019-02-19T17:45:21Z Plotting Similar Patents | Kaggle 2019-02-19 Kaggle kernel to get started using the **patent embeddings** in Python 2019-02-02T15:52:54Z 2019-02-02 Enhancing Binary Classification by Modeling Uncertain Boundary in Three-Way Decisions - IEEE Journals & Magazine Using BERT for state-of-the-art pre-training for natural language processing 2019-02-14T16:45:56Z 2019-02-14 2019-02-17 2019-02-17T10:45:12Z ISWC 2019 Tutorial "An Introduction To GraphQL" - IDA What Is Google Patents Search? 2019-02-09T00:52:01Z 2019-02-09 2019-02-19 Patent Similarity. A Big Data Method for Patent Analysis (2015) 2019-02-19T21:08:41Z 2019-02-09 2019-02-09 Keywords2vec 2019-02-09T01:43:55Z To generate a word2vec model, but using keywords instead of one word. Tokenize on stopwords + non word characters (This remembers me author of [FlashText algorithm](tag:flashtext_algorithm.html) saying he had developed it to create word2vec models) > We found that tokenizing using stopwords + non word characters was really useful for "finding" the keywords [keywords2vec](/doc/2019/02/keywords2vec) 2019-02-09T01:42:55Z Jeremy Howard on Twitter: "Such a ridiculously simple idea couldn't possibly work, could it? Or... could it? " 2019-02-14 2019-02-14T20:50:48Z Better Language Models and Their Implications > Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model 2019-02-19 2019-02-19T19:21:48Z Programmatic Patent Searches Using Google’s BigQuery & Public Patent Data 2019-02-27 2019-02-27T23:14:32Z Aquarius (filme) 2019-02-17 In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN not only leverages large amounts of cross-task data, but also benefits from a regularization effect that leads to more general representations in order to adapt to new tasks and domains. MT-DNN extends the model proposed in Liu et al. (2015) by incorporating a pre-trained bidirectional transformer language model, known as BERT (Devlin et al., 2018). MT-DNN obtains new state-of-the-art results on ten NLU tasks, including SNLI, SciTail, and eight out of nine GLUE tasks, pushing the GLUE benchmark to 82.7% (2.2% absolute improvement). We also demonstrate using the SNLI and SciTail datasets that the representations learned by MT-DNN allow domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations. The code and pre-trained models are publicly available at https://github.com/namisan/mt-dnn. Jianfeng Gao 2019-05-30T00:01:20Z Weizhu Chen 1901.11504 2019-02-17T12:30:18Z 2019-01-31T18:07:25Z [1901.11504] Multi-Task Deep Neural Networks for Natural Language Understanding Xiaodong Liu Pengcheng He outperforms BERT in nine of eleven benchmark NLP tasks Xiaodong Liu Multi-Task Deep Neural Networks for Natural Language Understanding Reinald Kim Amplayo Kyungjae Lee 2019-02-18T08:20:43Z Sua Sung 2019-02-14T03:07:53Z Seung-won Hwang Jihyeok Kim 2019-02-18 The performance of text classification has improved tremendously using intelligently engineered neural-based models, especially those injecting categorical metadata as additional information, e.g., using user/product information for sentiment classification. These information have been used to modify parts of the model (e.g., word embeddings, attention mechanisms) such that results can be customized according to the metadata. We observe that current representation methods for categorical metadata, which are devised for human consumption, are not as effective as claimed in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category, and even such context is often scarce (for tail category). To this end, we propose to use basis vectors to effectively incorporate categorical metadata on various parts of a neural-based model. This additionally decreases the number of parameters dramatically, especially when the number of categorical features is large. Extensive experiments on various datasets with different properties are performed and show that through our method, we can represent categorical metadata more effectively to customize parts of the model, including unexplored ones, and increase the performance of the model greatly. 1902.05196 > We observe that **current representation methods for categorical metadata... are not as effective as claimed** in popular classification methods, outperformed even by simple concatenation of categorical features in the final layer of the sentence encoder. We conjecture that categorical features are harder to represent for machine use, as available context only indirectly describes the category Jihyeok Kim 2019-02-14T03:07:53Z Categorical Metadata Representation for Customized Text Classification [1902.05196] Categorical Metadata Representation for Customized Text Classification Minji Seo Gaussian Processes 2019-02-11T11:45:29Z 2019-02-11 2019-02-10 2019-02-10T19:18:31Z Understanding building blocks of ULMFIT – Kerem Turgutlu – Medium Alessandro Moschitti 1902.05309 Lingzhen Chen 2019-02-14T11:40:58Z Transfer Learning for Sequence Labeling Using Source Model and Target Data 2019-02-18T08:30:22Z 2019-02-14T11:40:58Z Lingzhen Chen 2019-02-18 use-case ex: NER when the target data contains new categories [1902.05309] Transfer Learning for Sequence Labeling Using Source Model and Target Data In this paper, we propose an approach for transferring the knowledge of a neural model for sequence labeling, learned from the source domain, to a new model trained on a target domain, where new label categories appear. Our transfer learning (TL) techniques enable to adapt the source model using the target data and new categories, without accessing to the source data. Our solution consists in adding new neurons in the output layer of the target model and transferring parameters from the source model, which are then fine-tuned with the target data. Additionally, we propose a neural adapter to learn the difference between the source and the target label distribution, which provides additional important information to the target model. Our experiments on Named Entity Recognition show that (i) the learned knowledge in the source model can be effectively transferred when the target data contains new categories and (ii) our neural adapter further improves such transfer. [github](https://github.com/google/patents-public-data/tree/master/models/landscaping) 2019-02-20T08:14:36Z 2019-02-20 Automated patent landscaping (google/patents-public-data) Jacob Devlin talks about BERT at the Stanford NLP seminar 2019-02-11T11:20:39Z 2019-02-11 Includes new results such as the effect of the masking strategy, using synthetic training data,... > Vous voulez jouer avec la croix gammitte ? Regardez. Des millions d'hommes en sont morts... Vous voulez recommencer ? Vous êtes fous 2019-02-17 2019-02-17T19:30:43Z A t'on oublié où mène la croix gammée ? - Vidéo Ina.fr 2019-02-09T01:52:31Z Google explores AI's mysterious polytope | ZDNet 2019-02-09 Binary classification models with "Uncertain" predictions Damjan Krstajic [1711.09677] Binary classification models with "Uncertain" predictions 2017-11-27T13:29:42Z 2019-02-02 Binary classification models which can assign probabilities to categories such as "the tissue is 75% likely to be tumorous" or "the chemical is 25% likely to be toxic" are well understood statistically, but their utility as an input to decision making is less well explored. We argue that users need to know which is the most probable outcome, how likely that is to be true and, in addition, whether the model is capable enough to provide an answer. It is the last case, where the potential outcomes of the model explicitly include "don't know" that is addressed in this paper. Including this outcome would better separate those predictions that can lead directly to a decision from those where more data is needed. Where models produce an "Uncertain" answer similar to a human reply of "don't know" or "50:50" in the examples we refer to earlier, this would translate to actions such as "operate on tumour" or "remove compound from use" where the models give a "more true than not" answer. Where the models judge the result "Uncertain" the practical decision might be "carry out more detailed laboratory testing of compound" or "commission new tissue analyses". The paper presents several examples where we first analyse the effect of its introduction, then present a methodology for separating "Uncertain" from binary predictions and finally, we provide arguments for its use in practice. Damjan Krstajic Simon Thomas 2017-12-04T15:10:52Z David E Leahy Ljubomir Buturovic 1711.09677 2019-02-02T15:22:02Z How to find out if an idea I want to patent is already patented by someone else - Quora 2019-02-20T11:47:40Z 2019-02-20 An Animated Reconstruction of Ancient Rome: Take A 30-Minute Stroll Through the City's Virtually-Recreated Streets | Open Culture 2019-02-25T14:50:42Z 2019-02-25 2019-02-09T00:26:21Z 2019-02-09 Google Patents Public Datasets: connecting public, paid, and private patent data | Google Cloud Blog (((ل()(ل() 'yoav)))) sur Twitter : "These explanation slides by Mike Collins on the transformer ... 2019-02-24 2019-02-24T14:11:09Z 2019-02-28T17:48:31Z W3C Workshop on Web Standardization for Graph Data 2019-02-28