the natural architecture of NN to deal with sequences.
NN where **connections between units form a directed cycle**. This creates an **internal state of the network** which allows it to exhibit **dynamic temporal behavior**. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition or speech recognition.
2 broad classes: finite impulse and infinite impulse (a finite impulse RNN can be unrolled and replaced with a strictly feedforward neural network)
RNNs suffer from the **vanishing gradient problem** that prevents
them from learning long-range dependencies. [#LSTMs](/tag/lstm_networks) improve upon this
by using a gating mechanism that allows for explicit memory deletes and
RNN in NLP:
- Goal: reprenting a sequence of words as dense vectors
- input: seq of words (or chars)
- ouput: a seq of hidden states with each a representation of the seq from the beginning to a specific posiition
- advantages: encoding sequential relationships and dependency among words
How does one apply deep learning to time series forecasting? - Quora(About) > I would use the state-of-the-art [recurrent nets](/tag/recurrent_neural_network.html) (using gated units and multiple layers) to make predictions at each time step for some future horizon of interest. The RNN is then updated with the next observation to be ready for making the next prediction
Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models | Blog | Explosion AI(About) > A four-step strategy for deep learning with text
> Word embeddings let you treat individual words as related units of meaning, rather than entirely distinct IDs. However, most NLP problems require understanding of longer spans of text, not just individual words. There's now a simple and flexible solution that is achieving excellent performance on a wide range of problems. After embedding the text into a sequence of vectors, bidirectional RNNs are used to encode the vectors into a sentence matrix. The rows of this matrix can be understood as token vectors — they are sensitive to the sentential context of the token. The final piece of the puzzle is called an attention mechanism. This lets you reduce the sentence matrix down to a sentence vector, ready for prediction.
Recurrent Memory Network for Language Modeling (2016)(About) Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge.
In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data.
We demonstrate the power of RMN on language modeling and sentence completion tasks.
On language modeling, RMN outperforms Long Short-Term Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform in-depth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous state-of-the-art by a large margin.
Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs | WildML(About) The idea behind RNNs is to make use of sequential information. In a traditional neural network we assume that all inputs (and outputs) are independent of each other. But for many tasks that’s a very bad idea. If you want to predict the next word in a sentence you better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps