4.1 The Model
The Recurrent Neural network is basically the generalization of the
Feed-forward neural networks which is in general used to create
the sequences. Let us assume that the inputs (x1,x2,. . . ,xn) are
given and I am using RNN to computer the sequence of outputs i.e
(y1,y2,y3,. . . ,yn) by iterating the following equations.
ht = si?m(W hx xt +W hhht?1) (1)
yt = W yh
ht
(2)
The RNNs are used for sequence to sequence mapping , however
it is not clear how to use RNNs where the input sequence and
the output sequence are not identical i.e the input and the output
sequences have different lengths. In general sequencing the input
layer is mapped with the fixed size Vector using one RNN and then
map this vector to the target sequence with another RNN . So, as
the information is provided it would be difficult to train the RNN
network due to long term dependencies. However the Long Tern
Short Term Memory (LSTM) is known to learn the problem of long
range temporal dependencies.
The goal of RNN is to calculate the estimated probability
p(y1,y2,. . . ,yt
‘| x1,x2,. . . ,xt
) , where the input sequence is the
x1,x2,. . . ,xt and the output sequence is y1,y2,. . . ,yt
‘, and t and
t
? may differ . The RNN calculates the conditional probability by
computing the fixed dimensional representation ? v
? of inputs
sequence ( x1,x2,. . . ,xt
) which is given by the last hidden state of
the RNN and then computing the probability y1,y2,. . . ,yt
‘with the
standard LSTM formulation whose initial hidden state is set to
2
Figure 1: Overview of the Recurrent Neural Network
Source:https://leonardoaraujosantos.gitbooks.io/artificialinteligence/content/recurrent_neural_networks.html
Figure 2: Unfolding of Recurrent Neural Network 4
representation ? v
? of x1,x2,. . . ,xt
.
p(y1,y2, . . . ,yt
?
|x1, x2, . . . , xt ) =
tÖ?
t=1
p(yt
|v,y1, . . . ,yt?1) (3)
Each p(y1,y2,. . . ,yt
‘| x1,x2,. . . ,xt
) distribution is represented with
a softmax layer over all the words in the vocabulary. The actual
RNN model uses two different RNNs one for the input layer and
another one for the output layers because it will increase the no. of
computational parameters at the negligible computational cost and
helps to train the RNN with different languages . RNNs are also
valuable for reversing the order of the words of the input sequence
. So, for example instead of mapping the input sequence a,b,c with
the sequence of x,y,z the RNN maps the c,b,a with x,y,z where x,y,z is
the translation of a,b,c. The RNN model reads the input sequence in
the reverse order because this makes the optimization much easier.
The main property of RNN is that it converts the input sentence of
variable length into the fixed vector representation which can be
defined as the translation. The translation tends to be the paraphrase
of the input sequence , the translation objective encourages the
RNN to find the sentence representation that capture their meaning
as sentence with similar meaning are close to one another while
the sentences with different meanings are quite far from each other.
The Recurrent Neural network can be explained with the diagrams
given above (Figure 1) shows the overview of the recurrent
neural networks i.e how the hidden layer is dependent on
te previous hidden layer and the second figure (Figure 2) shows
the unfolding of the network i.e how the process is ca