# 4.1 So, as the information is provided it

4.1 The Model

The Recurrent Neural network is basically the generalization of the

Feed-forward neural networks which is in general used to create

the sequences. Let us assume that the inputs (x1,x2,. . . ,xn) are

given and I am using RNN to computer the sequence of outputs i.e

(y1,y2,y3,. . . ,yn) by iterating the following equations.

ht = si?m(W hx xt +W hhht?1) (1)

yt = W yh

ht

(2)

The RNNs are used for sequence to sequence mapping , however

it is not clear how to use RNNs where the input sequence and

the output sequence are not identical i.e the input and the output

sequences have different lengths. In general sequencing the input

layer is mapped with the fixed size Vector using one RNN and then

map this vector to the target sequence with another RNN . So, as

the information is provided it would be difficult to train the RNN

network due to long term dependencies. However the Long Tern

Short Term Memory (LSTM) is known to learn the problem of long

range temporal dependencies.

The goal of RNN is to calculate the estimated probability

p(y1,y2,. . . ,yt

‘| x1,x2,. . . ,xt

) , where the input sequence is the

x1,x2,. . . ,xt and the output sequence is y1,y2,. . . ,yt

‘, and t and

t

? may differ . The RNN calculates the conditional probability by

computing the fixed dimensional representation ? v

? of inputs

sequence ( x1,x2,. . . ,xt

) which is given by the last hidden state of

the RNN and then computing the probability y1,y2,. . . ,yt

‘with the

standard LSTM formulation whose initial hidden state is set to

2

Figure 1: Overview of the Recurrent Neural Network

Source:https://leonardoaraujosantos.gitbooks.io/artificialinteligence/content/recurrent_neural_networks.html

Figure 2: Unfolding of Recurrent Neural Network 4

representation ? v

? of x1,x2,. . . ,xt

.

p(y1,y2, . . . ,yt

?

|x1, x2, . . . , xt ) =

tÖ?

t=1

p(yt

|v,y1, . . . ,yt?1) (3)

Each p(y1,y2,. . . ,yt

‘| x1,x2,. . . ,xt

) distribution is represented with

a softmax layer over all the words in the vocabulary. The actual

RNN model uses two different RNNs one for the input layer and

another one for the output layers because it will increase the no. of

computational parameters at the negligible computational cost and

helps to train the RNN with different languages . RNNs are also

valuable for reversing the order of the words of the input sequence

. So, for example instead of mapping the input sequence a,b,c with

the sequence of x,y,z the RNN maps the c,b,a with x,y,z where x,y,z is

the translation of a,b,c. The RNN model reads the input sequence in

the reverse order because this makes the optimization much easier.

The main property of RNN is that it converts the input sentence of

variable length into the fixed vector representation which can be

defined as the translation. The translation tends to be the paraphrase

of the input sequence , the translation objective encourages the

RNN to find the sentence representation that capture their meaning

as sentence with similar meaning are close to one another while

the sentences with different meanings are quite far from each other.

The Recurrent Neural network can be explained with the diagrams

given above (Figure 1) shows the overview of the recurrent

neural networks i.e how the hidden layer is dependent on

te previous hidden layer and the second figure (Figure 2) shows

the unfolding of the network i.e how the process is ca