Recurrent Neural Networks (RNN) hidden states input sequence (any length) … … … Core idea: Apply the same weights 𝑊 repeatedlyA family of neural architectures 43 outputs (optional) A Simple RNN Language Model the students opened theirwords / one-hot vectors books laptops word embeddings a zoo output distribution Note: this input sequence could be much longer now! hidden states is the initial hidden state 44 RNN Language Models the students opened their books laptops a zoo RNN Advantages: • Can process any length input • Computation for step t can (in theory) use information from many steps back • Model size doesn’t increase for longer input context • Same weights applied on every timestep, so there is symmetry in how inputs are processed. RNN Disadvantages: • Recurrent computation is slow • In practice, difficult to access information from many steps back More on these later in the course 45 Training an RNN Language Model • Get a big corpus of text which is a sequence of words • Feed into RNN-LM; compute output distribution for every step t. • i.e. predict probability dist of every word, given words so far • Loss function on step t is cross-entropy between predicted probability distribution , and the true next word (one-hot for ): • Average this to get overall loss for entire training set: 46 Training an RNN Language Model = negative log prob of “students” the students opened their … examsCorpus Loss … 47 Predicted prob dists Training an RNN Language Model the students opened their … examsCorpus Loss … 48 Predicted prob dists = negative log prob of “opened” Training an RNN Language Model the students opened their … examsCorpus Loss … 49 Predicted prob dists = negative log prob of “their” Training an RNN Language Model the students opened their … examsCorpus Loss … 50 Predicted prob dists = negative log prob of “exams” Training an RNN Language Model + + + + … = the students opened their … exams … 51 Corpus Loss Predicted prob dists “Teacher forcing” Training a RNN Language Model • However: Computing loss and gradients across entire corpus is too expensive! • In practice, consider as a sentence (or a document) • Recall: Stochastic Gradient Descent allows us to compute loss and gradients for small chunk of data, and update. • Compute loss for a sentence (actually, a batch of sentences), compute gradients and update weights. Repeat. 52 Backpropagation for RNNs …… Question: How do we calculate this? Answer: Backpropagate over timesteps i=t,…,0, summing gradients as you go. This algorithm is called “backpropagation through time” [Werbos, P.G., 1988, Neural Networks 1, and others] 56 Generating text with a RNN Language Model Just like a n-gram Language Model, you can use a RNN Language Model to generate text by repeated sampling. Sampled output becomes next step’s input. my favorite season is … sample favorite sample season sample is sample spring spring57 Generating text with an RNN Language Model Let’s have some fun! • You can train an RNN-LM on any kind of text, then generate text in that style. • RNN-LM trained on Obama speeches: Source: https://medium.com/@samim/obama-rnn-machine-generated-political-speeches-c8abd18a2ea0 58 Generating text with an RNN Language Model Let’s have some fun! • You can train an RNN-LM on any kind of text, then generate text in that style. • RNN-LM trained on Harry Potter: Source: https://medium.com/deep-writing/harry-potter-written-by-artificial-intelligence-8a9431803da6 59 Generating text with an RNN Language Model Let’s have some fun! • You can train an RNN-LM on any kind of text, then generate text in that style. • RNN-LM trained on recipes: Source: https://gist.github.com/nylki/1efbaa36635956d35bcc 60 Generating text with a RNN Language Model 61 Let’s have some fun! • You can train a RNN-LM on any kind of text, then generate text in that style. • RNN-LM trained on paint color names: Source: http://aiweirdness.com/post/160776374467/new-paint-colors-invented-by-neural-network This is an example of a character-level RNN-LM (predicts what character comes next)