DEEPLEARNING M.Lukac WHY NOW? GPU! Good data, good annotated data(ImageNet). Some great simple ideas. Most of the techniquesare old 20-30 years. 99% is matrix multiplications. NEURAL NETWORKS Beginning in 50s and 60s. Biologically inspired by brain and neurons. Boom every ten years... State of the art for multimedia data processing! PROBLEMS? NEURON Perceptron Output is y=f (Wx+b) f is activation function Input x = [x0=1.0, x1, x2 ...., xn] Weights W = [w0, w1, w1, ....., wn] ACTIVATION FUNCTION Step Function TanH Sigmoid SIMPLE IMPLEMENTATION NEURAL NETWORK SIMPLE IMPLEMENTATION PLAYGROUND.TENSORFLOW.ORG OVERFITTING? HOW TO LEARN WEIGHTS? #VERY SIMPLIFIED Weights are learned with Backpropagation algorithm(gradient learning). Update weights(alpha is learning rate like 0.01): Objective function for bin. Classification[0,1]: Binary Cross Entropy Gradient for last layer: You need to propagate the error through the network. "gradient points in the direction of the greatest rate of increase of the function, components are p. derivatives" TRAINING Monitor the objective function: It should decrease over time. Play with learning rate alpha=[0.1, 0.001, 0.05, ...] Train with mini batch of samples. Normalizeyour data to <0,1>, ... SOFTMAX LAYER A softmax layer takes the activations and divides each of them by the sum of all activations, thereby forcing the outputs of the layer to take the form of probability distribution (sum to 1). OVERFITTING DROPOUT LAYER Regularization technique. Active only during training. With some probability set output of unit to zero. GRADIENT PROBLEM Vanishing gradient : gradients are smaller in every next layer Exploding gradient : gradients are larger in every next layer Result: Unable to learn deeper model(lower layers) Why? Weights and activation functions squeeze gradients. Understanding the difficulty of training deep feedforward neural networks [X.Glorot, 2010] when backpropagating error WHY NOW? SOLVING GP #2 Rectified Linear Unit(ReLU) as activation function. Intelligent Initialization of Weights at beginning of training. It doesn't solve the problem, It just minified the problem. ReLU: f(x) = max(0,x) HOW TO INITIALIZE WEIGHTS? 1. Random uniform from [-e, e] 2. Gaussian distribution 3. Xavier initialization 4. Pretraining with RBM models AUTOENCODER Non-Linear dimensionality reduction. Encoder and Decoder part. After training throw away Decoder part. It can work better than PCA. Training ends often in local optimum... compression Reducing the Dimensionality of Data with Neural Networks[Science, 2006, Hinton] AUTOENCODER PRETRAINED BY RBM https://github.com/Cospel/rbm-ae-tf AUTOENCODER VS PCA AUTOENCODER VS PCA Original Input Autoencoder PCA CONVOLUTIONAL NN Stack of Convolution, Pooling, ReLU, Fully Connected Layers. State of the art in computer vision. Convolutional Layer: Weights Sharing, Local Connectivity It is impractical to connect neurons to all neurons in the previous volume. INPUT, CONV, POOLING, RELU LAYERS LEARNED CNN FEATURES POOLING LAYER Subsampling the image. Smaller outputs = faster learning 1. AlexNet 2. VGG 3. ResNet 4. SqueezeNet 5. GoogleNet 6. ... MANY CNN ARCHITECTURES [ImageNet Classification with Deep Convolutional Neural Networks, 2012 G.Hinton, A. Krizhevsky] TRANSFER LEARNING 1. Use existing Weights or entire NN to finetune on new(similar domain) data 2. Use CNN descriptors for algorithms as KNN, SVM, … Many pretrained models(weights) are available to download on github: VGG with face descriptors Models for places …. RECURRENT NN Good for timeseries data, nlp, video sequences, ... Backpropagation through time … It has internal hidden state.(memory for sequence) RECURRENT NN SIMPLE IMPLEMENTATION FRAMEWORKS KERAS FEEDFORWARD NET KERAS RECURRENT NET KERAS CONVOLUTIONAL NETWORK THANK YOU...