(Primitive) Mathematical Model of Neuron Formal neuron 2 Formal neuron ► xi,..., xn real inputs y ► xq special input, always 1 2 Formal neuron y ► xi,..., xn real inputs ► xo special input, always 1 ► i/i/q, 1/1/1,..., wn real weights 2 ► xi,..., xn real inputs ► xo special input, always 1 ► wo, wi,..., wn real weights ► £ = 1/1/0 + J^/Li w/x; /Viner potential] In general, other potentials are considered (e.g. Gaussian), more on this in PV021. wn 2 Formal neuron x0 = 1 y ► xi,..., xn real inputs ► xo special input, always 1 ► 1/1/0, i/i/i,..., wn real weights ► £ = wo + S/Li w/x; //i/ier potential; In general, other potentials are considered (e.g. Gaussian), more on this in PV021. ► y output defined by y = cr(£) where a is an activation function. Sigmoid Functions
Multilayer Perceptron (MLP)
Output
Hidden
Input
► Neurons are organized in layers (input layer, output layer, possibly several hidden layers)
► Layers are numbered from 0; the input is 0-th
► Neurons in the £-th layer are connected with all neurons in the £ + 1-th layer
Intuition: The network computes a function as follows: Assign input values to the input neurons and 0 to the rest. Proceed upwards through the layers, one layer per step. In the £-th step consider output values of neurons in £ — 1-th layer as inputs to neurons of the £-th layer. Compute output values of neurons in the £-th layer. Example
Classical Example - ALVINN
Sharp Left Straight Ahead Sharp Right
4 Hidden Units
30 Output Units
30x32 Sensor Input Retina
► One of the first autonomous car driving systems (in 90s)
► ALVINN drives a car
► The net has 30 x 32 = 960 input neurons (the input space is IR960).
► The value of each input captures the shade of gray of the corresponding pixel. Classical Example - ALVINN
Sharp Left Straight Ahead Sharp Right
4 Hidden Units
30 Output Units
30x32 Sensor Input Retina
► One of the first autonomous car driving systems (in 90s)
► ALVINN drives a car
► The net has 30 x 32 = 960 input neurons (the input space is IR960).
► The value of each input captures the shade of gray of the corresponding pixel.
► Output neurons indicate where to turn (to the center of gravity).
Source: http://jmvidal.cse.sc.edu/talks/ann/alvin.html P1 : -1 + 2x1 + 2x2 = 0
P2 : 3 - 2x1 - 2x2 = 0
The output neuron performs an intersection of half-spaces. ImageNet classification with deep convolutional neural networks, by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton (2012).
Trained on two GPUs (NVIDIA GeForce GTX 580)
Results:
► Accuracy 84.7% in top-5 (second best alg. at the time: 73.8%)
► 63.3% in "perfect" classification (top-1) Then I organized a labeling party of intense labeling effort only among the (expert labelers) in our lab. Then I developed a modified interface that used GoogLeNet predictions to prune the number of categories from 1000 to only about 100. It was still too hard - people kept missing categories and getting up to ranges of 13-15% error rates. In the end I realized that to get anywhere competitively close to GoogLeNet, it was most efficient if I sat down and went through the painfully long training process and the subsequent careful annotation process myself... The labeling happened at a rate of about 1 per minute, but this decreased over time... Some images are easily recognized, while some images (such as those of fine-grained breeds of dogs, birds, or monkeys) can require multiple minutes of concentrated effort. I became very good at identifying breeds of dogs... Based on the sample of images I worked on, the GoogLeNet classification error turned out to be 6.8%... ILSVRC 2015
► Microsoft network ResNet: 152 layers, complex architecture
► Trained on 8 GPUs
► 96.43% accuracy in top-5 Deeper Insight into the Logistic Sigmoid
So if we use the logistic sigmoid as an activation function, and turn the neuron into a classifier as follows:
classify a given input x as 1 iff y > 1/2
Then the neuron basically works as the Bayes classifier!
This is the basis of logistic regression.
Given training data, we may compute the weights w that maximize the likelihood of the training data (w.r.t. the probabilities returned by the neuron). An extremely interesting observation is that such w maximizing the likelihood coincides with the minimum of least squares for the corresponding linear function (that is the same neuron but with identity as the activation function).