# Tutorial 12 â€“ Neural Networks

In this tutorial we will learn how to use neural network to solve classification and regression problems.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

sns.set()  # make plots nicer

A neural network is nothing more than a parametrized function $f(x, \Theta)$. Parameters of the function are called *weights* and denoted $\Theta$. Initially, we do not know the optimal weights and we have to find them. We will refer to the process of searching for the optimal weights as *neural network training* and it corresponds to the model fiting that we already know.

The neural network can represent a wide range of nonlinear functions. The network topology determines a family of functions that the network can represent. In this course, we will work mainly with [Feed Forward Neural Networks](https://en.wikipedia.org/wiki/Feedforward_neural_network). This type is used to solve supervised classification or regression tasks. Are are many other topologies and neural network schemes in the current research literature. The most important ones are [Convolutional Neural Networks (CNN)](https://en.wikipedia.org/wiki/Convolutional_neural_network) for image or signal processing, [Recurrent Neural Networks (RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network) for processing time series or [Generative Adversarial Networks (GANs)](https://en.wikipedia.org/wiki/Generative_adversarial_network) for generating artificial data.

## Multi Layer Perceptron

The simplest network is called Perceptron. We can see it as a type of a linear classifier. If we want to represent more complex functions (in general non-linear functions), we can stack perceptrons together to create Multi-Layer Perceptron (MLP). The MLP is also what people often mean by Neural Networks (NN). In scikit-learn, this model is implemented as a class [MLPClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html). The usage of this model is very similar to any other classification model from scikit-learn. It has many parameters, which govern both network topology and the process of training. If we are talking about the network topology, the most interesting parameter for us is the number of hidden layers.

![multi layer perceptron schema](https://www.fi.muni.cz/ib031/tutorial12/assets/FNN.png)

In the image above, we can see MLP with three hidden layers. The first two hidden layers have six neurons each. The third hidden layer has eight neurons. The number of neurons in the input layer corresponds to the number of features/attributes in the dataset. The size of the output layer corresponds to the number of classes we want to predict.

In [None]:
from sklearn import datasets
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error

### MLP Classifier
Let's try classification with MLP on MNIST dataset. The dataset contains images of handwritten digits and we have used it in tutorial 04. Scikit-learn already includes a simplified version of this dataset with 1797 samples. Each sample is 8x8 pixels in size. 

In [None]:
def show_digits(digits, labels, num_col=0, img_dim=8):
    """ display digits with labels """

    sns.set_style("whitegrid", {"axes.grid": False})
    digits = np.reshape(digits, (digits.shape[0], img_dim, img_dim))
    num_row = 1
    if num_col == 0:
        num_col = 5
    fig, axes = plt.subplots(1, num_col, figsize=(1.5 * num_col, 2 * num_row))
    for i in range(num_col):
        ax = axes[i]
        ax.imshow(digits[i], cmap="gray")
        ax.set_title("Label: {}".format(labels[i]))
        ax.set(yticklabels=[])
        ax.set(xticklabels=[])
    plt.tight_layout()
    plt.show()

In [None]:
mnist = datasets.load_digits()
mnist_X, mnist_y = mnist.data, mnist.target

digits = np.reshape(mnist_X, (1797, 8, 8))

mnist_train_X, mnist_test_X, mnist_train_y, mnist_test_y = train_test_split(
    mnist_X, mnist_y, test_size=0.2, random_state=1
)


show_digits(mnist_train_X, mnist_train_y, num_col=8)

<div class="alert alert-block alert-warning"><h5><b>Exercise 1</b></h5></div>

Create a [MLP classifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html) with **\*three hidden layers\*** with $256$, $128$ and $64$ neurons and train it on MNIST dataset. Compare the result with [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) model. Evaluate both models on the test set using **\*F1 measure\***. Use **\*42 as random state\*** for MLP model initialization. Do not forget to scale the values using [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html).

HINT:
- MPL training can take some time. To see the training progress you can set `verbose` on True.
- loss is a measure how well your network performs. Training process decreases loss to improve the network performance.

The expected results:  
MLP weighted average F1 score: $0.9917$  
SVM weighted average F1 score: $0.9833$


In [None]:
# TODO: your code goes here...
# mlp = ...
# mlp_pred_y = ...
# svm_pred_y = ...

We were able to train a neural network, that recognizes handwritten digits with a high accuracy. To increase it even more, we can tune hyper-parameters and also increase the number of training iterations (make the training a bit longer). You can try that later. Now, the performance of our model is comparable with SVM. The SVM belongs to the 30 best performing methods on the MNIST dataset. 

We can also check our results using confusion matrix or visual inspection of some of incorrectly classified digits.

In [None]:
from sklearn import metrics

sns.set_style("whitegrid", {"axes.grid": False})
disp = metrics.plot_confusion_matrix(mlp, mnist_test_X, mnist_test_y)
disp.figure_.suptitle("Confusion Matrix MLP")
digits = mnist_test_X[mlp_pred_y != mnist_test_y]
labels = mlp_pred_y[mlp_pred_y != mnist_test_y]

show_digits(digits, labels, num_col=len(labels))

It may look like our network significantly outperformed SVM. In the second exercise, we show, that it is not completely true. The neural network is initialized by random weights and when trained only for a short time, the performance of the trained network can be sensitive to the initial weights.

<div class="alert alert-block alert-warning"><h5><b>Exercise 2</b></h5></div>

Approximate the real performance of the network from exercise 1 by **\*10-fold cross validation\***. Plot the cross-validation scores as a **\*histogram\***. You can use parameter `n_jobs` to run cross-validation in parallel and make the computation faster. Do not fix the random state.

Expected results: $0.975 \pm 0.02$

In [None]:
# TODO: your code goes here...

### MLP Regressor

As the second example we will use MLP for regression (MLPRegressor class). The only difference from MLPClassifier is the real valued output instead of class label. We will try the regression on Californian Housing dataset.

In [None]:
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()

housing_X = pd.DataFrame(housing.data, columns=housing.feature_names)
housing_y = housing.target


housing_train_X, housing_test_X, housing_train_y, housing_test_y = train_test_split(
    housing_X, housing_y, test_size=0.1, random_state=1
)

print(housing.DESCR)

<div class="alert alert-block alert-warning"><h5><b>Exercise 3</b></h5></div>

Use [MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html) to predict **\*median house value\*** from California Housing Dataset. Compare the result with [LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) model. For both models, use [PowerTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html) as preprocessing step that will make the data more Gaussian. Store the predictions in variables `mlpr_pred_y`and `lr_pred_y`. Measure the models' performance using **\*RMSE\***.

TIPS:
- set `early_stopping` MLPRegressor's parameter to True to make the training shorter
- use the network topology (256, 128, 64) or try to find a better one
- RMSE measure the performance in the opposite way to F1 score that we used in previous exercises - the smaller value is the better


Expected results:  
MLPRegression RMSE: around $0.5$  
LinearRegression RMSE: $0.7135$  


In [None]:
# TODO: your code goes here...

We can visualize the results using scatter plot of expected vs. predicted values. In the ideal case, all the points would be on the green diagonal line. 

In [None]:
plt.scatter(housing_test_y, lr_pred_y, marker=".", color="red")
plt.scatter(housing_test_y, mlpr_pred_y, marker=".", color="blue")
plt.legend(["LinearRegression", "MLPRegression"], loc="upper left")

plt.xlabel("true values")
plt.ylabel("predicted values")
plt.plot([0, 5], [0, 5], "g-", linewidth=4, markersize=12)
plt.show()

Another way to visually compare the performance of two models is to show histograms of errors. In this case, we can see that red model has low number of small errors and higher variability of error values. We can deduce that blue model performs better.

In [None]:
b = plt.hist(housing_test_y - lr_pred_y, color="red", histtype="step", bins=50)
c = plt.hist(housing_test_y - mlpr_pred_y, color="blue", histtype="step", bins=50)
plt.legend(["LinearRegression", "MLPRegression"], loc="upper left")

plt.show()

## Deep Learning Libraries

Scikit learn package offers only limited support for deep learning. We need to use more advanced libraries for more control over the architecture of neural networks.

The first library is [TensorFlow](https://www.tensorflow.org/). It is symbolic math library, that supports complex graph computations. It computes operations of complex neural networks efficiently.
The second library is [Keras](https://keras.io/), it used to be a separate library but it is part of the TensorFlow as of version 2.0, It is deep Learning API and simplifies the process of building common neural networks. Both libraries are open-source. Make sure you have TensorFlow with Keras module installed before continuing. The library is available on PyPI and can be installed with pip.

You can check that TensorFlow is correctly installed by running the following imports.

In [None]:
from tensorflow import keras
import tensorflow as tf

Now we can build a neural network model using Keras. To train the syntax, we will build the same model as we used in the exercise 1. It will have $512$, $256$, and $128$ neurons in hidden layers. The number of neurons in the input and the output layer will correspond to the number of features/attributes and to the number of class labels, respectively. For training, we use full version of MNIST dataset where one image has size $28 \times 28$ pixels.

Let's load the datasets as numpy arrays and one-hot encode class labels (digits represented in images).

In [None]:
import requests
import io

from tensorflow.keras.utils import to_categorical


def load_numpy_array_from_url(url):
    response = requests.get(url)
    response.raise_for_status()
    return np.load(io.BytesIO(response.content))


train_X = load_numpy_array_from_url(
    "https://www.fi.muni.cz/ib031/datasets/train-images-idx3-ubyte.npy"
).reshape((60000, 28 * 28))
train_y = load_numpy_array_from_url(
    "https://www.fi.muni.cz/ib031/datasets/train-labels-idx1-ubyte.npy"
).reshape((60000,))
test_X = load_numpy_array_from_url(
    "https://www.fi.muni.cz/ib031/datasets/t10k-images-idx3-ubyte.npy"
).reshape((10000, 28 * 28))
test_y = load_numpy_array_from_url(
    "https://www.fi.muni.cz/ib031/datasets/t10k-labels-idx1-ubyte.npy"
).reshape((10000,))

train_y = to_categorical(train_y)
test_y = to_categorical(test_y)

print(train_y.shape)
print(test_y.shape)

The MLP models in Keras are created using `Sequential` class and the basic layers are created using `Dense` class.

<div class="alert alert-block alert-warning"><h5><b>Exercise 4</b></h5></div>

Replace all placeholder **\*None\*** values by integers to properly define a neural network model with the same architecture as MLP in exercise 1.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential()

model.add(Dense(units=None, activation='relu', input_dim=None))  # first hidden layer
model.add(Dense(units=None, activation='relu'))  # second hidden layer
model.add(Dense(units=None, activation='relu'))  # third hidden layer
model.add(Dense(units=None, activation='softmax'))  # output layer

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.summary()

Now we can train the model. Notice that the syntax is very similar to scikit learn library.

In [None]:
model.fit(train_X, train_y, epochs=5, batch_size=32)

In [None]:
pred_y = model.predict(test_X)
pred_y[0]

The variable `pred_y` now stores the predictions of the network in the form of probabilities that each image belongs to class $0$ to $9$.

<div class="alert alert-block alert-warning"><h5><b>Exercise 5</b></h5></div>

Create a new variable **\*pred_y_labels\*** with actual predicted class labels. The predicted class label for each image is the one with the highest probability. For example, for the first image the class label should be 7.

In [None]:
# TODO: pred_y_labels = your code goes here...
assert pred_y_labels[0] == 7

We can now evaluate the predictions using confusion matrix or any other metric used for classification problems. 

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

true_y_labels = np.argmax(test_y, axis=1)
print("\nConfusion Matrix")
print(confusion_matrix(true_y_labels, pred_y_labels))
print("\nClassification Report")
target_names = [str(i) for i in range(10)]
print(classification_report(true_y_labels, pred_y_labels, target_names=target_names))

We can also plot images with incorrect labeling and print the probabilities of each label.

In [None]:
def show_prediction(pred_y, test_X, test_y, index=0):
    """ function to visualize MNIST sample and label probabilities"""

    labels = list(range(10))
    fig, axs = plt.subplots(1, 2, figsize=(9, 3), sharey=False)
    img = test_X[index]
    img = img.reshape((28, 28))

    axs[0].imshow(img, cmap="gray")
    axs[0].set(yticklabels=[])
    axs[0].set(xticklabels=[])
    orig_class = np.argmax(test_y[index])
    axs[0].set_title(f"original class: {orig_class}")

    axs[1].bar(labels, pred_y[index])
    axs[1].set_ylim([0, 1])
    axs[1].set_xticks(np.arange(len(labels)))
    axs[1].set_xticklabels(labels)
    pred_class = np.argmax(pred_y[index])
    axs[1].set_title(f"predicted class: {pred_class}")
    axs[1].grid()

In [None]:
true_y_labels = np.argmax(test_y, axis=1)
incorrect = np.array(range(len(pred_y)))[pred_y_labels != true_y_labels]

for i in range(10):
    show_prediction(pred_y, test_X, test_y=test_y, index=incorrect[i])

<div class="alert alert-block alert-danger"><h5><b>(Optional) Exercise 6</b></h5></div>

Build and train a Convolutional neural network with the architecture from the figure below. Use classes [Conv2D](https://keras.io/layers/convolutional/#conv2d), [MaxPooling2D](https://keras.io/layers/pooling/#maxpooling2d), [Flatten](https://keras.io/layers/core/#flatten) and [Dense](https://keras.io/layers/core/#dense). Do not forget to set kernel sizes properly. Convolutional layers have $32$ and $64$ features. Dense layers have $3136$ and $128$ neurons. Compare the performance with your MLP network.

![CNN for MNIST classification schema](https://www.fi.muni.cz/ib031/tutorial12/assets/CNN.png)

In [None]:
# TODO: your code goes here...