This notebook is supplemental to Lecture 1 of the video series "Introduction to Neural Networks". These lectures are adapted from Michael Nielsen's free online book "Neural Networks and Deep Learning".
The video lecture can be access at here.
from mnist_loader import load_data_wrapper
import numpy as np
import matplotlib.pyplot as plt
training_data, validation_data, test_data = load_data_wrapper()
len(training_data)
img1 = training_data[0][0] # array of pixels
lb1 = training_data[0][1] # one-hot vector label
# print out the shape of img1, lb1
print(img1.shape)
print(lb1.shape)
# or equivalently, unpacking the 2-tuple (image, label)
# img1, lb1 = training_data[0]
Both the image and its label are rank 2 numpy arrays of shape (781,1) and (10,1), respectively. A label is a one-hot encoding of the digit.
print(lb1)
The function plot_mnist_digit below draw an MNIST image using the matplotlib library.
def plot_image(image):
""" Plot a single MNIST image.
Argument image is a numpy.ndarray of shape (784,1)
"""
image = image.reshape(28,28)
fig, axes = plt.subplots()
axes.matshow(image, cmap=plt.cm.binary)
plt.show()
plot_image(img1)
def plot_images(images):
""" Plot a list of MNIST images.
Argument images is a list of (image, label) tuples.
"""
fig, axes = plt.subplots(nrows=1, ncols=len(images))
for j, ax in enumerate(axes):
ax.matshow(images[j][0].reshape(28,28), cmap = plt.cm.binary)
ax.set_xticks([])
ax.set_yticks([])
plt.show()
plot_images(training_data[0:10])
def sigmoid(x):
"""Returns the output of the sigmoid or logistic function."""
return 1/(1+np.exp(-x))
Given a vector $\vec{x}\in\mathbb{R}^n$, the sigmoid function $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ can be extended to a vector-valued function $\sigma:\mathbb{R}^n\rightarrow\mathbb{R}^n$ by applying $\sigma$ elementwise. That is, if $$\vec{x}=\left[ \begin{array}{cccc} x_{1} \\ x_{2} \\ \vdots \\ x_{m} \end{array} \right]$$ then $$\sigma(\vec{x})=\left[ \begin{array}{cccc} \sigma(x_{1}) \\ \sigma(x_{2}) \\ \vdots \\ \sigma(x_{m}) \end{array} \right].$$
Similarly, $\sigma$ can be applied to a $m\times n$ matrix elementwise.
For example, if $$\vec{x}=\left[ \begin{array}{cccc} 1 \\ 2 \\ 3 \end{array} \right]$$ then $$\sigma(\vec{x})=\left[ \begin{array}{cccc} \sigma(1) \\ \sigma(2) \\ \sigma(3) \end{array} \right]\approx\left[ \begin{array}{cccc} 0.73 \\ 0.88 \\ 0.95 \end{array} \right]$$
x = np.array([[1],[2],[3]])
sigmoid(x)
Define $f_1(\vec{x})=W_1\vec{x}+\vec{b}_1$ and $f_2(\vec{x})=W_2\vec{x}+\vec{b}_2$ for some $W_1, W_2, \vec{b}_1, \text{and } \vec{b}_2.$
Consider the classifier or score function $f=\sigma\circ f_2\circ\sigma\circ f_1:\mathbb{R}^{784}\rightarrow\mathbb{R}^{10}.$ This is a two-layer neural network. The score function takes a flattened MNIST image of shape (784,1)
and output a one-hot vector of shape (10,1)
. The class with the highest score is the label predicted by the classifier.
The training a neural network amounts to producing a set parameters $W_1, W_2, \vec{b}_1, \text{and } \vec{b}_1$ whose score function $f(x; W_1, W_2, \vec{b}_1, \vec{b}_2)$ can accurately classify unseen images.
with open("parameters.npy", mode="rb") as r:
parameters = np.load(r)
W1, B1, W2, B2 = parameters
def f(x, W1, W2, B1, B2):
"""Return the output of the network if ``x`` is input image and
W1, W2, B1 and B2 are the learnable parameters or weights. """
Z1 = np.dot(W1, x) + B1
A1 = sigmoid(Z1)
Z2 = np.dot(W2, A1) + B2
A2 = sigmoid(Z2)
return A2
f(training_data[0][0],W1,W2,B1,B2)
np.argmax(f(training_data[10][0],W1,W2,B1,B2))
plot_image(training_data[10][0])
def predict(images, W1, W2, B1, B2):
"""Return the predictions for a list of images given the parameters.
"""
predictions = [] # empty list
for im in images:
a = f(im[0], W1, W2, B1, B2)
predictions.append(np.argmax(a)) # add prediction to predictions list
return predictions
predict(training_data[0:10],W1,W2,B1,B2)
plot_images(training_data[0:10])