A simple network to classify handwritten digits

来源：互联网发布：每天做梦知乎编辑：程序博客网时间：2024/05/21 08:42

Introduction:

this is aprogram using stochastic gradient descent and the MNIST ( MixedNational Institute of Standards and Technology ) trainingdata.

References from 《neuralnetwork and deep learning》.

The program aims torecognize the digits below.

A <wbr>simple <wbr>network <wbr>to <wbr>classify <wbr>handwritten <wbr>digits

Dependencies:

Python 2.7 with a library calledNumpy

Kali Linux 4.6.0（ Debian-derived ）

5000 images as training data

1000 images as test data

Run:

Input layer has 764 sigmoid neurons,hidden layer has 30 and output layer has 10.

The network is set running 30iterations with a mini-batch size of 10, and a learning rate of3.

The accuracy is about 95%

Code:

"""

mnist_loader

~~~~~~~~~~~~

A library to load the MNISTimage data. For details of the data

structures that are returned,see the doc strings for ``load_data``

and ``load_data_wrapper``. In practice, ``load_data_wrapper`` isthe

function usually called by ourneural network code.

"""

#### Libraries

# Standard library

import cPickle

import gzip

# Third-partylibraries

import numpy as np

def load_data():

"""Return the MNIST data as a tuple containingthe training data,

the validation data, and the testdata.

The ``training_data`` is returned as a tuplewith two entries.

The first entry contains the actual trainingimages. This is a

numpy ndarray with 50,000 entries. Each entry is, in turn, a

numpy ndarray with 784 values, representing the28 * 28 = 784

pixels in a single MNIST image.

The second entry in the ``training_data`` tupleis a numpy ndarray

containing 50,000 entries. Those entries are just the digit

values (0...9) for the corresponding imagescontained in the first

entry of the tuple.

The ``validation_data`` and ``test_data`` aresimilar, except

each contains only 10,000 images.

This is a nice data format, but for use inneural networks it's

helpful to modify the format of the``training_data`` a little.

That's done in the wrapper function``load_data_wrapper()``, see

below.

"""

f = gzip.open('../data/mnist.pkl.gz','rb')

training_data, validation_data, test_data =cPickle.load(f)

f.close()

return (training_data, validation_data,test_data)

defload_data_wrapper():

"""Return a tuple containing ``(training_data,validation_data,

test_data)``. Based on ``load_data``, but theformat is more

convenient for use in our implementation ofneural networks.

In particular, ``training_data`` is a listcontaining 50,000

2-tuples ``(x, y)``. ``x`` isa 784-dimensional numpy.ndarray

containing the input image. ``y`` is a 10-dimensional

numpy.ndarray representing the unit vectorcorresponding to the

correct digit for ``x``.

``validation_data`` and ``test_data`` are listscontaining 10,000

2-tuples ``(x, y)``. In eachcase, ``x`` is a 784-dimensional

numpy.ndarry containing the input image, and``y`` is the

corresponding classification, i.e., the digitvalues (integers)

corresponding to ``x``.

Obviously, this means we're using slightlydifferent formats for

the training data and the validation / testdata. These formats

turn out to be the most convenient for use inour neural network

code."""

tr_d, va_d, te_d = load_data()

training_inputs = [np.reshape(x, (784, 1)) for xin tr_d[0]]

training_results = [vectorized_result(y) for yin tr_d[1]]

training_data = zip(training_inputs,training_results)

validation_inputs = [np.reshape(x, (784, 1)) forx in va_d[0]]

validation_data = zip(validation_inputs,va_d[1])

test_inputs = [np.reshape(x, (784, 1)) for x inte_d[0]]

test_data = zip(test_inputs,te_d[1])

return (training_data, validation_data,test_data)

defvectorized_result(j):

"""Return a 10-dimensional unit vector with a1.0 in the jth

position and zeroes elsewhere. This is used to convert a digit

(0...9) into a corresponding desired output fromthe neural

network."""

e = np.zeros((10, 1))

e[j] = 1.0

return e

"""

network.py

~~~~~~~~~~

A module to implement the stochastic gradientdescent learning

algorithm for a feedforward neural network. Gradients are calculated

using backpropagation. Notethat I have focused on making the code

simple, easily readable, and easily modifiable. It is not optimized,

and omits many desirable features.

"""

#### Libraries

# Standard library

import random

# Third-party libraries

import numpy as np

class Network(object):

def__init__(self, sizes):

"""The list ``sizes``contains the number of neurons in the

respective layers of thenetwork. For example, if the list

was [2, 3, 1] then it wouldbe a three-layer network, with the

first layer containing 2neurons, the second layer 3 neurons,

and the third layer 1 neuron. The biases and weights for the

network are initializedrandomly, using a Gaussian

distribution with mean 0, andvariance 1. Note that the first

layer is assumed to be aninput layer, and by convention we

won't set any biases forthose neurons, since biases are only

ever used in computing theoutputs from later layers."""

self.num_layers =len(sizes)

self.sizes =sizes

self.biases =[np.random.randn(y, 1) for y in sizes[1:]]

self.weights =[np.random.randn(y, x)

for x, y in zip(sizes[:-1],sizes[1:])]

deffeedforward(self, a):

"""Return the output of thenetwork if ``a`` is input."""

for b, w in zip(self.biases,self.weights):

a = sigmoid(np.dot(w, a)+b)

return a

defSGD(self, training_data, epochs, mini_batch_size, eta,

test_data=None):

"""Train the neural networkusing mini-batch stochastic

gradient descent. The ``training_data`` is a list oftuples

``(x, y)`` representing thetraining inputs and the desired

outputs. The other non-optional parametersare

self-explanatory. If ``test_data`` is provided thenthe

network will be evaluatedagainst the test data after each

epoch, and partial progressprinted out. This is useful for

tracking progress, but slowsthings down substantially."""

if test_data: n_test =len(test_data)

n =len(training_data)

for j inxrange(epochs):

random.shuffle(training_data)

mini_batches = [

training_data[k:k+mini_batch_size]

for k inxrange(0, n, mini_batch_size)]

for mini_batch in mini_batches:

self.update_mini_batch(mini_batch, eta)

if test_data:

print"Epoch {0}: {1} / {2}".format(

j, self.evaluate(test_data),n_test)

else:

print"Epoch {0} complete".format(j)

defupdate_mini_batch(self, mini_batch, eta):

"""Update the network'sweights and biases by applying

gradient descent usingbackpropagation to a single mini batch.

The ``mini_batch`` is a listof tuples ``(x, y)``, and ``eta``

is the learningrate."""

nabla_b = [np.zeros(b.shape)for b in self.biases]

nabla_w = [np.zeros(w.shape)for w in self.weights]

for x, y inmini_batch:

delta_nabla_b, delta_nabla_w = self.backprop(x,y)

nabla_b = [nb+dnb for nb, dnb in zip(nabla_b,delta_nabla_b)]

nabla_w = [nw+dnw for nw, dnw in zip(nabla_w,delta_nabla_w)]

self.weights =[w-(eta/len(mini_batch))*nw

for w, nw in zip(self.weights,nabla_w)]

self.biases =[b-(eta/len(mini_batch))*nb

for b, nb in zip(self.biases,nabla_b)]

defbackprop(self, x, y):

"""Return a tuple ``(nabla_b,nabla_w)`` representing the

gradient for the costfunction C_x. ``nabla_b`` and

``nabla_w`` arelayer-by-layer lists of numpy arrays, similar

to ``self.biases`` and``self.weights``."""

nabla_b = [np.zeros(b.shape)for b in self.biases]

nabla_w = [np.zeros(w.shape)for w in self.weights]

# feedforward

activation = x

activations = [x] # list tostore all the activations, layer by layer

zs = [] # list to store allthe z vectors, layer by layer

for b, w in zip(self.biases,self.weights):

z = np.dot(w, activation)+b

zs.append(z)

activation = sigmoid(z)

activations.append(activation)

# backward pass

delta =self.cost_derivative(activations[-1], y) * \

sigmoid_prime(zs[-1])

nabla_b[-1] =delta

nabla_w[-1] = np.dot(delta,activations[-2].transpose())

# Note that the variable l inthe loop below is used a little

# differently to the notationin Chapter 2 of the book. Here,

# l = 1 means the last layerof neurons, l = 2 is the

# second-last layer, and soon. It's a renumbering of the

# scheme in the book, usedhere to take advantage of the fact

# that Python can usenegative indices in lists.

for l in xrange(2,self.num_layers):

z = zs[-l]

sp = sigmoid_prime(z)

delta = np.dot(self.weights[-l+1].transpose(),delta) * sp

nabla_b[-l] = delta

nabla_w[-l] = np.dot(delta,activations[-l-1].transpose())

return (nabla_b,nabla_w)

defevaluate(self, test_data):

"""Return the number of testinputs for which the neural

network outputs the correctresult. Note that the neural

network's output is assumedto be the index of whichever

neuron in the final layer hasthe highest activation."""

test_results =[(np.argmax(self.feedforward(x)), y)

for (x, y) in test_data]

return sum(int(x == y) for(x, y) in test_results)

defcost_derivative(self, output_activations, y):

"""Return the vector ofpartial derivatives \partial C_x /

\partial a for the outputactivations."""

return(output_activations-y)

#### Miscellaneous functions

def sigmoid(z):

"""Thesigmoid function."""

return1.0/(1.0+np.exp(-z))

def sigmoid_prime(z):

"""Derivative of the sigmoid function."""

returnsigmoid(z)*(1-sigmoid(z))

0 0