Neural Networks and Deep Learning CH1
来源:互联网 发布:淘宝低价销售的危害 编辑:程序博客网 时间:2024/05/21 13:55
CHAPTER 1
Using neural nets to recognize handwritten digits
- Perceptrons
- Sigmoid neurons
- The architecture of neural networks
- A simple network to classify handwritten digits
- Learning with gradient descent
- Implementing our network to classify digits
这一章主要介绍并构建了一个朴素的神经网络来实现准确率在96%左右的手写识别。
Perceptrons
这一节首先介绍了感知机(Perceptrons),虽然现在常用的激活函数是Sigomid,在NTU课程中还学到了RELU,但如书中所说,了解Perceptrons能让我们更了解其他激活函数是如何定义的。
感知机的工作原理很简单:
将输入与边权的乘积和与threshold相比较输入01即可。
如果令
感知机的实际意义是可以做一些带权重的决断,更进一步,他可以模拟各种逻辑计算。
这里我开了一个脑洞,游戏《minecraft》中,可以用红石来模拟各种逻辑电路,于是有人基于此做出了游戏中的计算机。这么一想,我感觉最简单的感知机就已经超越了硬件方面的逻辑电路设计。延伸开来,未来的神经网络会不会替代目前硬件的电路设计?
Sigmoid neurons
这一节我觉得重点在于引入了如何训练神经网络的方向:
如果对于一个权重,其微小的改变能在output中体现出来,那么对于神经网络的训练就有方向可以进行。
而此时如果神经网络使用的激活函数为perceptrons,那么这种改变将不会在output中体现出来。实际上,在权重上一点微小的改变有时会导致output的结果完全翻转(0-1,1-0)。
也正因如此才提出了sigmoid神经元。sigmoid神经元和perceptrons很像,改进的是权重的改变能够在output中体现出来。
sigmoid函数已经很熟悉了:
output如何改变也有公式可以计算:
这个公式告诉我们output的变化是权重和偏差变化的线性函数,因此可以很简单的衡量其间的关系。
The architecture of neural networks
对于神经网络已经很熟悉了。
这一节中提到了RNN。之前并不觉得RNN很厉害,后来看了老师的分享和一些资料,了解了原来LSTM是目前最接近人脑思考方式的神经网络。
A simple network to classify handwritten digits
这一节提出了一个解决28*28像素点手写图像识别的神经网络。
我觉得有启发性的是最后提出来只用4位2进制的输出来表示十个数字:
可惜的是很难直接用3层来实现它,书中给出的原因是二进制位和手写数字的特征很难有逻辑上的对应关系。
Learning with gradient descent
这一节详细地介绍了gradient descent的来龙去脉。
首先是cost function的定义:
其中
假设有两个参数
我们要找到一种选择
因此:
注意三角形的朝向。
由上,可以有一种使得
因为:
因此可以得到一种更新参数
从二维拓展到多维也是如此更新。
回到神经网络的训练,其更新方法为:
stochastic gradient descent的更新方法为:
Implementing our network to classify digits
这一节用python定义了朴素版本的神经网络来进行手写数字识别。用的是stochastic gradient descent(SGD)。没有详细解释的backpropation会在下一章讨论。
书中的代码可以改进的地方挺多的,首先是之后章节会提出的优化,这里不说了。
其次是,代码中的for循环完全可以使用矩阵运算替代。
从中学到一点是xrange的使用,在大量使用循环时,xrange比range更省时间和空间。具体原因是xrange不会把枚举的数生成list而range会。
代码:
"""network.py~~~~~~~~~~A module to implement the stochastic gradient descent learningalgorithm for a feedforward neural network. Gradients are calculatedusing backpropagation. Note that I have focused on making the codesimple, easily readable, and easily modifiable. It is not optimized,and omits many desirable features."""#### Libraries# Standard libraryimport random# Third-party librariesimport numpy as npclass Network(object): def __init__(self, sizes): """The list ``sizes`` contains the number of neurons in the respective layers of the network. For example, if the list was [2, 3, 1] then it would be a three-layer network, with the first layer containing 2 neurons, the second layer 3 neurons, and the third layer 1 neuron. The biases and weights for the network are initialized randomly, using a Gaussian distribution with mean 0, and variance 1. Note that the first layer is assumed to be an input layer, and by convention we won't set any biases for those neurons, since biases are only ever used in computing the outputs from later layers.""" self.num_layers = len(sizes) self.sizes = sizes self.biases = [np.random.randn(y, 1) for y in sizes[1:]] self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])] def feedforward(self, a): """Return the output of the network if ``a`` is input.""" for b, w in zip(self.biases, self.weights): a = sigmoid(np.dot(w, a)+b) return a def SGD(self, training_data, epochs, mini_batch_size, eta, test_data=None): """Train the neural network using mini-batch stochastic gradient descent. The ``training_data`` is a list of tuples ``(x, y)`` representing the training inputs and the desired outputs. The other non-optional parameters are self-explanatory. If ``test_data`` is provided then the network will be evaluated against the test data after each epoch, and partial progress printed out. This is useful for tracking progress, but slows things down substantially.""" if test_data: n_test = len(test_data) n = len(training_data) for j in xrange(epochs): random.shuffle(training_data) mini_batches = [ training_data[k:k+mini_batch_size] for k in xrange(0, n, mini_batch_size)] for mini_batch in mini_batches: self.update_mini_batch(mini_batch, eta) if test_data: print "Epoch {0}: {1} / {2}".format( j, self.evaluate(test_data), n_test) else: print "Epoch {0} complete".format(j) def update_mini_batch(self, mini_batch, eta): """Update the network's weights and biases by applying gradient descent using backpropagation to a single mini batch. The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta`` is the learning rate.""" nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] for x, y in mini_batch: delta_nabla_b, delta_nabla_w = self.backprop(x, y) nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] self.weights = [w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)] def backprop(self, x, y): """Return a tuple ``(nabla_b, nabla_w)`` representing the gradient for the cost function C_x. ``nabla_b`` and ``nabla_w`` are layer-by-layer lists of numpy arrays, similar to ``self.biases`` and ``self.weights``.""" nabla_b = [np.zeros(b.shape) for b in self.biases] nabla_w = [np.zeros(w.shape) for w in self.weights] # feedforward activation = x activations = [x] # list to store all the activations, layer by layer zs = [] # list to store all the z vectors, layer by layer for b, w in zip(self.biases, self.weights): z = np.dot(w, activation)+b zs.append(z) activation = sigmoid(z) activations.append(activation) # backward pass delta = self.cost_derivative(activations[-1], y) * \ sigmoid_prime(zs[-1]) nabla_b[-1] = delta nabla_w[-1] = np.dot(delta, activations[-2].transpose()) # Note that the variable l in the loop below is used a little # differently to the notation in Chapter 2 of the book. Here, # l = 1 means the last layer of neurons, l = 2 is the # second-last layer, and so on. It's a renumbering of the # scheme in the book, used here to take advantage of the fact # that Python can use negative indices in lists. for l in xrange(2, self.num_layers): z = zs[-l] sp = sigmoid_prime(z) delta = np.dot(self.weights[-l+1].transpose(), delta) * sp nabla_b[-l] = delta nabla_w[-l] = np.dot(delta, activations[-l-1].transpose()) return (nabla_b, nabla_w) def evaluate(self, test_data): """Return the number of test inputs for which the neural network outputs the correct result. Note that the neural network's output is assumed to be the index of whichever neuron in the final layer has the highest activation.""" test_results = [(np.argmax(self.feedforward(x)), y) for (x, y) in test_data] return sum(int(x == y) for (x, y) in test_results) def cost_derivative(self, output_activations, y): """Return the vector of partial derivatives \partial C_x / \partial a for the output activations.""" return (output_activations-y)#### Miscellaneous functionsdef sigmoid(z): """The sigmoid function.""" return 1.0/(1.0+np.exp(-z))def sigmoid_prime(z): """Derivative of the sigmoid function.""" return sigmoid(z)*(1-sigmoid(z))
这里还做了几个实验,横向比较了学习率对结果的影响;纵向比较了随机猜测,黑色占比猜测,svm的正确率。
数据的读取:
"""mnist_loader~~~~~~~~~~~~A library to load the MNIST image data. For details of the datastructures that are returned, see the doc strings for ``load_data``and ``load_data_wrapper``. In practice, ``load_data_wrapper`` is thefunction usually called by our neural network code."""#### Libraries# Standard libraryimport cPickleimport gzip# Third-party librariesimport numpy as npdef load_data(): """Return the MNIST data as a tuple containing the training data, the validation data, and the test data. The ``training_data`` is returned as a tuple with two entries. The first entry contains the actual training images. This is a numpy ndarray with 50,000 entries. Each entry is, in turn, a numpy ndarray with 784 values, representing the 28 * 28 = 784 pixels in a single MNIST image. The second entry in the ``training_data`` tuple is a numpy ndarray containing 50,000 entries. Those entries are just the digit values (0...9) for the corresponding images contained in the first entry of the tuple. The ``validation_data`` and ``test_data`` are similar, except each contains only 10,000 images. This is a nice data format, but for use in neural networks it's helpful to modify the format of the ``training_data`` a little. That's done in the wrapper function ``load_data_wrapper()``, see below. """ f = gzip.open('../data/mnist.pkl.gz', 'rb') training_data, validation_data, test_data = cPickle.load(f) f.close() return (training_data, validation_data, test_data)def load_data_wrapper(): """Return a tuple containing ``(training_data, validation_data, test_data)``. Based on ``load_data``, but the format is more convenient for use in our implementation of neural networks. In particular, ``training_data`` is a list containing 50,000 2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray containing the input image. ``y`` is a 10-dimensional numpy.ndarray representing the unit vector corresponding to the correct digit for ``x``. ``validation_data`` and ``test_data`` are lists containing 10,000 2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional numpy.ndarry containing the input image, and ``y`` is the corresponding classification, i.e., the digit values (integers) corresponding to ``x``. Obviously, this means we're using slightly different formats for the training data and the validation / test data. These formats turn out to be the most convenient for use in our neural network code.""" tr_d, va_d, te_d = load_data() training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]] training_results = [vectorized_result(y) for y in tr_d[1]] training_data = zip(training_inputs, training_results) validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]] validation_data = zip(validation_inputs, va_d[1]) test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]] test_data = zip(test_inputs, te_d[1]) return (training_data, validation_data, test_data)def vectorized_result(j): """Return a 10-dimensional unit vector with a 1.0 in the jth position and zeroes elsewhere. This is used to convert a digit (0...9) into a corresponding desired output from the neural network.""" e = np.zeros((10, 1)) e[j] = 1.0 return e
电子书URL:http://neuralnetworksanddeeplearning.com/chap1.html#the_architecture_of_neural_networks
- Neural Networks and Deep Learning CH1
- Neural Networks and Deep Learning学习笔记ch1 - 神经网络
- 读书笔记--Neural Networks and Deep Learning(CH1)
- Neural Networks and Deep Learning
- Neural Networks and Deep Learning
- Neural networks and Deep Learning
- Neural Networks and Deep Learning
- 《Neural networks and deep learning》概览
- 《Neural networks and deep learning》概览
- 《Neural networks and deep learning》概览
- Neural Networks And Deep Learning(1)
- Neural networks and deep learning materials
- neural-networks-and-deep-learning mnist-loader
- neural-networks-and-deep-learning mnist_pca.py
- neural-networks-and-deep-learning expand_mnist.py
- neural-networks-and-deep-learning mnist_average_darkness.py
- neural-networks-and-deep-learning mnist_svm.py
- neural-networks-and-deep-learning network.py
- Sealed Class Hierarchies
- 自然语言处理常用数据集
- 关于select2的使用——解决点击瞬间下拉框消失的问题
- 从输入URL到页面加载发生了什么
- 使用python爬虫爬取百度手机助手网站中app的数据
- Neural Networks and Deep Learning CH1
- Android中引用资源的方法总结
- 对极几何基本概念
- codeforces 373div1 Sasha and Array 矩阵+线段树
- 计算机网络概述
- dispatcherServlet,HandlerMapping,HandlerAdapter
- 【机器学习】Learning to Rank之Ranking SVM 简介
- Codeforces Round #340 (Div. 2)-C. Watering Flowers
- 从1开始学习java一个月后总结心得