人工神经网络

来源：互联网发布：plc编程入门电子书编辑：程序博客网时间：2024/04/29 18:07

一、神经网络的模型：

图1 两层全连接神经网络模型

这个是一个带有两个全连接层的神经网络，神经网络，一般不把输入层算在层数之中。

1、神经元：

图2 神经元的数学模型

从单个神经元来看，每个神经元可以看做是一个感知机，可以用来做决策，从图中可以看出，根据输入的线性组合，经过函数f来预测，比如Sigmoid函数，当输出值大于0.5的时候可以判定为正类，当输出值小于0.5的时候，可以判定为负类。这像什么？逻辑回归！举个神经元的例子：

图3 神经元示例

2、激活函数

激活函数有很多种类，比如，线性函数、Sigmoid函数、tanh函数、ReLU函数等，本文只列举三个类型的激活函数：

图4 Sigmoid函数和tanh函数

Sigmoid函数和tanh函数，有一个共同的问题就是，当数据很大的时候，就是图中两个所指的方向，在反向传播的过程中，会导致梯度消失。神经元的激活值在0或者1附近的时候，梯度几乎为零，那么在反向传播的时候梯度就会被“杀死”。其实很好理解，在两个所指方向上，函数越来越趋于水平，因此在求导的时候，所得梯度为0或者很小---近似于0，这样在反向传播的时候，梯度就为零了（或者近似于0），因此梯度就消失了，权重就无法更新，因此这两个函数在神经网络中不在常用。但是相比较而言tanh比Sigmoid函数要更受欢迎，因为tanh的数据输出是中心化的。

图5 ReLU函数

ReLU的优缺点图中已经的说明，但是对于缺点，我的理解是：当有一个特别大的梯度经过的时候，权重会被更新为一个特别小的负数（比如负无穷），那么当再次经过该神经元的时候，该神经元的输入值为负数，那么ReLU输出值就是0了，但是依旧会有梯度来更新对应权重，但是更新的值不足以让该神经元的输入值为正（就好比一个负无穷加上一个整数N，它依旧是一个很小的数），那么这样就会导致该神经元的不可逆转的失活。

那么到此可以对神经网络有一个理解：可以认为神经网络定义了一个由一系列的函数组成的函数族，网络的权重就是函数中的参数，神经经网络可以近似任何函数。（这个是被证明了的）。

二、前向传播和反向传播：

前向传播很简单，如下一个例子：

图6 神经网络示例

蓝色数字代表对应权重和偏值的数值，我们的目标是让这神经网络输出值为0.5或者非常接近0.5.

计算隐藏层神经元h1的输入，0.15 * 0.05 + 0.2 * 0.1 + 0.35 = 0.38，h1的输出：f(0.38) = 0.59，其中f为Sigmoid函数，同理可以计算出神经元h2的输入和输出。于是有下图：

图7 前向传播示例

由于输出值为0.63，不是我们想要的0.5，最起码不太接近0.5，那么我们就要更新权重和偏值，训练神经网络的目的就是训练权重和偏值，他们使得网络输出我们想要的结果。那么如何来更新权重呢？采用反向传播的方法，反向传播其实很简单，说白一点就是一系列的链式求导，本文不准备讲理论推到，具体的理论推到请看：

http://blog.csdn.net/hearthougan/article/details/55812728

我也不建议只看理论推到，看懂了，不如找一个小例子实际推到一番来的透彻，本文只简单讲一下如何的反向传播，比如我们现在得到了损失，也有了损失函数，我们要更新w5、w6、和b3，要更新他们，就要分别求出它们的梯度，现以求w5的梯度来说明，如下图：

图8 反向传播示例

w5如果懂了，其他的也就是知道了。假设学习率为0.5，那么最后更新权重为：

三、在CIFAR10和MNIST数据集上的实验结果

为了试验神经网络的效果我在，CIFAR10和MNIST上各测试了一下，MNIST的准确率为97.2%，而CIFAR10只有52%，但是相比较SVM和Softmax，神经网络的正确率还是提高了很多，之所以在CIFAR10上的正确率不高，个人认为它的图片背景更加复杂，图片为三通道的，而MNIST的图片背景就比较简单而且是单通道的。

本实验总共采用了75次寻参，其中隐藏层神经元的个数有[50, 75, 100]三类，每一类寻参（学习率和正则项）过程中，第一层权重W1的样子，和对应的损失以及在训练集合验证集上的正确率，现以gif的形式呈现：

1、现展示CIFAR10：

图9 50个神经元对应的W1和50个神经元寻参时对应的损失和正确率

图10 75个神经元对应的W1和75个神经元寻参时对应的损失和正确率

图11 100个神经元对应的W1和100个神经元寻参时对应的损失和正确率

在CIFAR10数据集上整个寻参之后，得到的最优W1和对应的损失以及正确率如下：

图12 最优参数显示

在寻到最优参数之后，在测试集上的正确率为52%

MNIST数据集显示：

（Sorry）MNIST数据集的gif超过2M，无法上传，只贴一张最优参数的静态图图吧：

在测试集上的正确率97.2%

全连接代码：

# -*- coding: utf-8 -*-"""Created on Fri May 19 23:23:42 2017@author: Abner"""import numpy as npimport matplotlib.pyplot as pltclass TwoLayerNet(object):    """    一个两层的全连接神经网络，输入层的神经元的个数为D，隐藏层的神经元的个数为：H，    输出层神经元的个数为C,利用Softmax损失函数和L2正则项来训练神经网络，第一个    全连接层的激活函数为ReLU        网络的结构为：    输入层 - 全连接层（第一个隐藏层） - ReLU - 全连接层（输出层） - Softmax        第二全连接层是输出层，输出的结果就是每个类别的得分。    """    def __init__(self, input_size, hidden_size, output_size, std = 1e-4):        """        初始化模型：        权重被初始化很小的随机值，偏值被初始化为0，权重和偏值存放在self.params中，        params是一个字典结构：                W1:第一个全连接层的权重，大小为：(D, H)        b1:第一层的偏值，大小为：(H,)        W2:第二个全连接层的权重，大下为：(H, C)        b2:第二层的偏值，大小为：(C,)                input_size:输入层的维数，D        hidden_size:隐藏层神经元的个数，H        output_size:输出层类别数，C                """        self.params = {}        self.params['W1'] = std * np.random.rand(input_size, hidden_size)        self.params['b1'] = np.zeros(hidden_size)        self.params['W2'] = std * np.random.rand(hidden_size, output_size)        self.params['b2'] = np.zeros(output_size)            def loss(self, X, y = None, reg = 0.0):        """        计算两层全连接神经网络的损失和梯度。                输入：        X：数据的输入大小为(N, D)，每个X[i]是一个样本        y: 训练样本标签，y[i]是X[i]对应的标签，参数y是可选择的，如果没有对y传参，        那么该损失函数只返回得分；如果传参，那么loss就返回损失和梯度。        reg:正则系数                返回：        如果y为空，返回一个大小为(N, C)的scores矩阵，其中scores[i, c]是样本X[i]是类别        c时的得分。                如果y不为空,那么就返回一个元组：        -loss:每批训练样本的损失，数据损失和正则损失        -grads：一个字典，存放的是权重（或者是偏值）和其对应的梯度                """            W1, b1 = self.params['W1'], self.params['b1']        W2, b2 = self.params['W2'], self.params['b2']        N, D = X.shape                #计算前向传播        scores = None                f = lambda x : np.maximum(0, x)        h1 = f(np.dot(X, W1) + b1)        h2 = np.dot(h1, W2) + b2                scores = h2                if y is None:            return scores                loss = None        shift_scores = scores - np.max(scores, axis = 1).reshape(-1, 1)        softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1, 1)        loss = -np.sum(np.log(softmax_output[range(N), list(y)]))        loss /= N        loss += reg * 0.5 * (np.sum(W1 * W1) + np.sum(W2 * W2))                #反向传播，计算梯度        grads = {}        """        计算反向传播，对权重和偏值进行求导，然后存放在一个字典中 ，比如，grads['W1']        应该存放W1的梯度，grads的梯度应该和W1的大小是相同的,grads['b1']与b1的大小        是相同的。        """        #得分对应损失函数的梯度        descores = softmax_output.copy()#N*C        descores[range(N), list(y)] -= 1#N*C        grads['W2'] = 1.0/N * h1.T.dot(descores) + reg * W2#H*C        grads['b2'] = 1.0/N * np.sum(descores, axis = 0)#C*1                dh1 = descores.dot(W2.T)#N*H        dh1_ReLU = (X.dot(W1) + b1 > 0) * dh1#N*H        grads['W1'] = 1.0/N * X.T.dot(dh1_ReLU) + reg * W1#D*H        grads['b1'] = 1.0/N * np.sum(dh1_ReLU, axis = 0)#H*!                return loss, grads        def train(self, X, y, X_val, y_val, learning_rate = 1e-3,              learning_rate_decay = 0.95, reg = 1e-5, num_iters = 100,              batch_size = 200, verbose = False):        """        训练神经网络，利用随机梯度        输入：        X:(N,D)        y:(N,)        X_val:给定的验证数据集，(N_val, D)        y_val:给定验证集的标签，(N_val,)        learning_rate:最优化时候的步长        learning_rate_decay:用于每个epoch学习率减少的标量值        reg:正则强度        num_iters:最优化时迭代的次数        batch_size:每批样本的个数        verbose:布尔值，如果为真，就打印最优化的过程        """                num_train = X.shape[0]        iterations_per_epoch = max(num_train/batch_size, 1)                #使用随机梯度（SGD）来最优化self.model中的参数        loss_history = []        train_acc_history = []        val_acc_history = []                for it in range(num_iters):            X_batch = None            y_batch = None                    """            TODO:            创建一个训练数据集和对应标签的随机minibatch，并把他们分别储存在X_batch和y_batch            """                    idx = np.random.choice(num_train, batch_size, replace = True)            X_batch = X[idx]            y_batch = y[idx]                    #利用当前的minibatch来计算损失和梯度            loss, grads = self.loss(X_batch, y = y_batch, reg = reg)            loss_history.append(loss)                        """            TODO:                利用self.params中的梯度来更新网络中的参数            """            self.params['W1'] = self.params['W1'] - learning_rate*grads['W1']            self.params['W2'] = self.params['W2'] - learning_rate*grads['W2']            self.params['b1'] = self.params['b1'] - learning_rate*grads['b1']            self.params['b2'] = self.params['b2'] - learning_rate*grads['b2']                        if verbose and it % 100 == 0:                print 'iteratrion %d / %d: loss %f' % (it, num_iters, loss)                        #每个epoch检查训练、验证集上的准确率，以及缩减学习率            if it % iterations_per_epoch == 0:                #检验正确率                train_acc = (self.predict(X_batch) == y_batch).mean()                val_acc = (self.predict(X_val) == y_val).mean()                train_acc_history.append(train_acc)                val_acc_history.append(val_acc)                                #缩减学习率                learning_rate *= learning_rate_decay        return {            'loss_history':loss_history,            'train_acc_history':train_acc_history,            'val_acc_history':val_acc_history,        }    def predict(self, X):        """        利用已经训练好权重的两层全连接神经网络训来预测数据的标签，对于每一数据我们预测        C个类别的得分，然后把得分最高的标签定位数据的标签。                输入：        - X输入数据,N*D        返回：        - y_pred:给定数据集对应的预测标签        """                y_pred = None                f = lambda x: np.maximum(0, x)        h1 = f(np.dot(X, self.params['W1']) + self.params['b1'])        h2 = np.dot(h1, self.params['W2']) + self.params['b2']        scores = h2                y_pred = np.argmax(scores, axis = 1)                return y_pred

导入数据CIFAR10数据集：

# -*- coding: utf-8 -*-"""Created on Sun May  7 19:32:30 2017@author: admin"""import numpy as npimport pickleimport osdef Load_CIFAR_Batch(filename):    with open(filename, 'rb') as f:        datadict = pickle.load(f)        X = datadict['data']        Y = datadict['labels']        X = X.reshape(10000, 3, 32, 32).transpose(0, 2, 3, 1).astype('float')#1000*32*32*3        Y = np.array(Y)        return X, Ydef Load_CIFAR10(Root):    xs = []    ys = []        for b in range(1,  6):        f = os.path.join(Root, 'data_batch_%d'%(b, ))        X, Y = Load_CIFAR_Batch(f)        xs.append(X)        ys.append(Y)    Xtr = np.concatenate(xs)    Ytr = np.concatenate(ys)        del X, Y    Xte, Yte = Load_CIFAR_Batch(os.path.join(Root, 'test_batch'))    return Xtr, Ytr, Xte, Yte

导入MNIST数据集：

# -*- coding: utf-8 -*-"""Created on Sun May 21 16:58:49 2017@author: Abner"""'''    使用python解析二进制文件'''import numpy as npimport cPickle as pickleimport structdef Load_MNIST(Data_dir,label_dir):    """    读取数据    """    binfile = open(Data_dir, 'rb') # 读取二进制文件    buffers = binfile.read()    head = struct.unpack_from('>IIII', buffers, 0) # 取前4个整数，返回一个元组    offset = struct.calcsize('>IIII')  # 定位到data开始的位置    imgNum = head[1]    width = head[2]    height = head[3]    bits = imgNum * width * height  # data一共有60000*28*28个像素值    bitsString = '>' + str(bits) + 'B'  # fmt格式：'>47040000B'    imgs = struct.unpack_from(bitsString, buffers, offset) # 取data数据，返回一个元组    binfile.close()    imgs = np.reshape(imgs, [imgNum, width * height]) # reshape为[60000,784]型数组    """"    读取Label    """    binfile = open(label_dir, 'rb') # 读二进制文件    buffers = binfile.read()    head = struct.unpack_from('>II', buffers, 0) # 取label文件前2个整形数    labelNum = head[1]    offset = struct.calcsize('>II')  # 定位到label数据开始的位置    numString = '>' + str(labelNum) + "B" # fmt格式：'>60000B'    labels = struct.unpack_from(numString, buffers, offset) # 取label数据    binfile.close()    labels = np.reshape(labels, [labelNum]) # 转型为列表(一维数组)        return imgs, labels    def Load_MNIST_Data():    train_path_lable_dir = 'MNIST\\train-labels.idx1-ubyte'    train_path_Data_dir = 'MNIST\\train-images.idx3-ubyte'    test_path_Data_dir = 'MNIST\\t10k-images.idx3-ubyte'    test_path_lable_dir = 'MNIST\\t10k-labels.idx1-ubyte'        Xtr, ytr = Load_MNIST(train_path_Data_dir ,train_path_lable_dir)    Xte, yte = Load_MNIST(test_path_Data_dir, test_path_lable_dir)        return Xtr, ytr, Xte, yte"""if __name__ == "__main__":    Xtr, ytr, Xte, yte = Load_MNIST_Data()    print("Xtr: ", Xtr.shape)    print("ytr: ", ytr.shape)    print('----------我是分割线-----------')    print("Xte: ", Xte.shape)    print("yte: ", yte.shape)"""

权重可视化：

# -*- coding: utf-8 -*-"""Created on Sun May 21 19:08:24 2017@author: Abner"""from math import sqrt, ceilimport numpy as npdef visualize_grid(Xs, ubound=255.0, padding=1):  """  Reshape a 4D tensor of image data to a grid for easy visualization.  Inputs:  - Xs: Data of shape (N, H, W, C)  - ubound: Output grid will have values scaled to the range [0, ubound]  - padding: The number of blank pixels between elements of the grid  """  (N, H, W, C) = Xs.shape  grid_size = int(ceil(sqrt(N)))  grid_height = H * grid_size + padding * (grid_size - 1)  grid_width = W * grid_size + padding * (grid_size - 1)  grid = np.zeros((grid_height, grid_width, C))  next_idx = 0  y0, y1 = 0, H  for y in xrange(grid_size):    x0, x1 = 0, W    for x in xrange(grid_size):      if next_idx < N:        img = Xs[next_idx]        low, high = np.min(img), np.max(img)        grid[y0:y1, x0:x1] = ubound * (img - low) / (high - low)        # grid[y0:y1, x0:x1] = Xs[next_idx]        next_idx += 1      x0 += W + padding      x1 += W + padding    y0 += H + padding    y1 += H + padding  return griddef visualize_grid1(Xs, ubound=255.0, padding=1):  """  Reshape a 4D tensor of image data to a grid for easy visualization.  Inputs:  - Xs: Data of shape (N, H, W)  - ubound: Output grid will have values scaled to the range [0, ubound]  - padding: The number of blank pixels between elements of the grid  """  (N, H, W) = Xs.shape  grid_size = int(ceil(sqrt(N)))  grid_height = H * grid_size + padding * (grid_size - 1)  grid_width = W * grid_size + padding * (grid_size - 1)  grid = np.zeros((grid_height, grid_width))  next_idx = 0  y0, y1 = 0, H  for y in xrange(grid_size):    x0, x1 = 0, W    for x in xrange(grid_size):      if next_idx < N:        img = Xs[next_idx]        low, high = np.min(img), np.max(img)        grid[y0:y1, x0:x1] = ubound * (img - low) / (high - low)        # grid[y0:y1, x0:x1] = Xs[next_idx]        next_idx += 1      x0 += W + padding      x1 += W + padding    y0 += H + padding    y1 += H + padding  return griddef vis_grid(Xs):  """ visualize a grid of images """  (N, H, W, C) = Xs.shape  A = int(ceil(sqrt(N)))  G = np.ones((A*H+A, A*W+A, C), Xs.dtype)  G *= np.min(Xs)  n = 0  for y in range(A):    for x in range(A):      if n < N:        G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = Xs[n,:,:,:]        n += 1  # normalize to [0,1]  maxg = G.max()  ming = G.min()  G = (G - ming)/(maxg-ming)  return G  def vis_nn(rows):  """ visualize array of arrays of images """  N = len(rows)  D = len(rows[0])  H,W,C = rows[0][0].shape  Xs = rows[0][0]  G = np.ones((N*H+N, D*W+D, C), Xs.dtype)  for y in range(N):    for x in range(D):      G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = rows[y][x]  # normalize to [0,1]  maxg = G.max()  ming = G.min()  G = (G - ming)/(maxg-ming)  return G

全连接神经网络分类：

# -*- coding: utf-8 -*-"""Created on Sat May 20 10:55:11 2017@author: Abner"""import numpy as npimport matplotlib.pyplot as pltimport matplotlibmatplotlib.use('Agg') from vis_uitls import visualize_gridfrom vis_uitls import visualize_grid1from LoadData import Load_CIFAR10from Load_MNIST import Load_MNIST_Datafrom Fullc_NN import TwoLayerNet#matplotlib inlineplt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plotsplt.rcParams['image.interpolation'] = 'nearest'plt.rcParams['image.cmap'] = 'gray'# for auto-reloading external modules# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython#%load_ext autoreload#%autoreload 2#读取MNIST数据集##############################################################################def get_MNIST_data(num_training=59000, num_validation=1000, num_test=1000):    """    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare    it for the two-layer neural net classifier. These are the same steps as    we used for the SVM, but condensed to a single function.      """    # Load the raw CIFAR-10 data    X_train, y_train, X_test, y_test = Load_MNIST_Data()    X_train = X_train.reshape(60000, 28, 28)    X_test = X_test.reshape(10000, 28, 28)    print "Before reshape Data:"    print "X_train: ", X_train.shape    print "y_train: ", y_train.shape    print "X_test: ", X_test.shape    print "y_test: ", y_test.shape            # Subsample the data    mask = range(num_training, num_training + num_validation)    X_val = X_train[mask]    y_val = y_train[mask]    mask = range(num_training)    X_train = X_train[mask]    y_train = y_train[mask]    mask = range(num_test)    X_test = X_test[mask]    y_test = y_test[mask]        # Normalize the data: subtract the mean image    mean_image = np.mean(X_train, axis=0)    X_train = X_train - mean_image    X_val = X_val - mean_image    X_test = X_test - mean_image    # Reshape data to rows    X_train = X_train.reshape(num_training, -1)    X_val = X_val.reshape(num_validation, -1)    X_test = X_test.reshape(num_test, -1)    return X_train, y_train, X_val, y_val, X_test, y_test        ##########################################################################'''def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):    """    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare    it for the two-layer neural net classifier. These are the same steps as    we used for the SVM, but condensed to a single function.      """    # Load the raw CIFAR-10 data    cifar10_dir = 'cifar-10-batches-py'    X_train, y_train, X_test, y_test = Load_CIFAR10(cifar10_dir)    print "Before reshape Data:"    print "X_train: ", X_train.shape    print "y_train: ", y_train.shape    print "X_test: ", X_test.shape    print "y_test: ", y_test.shape            # Subsample the data    mask = range(num_training, num_training + num_validation)    X_val = X_train[mask]    y_val = y_train[mask]    mask = range(num_training)    X_train = X_train[mask]    y_train = y_train[mask]    mask = range(num_test)    X_test = X_test[mask]    y_test = y_test[mask]    # Normalize the data: subtract the mean image    mean_image = np.mean(X_train, axis=0)    X_train -= mean_image    X_val -= mean_image    X_test -= mean_image    # Reshape data to rows    X_train = X_train.reshape(num_training, -1)    X_val = X_val.reshape(num_validation, -1)    X_test = X_test.reshape(num_test, -1)    return X_train, y_train, X_val, y_val, X_test, y_test'''# Invoke the above function to get our data.#X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()#MNISTX_train, y_train, X_val, y_val, X_test, y_test = get_MNIST_data()print 'Train data shape: ', X_train.shapeprint 'Train labels shape: ', y_train.shapeprint 'Validation data shape: ', X_val.shapeprint 'Validation labels shape: ', y_val.shapeprint 'Test data shape: ', X_test.shapeprint 'Test labels shape: ', y_test.shape'''##############################################################################"""不经过寻参，隐藏层设置神经元个数为50个"""#输入层神经元的个数是数据的维数，隐藏层神经元的个数为50个，输出层神经元的个数为10个#input_size = 32 * 32 * 3input_size = 28*28hidden_size = 50num_classes = 10net = TwoLayerNet(input_size, hidden_size, num_classes)# Train the networkstats = net.train(X_train, y_train, X_val, y_val,            num_iters=1000, batch_size=200,            learning_rate=1e-4, learning_rate_decay=0.95,            reg=0.5, verbose=True)# Predict on the validation setval_acc = (net.predict(X_val) == y_val).mean()print 'Validation accuracy: ', val_accplt.subplot(2, 1, 1)plt.plot(stats['loss_history'])plt.title('Loss history')plt.xlabel('Iteration')plt.ylabel('Loss')plt.subplot(2, 1, 2)train_acc, = plt.plot(stats['train_acc_history'], label='train')val_acc, = plt.plot(stats['val_acc_history'], label='val')plt.legend([train_acc, val_acc], ['Training Accuracy', 'Validation Accuracy'], loc='lower right')plt.title('Classification accuracy history')plt.xlabel('Epoch')plt.ylabel('Clasification accuracy')#plt.show()plt.savefig('E:\\MNIST\\loss.jpg')plt.close()#def show_net_weights(net):#  W1 = net.params['W1']#  W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)#  plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))#  plt.gca().axis('off')##  plt.show()#  plt.savefig('E:\\MNIST\\weight.jpg')def show_net_weights(net):  W1 = net.params['W1']  W1 = W1.reshape(28, 28, -1).transpose(2, 0, 1)  plt.imshow(visualize_grid1(W1, padding=1).astype('uint8'))  plt.gca().axis('off')#  plt.show()  plt.savefig('E:\\MNIST\\weight.jpg')show_net_weights(net)##############################################################################'''# best_net = None # store the best model into this ################################################################################## TODO: Tune hyperparameters using the validation set. Store your best trained  ## model in best_net.                                                            ##                                                                               ## To help debug your network, it may help to use visualizations similar to the  ## ones we used above; these visualizations will have significant qualitative    ## differences from the ones we saw above for the poorly tuned network.          ##                                                                               ## Tweaking hyperparameters by hand can be fun, but you might find it useful to  ## write code to sweep through possible combinations of hyperparameters          ## automatically like we did on the previous exercises.                          ##################################################################################stats = {}results = {}best_val = -1best_stats = None#input_size = 32 * 32 * 3input_size = 28*28num_classes = 10# hidden_sizes = (100 * np.random.rand(5)).round().astype(int)# learning_rates = (5e-3 - 5e-5) * np.random.rand(5) + 5e-5# regularization_strengths = np.random.rand(5)# hidden_sizes = [50, 75, 100]# learning_rates = [5e-4]# regularization_strengths = [0.65, 0.75, 0.85]# hidden_sizes = np.round(10 ** np.random.uniform(1.7,2.3,3)).astype(int)#hidden_sizes = [100]#CIFAR10#def show_net_weights(net, k):#  W1 = net.params['W1']#  W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)#  plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))#  plt.gca().axis('off')#  plt.savefig('E:\\NN\\%d times_W.jpg' % k)#  plt.show()def show_net_weights(net, k):  W1 = net.params['W1']  W1 = W1.reshape(28, 28, -1).transpose(2, 0, 1)  plt.imshow(visualize_grid1(W1, padding=1).astype('uint8'))  plt.gca().axis('off')  plt.savefig('E:\\MNIST\\%d times_W1.jpg' % k)hidden_sizes = [50, 75, 100]learning_rates = 10 ** np.random.uniform(-3.5,-2.5,5)regularization_strengths = 10 ** np.random.uniform(-5,1,5)k = 0for hidden_size in hidden_sizes:        for learning_rate in learning_rates:        for regularization_strength in regularization_strengths:                        # Print hyperparameters            print 'Size = %d, Learning rate = %e, Reg. Strength = %e' % (                hidden_size, learning_rate, regularization_strength)            # Initialize net            net = TwoLayerNet(input_size, hidden_size, num_classes)            # Training            stats[hidden_size, learning_rate, regularization_strength] = \                net.train(X_train, y_train, X_val, y_val,                          num_iters=2000, batch_size=500,                          learning_rate=learning_rate, learning_rate_decay=0.95,                          reg=regularization_strength, verbose=True)            # Testing            y_train_pred = net.predict(X_train)            y_val_pred = net.predict(X_val)            # Evaluation            train_num_correct = np.sum(y_train_pred == y_train)            training_accuracy = float(train_num_correct) / X_train.shape[0]            val_num_correct = np.sum(y_val_pred == y_val)            validation_accuracy = float(val_num_correct) / X_val.shape[0]            results[hidden_size, learning_rate, regularization_strength] = training_accuracy, validation_accuracy            if validation_accuracy > best_val:                best_val = validation_accuracy                best_net = net                best_stats = stats[hidden_size, learning_rate, regularization_strength]                            current_stats = stats[hidden_size, learning_rate, regularization_strength]                            # Plot the loss function and train / validation accuracies            plt.subplot(2, 1, 1)            plt.plot(current_stats['loss_history'])            plt.title('Loss history')            plt.xlabel('Iteration')            plt.ylabel('Loss')                        plt.subplot(2, 1, 2)            train_acc, = plt.plot(current_stats['train_acc_history'], label='train')            val_acc, = plt.plot(current_stats['val_acc_history'], label='val')            plt.legend([train_acc, val_acc], ['Training Accuracy', 'Validation Accuracy'], loc = 'lower right')            plt.title('Classification accuracy history')            plt.xlabel('Epoch')#寻参的时候，每次训练时，验证集和训练集的正确率            plt.ylabel('Clasification accuracy')#            plt.show()            k += 1            plt.savefig('E:\\MNIST\\%d times_Loss.jpg' % (k))            # visualize the weights of the current network            plt.close()            print "第%d次寻参的权重" % k            show_net_weights(net, k)pass# Print out results.for hid_size, lr, reg in sorted(results):    train_accuracy, val_accuracy = results[(hid_size, lr, reg)]    print 'size %d lr %e reg %e train accuracy: %f val accuracy: %f' % (                hid_size, lr, reg, train_accuracy, val_accuracy)    print 'best validation accuracy achieved during cross-validation: %f' % best_valplt.close()# Plot the loss function and train / validation accuraciesplt.subplot(2, 1, 1)plt.plot(best_stats['loss_history'])plt.title('Loss history')plt.xlabel('Iteration')plt.ylabel('Loss')plt.subplot(2, 1, 2)train_acc, = plt.plot(best_stats['train_acc_history'], label='train')val_acc, = plt.plot(best_stats['val_acc_history'], label='val')plt.legend([train_acc, val_acc], ['Training Accuracy', 'Validation Accuracy'], loc = 'lower right')plt.title('Classification accuracy history')plt.xlabel('Epoch')plt.ylabel('Clasification accuracy')#plt.show()plt.savefig('E:\\MNIST\\Best_Loss.jpg')plt.close()# visualize the weights of the current networkprint "最优时候的权重："show_net_weights(best_net, 0)pass##################################################################################                               END OF YOUR CODE                                ###################################################################################test_acc = (best_net.predict(X_test) == y_test).mean()#print 'Test accuracy: ', test_acctest_acc = (best_net.predict(X_test) == y_test).mean()print 'Test accuracy: ', test_acc

阅读全文

2 0