caffe(一): 利用自带MNIST例程训练手写字符识别模型

来源：互联网发布：tomcat修改1099端口编辑：程序博客网时间：2024/06/05 03:32

因为工作的需要，近期正式开始学习深度学习，采用的深度学习框架是caffe。为了更快的了解caffe训练模型的整体流程，首先从caffe自带的MNIST例程开始。

准备数据

因为所用的linux操作系统不能联网，所以不能编写脚本来下载数据集，只能先下载下来，然后导入自己的根目录下。
下载网址：
the minist dataset
下载下来的文件如下：
这里写图片描述
train开头的文件夹代表训练集，t10k开头的文件夹代表验证集，文件命名中有images的代表图片，有labels的代表标签。
文件下载好之后，需要将其解压，linux指令如下：

$ gzip -d train-images-idx3-ubyte.gz

依次将四个文件解压，得到的文件如下：
这里写图片描述
之后，需要将其转换为caffe适用的数据格式lmdb，编写shell脚本如下：

#!/usr/bin/env sh# this script converts the mnist data into lmdb or leveldb format,# depending on the value assigned to $BACKEND.EXAMPLE=examples/mnistDATA=examples/mnistBUILD=build/examples/mnistBACKEND="lmdb"echo "creating ${BACKEND}..."rm -rf $EXAMPLE/mnist_train_${BACKEND}rm -rf $EXAMPLE/mnist_test_${BACKEND}$BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte \ $DATA /train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}$BUILD/convert_mnist_data.bin $DATA/t10k-images-idx3-ubyte \ $DATA/t10k-labels-idx1-ubyte $EXAMPLE/mnist_test_${BACKEND} --backend=${BACKEND}echo "done"

EXAMPLE是lmdb文件存放路径，DATA是原始数据存放路径，BUILD是convert_mnist_data.bin文件的存放路径，将其换成自己的正确的路径，convert_mnist_data.bin是caffe自带的一个数据类型转换的文件。
至此，数据准备完毕。

定义网络结构

caffe中已经自带了该任务的网络结构，即：lenet_train_test.prototxt文件，看一下其中的内容：

name: "LeNet"layer{    name: "mnist"    type: "Data"    top: "data"    top: "label"    include {        phase: TRAIN    }    transform_param {        scale: 0.00390625    }    data_param {        source: "examples/mnist/mnist_train_lmdb"        batch_size: 64        backend: LMDB    }}layer {    name: "mnist"    type: "Data"    top: "data"    top: "label"    include {        phase: TEST    }    transform_param {        scale: 0.00390625    }    data_param {        source: "examples/mnist/mnist_test_lmdb"        batch_size: 100        backend: LMDB    }}layer {    name: "conv1"    type: "Convolution"    bottom: "data"    top: "conv1"    param {        lr_mult: 1    }    param {        lr_mult: 2    }    convolution_param {        num_output: 20        kernel_size: 5        stride: 1        weight_filler {            type: "xavier"        }        bias_filler {            type: "constant"        }    }}layer {    name: "pool1"    type: "Pooling"    bottom: "conv1"    top: "pool1"    pooling_param {        pool: MAX        kernel_size: 2        stride: 2    }}layer {    name: "conv2"    type: "Convolution"    bottom: "pool1"    top: "conv2"    param {        lr_mult: 1    }    param {        lr_mult: 2    }    convolution_param {        num_output: 50        kernel_size: 5        stride: 1        weight_filler {            type: "xavier"        }        bias_filler {            type: "xavier"        }    }}layer {    name: "pool2"    type: "Pooling"    bottom: "conv2"    top: "pool2"    pooling_param {        pool: MAX        kernel_size: 2        stride:2     }}layer {    name: "ip1"    type: "InnerProduct"    bottom: "pool2"    top: "ip1"    param {        lr_mult: 1    }    param {        lr_mult: 2    }    inner_product_param {        num_output: 500        weight_filler {            type: "xavier"        }        bias_filler {            type: "constant"        }    }}layer {    name: "relu1"    type: "ReLU"    bottom: "ip1"    top: "ip1"}layer {    name: "ip2"    type: "InnerProduct"    bottom: "ip1"    top: "ip2"    param {        lr_mult: 1    }    param {        lr_mult: 2    }    inner_product_param {        num_output: 10        weight_filler {            type: "xavier"        }        bias_filler {            type: "constant"        }    }}layer {    name: "accuracy"    type: "Accuracy"    bottom: "ip2"    bottom: "label"    top: "accuracy"    include {        phase: TEST    }}layer {    name: "loss"    type: "SoftmaxWithLoss"    bottom: "ip2"    bottom: "label"    top: "loss"}

配置solver参数

caffe已经自带啦该任务的sovler文件，即：lenet_solver.prototxt, 看一下它的内容：

# the train/test net protocol buffer definitionnet: "examples/mnist/lenet_train_test.prototxt"# test_iter specifies how many forward passes the test should carry out.# in the case of MNIST, we have test batch size 100 and 100 test iterations,# convering the full 10000 testing images.test_iter:100# carry out testing every 500 training iterations.test_interval: 500test_type: TEST# the base learning rate, momentum and the weight decay of the network.base_lr: 0.01momentum: 0.9weight_decay: 0.0005# the learning rate policylr_policy: "inv"gamma: 0.0001power: 0.75# display every 100 iterationsdisplay: 100# the maximum number of iterationsmax_iter: 10000# snapshot intermediate resultssnapshot: 5000snapshot_prefix: "examples/mnist/lenet"# solver mode: CPU or GPUsolver_mode: GPU

训练

因为我是在服务器集群上跑的代码，使用了GPU，在caffe根目录下运行脚本文件，指令如下：

$ srun -p K15G12 -J MNIST -c 4 --gres=gpu:1 sh examples/mnist/train_lenet.sh

train_lenet.sh脚本内容如下：

#!/usr/bin/env sh./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt

最后，在./examples/mnist文件夹下，出现了如下几个文件：
这里写图片描述
caffemodel就是训练好了的用于手写字符识别的模型。

注意

因为服务器上caffe已经配置好了，所以我是直接从别人那里拷过来一个caffe包，运行的时候一直出错。之后发现这是因为./build/tools/caffe 这个软链接到别人的caffe，我没有权限，我需要重新编译链接一下。
所以我在caffe根目录下首先 make clean，删掉原先存在的build文件夹，然后 make，重新编译链接，之后就能正常运行啦。

后记

刚刚开始接触linux操作系统和caffe框架，很多地方都不是很清楚，还有很多知识需要学习。在这个过程中遇到问题真的很想爆炸，但是就是这些问题的存在自己才能一步步成长。告诫自己：不要逃避问题和挑战，成长的路上任重而道远！

阅读全文

0 0