caffe使用过程+digits在windows下的安装和运行

来源:互联网 发布:杭州米趣网络上市 编辑:程序博客网 时间:2024/06/05 07:06

一。模型基本组成

想要训练一个caffe模型,需要配置两个文件,包含两个部分:网络模型,参数配置,分别对应*.prototxt , ****_solver.prototxt文件。

Caffe模型文件解析:

预处理图像的leveldb构建

输入:一批图像和label (2和3) 
输出:leveldb (4) 
指令里包含如下信息:

  1. conver_imageset (构建leveldb的可运行程序)
  2. train/ (此目录放处理的jpg或者其他格式的图像)
  3. label.txt (图像文件名及其label信息)
  4. 输出的leveldb文件夹的名字
  5. CPU/GPU (指定是在cpu上还是在gpu上运行code)

CNN网络配置文件

  1. Imagenet_solver.prototxt (包含全局参数的配置的文件)
  2. Imagenet.prototxt (包含训练网络的配置的文件)
  3. Imagenet_val.prototxt (包含测试网络的配置文件)

网络模型:

DATA:一般包括训练数据和测试数据层两种类型。 一般指输入层,包含source:数据路径,批处理数据大小batch_size,scale表示数据表示在[0,1],0.00390625即 1/255

训练数据层:

layer {  name: "mnist"  type: "Data"  top: "data"  top: "label"  include {    phase: TRAIN  }  transform_param {    scale: 0.00390625  }  data_param {    source: "examples/mnist/mnist_train_lmdb"    batch_size: 64    backend: LMDB  }}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

测试数据层:

layer {  name: "mnist"  type: "Data"  top: "data"  top: "label"  include {    phase: TEST  }  transform_param {    scale: 0.00390625  }  data_param {    source: "examples/mnist/mnist_test_lmdb"    batch_size: 100    backend: LMDB  }}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

CONVOLUATION:卷积层,blobs_lr:1 , blobs_lr:2分别表示weight 及bias更新时的学习率,这里权重的学习率为solver.prototxt文件中定义的学习率真,bias的学习率真是权重学习率的2倍,这样一般会得到很好的收敛速度。

num_output表示滤波的个数,kernelsize表示滤波的大小,stride表示步长,weight_filter表示滤波的类型

layer {  name: "conv1"  type: "Convolution"  bottom: "data"  top: "conv1"  param {    lr_mult: 1 //weight学习率  }  param {    lr_mult: 2 //bias学习率,一般为weight的两倍  }  convolution_param {    num_output: 20  //滤波器个数    kernel_size: 5    stride: 1  //步长    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

POOLING: 池化层

layer {  name: "pool1"  type: "Pooling"  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 2     stride: 2  }}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

INNER_PRODUCT: 其实表示全连接,不要被名字误导

layer {  name: "ip1"  type: "InnerProduct"  bottom: "pool2"  top: "ip1"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  inner_product_param {    num_output: 500     weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

RELU:激活函数,非线性变化层 max( 0 ,x ),一般与CONVOLUTION层成对出现

layer {  name: "relu1"  type: "ReLU"  bottom: "ip1"  top: "ip1"}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

SOFTMAX:

layer {  name: "loss"  type: "SoftmaxWithLoss"  bottom: "ip2"  bottom: "label"  top: "loss"}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

参数配置文件:

***_solver.prototxt文件定义一些模型训练过程中需要到的参数,比较学习率,权重衰减系数,迭代次数,使用GPU还是CPU等等.

# The train/test net protocol buffer definitionnet: "examples/mnist/lenet_train_test.prototxt"# test_iter specifies how many forward passes the test should carry out.# In the case of MNIST, we have test batch size 100 and 100 test iterations,# covering the full 10,000 testing images.test_iter: 100# Carry out testing every 500 training iterations.test_interval: 500# The base learning rate, momentum and the weight decay of the network.base_lr: 0.01momentum: 0.9weight_decay: 0.0005# The learning rate policylr_policy: "inv"gamma: 0.0001power: 0.75# Display every 100 iterationsdisplay: 100# The maximum number of iterationsmax_iter: 10000# snapshot intermediate resultssnapshot: 5000snapshot_prefix: "examples/mnist/lenet"# solver mode: CPU or GPUsolver_mode: GPUdevice_id: 0  #在cmdcaffe接口下,GPU序号从0开始,如果有一个GPU,则device_id:0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34

训练出的模型被存为*.caffemodel,可供以后使用。 
一个完整的网络应该是: 
这里写图片描述


步骤

  • 数据准备 
    准备三组数据: 
    1. Training Set:用于训练网络
    2. Validation Set:用于训练时测试网络准确率
    3. Test Set:用于测试网络训练完成后的最终正确率
  • 构建lmdb/leveldb文件,caffe支持三种数据格式输入:images, levelda, lmdb

虽然lmdb的内存消耗是leveldb的1.1倍,但是lmdb的速度比leveldb快10%至15%,更重要的是lmdb允许多种训练模型同时读取同一组数据集。 
因此lmdb取代了leveldb成为Caffe默认的数据集生成格式。

  • 定义name.prototxt , name_solver.prototxt文件
  • 训练模型

    在windows下训练巨麻烦,要在win下使用.sh文件才行。

在windows使用.sh

安装一波 
cygwin 
在软件下可以安装,如果出现package不存在的情况可以重新打开setup执行包下载,一般没问题碰到什么问题解决什么问题。

用.bat来测试

  1. 去官网http://yann.lecun.com/exdb/mnist/下载mnist数据集。下载后解压到C:\caffe-master\data\mnist
  2. 在caffe根目录下,新建一个create_mnist.bat,里面写入如下的脚本。此处可能出错,因为train-images.idx3-ubyte 在解压的时候可能是train-images-idx3-ubyte要注意修改。

    .\Build\x64\Release\convert_mnist_data.exe .\data\mnist\mnist_train_lmdb\train-images.idx3-ubyte .\data\mnist\mnist_train_lmdb\train-labels.idx1-ubyte .\examples\mnist\mnist_train_lmdb 
    echo. 
    .\Build\x64\Release\convert_mnist_data.exe .\data\mnist\mnist_test_lmdb\t10k-images.idx3-ubyte .\data\mnist\mnist_test_lmdb\t10k-labels.idx1-ubyte .\examples\mnist\mnist_test_lmdb 
    pause 
    `

    然后双击该脚本运行,即可在E:\caffe\examples\mnist下面生成相应的lmdb数据文件。

  3. 在caffe根目录下,新建train_mnist.bat,然后输入如下的脚本,

.\Build\x64\Release\caffe.exe train –solver=.\examples\mnist\lenet_solver.prototxt 
pause

然后双击运行,就会开始训练,训练完毕后会得到相应的准确率和损失率。 
这里写图片描述

接下来安装digits:

按照这里装就好了

https://github.com/NVIDIA/DIGITS/blob/digits-5.0/docs/BuildDigitsWindows.md

最后在digits目录下 执行python -m digits就可以了 
出现了不少bug

bug1

出现找不到pycaffe的情况, 
这种情况一般是因为python没有导入caffe的包只需要将CAFFE_ROOT\Build\x64\Release\pycaffe\caffe文件夹复制到anaconda的sitepackages中就可以了。

bug2

出现pkg_resources._vendor.packaging.version.InvalidVersion: Invalid version: 'CAFFE_VERSION' 
找到\DIGITS-master\digits\config下的caffe.py 
按照下面的中文部分修改。

from __future__ import absolute_importimport impimport osimport platformimport reimport subprocessimport sysfrom . import option_listfrom digits import device_queryfrom digits.utils import parse_versiondef load_from_envvar(envvar):    """    Load information from an installation indicated by an environment variable    """    value = os.environ[envvar].strip().strip("\"' ")#此处需要修改路径,于CAFFE_HOME对应    if platform.system() == 'Windows':        #executable_dir = os.path.join(value, 'install', 'bin')        executable_dir = os.path.join(value)        #python_dir = os.path.join(value, 'install', 'python')        python_dir = os.path.join(value, 'pycaffe')    else:        executable_dir = os.path.join(value, 'build', 'tools')        python_dir = os.path.join(value, 'python')    try:        executable = find_executable_in_dir(executable_dir)        if executable is None:            raise ValueError('Caffe executable not found at "%s"'                             % executable_dir)        if not is_pycaffe_in_dir(python_dir):            raise ValueError('Pycaffe not found in "%s"'                             % python_dir)        import_pycaffe(python_dir)        version, flavor = get_version_and_flavor(executable)    except:        print ('"%s" from %s does not point to a valid installation of Caffe.'               % (value, envvar))        print 'Use the envvar CAFFE_ROOT to indicate a valid installation.'        raise    return executable, version, flavordef load_from_path():    """    Load information from an installation on standard paths (PATH and PYTHONPATH)    """    try:        executable = find_executable_in_dir()        if executable is None:            raise ValueError('Caffe executable not found in PATH')        if not is_pycaffe_in_dir():            raise ValueError('Pycaffe not found in PYTHONPATH')        import_pycaffe()        version, flavor = get_version_and_flavor(executable)    except:        print 'A valid Caffe installation was not found on your system.'        print 'Use the envvar CAFFE_ROOT to indicate a valid installation.'        raise    return executable, version, flavordef find_executable_in_dir(dirname=None):    """    Returns the path to the caffe executable at dirname    If dirname is None, search all directories in sys.path    Returns None if not found    """    if platform.system() == 'Windows':        exe_name = 'caffe.exe'    else:        exe_name = 'caffe'    if dirname is None:        dirnames = [path.strip("\"' ") for path in os.environ['PATH'].split(os.pathsep)]    else:        dirnames = [dirname]    for dirname in dirnames:        path = os.path.join(dirname, exe_name)        if os.path.isfile(path) and os.access(path, os.X_OK):            return path    return Nonedef is_pycaffe_in_dir(dirname=None):    """    Returns True if you can "import caffe" from dirname    If dirname is None, search all directories in sys.path    """    old_path = sys.path    if dirname is not None:        sys.path = [dirname]  # temporarily replace sys.path    try:        imp.find_module('caffe')    except ImportError:        return False    finally:        sys.path = old_path    return Truedef import_pycaffe(dirname=None):    """    Imports caffe    If dirname is not None, prepend it to sys.path first    """    if dirname is not None:        sys.path.insert(0, dirname)        # Add to PYTHONPATH so that build/tools/caffe is aware of python layers there        os.environ['PYTHONPATH'] = '%s%s%s' % (            dirname, os.pathsep, os.environ.get('PYTHONPATH'))    # Suppress GLOG output for python bindings    GLOG_minloglevel = os.environ.pop('GLOG_minloglevel', None)    # Show only "ERROR" and "FATAL"    os.environ['GLOG_minloglevel'] = '2'    # for Windows environment, loading h5py before caffe solves the issue mentioned in    # https://github.com/NVIDIA/DIGITS/issues/47#issuecomment-206292824    import h5py  # noqa    try:        import caffe    except ImportError:        print 'Did you forget to "make pycaffe"?'        raise    # Strange issue with protocol buffers and pickle - see issue #32    sys.path.insert(0, os.path.join(        os.path.dirname(caffe.__file__), 'proto'))    # Turn GLOG output back on for subprocess calls    if GLOG_minloglevel is None:        del os.environ['GLOG_minloglevel']    else:        os.environ['GLOG_minloglevel'] = GLOG_minlogleveldef get_version_and_flavor(executable):    """    Returns (version, flavor)    Should be called after import_pycaffe()    """    version_string = get_version_from_pycaffe()    if version_string is None:        version_string = get_version_from_cmdline(executable)    if version_string is None:        version_string = get_version_from_soname(executable)    if version_string is None:        raise ValueError('Could not find version information for Caffe build ' +                         'at "%s". Upgrade your installation' % executable)    #这部分代码没用,但是会出现bug,我就注释了    #version = parse_version(version_string)    #if parse_version(0, 99, 0) > version > parse_version(0, 9, 0):    #    flavor = 'NVIDIA'    #    minimum_version = '0.11.0'    #    if version < parse_version(minimum_version):    #        raise ValueError(    #            'Required version "%s" is greater than "%s". Upgrade your installation.'    #            % (minimum_version, version_string))    #else:    #    flavor = 'BVLC'    flavor = 'BVLC'    return version_string, flavordef get_version_from_pycaffe():    try:        from caffe import __version__ as version        return version    except ImportError:        return Nonedef get_version_from_cmdline(executable):    command = [executable, '-version']    p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)    if p.wait():        print p.stderr.read().strip()        raise RuntimeError('"%s" returned error code %s' % (command, p.returncode))    pattern = 'version'    for line in p.stdout:        if pattern in line:            return line[line.find(pattern) + len(pattern) + 1:].strip()    return Nonedef get_version_from_soname(executable):    command = ['ldd', executable]    p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)    if p.wait():        print p.stderr.read().strip()        raise RuntimeError('"%s" returned error code %s' % (command, p.returncode))    # Search output for caffe library    libname = 'libcaffe'    caffe_line = None    for line in p.stdout:        if libname in line:            caffe_line = line            break    if caffe_line is None:        raise ValueError('libcaffe not found in linked libraries for "%s"'                         % executable)    # Read the symlink for libcaffe from ldd output    symlink = caffe_line.split()[2]    filename = os.path.basename(os.path.realpath(symlink))    # parse the version string    match = re.match(r'%s(.*)\.so\.(\S+)$' % (libname), filename)    if match:        return match.group(2)    else:        return None#看这里,看这里,一个路径问题#我们需要在环境变量里声明一下,CAFFE_ROOT 或者 CAFFE_HOME都可以,指向caffe编译后的 ./Build/x64/Releaseif 'CAFFE_ROOT' in os.environ:    executable, version, flavor = load_from_envvar('CAFFE_ROOT')elif 'CAFFE_HOME' in os.environ:    executable, version, flavor = load_from_envvar('CAFFE_HOME')else:    executable, version, flavor = load_from_path()option_list['caffe'] = {    'executable': executable,    'version': version,    'flavor': flavor,    'multi_gpu': (flavor == 'BVLC' or parse_version(version) >= parse_version(0, 12)),    'cuda_enabled': (len(device_query.get_devices()) > 0),}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239

再次运行: 
这里写图片描述

训练: 
官方教程

训练

这里写图片描述

完美~