基于caffe的多标签端到端车牌识别
来源:互联网 发布:淘宝店店铺转让 编辑:程序博客网 时间:2024/06/06 20:46
最近利用深度学习框架caffe做了一个端到端车牌识别的模型,所用的网络为AlexNet稍加了一些修改,由于数据集太少、网络结构不完善,效果并没有很好,但是一些基本的思路摸清楚了。
一、数据处理
- 所用的数据为EasyPR提供的数据集,共有1541张车牌图片,按照8:1:1的比例分为train、val和test三个数据集。(数据集太少,后期还会用GAN和真实车牌生成数据)
- caffe自带的lmdb数据库没有存储多标签数据的功能,所以用hdf5对数据进行存储。车牌共有7个字符,第一个字符为汉字,第二个字符为字母,后面五个字符为数字和字母,所以将标签分为三类:31个汉字标签,24个字母标签,以及34个字母数字标签。
- opencv读取的图片为H*W*C的形式,根据blob对图片reshape为C*H*W维度。
# 读取labelmaplabeldict_Chi = dict()labelmap_Chi = open("labelmap_Chinese.txt", "r")lines = labelmap_Chi.readlines()for line in lines: line = line.strip("\n") temp = line.split(" ") labeldict_Chi[temp[1]] = temp[0]labelmap_Chi.close()labeldict_Letter = dict()labelmap_Letter = open("labelmap_Letter.txt", "r")lines = labelmap_Letter.readlines()for line in lines: line = line.strip("\n") temp = line.split(" ") labeldict_Letter[temp[1]] = temp[0]labelmap_Letter.close()labeldict_NL = dict()labelmap_NL = open("labelmap_Num&Letter.txt", "r")lines = labelmap_NL.readlines()for line in lines: line = line.strip("\n") temp = line.split(" ") labeldict_NL[temp[1]] = temp[0]labelmap_NL.close()for i in range(2): imgs_file = os.listdir(IMG_FILE[i]) for img_file in imgs_file: filename = os.path.join(IMG_FILE[i], img_file) image = cv2.imread(filename) image = image.reshape(3, 36, 136) label = [] char_Chi = img_file[:3] label.append(int(labeldict_Chi[char_Chi])) char_Spe = img_file[3:4] label.append(int(labeldict_Letter[char_Spe])) for char in img_file[4:-4]: label.append(int(labeldict_NL[char])) IMAGE[i].append(image) LABEL[i].append(np.array(label)) IMAGE[i] = np.array(IMAGE[i], dtype=int) LABEL[i] = np.array(LABEL[i], dtype=int) meanValue = mean(IMAGE[0]) print "Creating hdf5..." with h5py.File(HDF5_FILE[i], "w") as h5: data = IMAGE[i].astype(np.float32) - meanValue label = LABEL[i].astype(np.float32) h5["data"] = data h5["label"] = label h5.close() print "Creating txt..." with open(TXT_FILE[i], "w") as txt: txt.write(os.path.join(os.getcwd(), HDF5_FILE[i])) txt.close()np.save("mean_e2e.npy", meanValue)
将图片存入h5文件的时候,需要加上图片的均值,由于caffe计算均值的文件是针对lmdb格式的,所以均值就自己写了一个:
# 计算均值def mean(images): images = images.astype(np.float32) meanValue = sum(images) / len(images) return meanValue
最后要将h5文件读入到一个txt文件里,这个txt文件存储的就是h5文件的路径。
二、生成网络结构
- 网络结构使用的是AlexNet结构,对网络结构进行了一些修改:
- 数据层用的是hdf5数据;
- 多标签分类,添加了slice层;
- 最后的输出根据标签数量的不同,输出的特征数有变化。
name: "Plate_E2E"layer { name: "data" type: "HDF5Data" top: "data" top: "label" include { phase: TRAIN } hdf5_data_param { source: "examples/plate_e2e/train_e2e.txt" batch_size: 10 }}layer { name: "data" type: "HDF5Data" top: "data" top: "label" include { phase: TEST } hdf5_data_param { source: "examples/plate_e2e/val_e2e.txt" batch_size: 10 }}layer { name: "slicers" type: "Slice" bottom: "label" top: "label_1" top: "label_2" top: "label_3" top: "label_4" top: "label_5" top: "label_6" top: "label_7" slice_param { axis: 1 slice_point: 1 slice_point: 2 slice_point: 3 slice_point: 4 slice_point: 5 slice_point: 6 }}layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1"}layer { name: "norm1" type: "LRN" bottom: "conv1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 }}layer { name: "pool1" type: "Pooling" bottom: "norm1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 }}layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2"}layer { name: "norm2" type: "LRN" bottom: "conv2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 }}layer { name: "pool2" type: "Pooling" bottom: "norm2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 }}layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3"}layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4"}layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5"}layer { name: "fc6" type: "InnerProduct" bottom: "conv5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" } bias_filler { type: "constant" value: 0 } }}layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6"}layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc7_1" type: "InnerProduct" bottom: "fc6" top: "fc7_1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu7_1" type: "ReLU" bottom: "fc7_1" top: "fc7_1"}layer { name: "drop7_1" type: "Dropout" bottom: "fc7_1" top: "fc7_1" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc7_2" type: "InnerProduct" bottom: "fc6" top: "fc7_2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu7_2" type: "ReLU" bottom: "fc7_2" top: "fc7_2"}layer { name: "drop7_2" type: "Dropout" bottom: "fc7_2" top: "fc7_2" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc7_3" type: "InnerProduct" bottom: "fc6" top: "fc7_3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu7_3" type: "ReLU" bottom: "fc7_3" top: "fc7_3"}layer { name: "drop7_3" type: "Dropout" bottom: "fc7_3" top: "fc7_3" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc7_4" type: "InnerProduct" bottom: "fc6" top: "fc7_4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu7_4" type: "ReLU" bottom: "fc7_4" top: "fc7_4"}layer { name: "drop7_4" type: "Dropout" bottom: "fc7_4" top: "fc7_4" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc7_5" type: "InnerProduct" bottom: "fc6" top: "fc7_5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu7_5" type: "ReLU" bottom: "fc7_5" top: "fc7_5"}layer { name: "drop7_5" type: "Dropout" bottom: "fc7_5" top: "fc7_5" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc7_6" type: "InnerProduct" bottom: "fc6" top: "fc7_6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu7_6" type: "ReLU" bottom: "fc7_6" top: "fc7_6"}layer { name: "drop7_6" type: "Dropout" bottom: "fc7_6" top: "fc7_6" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc7_7" type: "InnerProduct" bottom: "fc6" top: "fc7_7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }}layer { name: "relu7_7" type: "ReLU" bottom: "fc7_7" top: "fc7_7"}layer { name: "drop7_7" type: "Dropout" bottom: "fc7_7" top: "fc7_7" dropout_param { dropout_ratio: 0.5 }}layer { name: "fc8_1" type: "InnerProduct" bottom: "fc7_1" top: "fc8_1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 31 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "fc8_2" type: "InnerProduct" bottom: "fc7_2" top: "fc8_2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 24 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "fc8_3" type: "InnerProduct" bottom: "fc7_3" top: "fc8_3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "fc8_4" type: "InnerProduct" bottom: "fc7_4" top: "fc8_4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "fc8_5" type: "InnerProduct" bottom: "fc7_5" top: "fc8_5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "fc8_6" type: "InnerProduct" bottom: "fc7_6" top: "fc8_6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "fc8_7" type: "InnerProduct" bottom: "fc7_7" top: "fc8_7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 34 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }}layer { name: "accuracy_1" type: "Accuracy" bottom: "fc8_1" bottom: "label_1" top: "accuracy_1" include { phase: TEST }}layer { name: "accuracy_2" type: "Accuracy" bottom: "fc8_2" bottom: "label_2" top: "accuracy_2" include { phase: TEST }}layer { name: "accuracy_3" type: "Accuracy" bottom: "fc8_3" bottom: "label_3" top: "accuracy_3" include { phase: TEST }}layer { name: "accuracy_4" type: "Accuracy" bottom: "fc8_4" bottom: "label_4" top: "accuracy_4" include { phase: TEST }}layer { name: "accuracy_5" type: "Accuracy" bottom: "fc8_5" bottom: "label_5" top: "accuracy_5" include { phase: TEST }}layer { name: "accuracy_6" type: "Accuracy" bottom: "fc8_6" bottom: "label_6" top: "accuracy_6" include { phase: TEST }}layer { name: "accuracy_7" type: "Accuracy" bottom: "fc8_7" bottom: "label_7" top: "accuracy_7" include { phase: TEST }}layer { name: "loss_1" type: "SoftmaxWithLoss" bottom: "fc8_1" bottom: "label_1" top: "loss_1" loss_weight: 0.2}layer { name: "loss_2" type: "SoftmaxWithLoss" bottom: "fc8_2" bottom: "label_2" top: "loss_2" loss_weight: 0.2}layer { name: "loss_3" type: "SoftmaxWithLoss" bottom: "fc8_3" bottom: "label_3" top: "loss_3" loss_weight: 0.2}layer { name: "loss_4" type: "SoftmaxWithLoss" bottom: "fc8_4" bottom: "label_4" top: "loss_4" loss_weight: 0.2}layer { name: "loss_5" type: "SoftmaxWithLoss" bottom: "fc8_5" bottom: "label_5" top: "loss_5" loss_weight: 0.2}layer { name: "loss_6" type: "SoftmaxWithLoss" bottom: "fc8_6" bottom: "label_6" top: "loss_6" loss_weight: 0.2}layer { name: "loss_7" type: "SoftmaxWithLoss" bottom: "fc8_7" bottom: "label_7" top: "loss_7" loss_weight: 0.2}
- 有两个问题需要总结下:
- AlexNet在conv5之后又一个pool5层,但是到conv5之后,图片的边界已经不足以再做池化,所以去掉了pool5层;
- 第一次训练的时候忘记修改最后fc层输出的num_output参数,导致特征分类出现问题。七个fc层的num_output分别是31、24、34、34、34、34、34,分别对应各自标签的类别。
solver修改为如下所示,再修改相应的deploy网络结构,就可以开始训练了。
net: "examples/plate_e2e/train_val_e2e.prototxt"test_iter: 130test_interval: 100base_lr: 0.001lr_policy: "step"gamma: 0.1stepsize: 3000display: 10max_iter: 15000momentum: 0.9weight_decay: 0.0005snapshot: 3000snapshot_prefix: "examples/plate_e2e/plate_e2e"solver_mode: GPU
三、训练
#!/usr/bin/env shTOOLS=./build/tools$TOOLS/caffe train --solver=examples/plate_e2e/solver_e2e.prototxt -gpu all
四、测试
def Test(img): # 加载模型 net = caffe.Net(deploy, caffe_model, caffe.TEST) # 注意可以调节预处理批次的大小 # 由于是处理一张图片,所以把原来的10张的批次改为1 net.blobs['data'].reshape(1, 3, 36, 136) # 加载图片到数据层 im = cv2.imread(img) images = np.array([im.reshape(3, 36, 136)], dtype=np.float32) meanValue = mean() net.blobs['data'].data[...] = images - meanValue # 前向计算 out = net.forward() # 预测分类 result = [] result.append(out['prob_1'].argmax()) result.append(out['prob_2'].argmax()) result.append(out['prob_3'].argmax()) result.append(out['prob_4'].argmax()) result.append(out['prob_5'].argmax()) result.append(out['prob_6'].argmax()) result.append(out['prob_7'].argmax()) return result
最后的测试结果并不好,还需要对网络结构进行修改和构建,数据集也需要重新扩充,后期会进行更新。
阅读全文
0 0
- 基于caffe的多标签端到端车牌识别
- 基于OpenCV的车牌识别—车牌定位
- 基于Opencv的汽车车牌识别
- 基于SVM和神经网络的车牌识别
- 基于matlab的蓝色车牌识别
- 基于SVM和神经网络的车牌识别
- 基于Opencv的汽车车牌识别
- 基于EasyPR的车牌识别android实现
- 基于Opencv的汽车车牌识别
- 基于opencv的车牌识别(三)车牌ROI提取,字符分割及识别
- 一种基于BP神经网络的车牌字符识别方法
- 我的毕设 图像处理 基于ARM车牌识别
- 基于opencv的车牌识别解析与代码
- 基于Halcon的车牌识别实例及验证
- 基于SVM与人工神经网络的车牌识别系统
- 基于SVM和神经网络的车牌识别:项目启动
- 基于easyPR和openalpr的车牌识别研究
- 【opencv机器学习】基于SVM和神经网络的车牌识别
- linux redis安装
- [吴恩达 DL] Class1 Week3 浅层神经网络+代码实现
- padding和margin区别以及常用法
- 如何去掉 Liferay 页面下方的 Powered by Lifray
- 前端涉及的知识体系
- 基于caffe的多标签端到端车牌识别
- CS 400 Path Union 思维+贪心
- 常用的工具(一 显示小红点)
- 基于深度学习的视频检测(六) 行人计数,监控视频人员管理
- lab5 shellLabNote
- 97. Interleaving String
- php时间
- 沈询所有资源的索引(会不断更新)
- HTML+JS获取系统实时时间并显示