当我们设计好网络结构后,在神经网络训练的过程中,迭代输出的log信息中,一般包括,迭代次数,训练损失代价,测试损失代价,测试精度等。本文提供一段示例,简单讲述如何绘制训练曲线(training curve)。
首先看一段训练的log输出,网络结构参数的那段忽略,直接跳到训练迭代阶段:
I0627 21:30:06.004370 15558 solver.cpp:242] Iteration 0, loss = 21.6953I0627 21:30:06.004420 15558 solver.cpp:258] Train net output #0: loss = 21.6953 (* 1 = 21.6953 loss)I0627 21:30:06.004426 15558 solver.cpp:571] Iteration 0, lr = 0.01I0627 21:30:28.592690 15558 solver.cpp:242] Iteration 100, loss = 13.6593I0627 21:30:28.592730 15558 solver.cpp:258] Train net output #0: loss = 13.6593 (* 1 = 13.6593 loss)I0627 21:30:28.592733 15558 solver.cpp:571] Iteration 100, lr = 0.01...I0627 21:37:47.926597 15558 solver.cpp:346] Iteration 2000, Testing net (#0)I0627 21:37:48.588079 15558 blocking_queue.cpp:50] Data layer prefetch queue emptyI0627 21:40:40.575474 15558 solver.cpp:414] Test net output #0: loss = 13.07728 (* 1 = 13.07728 loss)I0627 21:40:40.575477 15558 solver.cpp:414] Test net output #1: loss/top-1 = 0.00226I0627 21:40:40.575487 15558 solver.cpp:414] Test net output #2: loss/top-5 = 0.01204I0627 21:40:40.708261 15558 solver.cpp:242] Iteration 2000, loss = 13.1739I0627 21:40:40.708302 15558 solver.cpp:258] Train net output #0: loss = 13.1739 (* 1 = 13.1739 loss)I0627 21:40:40.708307 15558 solver.cpp:571] Iteration 2000, lr = 0.01...I0628 01:28:47.426129 15558 solver.cpp:242] Iteration 49900, loss = 0.960628I0628 01:28:47.426177 15558 solver.cpp:258] Train net output #0: loss = 0.960628 (* 1 = 0.960628 loss)I0628 01:28:47.426182 15558 solver.cpp:571] Iteration 49900, lr = 0.01I0628 01:29:10.084050 15558 solver.cpp:449] Snapshotting to binary proto file train_net/net_iter_50000.caffemodelI0628 01:29:10.563587 15558 solver.cpp:734] Snapshotting solver state to binary proto filetrain_net/net_iter_50000.solverstateI0628 01:29:10.692239 15558 solver.cpp:346] Iteration 50000, Testing net (#0)I0628 01:29:13.192075 15558 blocking_queue.cpp:50] Data layer prefetch queue emptyI0628 01:31:00.595120 15558 solver.cpp:414] Test net output #0: loss = 0.6404232 (* 1 = 0.6404232 loss)I0628 01:31:00.595124 15558 solver.cpp:414] Test net output #1: loss/top-1 = 0.953861I0628 01:31:00.595127 15558 solver.cpp:414] Test net output #2: loss/top-5 = 0.998659I0628 01:31:00.727577 15558 solver.cpp:242] Iteration 50000, loss = 0.680951I0628 01:31:00.727618 15558 solver.cpp:258] Train net output #0: loss = 0.680951 (* 1 = 0.680951 loss)I0628 01:31:00.727623 15558 solver.cpp:571] Iteration 50000, lr = 0.0096
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
这是一个普通的网络训练输出,含有1个loss
,可以看出solver.prototxt
的部分参数为:
test_interval: 2000base_lr: 0.01lr_policy: "step" # or "multistep"gamma: 0.96display: 100stepsize: 50000 # if is "multistep", the first stepvalue is set as 50000snapshot_prefix: "train_net/net"
当然,上面的分析,即便不理会,对下面的代码也没什么影响,绘制训练曲线本质就是文件操作,从上面的log文件中,我们可以看出:
- 对于每个出现字段
] Iteration
和loss =
的文本行,含有训练的迭代次数以及损失代价; - 对于每个含有字段
] Iteration
和Testing net (#0)
的文本行,含有测试的对应的训练迭代次数; - 对于每个含有字段
#2:
和loss/top-5
的文本行,含有测试top-5
的精度。
根据这些分析,就可以对文本进行如下处理:
import osimport sysimport numpy as npimport matplotlib.pyplot as pltimport mathimport reimport pylabfrom pylab import figure, show, legendfrom mpl_toolkits.axes_grid1 import host_subplotfp = open('log.txt', 'r')train_iterations = []train_loss = []test_iterations = []test_accuracy = []for ln in fp: if '] Iteration ' in ln and 'loss = ' in ln: arr = re.findall(r'ion \b\d+\b,',ln) train_iterations.append(int(arr[0].strip(',')[4:])) train_loss.append(float(ln.strip().split(' = ')[-1])) if '] Iteration' in ln and 'Testing net (#0)' in ln: arr = re.findall(r'ion \b\d+\b,',ln) test_iterations.append(int(arr[0].strip(',')[4:])) if '#2:' in ln and 'loss/top-5' in ln: test_accuracy.append(float(ln.strip().split(' = ')[-1]))fp.close()host = host_subplot(111)plt.subplots_adjust(right=0.8) par1 = host.twinx()host.set_xlabel("iterations")host.set_ylabel("log loss")par1.set_ylabel("validation accuracy")p1, = host.plot(train_iterations, train_loss, label="training log loss")p2, = par1.plot(test_iterations, test_accuracy, label="validation accuracy")host.legend(loc=5)host.axis["left"].label.set_color(p1.get_color())par1.axis["right"].label.set_color(p2.get_color())host.set_xlim([-1500, 160000])par1.set_ylim([0., 1.05])plt.draw()plt.show()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
示例代码中,添加了简单的注释,如果网络训练的log输出与本中所列出的不同,只需要略微修改其中的一些参数设置,就能绘制出训练曲线图。
最后附上绘制出的训练曲线图: