LRCN (2)

来源:互联网 发布:vb.net serialport1 编辑:程序博客网 时间:2024/05/29 13:31

classify_video.py

classify_video.py will classify a video using
(1) singleFrame RGB model
(2) singleFrame flow model
(3) 0.5/0.5 singleFrame RGB/singleFrame flow fusion
(4) 0.33/0.67 singleFrame RGB/singleFrame flow fusion
(5) LRCN RGB model
(6) LRCN flow model
(7) 0.5/0.5 LRCN RGB/LRCN flow model
(8) 0.33/0.67 LRCN RGB/LRCN flow model
结果输出了8种预测结果,分别是不同的模型以及他们的融合,那么问题来了,这几个模型是怎么融合的呢

action_hash[compute_fusion(predictions_RGB_singleFrame, predictions_flow_singleFrame, 0.33)]

1、函数compute_fusion

计算融合的函数非常简单
函数输入:两个预测好的矩阵以及权重
predictions_flow_singleFrame:预测矩阵 是一个156X01维的矩阵。 UCF101有101类,156是因为有156张图片
输出:融合结果

def compute_fusion(RGB_pred, flow_pred, p):  return np.argmax(p*np.mean(RGB_pred,0) + (1-p)*np.mean(flow_pred,0))  

2、函数caffe.Net(a,b,c)

函数输入:deploy prototxt,caffemodel
输出
a:.prototxt文件,网络结构net
b: caffemodel文件,pretrained可《Long-term Recurrent Convolutional Networks for Visual Recognition and Description》学习参数
c:不明确caffe.TEST=1

Models and weights
singleFrame_model = ‘deploy_singleFrame.prototxt’
lstm_model = ‘deploy_lstm.prototxt’
RGB_singleFrame = ‘single_frame_all_layers_hyb_RGB_iter_5000.caffemodel’
flow_singleFrame = ‘single_frame_all_layers_hyb_flow_iter_50000.caffemodel’
RGB_lstm = ‘RGB_lstm_model_iter_30000.caffemodel’
flow_lstm = ‘flow_lstm_model_iter_50000.caffemodel’

两个单帧模型,共享一个singleFrame_model,但是输入caffe.Net的网络权重不同。加入训练好的网络权重(caffelmodel)之前,网络结构只能称之为model,加入可学习参数之后,才叫net。
由此可知caffe.Net的作用是讲网络结构,和网络权重结合

RGB_singleFrame_net =  caffe.Net(singleFrame_model, RGB_singleFrame, caffe.TEST)flow_singleFrame_net =  caffe.Net(singleFrame_model, flow_singleFrame, caffe.TEST)RGB_lstm_net =  caffe.Net(lstm_model, RGB_lstm, caffe.TEST)flow_lstm_net =  caffe.Net(lstm_model, flow_lstm, caffe.TEST)

3、函数singleFrame_classify_video

单帧RGB和单帧光流都用这个函数来分类,只不过输入的模型不同
函数输入:frames,net,transformer_flow,is_flow
net:
transformer_flow:
-函数输出:
np.mean(output_predictions,0).argmax() int64 ,156张图片取平均,找到最大值的位置,
output_predictions 预测矩阵 156x101
函数引用

RGB_singleFrame_net =  caffe.Net(singleFrame_model, RGB_singleFrame, caffe.TEST)class_RGB_singleFrame, predictions_RGB_singleFrame = \         singleFrame_classify_video(RGB_frames, RGB_singleFrame_net, transformer_RGB, False)del RGB_singleFrame_netflow_singleFrame_net =  caffe.Net(singleFrame_model, flow_singleFrame, caffe.TEST)class_flow_singleFrame, predictions_flow_singleFrame = \         singleFrame_classify_video(flow_frames, flow_singleFrame_net, transformer_flow, True)del flow_singleFrame_net

函数定义

 def singleFrame_classify_video(frames, net, transformer, is_flow):  batch_size = 16   input_images = []  # resize the images  for im in frames:    input_im = caffe.io.load_image(im)    if (input_im.shape[0] < 240):      input_im = caffe.io.resize_image(input_im, (240,320))    input_images.append(input_im)   vid_length = len(input_images)       output_predictions = np.zeros((len(input_images),101))  for i in range(0,len(input_images), batch_size):    clip_input = input_images[i:min(i+batch_size, len(input_images))]    #将输入的图片分成batch    clip_input = caffe.io.oversample(clip_input,[227,227])    #???caffe.io.oversample    clip_clip_markers = np.ones((clip_input.shape[0],1,1,1))    #比如clip_input.shape[0]=160,生成(160,1,1,1)四维全1 矩阵    clip_clip_markers[0:10,:,:,:] = 0    if is_flow:  #need to negate the values when mirroring      clip_input[5:,:,:,0] = 1 - clip_input[5:,:,:,0]      #mirroring ?    caffe_in = np.zeros(np.array(clip_input.shape)[[0,3,1,2]], dtype=np.float32)    for ix, inputs in enumerate(clip_input):      caffe_in[ix] = transformer.preprocess('data',inputs)    net.blobs['data'].reshape(caffe_in.shape[0], caffe_in.shape[1], caffe_in.shape[2], caffe_in.shape[3])    out = net.forward_all(data=caffe_in)    output_predictions[i:i+batch_size] = np.mean(out['probs'].reshape(10,caffe_in.shape[0]/10,101),0)  return np.mean(output_predictions,0).argmax(), output_predictions

4、函数LRCN_classify_video

LRCN RGB和flow也都是用这个函数来分类,只不过输入的net不同

函数输入:frames,net,transformer_flow,is_flow
net:
transformer_flow:
-函数输出:
np.mean(output_predictions,0).argmax() int64 ,156张图片取平均,找到最大值的位置,
output_predictions 预测矩阵 156x101
函数引用

RGB_lstm_net =  caffe.Net(lstm_model, RGB_lstm, caffe.TEST)class_RGB_LRCN, predictions_RGB_LRCN = \         LRCN_classify_video(RGB_frames, RGB_lstm_net, transformer_RGB, False)del RGB_lstm_netflow_lstm_net =  caffe.Net(lstm_model, flow_lstm, caffe.TEST)class_flow_LRCN, predictions_flow_LRCN = \         LRCN_classify_video(flow_frames, flow_lstm_net, transformer_flow, True)del flow_lstm_net注意:

函数定义

   #classify video with LRCN modeldef LRCN_classify_video(frames, net, transformer, is_flow):  clip_length = 16  offset = 8  input_images = []  for im in frames:    input_im = caffe.io.load_image(im)    if (input_im.shape[0] < 240):      input_im = caffe.io.resize_image(input_im, (240,320))    input_images.append(input_im)  vid_length = len(input_images)  input_data = []  for i in range(0,vid_length,offset):    if (i + clip_length) < vid_length:      input_data.extend(input_images[i:i+clip_length])    else:  #video may not be divisible by clip_length      input_data.extend(input_images[-clip_length:])  output_predictions = np.zeros((len(input_data),101))  for i in range(0,len(input_data),clip_length):    clip_input = input_data[i:i+clip_length]    clip_input = caffe.io.oversample(clip_input,[227,227])    clip_clip_markers = np.ones((clip_input.shape[0],1,1,1))    clip_clip_markers[0:10,:,:,:] = 0#    if is_flow:  #need to negate the values when mirroring#      clip_input[5:,:,:,0] = 1 - clip_input[5:,:,:,0]    caffe_in = np.zeros(np.array(clip_input.shape)[[0,3,1,2]], dtype=np.float32)    for ix, inputs in enumerate(clip_input):      caffe_in[ix] = transformer.preprocess('data',inputs)    out = net.forward_all(data=caffe_in, clip_markers=np.array(clip_clip_markers))    output_predictions[i:i+clip_length] = np.mean(out['probs'],1)  return np.mean(output_predictions,0).argmax(), output_predictions

action hash

action_hash_rev.p是作者给出的一个文件,我猜大概是讲融合结果进行统一的hash表示吧

#Load activity label hashaction_hash = pickle.load(open('action_hash_rev.p','rb'))

补充

1、numpy

np.argmax:
Returns the indices of the maximum values along an axis. 返回在指定轴上的最大值的index

a = np.arange(6).reshape(2,3)
a
array([[0, 1, 2],
[3, 4, 5]])
np.argmax(a)
5 最大值是数组中的第6个值a[5]
np.argmax(a, axis=0)
array([1, 1, 1])
np.argmax(a, axis=1)
array([2, 2])

np.mean:Compute the arithmetic mean along the specified axis. 计算在指定轴上的算术平均,第二个值为0,表示在矩阵的列向量上取平均

a = np.array([[1, 2], [3, 4]])
np.mean(a)
2.5
np.mean(a, axis=0)
array([ 2., 3.])
np.mean(a, axis=1)
array([ 1.5, 3.5])

np.array

2、caffe

caffe.io.load_image(im) 
im是图片名称
caffe.io.oversample

在执行训练的时候
caffe train -solver xxx.prototxt
这个solver中包含的net是train_test_xxx.prototxt

而在进行网络处理的时候,这里面是deployxxx.prototxt,它们都是layer的网络结构,有什么不同呢

 caffe.Net(singleFrame_model, flow_singleFrame, caffe.TEST) singleFrame_model = 'deploy_singleFrame.prototxt'

*_train_test.prototxt文件:这是训练与测试网络配置文件
*_deploy.prototxt文件:这是模型构造文件
具体有什么不同呢
参考这篇文章

3、others

Extract list of frames in video

RGB_frames = glob.glob('%s%s/*.jpg' %(RGB_video_path, video))
原创粉丝点击