LRCN (2)
来源:互联网 发布:vb.net serialport1 编辑:程序博客网 时间:2024/05/29 13:31
classify_video.py
classify_video.py will classify a video using
(1) singleFrame RGB model
(2) singleFrame flow model
(3) 0.5/0.5 singleFrame RGB/singleFrame flow fusion
(4) 0.33/0.67 singleFrame RGB/singleFrame flow fusion
(5) LRCN RGB model
(6) LRCN flow model
(7) 0.5/0.5 LRCN RGB/LRCN flow model
(8) 0.33/0.67 LRCN RGB/LRCN flow model
结果输出了8种预测结果,分别是不同的模型以及他们的融合,那么问题来了,这几个模型是怎么融合的呢
action_hash[compute_fusion(predictions_RGB_singleFrame, predictions_flow_singleFrame, 0.33)]
1、函数compute_fusion
计算融合的函数非常简单
函数输入:两个预测好的矩阵以及权重
predictions_flow_singleFrame:预测矩阵 是一个156X01维的矩阵。 UCF101有101类,156是因为有156张图片
输出:融合结果
def compute_fusion(RGB_pred, flow_pred, p): return np.argmax(p*np.mean(RGB_pred,0) + (1-p)*np.mean(flow_pred,0))
2、函数caffe.Net(a,b,c)
函数输入:deploy prototxt,caffemodel
输出:
a:.prototxt文件,网络结构net
b: caffemodel文件,pretrained可《Long-term Recurrent Convolutional Networks for Visual Recognition and Description》学习参数
c:不明确caffe.TEST=1
Models and weights
singleFrame_model = ‘deploy_singleFrame.prototxt’
lstm_model = ‘deploy_lstm.prototxt’
RGB_singleFrame = ‘single_frame_all_layers_hyb_RGB_iter_5000.caffemodel’
flow_singleFrame = ‘single_frame_all_layers_hyb_flow_iter_50000.caffemodel’
RGB_lstm = ‘RGB_lstm_model_iter_30000.caffemodel’
flow_lstm = ‘flow_lstm_model_iter_50000.caffemodel’
两个单帧模型,共享一个singleFrame_model,但是输入caffe.Net的网络权重不同。加入训练好的网络权重(caffelmodel)之前,网络结构只能称之为model,加入可学习参数之后,才叫net。
由此可知caffe.Net的作用是讲网络结构,和网络权重结合
RGB_singleFrame_net = caffe.Net(singleFrame_model, RGB_singleFrame, caffe.TEST)flow_singleFrame_net = caffe.Net(singleFrame_model, flow_singleFrame, caffe.TEST)RGB_lstm_net = caffe.Net(lstm_model, RGB_lstm, caffe.TEST)flow_lstm_net = caffe.Net(lstm_model, flow_lstm, caffe.TEST)
3、函数singleFrame_classify_video
单帧RGB和单帧光流都用这个函数来分类,只不过输入的模型不同
函数输入:frames,net,transformer_flow,is_flow
net:
transformer_flow:
-函数输出:
np.mean(output_predictions,0).argmax() int64 ,156张图片取平均,找到最大值的位置,
output_predictions 预测矩阵 156x101
函数引用
RGB_singleFrame_net = caffe.Net(singleFrame_model, RGB_singleFrame, caffe.TEST)class_RGB_singleFrame, predictions_RGB_singleFrame = \ singleFrame_classify_video(RGB_frames, RGB_singleFrame_net, transformer_RGB, False)del RGB_singleFrame_netflow_singleFrame_net = caffe.Net(singleFrame_model, flow_singleFrame, caffe.TEST)class_flow_singleFrame, predictions_flow_singleFrame = \ singleFrame_classify_video(flow_frames, flow_singleFrame_net, transformer_flow, True)del flow_singleFrame_net
函数定义
def singleFrame_classify_video(frames, net, transformer, is_flow): batch_size = 16 input_images = [] # resize the images for im in frames: input_im = caffe.io.load_image(im) if (input_im.shape[0] < 240): input_im = caffe.io.resize_image(input_im, (240,320)) input_images.append(input_im) vid_length = len(input_images) output_predictions = np.zeros((len(input_images),101)) for i in range(0,len(input_images), batch_size): clip_input = input_images[i:min(i+batch_size, len(input_images))] #将输入的图片分成batch clip_input = caffe.io.oversample(clip_input,[227,227]) #???caffe.io.oversample clip_clip_markers = np.ones((clip_input.shape[0],1,1,1)) #比如clip_input.shape[0]=160,生成(160,1,1,1)四维全1 矩阵 clip_clip_markers[0:10,:,:,:] = 0 if is_flow: #need to negate the values when mirroring clip_input[5:,:,:,0] = 1 - clip_input[5:,:,:,0] #mirroring ? caffe_in = np.zeros(np.array(clip_input.shape)[[0,3,1,2]], dtype=np.float32) for ix, inputs in enumerate(clip_input): caffe_in[ix] = transformer.preprocess('data',inputs) net.blobs['data'].reshape(caffe_in.shape[0], caffe_in.shape[1], caffe_in.shape[2], caffe_in.shape[3]) out = net.forward_all(data=caffe_in) output_predictions[i:i+batch_size] = np.mean(out['probs'].reshape(10,caffe_in.shape[0]/10,101),0) return np.mean(output_predictions,0).argmax(), output_predictions
4、函数LRCN_classify_video
LRCN RGB和flow也都是用这个函数来分类,只不过输入的net不同
函数输入:frames,net,transformer_flow,is_flow
net:
transformer_flow:
-函数输出:
np.mean(output_predictions,0).argmax() int64 ,156张图片取平均,找到最大值的位置,
output_predictions 预测矩阵 156x101
函数引用
RGB_lstm_net = caffe.Net(lstm_model, RGB_lstm, caffe.TEST)class_RGB_LRCN, predictions_RGB_LRCN = \ LRCN_classify_video(RGB_frames, RGB_lstm_net, transformer_RGB, False)del RGB_lstm_netflow_lstm_net = caffe.Net(lstm_model, flow_lstm, caffe.TEST)class_flow_LRCN, predictions_flow_LRCN = \ LRCN_classify_video(flow_frames, flow_lstm_net, transformer_flow, True)del flow_lstm_net注意:
函数定义
#classify video with LRCN modeldef LRCN_classify_video(frames, net, transformer, is_flow): clip_length = 16 offset = 8 input_images = [] for im in frames: input_im = caffe.io.load_image(im) if (input_im.shape[0] < 240): input_im = caffe.io.resize_image(input_im, (240,320)) input_images.append(input_im) vid_length = len(input_images) input_data = [] for i in range(0,vid_length,offset): if (i + clip_length) < vid_length: input_data.extend(input_images[i:i+clip_length]) else: #video may not be divisible by clip_length input_data.extend(input_images[-clip_length:]) output_predictions = np.zeros((len(input_data),101)) for i in range(0,len(input_data),clip_length): clip_input = input_data[i:i+clip_length] clip_input = caffe.io.oversample(clip_input,[227,227]) clip_clip_markers = np.ones((clip_input.shape[0],1,1,1)) clip_clip_markers[0:10,:,:,:] = 0# if is_flow: #need to negate the values when mirroring# clip_input[5:,:,:,0] = 1 - clip_input[5:,:,:,0] caffe_in = np.zeros(np.array(clip_input.shape)[[0,3,1,2]], dtype=np.float32) for ix, inputs in enumerate(clip_input): caffe_in[ix] = transformer.preprocess('data',inputs) out = net.forward_all(data=caffe_in, clip_markers=np.array(clip_clip_markers)) output_predictions[i:i+clip_length] = np.mean(out['probs'],1) return np.mean(output_predictions,0).argmax(), output_predictions
action hash
action_hash_rev.p是作者给出的一个文件,我猜大概是讲融合结果进行统一的hash表示吧
#Load activity label hashaction_hash = pickle.load(open('action_hash_rev.p','rb'))
补充
1、numpy
np.argmax:
Returns the indices of the maximum values along an axis. 返回在指定轴上的最大值的index
a = np.arange(6).reshape(2,3)
a
array([[0, 1, 2],
[3, 4, 5]])
np.argmax(a)
5 最大值是数组中的第6个值a[5]
np.argmax(a, axis=0)
array([1, 1, 1])
np.argmax(a, axis=1)
array([2, 2])
np.mean:Compute the arithmetic mean along the specified axis. 计算在指定轴上的算术平均,第二个值为0,表示在矩阵的列向量上取平均
a = np.array([[1, 2], [3, 4]])
np.mean(a)
2.5
np.mean(a, axis=0)
array([ 2., 3.])
np.mean(a, axis=1)
array([ 1.5, 3.5])
np.array
2、caffe
caffe.io.load_image(im)
im是图片名称
caffe.io.oversample
在执行训练的时候 caffe train -solver xxx.prototxt
这个solver中包含的net是train_test_xxx.prototxt
而在进行网络处理的时候,这里面是deployxxx.prototxt,它们都是layer的网络结构,有什么不同呢
caffe.Net(singleFrame_model, flow_singleFrame, caffe.TEST) singleFrame_model = 'deploy_singleFrame.prototxt'
*_train_test.prototxt文件:这是训练与测试网络配置文件
*_deploy.prototxt文件:这是模型构造文件
具体有什么不同呢
参考这篇文章
3、others
Extract list of frames in video
RGB_frames = glob.glob('%s%s/*.jpg' %(RGB_video_path, video))
- LRCN (2)
- lrcn
- LRCN(1)
- LRCN代码复现3
- LRCN(5) sequence_input_layer.py
- Intel Caffe LRCN Flow
- LRCN(4) solver之train_test_lstm_RGB.prototxt
- 2
- 2
- 2
- 2
- 2
- 2
- 2
- 2
- 2
- 2
- 2
- 可输入/选择的下拉框 -- 异步编程
- es5和es6中封装继承的不同
- Leetcode 654. Maximum Binary Tree 最大二叉树 解题报告
- mybatis逆向工程
- hdu 6119 百度之星初赛B 小小粉丝度度熊
- LRCN (2)
- 第3章 Spring 4.0增强和新功能 I -- Spring4.3.8参考文档中文版
- bashshell 的一些基本知识
- ctf-web--总结几点基础题的做题思路
- Qt连接mysql数据库失败
- 略坑的Thread.stop()
- CSU-ACM2017暑期训练16-树状数组 B
- sqlite 日期型 字符串转为日期型
- 【Python3.6爬虫学习记录】(六)urllib详细使用方法(header,代理,超时,认证,异常处理)