动作识别：improved dense trajectories（iDT）特征编码—Fisher Vector代码学习

来源：互联网发布：淘宝v任务平台下载编辑：程序博客网时间：2024/05/20 01:34

论文　”Action Recognition with Improved Trajectories”　通过相机运动估计，使得提取的光流更加准确，以及ＦｉｓｈｅｒVector 编码，提升了Dense Trajectories的效果。

iDT的代码可以点击此处下载，

对iDT特征进行FV编码的代码可以在dtfv 下载。

iDT 代码的解析可以参考这篇论文　http://blog.csdn.net/wzmsltw/article/details/53023363

Fisher Vector implementation for Dense Trajectories (DTFV)

是对ＩＤＴ算法Fisher Vector 编码的Ｃ＋＋实现。源代码是Ｃ＋＋写的，编译产生可执行文件后，放入scrpit文件夹下.Demo是用python写了一个脚本，输入视频的路径，输出经过FisherVector编码的特征。

先回顾一下ＦｉｓｈｅｒＶｅｃｔｏｒ编码的过程，作者在论文中写到:

Unlike bag of features, Fisher vector encodes both first and second order statistics between the video descriptors and a Gaussian Mixture Model (GMM). In recent evaluations , this shows an improved performance over bag of features for both image and action classification. Differently from the bag-of-features encoding, we first reduce the descriptor dimensionality by a factor of two using Principal Component Analysis (PCA), We set the number of Gaussians to K = 256 and randomly sample a
subset of 256,000 features from the training set to estimate the GMM. Each video is, then, represented by a 2DK dimensional Fisher vector for each descriptor type, where D　is the descriptor dimension after performing PCA. Finally, we apply power and L2 normalization to the Fisher vector. To combine different descriptor types, we concatenate their normalized Fisher vectors. A linear SVM is used for classification.

总结一下，Ｆｉｓｈｅｒ　Ｖｅｃｔｏｒ编码的过程是

１．　原来每个样本的iDT 特征为４２６维。首先将每个描述子的维数通过ＰＣＡ降为原来的1/2. 描述子包括: 轨迹描　　述子Ｔraj, 以及HOF, HOG ,MBH。

２．　用高斯聚类个数Ｋ＝２５６来将随机采样的２５６０００个样本进行ＧＭＭ估计，得到Codebook.

３．　原来的视频可以用２ＤＫ维来表示。Ｄ是指降维后的２１３维，Ｋ指的是Ｋ_means聚类个数。

经过上述处理，每个视频的维度都变成了２ＤＫ个，便于后续的分类。

４．　采用L2正则化处理ＦｉｓｈｅｒVector。

代码分析：

从Github上下载ＤＴＦＶ，有三个原文件，分别是src,即程序的原代码；data文件存放着ＦＶ编码用到的ＰＣＡ因子矩阵，以及作者在两个数据库上训练的ＧＭＭ码本（ＣｏｄｅＢｏｏｋ），Ｓｃｒｉｐt文件夹存放的是作者Demo演示用的Python脚本。

如果想在自己的数据库上进行Fisher Vector 编码：以ＭＣＡＤ数据库Multi-camera Action Dataset　URL:http://mmas.comp.nus.edu.sg/MCAD/MCAD.html 　为例。

首先要进行样本的随机采样。需要注意的是原代码select_pts.cpp 作者是利用标准输入cin，通过判断随机数的大小，采样１００００个样本点，然后再等间隔１００选取最终样本，所以如果直接运行select_pts.cpp 得到的是１００个样本，不足以去训练后续的ＧＭＭ。因此可以通过增大采样点数，或者不进行后续的等间隔采样，来使得到的样本个数足够多。这一步生成的是随机样本的　Traj,　HOF ,　HOG,　MBH (包括MBHx, MBHy)特征。　比如保存为Sample.traj Sample.hof , Sample.Hog, Sample.mbhx , Sample.mbhy.

然后将上述随机采样得到的样本特征Sample.traj Sample.hof , Sample.Hog, Sample.mbhx , Sample.mbhy作为输入，训练ＰＣＡ，得到因子矩阵mcad.traj.mat , mcad.hof.mat
, mcad.hog.mat , mcad.mbhx.mat , mcad.mbhy.mat ,以及经过降维后的样本特征Sample.pca.traj, Sample.pca.hog, Sample.pca.hofj, Sample.pca.mbhx, Sample.pca.mbhy。源代码train_pca 参数分别为　inputFile outputMat projDimension [projectedFile]，也就是输入特征路径，ＰＣＡ因子矩阵输出路径，通过ＰＣＡ降到的维数，输入特征经过ＰＣＡ因子矩阵降维后的特征矩阵。虽然第四个参数可以省略，但是因为训练ＧＭＭ要用到降维后的特征矩阵，所以这一步降维后的特征要保存下来。

接着，通过train_gmm,训练码本Codebook.输入参数为　inputData outputCodeBook numClusters　　inputData是降维后的特征（Sample.pca.traj, Sample.pca.hog, Sample.pca.hofj, Sample.pca.mbhx,Sample.pca.mbhy），outputCodeBook是训练后的码本（这里保存为mcad.traj.gmm, mcad.hof.gmm, mcad.hog.gmm, mcad.mbhx.gmm , mcad.mbhy.gmm ），numClusters是高斯聚类个数。

至此，ＰＣＡ因子矩阵和Codebook已经替换为ＭＣＡＤ数据库的参数，接下来修改.lst为自己生成的参数，可以将ＭＣＡD数据库的ＩＤＴ特征进行ＦｉｓｈｅｒVector 编码了。

（原作者的输入有一些是从标准输入cin逐行读取的，为方便可以修改为fstream方式，从文件中读取。）

阅读全文

0 0