Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vector源码解析

来源：互联网发布：大和号战列舰模型淘宝编辑：程序博客网时间：2024/05/22 19:07

这几天读了Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vector(CVPR 2015),跑了一下作者提供的源代码，网址链接Fisher Vector Code。

文章主要用到的数据集是PASCAL1K,COCO,FLICKR30K,与FILCKR8K

主要任务是图像文本的互查

图像特征：VGG19提取

文本特征：word2vector再利用文中所提及GMM+HGLMM方法获取Fisher Vector,获得文本那一路的输入

利用CCA方法，将图像与文本特征映射到公共空间

最后文章利用的评价标准是r@1,r@5,r@10,我采用的是MAP，来判断I2T与T2I的互查结果的好坏

如下是代码步骤详解

Package version: 1.6
---------------

Disclaimer
----------
You are free to use our HGLMM/LMM code for any purpose.
This package contains external software packages, as detailed in the References section below.
Using each of these packages is according to the package's terms of use.

References
----------
Our paper:
Ben Klein, Guy Lev, Gil Sadeh, Lior Wolf.
Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors.
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015
http://www.cs.tau.ac.il/~wolf/papers/Klein_Associating_Neural_Word_2015_CVPR_paper.pdf

We use the The word2vec word embedding. word2vec homepage:
https://code.google.com/p/word2vec/

This package contains VLFeat version 0.9.18. It was downloaded from:
http://www.vlfeat.org/

This package contains FastICA version 2.5. It was downloaded from:
http://research.ics.aalto.fi/ica/fastica/

We use the VGG convolutional network (AKA Oxfordnet) for image feature extraction:
http://www.robots.ox.ac.uk/~vgg/research/very_deep/

We use a modified version of CCA implementation originally published by Magnus Borga. The original code:
http://www.mathworks.com/matlabcentral/fileexchange/47496-l1mccaforssvep-demo-zip/content/L1MCCAforSSVEP_Demo/cca.m
The modified version of this code is in the file cca/cca_alg.m in this package. Usage of this file is for non-commercial purposes only, as determined by Magnus Borga.

Data

----
Following are links for downloading the datasets which we used for evaluating our models.

Pascal1K:
http://vision.cs.uiuc.edu/pascal-sentences/

Flickr8K:
http://nlp.cs.illinois.edu/HockenmaierGroup/8k-pictures.html
https://illinois.edu/fb/sec/1713398
(It is recommended to use the second link rather than the first one. In the first one some image links are broken).

Flickr30K:
http://shannon.cs.illinois.edu/DenotationGraph/

COCO:
http://mscoco.org/

Overview
--------
This package contains matlab code for reproducing the HGLMM (or GMM or LMM) Fisher vector (FV) sentence representation, as described in our paper.
The word embedding that we used was word2vec (the negative-sampling version). We got best results after applying the ICA transformation on the word2vec vectors (staying in the same dimension, 300).
The FV which obtained best results was based on HGLMM with 30 clusters.
This package also contains the code which we used for:
- VGG feature extraction, for image representation.
- Canonical Correlation Analysis (CCA), for mapping the sentence vectors and image vectors to a common vector space.
This package also contains explanation and example for using our pre-trained CCA model, trained on the COCO dataset. This is the model which is referred to as GMM+HGLMM in our paper.
The following sections contain instructions for each required step.
In the following, if a file/folder is mentioned without a full path, it means this file/folder is in the fv folder.

Compiling HGLMM/LMM mex files
-------------------------------
The code of EM algorithm and of FV computation is written in C++.
This package contains code for both Linux and Windows.
Linux code is in the folders:
- HGLMM_linux
- LMM_linux
Windows code is in the folders:
- HGLMM_win
- LMM_win
These folders contain pre-compiled matlab mex files.
If you would like to compile mex files on your machine, refer to the README.txt files within those folders.

Required code changes
---------------------
- in file data_dir_base.m: change the function so it will return the path of your folder where you want to store the data files (which we will create in the sequel).

Initial step
------------
Go to the fv folder.
Run matlab.
Call:
fv_init;

Word2vec

--------
Download the word2vec word embedding from the word2vec homepage:
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing
Unzip the bin file into your data folder.
In order to convert the bin file to matlab file, call (in matlab):
word2vec_binfile_to_matlab;
This script also normalizes the vectors before saving them to file.
In order to create a sample (of 300,000 vectors, out of 3,000,000) of the word2vec vectors, call:
word2vec_sample;
You might want to use this sample in the next steps.

ICA

---
Here we apply the ICA transformation on the word2vec vectors.
Our experiments show that it is enough to calculate ICA on the sampled word2vec (which we created in the previous step). It will run faster, and will not cause degradation in results.
To do it, call:
calc_ica;
2 files will be created:
- A file containing the transformed vectors.
- A file containing the ICA transition matrix.
Remarks:
- If you want to perform ICA on the entire word2vec (without sampling), change the variable is_sampled to false (in the script calc_ica.m).
- If you want PCA instead of ICA, use the script calc_pca.

Computing HGLMM/LMM/GMM model
-----------------------------
In order to compute a HGLMM/LMM/GMM model on the word2vec vectors, use the function hglmm/lmm/gmm.
For example, to compute 30-clusters HGLMM on the ICAed sampled word2vec vectors, call:
hglmm(30, true, 'ica', 300);

Encoding Sentences as Fisher Vectors
------------------------------------
See the script fv_example.m.

VGG feature extraction
----------------------
We used the VGG convolutional network (Oxfordnet) for image feature extraction.
We used their 19-layer network.
We used this network via the matcaffe interface.
For instructions for how to use the VGG network, and for downloading the network weights file (VGG_ILSVRC_19_layers.caffemodel) and the layer configuration file (VGG_ILSVRC_19_layers_deploy.prototxt), please check out these links:
http://www.robots.ox.ac.uk/~vgg/research/very_deep/
https://gist.github.com/ksimonyan/3785162f95cd2d5fee77#file-readme-md

In addition, refer to the folder vgg in this package.
In this folder there is the file VGG_ILSVRC_19_layers_deploy.feature_extarct.prototxt.
This file is same as VGG's original configuration file, except that we removed the last 4 layers (since we want feature extraction rather than classification).
Also, refer to this files:
- vgg_init.m (matcaffe one-time initialization)
- vgg_image_file_to_features.m
- vgg_feature_extract.m
The code in this files is based on VGG's demo file matcaffe_demo_vgg_mean_pix.m.

CCA

---
Relevant files are in the folder cca.
See the script cca_example.m in this folder.
This script shows how to:
1. Train a CCA model.
2. Use our pre-trained CCA model, trained on the COCO dataset. This is the model which is referred to as GMM+HGLMM in our paper.

If you are interested in using our pre-trained CCA model, you have to download the following additional files, which are not part of this package (due to their size) but are available for download from the same location as this package:
- cca_model_g30i_h30i_eta_5e-05_cntr_1_sampled.mat
- sent_vec_sampled_features_g30i_h30i.mat
the content of these files is explained in cca_example.m

%下面代码主要做的工作是将pascal sentences 的文本分为测试集与训练集，并且读入到matlab供后面及进行wordembedding TrainList=importdata('\\172.31.222.30\DataSet\pascal sentence\trainset_txt.list'); TrainLines=TrainList.textdata; Df = cell(1,800); for i =0:799        path=TrainLines{i+1};     fullpath=fullfile('\\172.31.222.30','DataSet','pascal sentence','data',path);   fid=fopen(fullpath,'rt');    content='';    while feof(fid)~=1        line=fgetl(fid);        content=strcat(content,line);   end     Df{i+1}=content;     fclose(fid); endTestList=importdata('\\172.31.222.30\DataSet\pascal sentence\testset_txt.list');TestLines=TestList.textdata;Df = cell(1,100);for i =0:99        path=TestLines{i+1};    fullpath=fullfile('\\172.31.222.30','DataSet','pascal sentence','data',path);    fid=fopen(fullpath,'rt');    content='';    while feof(fid)~=1        line=fgetl(fid);        content=strcat(content,line);    end    Df{i+1}=content;    fclose(fid);end

Correlation

% ---- training the CCA model ----% the input for the CCA training is the representations of all the correct % <sentence, image> pairs of the training set.% so you should prepare 2 matrices, X_trn and Y_trn:% X_trn: a d*n matrix where n is number of training sentences and d is the%   dimension of sentence representation. this matrix contains the%   representations of all training sentences.% Y_trn: a m*n matrix where m is the dimension of image representation.%   this matrix should satisfy that for all i, X_trn(:,i) and Y_trn(:,i)%   are vectors of a correct <sentence, image> pair. load('\\172.31.222.30\DataSet\yuanyuxin\TMM_fea\PASCAL_4096_1000.mat'); load('E:\wz\hglmm_fv\hglmm_fv_v1.6\hglmm_fv_v1.6\fv\train_hglmm_30_ica_sent_vecs.mat'); load('E:\wz\hglmm_fv\hglmm_fv_v1.6\hglmm_fv_v1.6\fv\train_gmm_30_ica_sent_vecs.mat');  %X_trn = T_tr'; X_trn=[gmm_30_ica_sent_vecs ; hglmm_30_ica_sent_vecs]; Y_trn = I_tr';% create an object of class CCAModelcca_m = CCAModel;% regularization value (should be determined using the validation set).eta = 0.001;% setting center to true enables centering of the data (reducing the mean)% we got better results with centering enabledcenter = true;% trainlg(1, 'cca train start\n');cca_m.train(X_trn, Y_trn, eta, center);lg(1, 'cca train done\n');% ---- mapping the test samples using the trained CCA model ----% setting apply_r to true will enable scaling by the eigenvalues% we got better results with this scalingcca_m.set_apply_r(true);% a matrix containing the vectors of the test sentences (each column is a% vector representation of a sentence) load('E:\wz\hglmm_fv\hglmm_fv_v1.6\hglmm_fv_v1.6\fv\test_hglmm_30_ica_sent_vecs.mat'); load('E:\wz\hglmm_fv\hglmm_fv_v1.6\hglmm_fv_v1.6\fv\test_gmm_30_ica_sent_vecs.mat'); tst_sentences = [gmm_30_ica_sent_vecs ; hglmm_30_ica_sent_vecs];% a matrix containing the vectors of the test images (each column is a% vector representation of an image) tst_images = I_te';% map the sentences vectors and normalizetst_sentences = cca_m.map(tst_sentences, true, true);% map the images vectors and normalizetst_images = cca_m.map(tst_images, false, true);save('tst_sentences.mat','tst_sentences');save('tst_images.mat','tst_images');% D1 = pdist([tst_sentences; tst_images],'cosine');% D1 = D1( ~ isnan(D1));% W1=mean(D1(:));                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               % after this mapping, we used the cosine similarity to score the similarity% between each image and each sentence% % ---- using our provided pre-trained CCA model ----% % % below is an example for using our pre-trained CCA model, trained on the COCO dataset.% % this is the model which is referred to as GMM+HGLMM in our paper.% % this model expects the following image and sentence representations:% %   image representation: the usual VGG representation, as described in the file README.txt.% %   sentence representation: the GMM+HGLMM representation, which means concatenation of GMM FV% %     and HGLMM FV (both with 30 clusters, and both on top of ICA-transformed word2vec). due to memory% %     limitation, we had to reduce the representation dimension before CCA training. we did it% %     by random sampling of 20,000 coordinates out of 36,000. the code below examplifies this and% %     uses the same coordinate sample (which is the one that our CCA model expects).% % f='\\172.31.222.30\DataSet\yuanyuxin\TMM_fea\PASCAL_4096_1000.mat';% load(f);% % pre-trained CCA model file name (chage the folder to your own)% cca_model_file_name = 'E:\yuanyuxin\Fisher\hglmm_fv\cca_model_g30i_h30i_eta_5e-05_cntr_1_sampled.mat';% % this will load a variable named cca_m of type CCAModel% load(cca_model_file_name);% % % the sampled coordinates are stored in this file (chage the folder to your own)% sent_vec_sampled_features_file_name = 'E:\yuanyuxin\Fisher\hglmm_fv\sent_vec_sampled_features_g30i_h30i.mat';% % a varialbe named sent_vec_sampled_features will be loaded% load(sent_vec_sampled_features_file_name);% % % in fv_example.m we have genereted hglmm_30_ica_sent_vecs (the needed HGLMM representation% % for sentences) and gmm_30_ica_sent_vecs (the needed GMM representation for sentences)% % now let's concatenate these representations:% tst_sentences = [gmm_30_ica_sent_vecs ; hglmm_30_ica_sent_vecs];% % % now we apply the coordinate sampling explained above% tst_sentences = tst_sentences(sent_vec_sampled_features, :);% % % now we can map the sentences vectors as shown in the previous example above:% % cca_m.set_apply_r(true);% % map the sentences vectors and normalize% tst_sentences = cca_m.map(tst_sentences, true, true);% % % and we can map images vectors as well:% % % map the images vectors and normalize% tst_images = cca_m.map(tst_images, false, true);

CALMAP

load('./t2i/img_fea.mat');imgfea1 = x;load('./t2i/txt_fea.mat');txtfea1 = x;load('./i2t/img_fea.mat');imgfea2 = x;load('./i2t/txt_fea.mat');txtfea2 = x;load('test_lab.mat');imgfea1=pre_cnn(imgfea1);txtfea1=pre_cnn(txtfea1);imgfea2=pre_cnn(imgfea2);txtfea2=pre_cnn(txtfea2);%imgfea1=sqrtNorm(imgfea1);%txtfea1=sqrtNorm(txtfea1);%imgfea2=sqrtNorm(imgfea2);%txtfea2=sqrtNorm(txtfea2);imgcat = test_lab;txtcat = test_lab;te_n_I = 8000;te_n_T = 8000;D1 = pdist([imgfea1; txtfea1],'cosine');W1 = -squareform(D1);D2 = pdist([imgfea2; txtfea2],'cosine');W2 = -squareform(D2);max=0;aa=0;IT=0;TI=0;for i=0:0.01:1W =i*W1+(1-i)*W2;WIA = W(1:te_n_I,:);WTA = W(te_n_I+1:end,:);WII = W(1:te_n_I,1:te_n_I);WTT = W(te_n_I+1:end,te_n_I+1:end);WIT = W(1:te_n_I,te_n_I+1:end);WTI = W(te_n_I+1:end,1:te_n_I);map=0;[mapIT, prIQ_IT, mapICategory_IT, ap_IT] = QryonTestBi(WIT, imgcat, txtcat);map=map+mapIT;%Text->Image [mapTI, prIQ_IT, mapICategory_IT, ap_IT] = QryonTestBi(WTI, txtcat, imgcat);  map=map+mapTI;  map=map/2;  if map>max      max=map;      aa=i;      IT=mapIT;      TI=mapTI;  endenddisp(['Image->Text Query MAP: ' num2str(IT)]);disp(['Text->Image Query MAP: ' num2str(TI)])disp(['max avg: ' num2str(max)]);disp(['weight: ' num2str(aa)]);

阅读全文

0 0