Image Super-Resolution as Sparse Representation of Raw Image Patches

来源：互联网发布：模拟股指期货交易软件编辑：程序博客网时间：2024/05/21 08:39

本文为杨建超CVPR08上文章Image Super-Resolution as Sparse Representation of Raw Image Patches的读书笔记，针对如何运动压缩感知的理论、稀疏表示来进行超分辨重建。

Image Super-Resolution as Sparse Representation of Raw Image Patches

To_捭阖_youth

一、Theoretical basis and Motivation

1、Previous methods

①Conventional approaches

require multiple low-resolution images of the same scene,typically aligned with sub-pixel accuracy

The SR task—recover the original high-resolution image by fusing the low-resolution images.

Problem —much information is lost in the high-to-low generation process,the reconstruction problem is severely undetermined,

the solution is not unique.

methods —MAP、BTV、MRF（【2】new problems：the performance degrades rapidly if the magnification factor is large or if the is not enough low-resolution images to constrain the solution）

===>overcome(【12】an example-based learning(require enormous databases)、【22】LEE from manifolds learning(result in blurring effects,due to over-or under-fitting))

②Our method

not require any learning on high-resolution patches,insteadworking directly with the low-resolution training patches or features.

Two steps:

①a local model from the sparse prior to recover lost high-frequency for local details

(using the sparse prior,find the sparse representation for each local patch)

②the global model from the reconstruction constraint to remove possible artifact

(regularize and refine the entire image and global optimization to eliminate the reconstruction errors,suppressing noise and ensuring consistency with the low-resolution input)

The advantage:

①require a much smaller database

②superior performance,both qualitatively and quantitatively(the couputed sparse representation adaptively selects the most relevant patches in the dictionary to the best represent each patch of the given low-resolution image)

二、Super-resolution from Sparisity

1、Reconstruction constraint

Y=DHX (1)

(Y:a low-resolution image;X:a higher-resolution image X of the same scene;

H represent a burring filter;D the downsampling operator.)

2、Sparse representation prior

the patches x of the high-resolution image X:

2.1、Local Model from Sparse Representation

【Remarks】

①Subtract the mean pixel value for each patch,so that the dictionary represents image textures rather than abolute intensities

②（3）式F penalize visually salient high-frequency errors;F is chaosen as some kind of high-pass filter.

(more sensitive to high-frequency content image;the high-frequency components of low-resolution image are the most important for predicting the lost high-frequency content)

2.2、Enforcing Global Reconstruction Constraint

Notice do not demand exact equality between y and Dl*a；and noise,

Eliminate this discrepency by projecting X0 onto DHX=Y,computing

Algorithm:

三、Dictionary Preparation

3.1、Random Raw Patches from Training Images

Generate dictionaries by simply randomly sampling raw patchsfrom training images of similar statistical nature;

For each high-resolution training image X,generate the correspond-ing low-resolution image Y bybluring and downsampling.

3.2、Derivative Features

Previous methods:

①Freeman 【12】use a high-pass filter

②Sun【23】use a set of Gaussion derivativefilters

③Chang 【5】use the first-order and second-order gradients of the patches

Our method:also use the first-order and second-order derivatives as the feature for the low-resolution patch.

finally concatenate the four description feature vectors as one vector as the final representation of the low-resolution patch.

四、Experiments

Experimental settings:

Magnify factor:3

Low-reolution patches:3x3 with overlap of 1 pixel between adjacent patches

High-reolution patches:9x9 with overlap of 3 pixel between adjacent patches

Freature:not extracted directly from the 3x3 low-resolution patch,but rather from an upsampled version produced by bicubic interpolation.

λ:balance sparisity of the solution with fidelity to the reconstruction constraint,λ=50 x dim(patch feature)

Experimental results:

NE【5】:generate sharp edges,but blurs the texture

BP:many jiagged effects along the edges

Bicubic:

SE【6】:give a decent reconstruction,but undeesired smoothing

Our:gives clearer fur and sharper edger;is more faithful to the original facial texture

五、Discussion

From a more practical standpoint, it would be desirabl to have a way of effectively combining dictionaries to work with images containing multiple types of textures or multiple object categories. One approach to this would integrate supervised image segmentation and super-resolution,applying the appropriate dictionary within each segment.

六、附录代码

下面为杨建超主页CVPR08代码：

main.m

% Image super-resolution using sparse representation% Example code%% Nov. 2, 2007. Jianchao Yang% IFP @ UIUC%% Revised version. April, 2009.%% Reference% Jianchao Yang, John Wright, Thomas Huang and Yi Ma. Image superresolution% via sparse representation of raw image patches. IEEE Computer Society% Conference on Computer Vision and Pattern Recognition (CVPR), 2008. %% For any questions, email me by jyang29@illinois.educlear all;clc;addpath('Solver');addpath('Sparse coding');% =====================================================================% specify the parameter settingspatch_size = 3; % patch size for the low resolution input imageoverlap = 1; % overlap between adjacent patcheslambda = 0.1; % sparsity parameterzooming = 3; % zooming factor, if you change this, the dictionary needs to be retrained.tr_dir = 'Data/training'; % path for training imagesskip_smp_training = true; % sample training patchesskip_dictionary_training = true; % train the coupled dictionarynum_patch = 50000; % number of patches to sample as the dictionarycodebook_size = 1024; % size of the dictionaryregres = 'L1'; % 'L1' or 'L2', use the sparse representation directly, or use the supports for L2 regression% =====================================================================% training coupled dictionaries for super-resolutionif ~skip_smp_training,    disp('Sampling image patches...');    [Xh, Xl] = rnd_smp_dictionary(tr_dir, patch_size, zooming, num_patch);    save('Data/Dictionary/smp_patches.mat', 'Xh', 'Xl');    skip_dictionary_training = false;end;if ~skip_dictionary_training,    load('Data/Dictionary/smp_patches.mat');    [Dh, Dl] = coupled_dic_train(Xh, Xl, codebook_size, lambda);    save('Data/Dictionary/Dictionary.mat', 'Dh', 'Dl');else    load('Data/Dictionary/Dictionary.mat');end;% =====================================================================% Process the test image fname = 'Data/Test/1.bmp';testIm = imread(fname); % testIm is a high resolution image, we downsample it and do super-resolutionif rem(size(testIm,1),zooming) ~=0,    nrow = floor(size(testIm,1)/zooming)*zooming;    testIm = testIm(1:nrow,:,:);end;if rem(size(testIm,2),zooming) ~=0,    ncol = floor(size(testIm,2)/zooming)*zooming;    testIm = testIm(:,1:ncol,:);end;imwrite(testIm, 'Data/Test/high.bmp', 'BMP');lowIm = imresize(testIm,1/zooming, 'bicubic');imwrite(lowIm,'Data/Test/low.bmp','BMP');interpIm = imresize(lowIm,zooming,'bicubic');imwrite(uint8(interpIm),'Data/Test/bb.bmp','BMP');% work with the illuminance domain only RGB转化成YCBCRlowIm2 = rgb2ycbcr(lowIm);lImy = double(lowIm2(:,:,1));%Y成分指的是亮度分量；Cb指的是蓝色色度分量；Cr指的是红色色度分量% bicubic interpolation for the other two channelsinterpIm2 = rgb2ycbcr(interpIm);hImcb = interpIm2(:,:,2);hImcr = interpIm2(:,:,3);%这里我有一个问题就是为什么要进行对Cb和Cr通道就进行双插值法？？？？？% ======================================================================% Super-resolution using sparse representationdisp('Start superresolution...');[hImy] = L1SR(lImy, zooming, patch_size, overlap, Dh, Dl, lambda, regres);ReconIm(:,:,1) = uint8(hImy);ReconIm(:,:,2) = hImcb;ReconIm(:,:,3) = hImcr;nnIm = imresize(lowIm, zooming, 'nearest');figure, imshow(nnIm);title('Input image');pause(1);figure, imshow(interpIm);title('Bicubic interpolation');pause(1)ReconIm = ycbcr2rgb(ReconIm);figure,imshow(ReconIm,[]);title('Our method');imwrite(uint8(ReconIm),'Data/Test/L1SR.bmp','BMP');

coupled_dic_train.m

function [Dh, Dl] = coupled_dic_train(Xh, Xl, codebook_size, lambda)addpath('Sparse coding/sc2');hDim = size(Xh, 1);lDim = size(Xl, 1);% joint learning of the dictionaryX = [1/sqrt(hDim)*Xh; 1/sqrt(lDim)*Xl];X = X(:, 1:80000);%取出X中从第1列到第80000列的元素Xnorm = sqrt(sum(X.^2, 1));%求出X范数  得到的是一个行向量1×80000clear Xh Xl;X = X(:, Xnorm > 1e-5);%取Xnorm大于一个界限的列X = X./repmat(sqrt(sum(X.^2, 1)), hDim+lDim, 1);%归一化处理，这里的归一化处理利用范数，repmat指的是平铺矩阵idx = randperm(size(X, 2));%设置索引或者说是标签Binit = X(:, idx(1:codebook_size));%由于字典的大小只有1024，故取这些列作为求解D的初始的X%这里有一个问题要思考：为什么不直接从X中取出第一列到1024列来作为Binit呢，而要先随机排列下，之后再选择呢？%我的思考是这样做的目地是为了保持一个随机性。[D] = sparse_coding(X, codebook_size, lambda/2, 'L1', [], 50, 5000, [], [], Binit);Dh = D(1:hDim, :);%分配好高分辨与低分辨；操作方法是取行Dl = D(hDim+1:end, :);% normalize the dictionaryDh = Dh./repmat(sqrt(sum(Dh.^2, 1)), hDim, 1);Dl = Dl./repmat(sqrt(sum(Dl.^2, 1)), lDim, 1);

rnd_smp_dictionary.m

function [Xh, Xl] = rnd_smp_dictionary(tr_dir, patch_size, zooming, num_patch)fpath = fullfile(tr_dir, '*.bmp');img_dir = dir(fpath);%常见的输入图像操作Xh = [];Xl = [];img_num = length(img_dir);nums = zeros(1, img_num);for num = 1:length(img_dir),%for语句打逗号和不打逗号一样    im = imread(fullfile(tr_dir, img_dir(num).name));    nums(num) = prod(size(im));%每个图像的大小end;nums = floor(nums*num_patch/sum(nums));%确定每个图为１０万块贡献的块数for ii = 1:img_num,        patch_num = nums(ii);%用 patch_num来表示91副图中每一幅图为10万块贡献的块数    im = imread(fullfile(tr_dir, img_dir(ii).name));        [H, L] = sample_patches(im, patch_size, zooming, patch_num);        Xh = [Xh, H];    Xl = [Xl, L];        fprintf('Sampled...%d\n', size(Xh, 2));end;function [HP, LP] = sample_patches(im, patch_size, zooming, patch_num)lz = 2;if size(im, 3) == 3,    hIm = rgb2gray(im);else    hIm = im;end;%把行和列变成能被zooming整除的数，这里是将176变成174if rem(size(hIm,1),zooming)%求余数函数    nrow = floor(size(hIm,1)/zooming)*zooming;    hIm = hIm(1:nrow,:);end;if rem(size(hIm,2),zooming)    ncol = floor(size(hIm,2)/zooming)*zooming;    hIm = hIm(:,1:ncol);end;%由高分辨训练图像块得到低分辨训练图像块，这里直接缩放zooming即可得到lIm = imresize(hIm,1/zooming);[nrow, ncol] = size(lIm);%低分辨训练图像取块（行、列）,把行列从1随机打乱来，之后形成网格x = randperm(nrow-patch_size-lz-1);y = randperm(ncol-patch_size-lz-1);[X,Y] = meshgrid(x,y);%将网格中X，Y都排成列向量，并且从第一列再到第二列这样排xrow = X(:);ycol = Y(:);xrow = xrow(1:patch_num);ycol = ycol(1:patch_num);% zoom the original imagelIm = imresize(lIm, lz,'bicubic');hIm = double(hIm);lIm = double(lIm);H = zeros(zooming^2*patch_size^2,patch_num);L = zeros(lz^2*4*patch_size^2,patch_num); % compute the first and second order gradientshf1 = [-1,0,1];vf1 = [-1,0,1]'; lImG11 = conv2(lIm,hf1,'same');lImG12 = conv2(lIm,vf1,'same'); hf2 = [1,0,-2,0,1];vf2 = [1,0,-2,0,1]'; lImG21 = conv2(lIm,hf2,'same');lImG22 = conv2(lIm,vf2,'same');count = 1;%高分辨训练样本取块和低分辨样本区块；这里有一个很有意思的东西，就是他取特征的时候，每次进行两列，内部存在一个转置的关系for pnum = 1:patch_num,        hrow = (xrow(pnum)-1)*zooming + 1;    hcol = (ycol(pnum)-1)*zooming + 1;    Hpatch = hIm(hrow:hrow+zooming*patch_size-1,hcol:hcol+zooming*patch_size-1);%高分辨训练图像块区块9×9        lrow = (xrow(pnum)-1)*lz + 1;    lcol = (ycol(pnum)-1)*lz + 1;    %     fprintf('(%d, %d), %d, [%d, %d]\n', lrow, lcol, lz*patch_size,%     size(lImG11));    Lpatch1 = lImG11(lrow:lrow+lz*patch_size-1,lcol:lcol+lz*patch_size-1);%6×6    Lpatch2 = lImG12(lrow:lrow+lz*patch_size-1,lcol:lcol+lz*patch_size-1);    Lpatch3 = lImG21(lrow:lrow+lz*patch_size-1,lcol:lcol+lz*patch_size-1);    Lpatch4 = lImG22(lrow:lrow+lz*patch_size-1,lcol:lcol+lz*patch_size-1);         Lpatch = [Lpatch1(:),Lpatch2(:),Lpatch3(:),Lpatch4(:)];    Lpatch = Lpatch(:);%低分辨取块加特征提取；这里可以说明文中一个说法得到四个梯度特征最后归一为一个方向的特征         HP(:,count) = Hpatch(:)-mean(Hpatch(:));%将高分辨训练样本每一块都减去均值（按列）    LP(:,count) = Lpatch;        count = count + 1;        Hpatch = Hpatch';    Lpatch1 = Lpatch1';    Lpatch2 = Lpatch2';    Lpatch3 = Lpatch3';    Lpatch4 = Lpatch4';    Lpatch = [Lpatch1(:),Lpatch2(:),Lpatch3(:),Lpatch4(:)];        HP(:,count) = Hpatch(:)-mean(Hpatch(:));    LP(:,count) = Lpatch(:);    count = count + 1;    end;

L1SR.m

function [hIm, ww] = L1SR(lIm, zooming, patch_size, overlap, Dh, Dl, lambda, regres)% Use sparse representation as the prior for image super-resolution% Usage%       [hIm] = L1SR(lIm, zooming, patch_size, overlap, Dh, Dl, lambda)% % Inputs%   -lIm:           low resolution input image, single channel, e.g.%   illuminance%   -zooming:       zooming factor, e.g. 3%   -patch_size:    patch size for the low resolution image%   -overlap:       overlap among patches, e.g. 1%   -Dh:            dictionary for the high resolution patches%   -Dl:            dictionary for the low resolution patches%   -regres:       'L1' use the sparse representation directly to high%                   resolution dictionary;%                   'L2' use the supports found by sparse representation%                   and apply least square regression coefficients to high%                   resolution dictionary.% Ouputs%   -hIm:           the recovered image, single channel%% Written by Jianchao Yang @ IFP UIUC% April, 2009% Webpage: http://www.ifp.illinois.edu/~jyang29/% For any questions, please email me by jyang29@uiuc.edu%% Reference% Jianchao Yang, John Wright, Thomas Huang and Yi Ma. Image superresolution% as sparse representation of raw image patches. IEEE Computer Society% Conference on Computer Vision and Pattern Recognition (CVPR), 2008. %[lhg, lwd] = size(lIm);hhg = lhg*zooming;hwd = lwd*zooming;mIm = imresize(lIm, 2,'bicubic');[mhg, mwd] = size(mIm);hpatch_size = patch_size*zooming;mpatch_size = patch_size*2;% extract gradient feature from lImhf1 = [-1,0,1];vf1 = [-1,0,1]';hf2 = [1,0,-2,0,1];vf2 = [1,0,-2,0,1]';lImG11 = conv2(mIm,hf1,'same');lImG12 = conv2(mIm,vf1,'same');lImG21 = conv2(mIm,hf2,'same');lImG22 = conv2(mIm,vf2,'same');lImfea(:,:,1) = lImG11;lImfea(:,:,2) = lImG12;lImfea(:,:,3) = lImG21;lImfea(:,:,4) = lImG22;lgridx = 2:patch_size-overlap:lwd-patch_size;%2：3-1：85-3变成1×41的行向量；大小从2开始，每间隔2，最大值为82lgridx = [lgridx, lwd-patch_size];%补充并重复最后列的值，%我提出一个问题，这样做的目地是？？？？lgridy = 2:patch_size-overlap:lhg-patch_size;%2：3-1：86-3变成1×42的行向量；大小从2开始，每间隔2，最大值为84lgridy = [lgridy, lhg-patch_size];%补充并重复最后列的值，%将lgridx、dy每一元素进行一个变换得到mgridx、dymgridx = (lgridx - 1)*2 + 1;mgridy = (lgridy - 1)*2 + 1;%我提出一个问题，这样做的目地是？？？？我认为应该是由于放大倍数的关系，导致索引的变化；分析可以看下面对于高分辨的处理% using linear programming to find sparse solutionbhIm = imresize(lIm, 3, 'bicubic');hIm = zeros([hhg, hwd]);nrml_mat = zeros([hhg, hwd]);hgridx = (lgridx-1)*zooming + 1;hgridy = (lgridy-1)*zooming + 1;disp('Processing the patches sequentially...');count = 0;% loop to recover each patchfor xx = 1:length(mgridx),    for yy = 1:length(mgridy),                mcolx = mgridx(xx);        mrowy = mgridy(yy);                count = count + 1;        if ~mod(count, 100),            fprintf('.\n');        else            fprintf('.');        end;        mpatch = mIm(mrowy:mrowy+mpatch_size-1, mcolx:mcolx+mpatch_size-1);%取mIm（3：3+6-1，3：3+6-1） 6×6        mmean = mean(mpatch(:));                mpatchfea = lImfea(mrowy:mrowy+mpatch_size-1, mcolx:mcolx+mpatch_size-1, :);%取lImfea（3：3+6-1，3：3+6-1）6×6        mpatchfea = mpatchfea(:);                mnorm = sqrt(sum(mpatchfea.^2));                if mnorm > 1,            y = mpatchfea./mnorm;%这就相当于归一化        else            y = mpatchfea;        end;                w = SolveLasso(Dl, y, size(Dl, 2), 'nnlasso', [], lambda);%解决Lasso问题得出的结果为w为1022×1，由于是稀疏表示系数；肯定大部分都是0%         w = feature_sign(Dl, y, lambda);                if isempty(w),            w = zeros(size(Dl, 2), 1);        end;        switch regres,            case 'L1'                if mnorm > 1,%这里mnorm求解得84.6018                    hpatch = Dh*w*mnorm;                else                    hpatch = Dh*w;                end;%我这里提出一个问题为什么乘以一个mnorm，它怎么又跟高分辨hpatch联系起来了，这个是什么意思？？？？            case 'L2'                idx = find(w);                lsups = Dl(:, idx);                hsups = Dh(:, idx);                w = inv(lsups'*lsups)*lsups'*mpatchfea;                hpatch = hsups*w;            otherwise                error('Unknown fitting!');        end;        hpatch = reshape(hpatch, [hpatch_size, hpatch_size]);        hpatch = hpatch + mmean;%注意这里加上了均值                hcolx = hgridx(xx);        hrowy = hgridy(yy);                hIm(hrowy:hrowy+hpatch_size-1, hcolx:hcolx+hpatch_size-1)...            = hIm(hrowy:hrowy+hpatch_size-1, hcolx:hcolx+hpatch_size-1) + hpatch;%取hIm（4：4+9-1，4：4+9-1） 9×9        nrml_mat(hrowy:hrowy+hpatch_size-1, hcolx:hcolx+hpatch_size-1)...            = nrml_mat(hrowy:hrowy+hpatch_size-1, hcolx:hcolx+hpatch_size-1) + 1;%我提出一个问题nrml_mat到底在这里面作用是？？？因为里面除了1就是0，后面又将0变成1，之后hIm每个元素除以它的每个元素的1    end;end;fprintf('done!\n');% fill the emptyhIm(1:3, :) = bhIm(1:3, :);hIm(:, 1:3) = bhIm(:, 1:3);hIm(end-2:end, :) = bhIm(end-2:end, :);hIm(:, end-2:end) = bhIm(:, end-2:end);nrml_mat(nrml_mat < 1) = 1;hIm = hIm./nrml_mat;hIm = uint8(hIm);

code: Jianchao Yang webpage

Study notes:CVPR08-SR

本文尚不成熟，希望大家提出宝贵意见。

敬请关注本博客和新浪微博To_捭阖_youth.