常用的范数求导

来源:互联网 发布:软件开发零基础 编辑:程序博客网 时间:2024/05/16 12:50

矢量范数的偏导数

  1. L1范数不可微。但是存在次梯度,即是次微分的。
    L1范数的次梯度如下:
    x||x||1=sign(x)

    其中sign(x) 表示如下:
    sign(x)=+11[1,1]xi>0xi<0xi=0

在实验中,我们经常很碰到,一个函数表达式中含有多个带有绝对值表达式,我们为了去掉绝对值号,进行化简,经常需要假设函数绝对值中的表达式满足>0或者<0来消去绝对值。但是当变量很多时,很难划分这样的空间。例如上面的L1就是一个例子:

||x||1=|x1|+|x2|+....+|xn|

对于一维的情况:
|x|={xxx0x0

但是对于高维的情况,我们很难写书上面明确的展开式。但是,在数值计算中,除了所谓的符号运。都是在知道明确的“值”的情况下,来进行求解的。
因此,知道了具体的值,我们很容易确定这个值的梯度。例如,对于3维的情况。如果值为x0=[3,2,5]T,我们很容易知道,其函数值展开的表达式为:
||x||1=x1x2+x3

故其梯度为[1,1,1]T,即sign(x)
2. L2 范数:
x||xa||2=xa||xa||2

xxa||xa||2=I||xa||2(xa)(xa)T||xa||32

||x||22x=||xTx||2x=2x

例如:求解下面函数的偏导数:

f(W)=12i,jS1γi,j||wTiXwTjX||22

其中W是矩阵,大小D×LX是矩阵,大小为D×N,其中D是特征向量的维度,L是任务的数量,N是样本的数量。则矢量*矩阵,即wTiX是一个矢量,矢量和矢量也是矢量,故这是要求解矢量L2范数的偏导数。
f(W)wi===i,jS1γi,j(wTiXwTjX)(wTiXwTjX)wii,jS1γi,j(wTiXwTjX)XTi,jS1γi,j(wTiwTj)(XXT)

注意这里得到的是行向量的形式,因此还需要对其进行转置
wj求偏导数:

f(W)wj===i,jS1γi,j(wTjXwTiX)(wTjXwTiX)wji,jS1γi,j(wTjXwTiX)XTi,jS1γi,j(wTjwTi)(XXT)

Matlab对低维数据进行验证:

% verifysyms x11 x12 x21 x22 x31 x32 wi1 wi2 wi3 wj1 wj2 wj3 real;% define symbols as real value%method 1X = [x11 x12;x21 x22;x31 x32]; % X 3*2 ,two sampels ,feature dimension 3wi = [wi1 wi2 wi3]';wj = [wj1 wj2 wj3]';fw = 1/2*norm(wi'*X-wj'*X,2).^2;grad_wi1 = diff(fw,wi1); %\partial wi1grad_wi2 = diff(fw,wi2); %\partial wi2grad_wi3 = diff(fw,wi3); %\partial wi3grad_wi0 =[grad_wi1;grad_wi2;grad_wi3];% method 2grad_wi = (wi'*X-wj'*X)*X';grad_wi = grad_wi';clc;disp('method 1:');disp(grad_wi0);disp('method 2:');disp(grad_wi);disp('%%%%%%%%%%%%%%%%%%%%%% \partial wj %%%%%%%%%%%%%%%%%%');% method 1grad_wj1 = diff(fw,wj1);grad_wj2 = diff(fw,wj2);grad_wj3 = diff(fw,wj3);grad_wj0 = [grad_wj1;grad_wj2;grad_wj3];% method2grad_wj = (wj'*X- wi'*X)*X';grad_wj = grad_wj';disp('method 1:');disp(grad_wj0);disp('method 2:');disp(grad_wj);

结果:

method 1: x11*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x12*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) x21*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x22*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) x31*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x32*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)method 2: x11*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x12*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) x21*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x22*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) x31*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x32*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)%%%%%%%%%%%%%%%%%%%%%% \partial wj %%%%%%%%%%%%%%%%%%method 1: - x11*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x12*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) - x21*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x22*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) - x31*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x32*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)method 2: - x11*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x12*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) - x21*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x22*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32) - x31*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x32*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)

参考文献:

  1. The Matrix Cookbook
0 0
原创粉丝点击