[First order method] Gradient descent tools

来源:互联网 发布:网络电视怎么能搜索 编辑:程序博客网 时间:2024/05/23 17:22

  • Lipschitz gradient
  • Strong continuous
  • Co-coercivity of gradient
    • 1 Lipschitz of gradient
    • 2 Strong convex

1. Lipschitz gradient

If f(x) L-is lipschitz continuous, then we have

12Lf(x)22f(x)f(x)L2xx22

The right hand is because that

f(x)f(x)+f(x)(xx)+L2xx22=L2xx22

The left hand is because that
f(x)f(y)infyf(x)+f(x)(yx)+L2yx22=f(x)12Lf(x)22

This tells that function with lipschitz derivative is upper bounded by a quadratic function.

2. Strong continuous

If f(x) is m-strong continuous, then we have

m2xx22f(x)f(x)12mf(x)22

The left hand is because that

f(x)f(x)+f(x)(xx)+m2xx2=m2xx2

The left hand is because that y

f(y)infyf(x)+f(x)(yx)+m2xy22=f(x)+12mf(x)22

so
f(x)f(x)+12mf(x)22

This tells that function with strong convexity is lower bounded by a quadratic function.

3. Co-coercivity of gradient

3.1 Lipschitz of gradient

If f(x) is convex and L2x22f(x) is convex(f(x) is lipschitz continuous), then we have

0(f(x)f(y))(xy)Lxy22

which can be rewritten as

((Lxf(x))(Lyf(y)))(xy)0

which says that

g(z)=L2z22f(z)

with increasing derivative Lzf(z), is convex. So both of
g(z)+f(x)=L2z22f(z)+f(x)zfx(z)g(z)+f(y)=L2z22f(z)+f(y)zfy(z)

are convex, then both of fx(z) and fy(z) are L-lipschitz. So
f(y)f(x)f(x)(yx)=(f(y)+f(x)y)(f(x)f(x)x)=fx(y)fx(x)12Lfx(y)22=12Lf(y)f(x)22

the same
f(x)f(y)f(y)(xy)12Lf(y)f(x)22

combining these two, we get the co-coercivity of L-lipschitz gradient is

(f(x)f(y))(xy)1Lf(y)f(x)22

3.2 Strong convex

If f(x) is m-strong convex, then we have

h(x)=f(x)m2x22
is convex.

And from theorem in blog
http://blog.csdn.net/comeyan/article/details/50541596#2-strong-convex
there exists a M such that f(x) is M-lipschitz. So we have

h(x)h(y)2=f(x)mxf(y)+m(y)2(M+m)xy2

So h(x) is M+m lipschitz continuous. From last subsection, we know that

(h(x)h(y))(xy)1M+mh(y)h(x)22(f(x)mxf(z)+my)(xy)1M+mf(x)mxf(z)+my22(f(x)f(y))(xy)1M+mf(x)f(y)22+mMM+mxy22

0 0