Conjugate Gradient
来源:互联网 发布:乱世佳人英文影评知乎 编辑:程序博客网 时间:2024/06/17 05:51
Conjugate Gradient
Before diving in to Haskell, let’s go over exactly what the conjugate gradient method is and why it works. The “normal” conjugate gradient method is a method for solving systems of linear equations. However, this extends to a method for minimizing quadratic functions, which we can subsequently generalize to minimizing arbitrary functions
Suppose we have some quadratic function
for
We can write any quadratic function in this form, as this generates all the coefficients
Taking the gradient of
which you can verify by writing out the terms in summation notation.
If we evaluate
Note that computing
Since this is a quadratic function in
The minimum of this function occurs when
Solving this for
Note that since the directon is the negative of the gradient, a.k.a. the direction of steepest descent,
If this were simple gradient descent, we would iterate this procedure, computing the gradient at each next point and moving in that direction. However, this has a problem - by moving
We define two vectors
Since we have already moved in the
This leaves us with the obvious question - what is
Choosing this
Thus, the full Conjugate Gradient algorithm for quadratic functions:
Let
f be a quadratic functionf(x)=12xTAx+bTx+c which we wish to minimize.
1. Initialize: Leti=0 andxi=x0 be our initial guess, and computedi=d0=−∇f(x0) .
2. Find best step size: Computeα to minimize the functionf(xi+αdi) via the equation
α=−diT(Axi+b)diTAdi.
3.Update the current guess: Letxi+1=xi+αdi .
4.Update the direction: Letdi+1=−∇f(xi+1)+βidi whereβi is given by
βi=∇f(xi+1)TAdidiTAdi.
5.Iterate: Repeat steps 2-4 until we have looked inn directions, where nn is the size of your vector space (the dimension ofx ).
Nonlinear Conjugate Gradient
So, now that we’ve derived this for quadratic functions, how are we going to use this for general nonlinear optimization of differentiable functions? To do this, we’re going to reformulate the above algorithm in slightly more general terms.
First of all, we will revise step two. Instead of
Find best step size: Compute
α to minimize the functionf(xi+αdi) via the equation
α=−diT(Axi+b)diTAdi.
we will simply use a line search:
Find best step size: Compute
α to minimize the functionf(xi+αdi) via a line search in the directiondi .
In addition, we must reformulate the computation of
Therefore,
Conveniently enough, the value of
Update the direction: Let
dk+1=−∇f(xk+1)+βkdk whereβk is given by
βk=∇f(xk+1)T(∇f(xk+1)−∇f(xk))dkT(∇f(xk+1)−∇f(xk)).
We can now apply this algorithm to any nonlinear and differentiable function! This reformulation ofβ is known as the Polak-Ribiere method; know that there are others, similar in form and also in use.
Line Search
The one remaining bit of this process that we haven’t covered is step two: the line search. As you can see above, we are given a point
There are many ways to do this line search, and they can range from relatively simple linear methods (like the secant method) to more complex (using quadratic or cubic polynomial approximations).
One simple method for a line search is known as the bisection method. The bisection method is simply a binary search. To minimize a univariate function
Another simple method is known as the secant method. Like the bisection method, the secant method requires two initial points
It then finds the root of this linear approximation, setting
It then evaluates
There are more line search methods, but the last one we will examine is one known as Dekker’s method. Dekker’s method is a combination of the secand method and the bisection method. Unlike the previous two methods, Dekker’s method keeps track of three points:
ak : the current “contrapoint”bk : the current guess for the rootbk−1 : the previous guess for the root
Brent’s method then computes the two possible next values:
After
Dekker’s method is effectively a heuristic method, but is nice in practice; it has the reliability of the bisection method and gains a boost of speed from its use of the secant method.
- Conjugate Gradient
- Conjugate Gradient method
- linear conjugate gradient method
- Conjugate Gradient (转)
- Notes on Conjugate Gradient Method
- 共轭梯度法(conjugate gradient method)
- 对Conjugate Gradient 优化的简单理解
- 共轭梯度(Conjugate Gradient )笔记
- conjugate gradient method (共轭梯度法)
- conjugate gradient method (共轭梯度法)
- 共轭梯度法(conjugate gradient method)
- Conjugate gradient method(共轭梯度算法)
- 学习BLAS库 -- Conjugate Gradient Method
- 共轭梯度法(Conjugate Gradient Method)
- comparison of direct search and conjugate gradient method
- <zz>conjugate gradient共轭梯度方法及matlab推导
- 共轭梯度法学习 The Conjugate Gradient method
- 基于OpenCV CxCore和Conjugate Gradient Method求函数局部极小值的抽象类
- Python命令行参数学习
- POJ 3250 Bad Hair Day【单调栈】
- VMwareWorkstation10安装虚拟机
- 文本可视化研究
- springWeb开题笔记
- Conjugate Gradient
- Jzoj4744 同余
- 【1200】判断三角形是否为直角三角形
- codevs 2370 小机房的树 (lca)
- angularJs表格添加删除
- ubuntu 16.04硬盘分区方案
- 浓墨重彩之MySQL-02-数据库结构
- 【容斥原理+逆元+组合数+费马小定理+快速幂】UVALive
- 1039. 到底买不买(20)