Rank-one updates for faster matrix inversion

来源:互联网 发布:胡退位真相知乎 编辑:程序博客网 时间:2024/06/07 03:23

Making a “rank-one update” to a matrix {\mathbf{A}} means adding to the matrix another matrix of rank one:

\displaystyle  \mathbf{B}=\mathbf{A}+\mathbf{vv^{t}}

This actually occurs quite a bit in statistics and optimisation – I’m interested in the topic because rank-one updates occur in certain applications of Expectation Propagation, but a perhaps more important case is in sequential regression problems. Imagine doing linear regression, but being given the datapoints one-by-one. Because the regression coefficients {\mathbf{w}} are given by the famous least-squares equation:

\displaystyle  \mathbf{w}=\mathbf{\left(X^{t}\mathbf{X}\right)}^{-1}\mathbf{X^{t}y}

you seem to need to invert the {\mathbf{X^{t}X}} matrix each time you get a new datapoint. This is expensive because matrix inversion is a {\mathcal{O}\left(n^{3}\right)} operation. Thanks to rank-one updates, we can bring that cost down to {\mathcal{O}\left(n^{2}\right)}. At each step if {\mathbf{X}} is the current design matrix, our new datapoint is {\mathbf{v}}, then the updated version of {\mathbf{X}} is:

\displaystyle  \mathbf{X}'=\left[\begin{array}{c} \mathbf{X}\\ \mathbf{v}^{t} \end{array}\right]

and

\displaystyle  \mathbf{X}'^{t}\mathbf{X}'=\mathbf{X}^{t}\mathbf{X}+\mathbf{vv}^{t}

so that we have a rank-one update to {\mathbf{X}} for each new datapoint. There are many ways we can take advantage of that, but I’ll only look at two:

  1. Updating the inverse of {\mathbf{X}^{t}\mathbf{X}} using the matrix inversion lemma (AKA the Woodbury-Sherman-Morrison formula)
  2. Updating the Cholesky decomposition of {\mathbf{X}^{t}\mathbf{X}}

For the first one, say that {\mathbf{A}=\mathbf{X}^{t}\mathbf{X}}, and we have already computed the inverse of {\mathbf{A}}. The matrix inversion lemma tells us that:

\displaystyle  \left(\mathbf{A}+\mathbf{vv}^{t}\right)^{-1}=\mathbf{A}^{-1}-\frac{1}{\left(1+\mathbf{v}^{t}\mathbf{A}^{-1}\mathbf{v}\right)}\left(\mathbf{A}^{-1}\mathbf{vv}^{t}\mathbf{A}^{-1}\right)

(what this formula does becomes clearer if you imagine that {\mathbf{v}} is an eigenvector of A). From a computational point of view, the important point is that we can update the inverse using just matrix products, a division and an addition, which brings the cost down to {\mathcal{O}\left(n^{2}\right)}.

The second solution is often recommended by linear algebra specialists because it has better numerical stability. It is a bit slower in practice, and I haven’t seen it do much of a difference in my own problems, but YMMV. It might become important when the matrices are poorly conditioned. Inverting a positive-definite matrix {\mathbf{A}} can be done via the Cholesky decomposition:

\displaystyle  \mathbf{U}^{t}\mathbf{U}=\mathbf{A}

where {\mathbf{U}} is an upper-triangular matrix. The trick is simply that:

\displaystyle  \left(\mathbf{U}^{t}\mathbf{U}\right)^{-1}=\mathbf{U}^{-1}\left(\mathbf{U}^{t}\right)^{-1}

and inverting a triangular matrix is easy, so that the expensive part is to get {\mathbf{U}} in the first place.

In R I find it useful to define a cholsolve function as such:


#Compute A\b, given U=chol(A)
cholsolve <- function(U,b)
{
backsolve(U,forwardsolve(t(U),b))
}

It is possible to update a Cholesky decomposition following a rank-one update in {\mathcal{O}\left(n^{2}\right)} time, see for example this tech report by Matthias Seeger. A year ago I wrote my own C function for doing that, which managed to be slower than the {\mathcal{O}\left(n^{3}\right)} {chol} for most practical values of {n} (a bit of a disappointment). Fortunately, a package called SamplerCompare has a function called chud which is very fast, so that there’s no excuse anymore for not using that trick.

原创粉丝点击