Rank-one updates for faster matrix inversion

来源：互联网发布：胡退位真相知乎编辑：程序博客网时间：2024/06/07 03:23

Making a “rank-one update” to a matrix ${\mathbf{A}}$ means adding to the matrix another matrix of rank one:

$\displaystyle \mathbf{B}=\mathbf{A}+\mathbf{vv^{t}}$

This actually occurs quite a bit in statistics and optimisation – I’m interested in the topic because rank-one updates occur in certain applications of Expectation Propagation, but a perhaps more important case is in sequential regression problems. Imagine doing linear regression, but being given the datapoints one-by-one. Because the regression coefficients ${\mathbf{w}}$ are given by the famous least-squares equation:

$\displaystyle \mathbf{w}=\mathbf{\left(X^{t}\mathbf{X}\right)}^{-1}\mathbf{X^{t}y}$

you seem to need to invert the ${\mathbf{X^{t}X}}$ matrix each time you get a new datapoint. This is expensive because matrix inversion is a ${\mathcal{O}\left(n^{3}\right)}$ operation. Thanks to rank-one updates, we can bring that cost down to ${\mathcal{O}\left(n^{2}\right)}$ . At each step if ${\mathbf{X}}$ is the current design matrix, our new datapoint is ${\mathbf{v}}$ , then the updated version of ${\mathbf{X}}$ is:

$\displaystyle \mathbf{X}'=\left[\begin{array}{c} \mathbf{X}\\ \mathbf{v}^{t} \end{array}\right]$

and

$\displaystyle \mathbf{X}'^{t}\mathbf{X}'=\mathbf{X}^{t}\mathbf{X}+\mathbf{vv}^{t}$

so that we have a rank-one update to ${\mathbf{X}}$ for each new datapoint. There are many ways we can take advantage of that, but I’ll only look at two:

Updating the inverse of ${\mathbf{X}^{t}\mathbf{X}}$ using the matrix inversion lemma (AKA the Woodbury-Sherman-Morrison formula)
Updating the Cholesky decomposition of ${\mathbf{X}^{t}\mathbf{X}}$

For the first one, say that ${\mathbf{A}=\mathbf{X}^{t}\mathbf{X}}$ , and we have already computed the inverse of ${\mathbf{A}}$ . The matrix inversion lemma tells us that:

$\displaystyle \left(\mathbf{A}+\mathbf{vv}^{t}\right)^{-1}=\mathbf{A}^{-1}-\frac{1}{\left(1+\mathbf{v}^{t}\mathbf{A}^{-1}\mathbf{v}\right)}\left(\mathbf{A}^{-1}\mathbf{vv}^{t}\mathbf{A}^{-1}\right)$

(what this formula does becomes clearer if you imagine that ${\mathbf{v}}$ is an eigenvector of A). From a computational point of view, the important point is that we can update the inverse using just matrix products, a division and an addition, which brings the cost down to ${\mathcal{O}\left(n^{2}\right)}$ .

The second solution is often recommended by linear algebra specialists because it has better numerical stability. It is a bit slower in practice, and I haven’t seen it do much of a difference in my own problems, but YMMV. It might become important when the matrices are poorly conditioned. Inverting a positive-definite matrix ${\mathbf{A}}$ can be done via the Cholesky decomposition:

$\displaystyle \mathbf{U}^{t}\mathbf{U}=\mathbf{A}$

where ${\mathbf{U}}$ is an upper-triangular matrix. The trick is simply that:

$\displaystyle \left(\mathbf{U}^{t}\mathbf{U}\right)^{-1}=\mathbf{U}^{-1}\left(\mathbf{U}^{t}\right)^{-1}$

and inverting a triangular matrix is easy, so that the expensive part is to get ${\mathbf{U}}$ in the first place.

In R I find it useful to define a cholsolve function as such:

#Compute A\b, given U=chol(A) cholsolve <- function(U,b) { backsolve(U,forwardsolve(t(U),b)) }
It is possible to update a Cholesky decomposition following a rank-one update in ${\mathcal{O}\left(n^{2}\right)}$ time, see for example this tech report by Matthias Seeger. A year ago I wrote my own C function for doing that, which managed to be slower than the ${\mathcal{O}\left(n^{3}\right)}$ ${chol}$ for most practical values of ${n}$ (a bit of a disappointment). Fortunately, a package called SamplerCompare has a function called chud which is very fast, so that there’s no excuse anymore for not using that trick.