论文读书笔记-Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and D

来源：互联网发布：linux udp connect 编辑：程序博客网时间：2024/06/05 16:26

标题：Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection

本篇论文是2011年ICML会议上推荐的一篇论文，获得的是Distinguished Paper Awards（完全没概念），论文是介绍贪心算法在子集选择，稀疏模拟和字典选择中的应用。其中submodular meets Spectral我把其翻译为当子模块遇到谱。谱貌似在机器学习里面还有不少用处，包括什么谱聚类这些，但本人目前还十分迷茫，多学习才是王道。

下面摘抄一些本文中提到的要点，由于公式很多没见过，可能遗漏了不少内容。

1、 subset selection

Select a subset of k variables from a given set of observation variables which, taken together, “best”predict another variable of interest.

以上就是子集选择问题的描述，可以看到这种选择是有原则的，即能够最好的描述其他未选择但是与观察值有光的情况。这在模式识别的特征提取（本周一马波老师刚讲到了这里），稀疏学习，字典选择中得到了充分的应用，正如作者所说，we wish to predict the phenomenon using only a small subset from the high-dimensinal feature space.这个问题也可以转换为如下的数学表达形式：Given the covariances between n variables Xi(可被观察的) and a variable Z(有待被预测的),select a subset k<<n of the variables Xi and a linear prediction function of Z from the selected Xi that maximizes the (squared multiple corelation) fit.

针对上述问题的解决，一般有两种方法，greed algorithms, convex relaxation scheme.作者指出第二种方法并没有为选择提供明确的控制方法，而greed algorithms，which iteratively add or remove variables based on simple measures of fit with Z应用更为广泛。在使用中，又以Forward regression和Orthogonal matching pursuit这两种贪心算法最为著名。

2、

The matrix of covariances between the Xi and Xj is denoted by C, with entries ci,j=Cov(Xi,Xj). Similarly, we use b to denote the covariances between Z and the Xi, with entries bi=Cov(Z,Xi).经过这番定以后，subset selection可以如下定义：

Given pairwise convariances among all variables, as well as a parameter k, find a set S∈V of at most k variables Xi and a linear predictor of Z, maximizing the squared multiple correlation

For a set S, we use Cs to denote the submatrix of C with row and column set S, and bs to denote the vector with only entries bi for i∈S. 于是问题可以被简化为, Given C,b,and k,select a set of S of at most k variables to maximize

3、

针对dictionary selection也进行重新表述

Given all pairwise convariances among the Zj and Xi, and parameters d and k, find a set D of at most d variables from {X1,Xn},maximizing

4、submodularity ratio

子模块率，这应该算是这篇论文提出的一个重要概念，作者表述如下：We introduce the notion of submodularity ratio for a general set function, which captures “how close” to submodular the function is.

Let f be a non-negative set funciton. The submodularity ratio of f with respect to a set U and a parameter k≥1 is , thus it captures how much more f can increase by adding any subset S of size k to L, compared to the combined benefits of adding its individual elements to L. If f is specifically the R^2 objective defined on the variables Xi, then we omit f and simply write 然后代入刚才的式子进行化简即可。如果f是子模块的话，那么对所有的U和K，都有

小结：这篇论文中公式和推导符号较多，看着还是有点吃力，能看懂的也就以上几点，后半部分一些边界分析和实验观察数据也就大致浏览了下。最后感谢崔睿老师帮忙打印论文，以后还是会经常麻烦他的。