svm理论与实验之21: 自定义核函数的使用

来源：互联网发布：2014年网络作家排行榜编辑：程序博客网时间：2024/06/05 09:02

徐海蛟博士

真实场景下，数据的特征可能比较复杂，系统提供的4种核函数或许达不到最佳效果，那么就需要自定义核函数了。当然，有很多大牛干这个事情，我们可以拿来使用，通过自定义核方式。

如何用？这时候不再把训练与测试数据文件作为输入参数了，而是使用核矩阵作为输入参数。

Assume there are L training instances x1, ..., xL . ... L行训练样本
Let K(x, y) be the kernel value of two instances x 与 y. The input formats are:
New training instance for xi:
<label> 0:i 1:K(xi,x1) ... L:K(xi,xL)

New testing instance for any x:
<label> 0:? 1:K(x,x1) ... L:K(x,xL)

That is, in the training file the first column must be the "ID" of xi. In testing, ? can be any value.

All kernel values including ZEROs must be explicitly provided. Any permutation or random subsets of the training/testing files are also valid (see examples below).

Note: the format is slightly different from the precomputed kernel
package released in libsvmtools earlier.

例子:
Assume the original training data has 3个four-feature instances, testing data has one instance:
15 1:1 2:1 3:1 4:1
45 2:3 4:3
25 3:1
-----------------------------------
15 1:1 3:1

若使用线性核, we have the following new training/testing sets:
15 0:1 1:4 2:6 3:1
45 0:2 1:6 2:18 3:0
25 0:3 1:1 2:0 3:1
-------------------------------------
15 0:? 1:2 2:0 3:1

? can be any value.

Any subset of the above training file is also valid. 例如,
25 0:3 1:1 2:0 3:1
45 0:2 1:6 2:18 3:0
意味着核矩阵是:
[K(2,2) K(2,3)] = [18 0]
[K(3,2) K(3,3)] = [0 1]