sklearn浅析（七）——Support Vector Machines

来源：互联网发布：java接收json对象数组编辑：程序博客网时间：2024/06/06 03:46

支持向量机以感知机为原型，但是它的能力要远远强于感知机，svm在回归，分类和异常检测上都有重要作用，并且可以通过kernel trick实现高维数据的非线性分类。关于svm的详细介绍请自行查找，可参考[统计学习方法李航]，[cs229课程 Andrew Ng]和July的博客理解svm的三层境界。
sklearn里面提供了很多svm及其变种，用于不同的应用场景，包括：

SVC
LinearSVC
NuSVC
SVR
LinearSVR
NuSVR
OneClassSVM

前三个为分类器，后三个为回归器，其中LinearSVC和LinearSVR即kernel为linear的SVC和SVR，SVC和NuSVC是相似的方法，后者是带kernel的形式。
最后一个用于非监督的Novelty检测，注意，它不同于Outlier 检测，Novelty的训练集是不包括异常点的，我们对模型拟合后，用来预测新的样本是否属于异常点，而Outlier检测的训练样本中就是包括异常点的。

类之间继承关系

SVC和NuSVC继承自相同的类BaseSVC(six.with_metaclass(ABCMeta, BaseLibSVM, ClassifierMixin))
SVR和NuSVR继承自相同的类 BaseLibSVM, RegressorMixin
LinearSVC继承自BaseEstimator, LinearClassifierMixin,_LearntSelectorMixin, SparseCoefMixin
LinearSVR继承自LinearModel, RegressorMixin

SVC，NuSVC，SVR和NuSVR

这4个在sklearn里的使用基本相似，底层都是调用的另外一个很有名的svm库libsvm实现的，在sklearn只是对其做了简单的封装，主要区别在于三个初始化参数C，nu，epsilon和impl，这里也SVC为例，不同的地方会特别指出。

PS：这里列出了四个类的定义，可以忽略，先看下面的解释再回过头来看。

SVC类的定义

class SVC(BaseSVC):  def __init__(self, C=1.0, kernel='rbf', degree=3, gamma='auto',                 coef0=0.0, shrinking=True, probability=False,                 tol=1e-3, cache_size=200, class_weight=None,                 verbose=False, max_iter=-1, decision_function_shape=None,                 random_state=None):        super(SVC, self).__init__(            impl='c_svc', kernel=kernel, degree=degree, gamma=gamma,            coef0=coef0, tol=tol, C=C, nu=0., shrinking=shrinking,            probability=probability, cache_size=cache_size,            class_weight=class_weight, verbose=verbose, max_iter=max_iter,            decision_function_shape=decision_function_shape,            random_state=random_state)

NuSVC类的定义

 def __init__(self, nu=0.5, kernel='rbf', degree=3, gamma='auto',                 coef0=0.0, shrinking=True, probability=False,                 tol=1e-3, cache_size=200, class_weight=None, verbose=False,                 max_iter=-1, decision_function_shape=None, random_state=None):        super(NuSVC, self).__init__(            impl='nu_svc', kernel=kernel, degree=degree, gamma=gamma,            coef0=coef0, tol=tol, C=0., nu=nu, shrinking=shrinking,            probability=probability, cache_size=cache_size,            class_weight=class_weight, verbose=verbose, max_iter=max_iter,            decision_function_shape=decision_function_shape,            random_state=random_state)

SVR类的定义

class SVR(BaseLibSVM, RegressorMixin):     def __init__(self, kernel='rbf', degree=3, gamma='auto', coef0=0.0,                 tol=1e-3, C=1.0, epsilon=0.1, shrinking=True,                 cache_size=200, verbose=False, max_iter=-1):        super(SVR, self).__init__(            'epsilon_svr', kernel=kernel, degree=degree, gamma=gamma,            coef0=coef0, tol=tol, C=C, nu=0., epsilon=epsilon, verbose=verbose,            shrinking=shrinking, probability=False, cache_size=cache_size,            class_weight=None, max_iter=max_iter, random_state=None)

NuSVR类的定义

class NuSVR(BaseLibSVM, RegressorMixin):     def __init__(self, nu=0.5, C=1.0, kernel='rbf', degree=3,                 gamma='auto', coef0=0.0, shrinking=True, tol=1e-3,                 cache_size=200, verbose=False, max_iter=-1):        super(NuSVR, self).__init__(            'nu_svr', kernel=kernel, degree=degree, gamma=gamma, coef0=coef0,            tol=tol, C=C, nu=nu, epsilon=0., shrinking=shrinking,            probability=False, cache_size=cache_size, class_weight=None,            verbose=verbose, max_iter=max_iter, random_state=None)

直观上理解，四个的impl不同，这个显而易见，其次，初始化时：

    SVC的C自定义，nu=epsilon=0；NuSVC的nu自定义，C=epsilon=0    epsilon在BaseSVC的初始化方法中设置为0    SVR的C和epsilon自定义，nu=0；NuSVR的C和nu自定义，epsilon=0    且probability，class_weight，random_state都不可设置，强制为0

这些其实就是支持向量回归器和支持向量分类器算法中的参数。

C：软间隔最大化SVC（对于线性不可分）中松弛变量ζ的惩罚系数
nu：float，训练误差分数的上界和支持向量分数的下界，取值(0,1]，用来控制支持向量的个数
epsilon：float，SVR损失函数——epsilon不敏感损失的参数，用来判定对于每一个样本，当我们的预测值与实际值相差多少时，才需要计算该样本的损失，由于损失是绝对值函数，优化时需要拆解为两部分，分别求解。见kernel-岭回归一节
kernel：str 或callable对象，核的类型，
可取值为 ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’（和kernelridge有所不同）
当选用可调用对象时，该对象必须接受两个参数，返回一个float类型的结果
gamma：意义同kernelRidge，不同的是当取值为’auto’时，gamma=1/n_features
shrinking：bool，使用使用启发式的参数收缩
decision_function_shape：str，‘ovo’——one vs one和’ovr’——one vs rest

这四个都没有自己实现fit方法，都是使用的BaseLibSVM的fit()（BaseSVC继承自BaseLibSVM，因为SVC分类时设计到概率计算，因此多了这一层继承关系，在BaseSVC中定义了关于预测计算的一些函数）

BaseLibSVM类的fit()方法：

def fit(self, X, y, sample_weight=None):    类型检查以及参数组合的判断，因为不是所有参数任意组合都是支持的，因此比较复杂    如果X是稀疏的且kernel选取的不是callable对象，则调用libsvm的_sparse_fit()否则调用_dense_fit()

说了这么多，其实在sklearn里面，这四种svm的实现，我们可以理解成都是调用了libsvm同一个方法（其实是两个，稀疏样本矩阵的libsvm_sparse.libsvm_sparse_train()和非稀疏的libsvm.fit()），只是参数不同。

LinearSVC和LinearSVR

把这两个单独拿出来，是因为它们类的继承关系跟上述的四个不再一个体系，而且相对比较简单。
LinearSVC可以理解成SVC的kernel=‘linear’，不同的是，LinearSVC可以自定义正则化类型（l1，l2）和损失函数（hinge，squared_hinge），也可以通过前面提到的linear_model.SGDClassifier实现和LinearSVC相同的效果（通过调整penalty和loss参数），但是实现方法不一样，后者使用的是liblinear.train_wrap()。

LinearSVR和SVR的关系同上。

阅读全文

0 0