sklearn学习——SVM例程总结(PCA+Pipline+cv+GridSearch)
来源:互联网 发布:缤特力升级软件 编辑:程序博客网 时间:2024/05/16 10:25
Introduction
其实对于SVM调节超参数不需要这么复杂,因为gamma可能更重要一点,固定C=1,手动调节gamma即可。此外,sklearn的网格搜索极其的慢,下面的代码出来结果至少要半个多小时,如果有经验根本不需要。对于有经验的人来说或许看学习曲线就能知道调什么参数。但是为什么还要这么做呢?可能是为了装吧,或许更直观一点,不需要老中医式的随便开点良药,看看效果再换药了!
PCA:主成分分析
Pipline: 管道机制、官网
GridSearch:官网
Method
下面给出修改后的代码,里面都有注释,直接拿回去慢慢调:
数据是sklearn自带的,数据量不大,如果是比赛数据,根本没法跑,太慢了!!!
官网例程:比较三种降维方法:PCA+NMF(非负矩阵分解)+KBest
#!/usr/bin/python# -*- coding: utf-8 -*-"""=================================================================Selecting dimensionality reduction with Pipeline and GridSearchCV=================================================================This example constructs a pipeline that does dimensionalityreduction followed by prediction with a support vectorclassifier. It demonstrates the use of GridSearchCV andPipeline to optimize over different classes of estimators in asingle CV run -- unsupervised PCA and NMF dimensionalityreductions are compared to univariate feature selection duringthe grid search."""# Authors: Robert McGibbon, Joel Nothmanfrom __future__ import print_function, divisionimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import load_digitsfrom sklearn.model_selection import GridSearchCVfrom sklearn.pipeline import Pipelinefrom sklearn.model_selection import StratifiedShuffleSplit#分层洗牌分割交叉验证from sklearn.svm import LinearSVCfrom sklearn.decomposition import PCA, NMFfrom sklearn.feature_selection import SelectKBest, chi2digits = load_digits()print(__doc__)pipe = Pipeline([ ('reduce_dim', PCA()), ('classify', LinearSVC())])N_FEATURES_OPTIONS = [2, 4, 8]C_OPTIONS = [1, 10, 100, 1000]param_grid = [ { 'reduce_dim': [PCA(iterated_power=7), NMF()], 'reduce_dim__n_components': N_FEATURES_OPTIONS, 'classify__C': C_OPTIONS }, { 'reduce_dim': [SelectKBest(chi2)], 'reduce_dim__k': N_FEATURES_OPTIONS, 'classify__C': C_OPTIONS },]reducer_labels = ['PCA', 'NMF', 'KBest(chi2)']cv = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=42)grid = GridSearchCV(pipe, cv=3, n_jobs=2, param_grid=param_grid)grid.fit(digits.data, digits.target)mean_scores = np.array(grid.cv_results_['mean_test_score'])# scores are in the order of param_grid iteration, which is alphabeticalmean_scores = mean_scores.reshape(len(C_OPTIONS), -1, len(N_FEATURES_OPTIONS))# select score for best Cmean_scores = mean_scores.max(axis=0)bar_offsets = (np.arange(len(N_FEATURES_OPTIONS)) * (len(reducer_labels) + 1) + .5)plt.figure()COLORS = 'bgrcmyk'for i, (label, reducer_scores) in enumerate(zip(reducer_labels, mean_scores)): plt.bar(bar_offsets + i, reducer_scores, label=label, color=COLORS[i])plt.title("Comparing feature reduction techniques")plt.xlabel('Reduced number of features')plt.xticks(bar_offsets + len(reducer_labels) / 2, N_FEATURES_OPTIONS)plt.ylabel('Digit classification accuracy')plt.ylim((0, 1))plt.legend(loc='upper left')plt.show()
寻找最优超参数:
# -*- coding: utf-8 -*-"""Created on Wed Jul 26 22:06:34 2017@author: qiu"""from __future__ import print_function, divisionimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.datasets import load_digitsfrom sklearn.model_selection import GridSearchCVfrom sklearn.pipeline import Pipelinefrom sklearn.model_selection import StratifiedShuffleSplit#分层洗牌分割交叉验证from sklearn.svm import SVCfrom sklearn.decomposition import PCA, NMFfrom sklearn.feature_selection import SelectKBest, chi2digits = load_digits()#网格搜索可视化——热力图pipe = Pipeline(steps=[ ('classify', SVC())])C_range = np.logspace(-2, 1, 4)# logspace(a,b,N)把10的a次方到10的b次方区间分成N份gamma_range = np.logspace(-9, -6, 4)param_grid = [ { 'classify__C': C_range, 'classify__gamma': gamma_range },]cv = StratifiedShuffleSplit(n_splits=10, test_size=0.2, random_state=42)grid = GridSearchCV(pipe, param_grid=param_grid, cv=cv)#基于交叉验证的网格搜索。grid.fit(digits.data, digits.target)print("The best parameters are %s with a score of %0.2f" % (grid.best_params_, grid.best_score_))#找到最佳超参数
未完待续。。。
网格搜索可视化——热力图,参考sklearn学习-SVM例程总结3(网格搜索+交叉验证——寻找最优超参数)
在获取最佳参数后画学习曲线,参考
kaggle竞赛——Titanic:Machine Learning from Disaster
阅读全文
1 0
- sklearn学习——SVM例程总结(PCA+Pipline+cv+GridSearch)
- sklearn学习——SVM例程总结1(outlier and unbalanced classes)
- sklearn学习-SVM例程总结2(特征选择——单因素方差分析(方差分析anova ))
- sklearn学习-SVM例程总结3(网格搜索+交叉验证——寻找最优超参数)
- sklearn学习——SVM
- sklearn pipline
- sklearn中的Pipline(流水线学习器)
- sklearn——PCA&LDA
- Python机器学习包的sklearn中的Gridsearch简单使用
- 机器学习 SVM sklearn
- sklearn之svm学习
- sklearn 特征降维利器 —— PCA & TSNE
- sklearn PCA
- Andrew机器学习课程笔记(3)—— K均值、SVM、PCA
- Python scikit-learn 学习笔记—PCA+SVM人脸识别
- 机器学习总结(三)——SVM
- 【sklearn】svm
- sklearn:SVM
- 自用基础8-面向对象3
- java面试题——编程题:彩色瓷砖
- hihoCoder 1523 : 数组重排2
- 华为机试——数字颠倒
- STL学习之六:queue用法示例
- sklearn学习——SVM例程总结(PCA+Pipline+cv+GridSearch)
- bzoj2741[FOTILE模拟赛L]
- Comparable接口实现自定义对象的排序功能
- (转)你事业的上限究竟在哪里?《哈佛商业评论》史上最佳文章
- hdu2027 统计元音(C语言)
- [二分图匹配] [NOI2009] BZOJ 1562——序列变换
- HDU 6040& 2017年多校训练第一场 1008题
- 一年经验Java程序员面试经
- tcp 通信 服务器端