scikit-learn 交叉验证绘图及原理实践 分类:机器学习Sklearn
来源:互联网 发布:辽宁11选5遗漏数据 编辑:程序博客网 时间:2024/06/06 09:03
交叉验证返回的是平均均方误或平均判定正确率。
- from sklearn import datasets
- from sklearn.cross_validation import cross_val_predict
- from sklearn import linear_model
- import matplotlib.pyplot as plt
- lr = linear_model.LinearRegression()
- boston = datasets.load_boston()
- y = boston.target
- #cross_val_predict returns an array of the same size as ‘y’ where each entry
- #is a prediction obtained by cross validated
- predicted = cross_val_predict(lr, boston.data, y, cv = 10)
- fig, ax = plt.subplots()
- ax.scatter(y, predicted)
- ax.plot([y.min(), y.max()], [y.min(), y.max()], ’k–’, lw = 4)
- ax.set_xlabel(”Measured”)
- ax.set_ylabel(”Predicted”)
- plt.show()
from sklearn import datasetsfrom sklearn.cross_validation import cross_val_predictfrom sklearn import linear_modelimport matplotlib.pyplot as pltlr = linear_model.LinearRegression()boston = datasets.load_boston()y = boston.target
上面ax.plot中’k–’,k指线为黑色,–是线的形状。lw指定线宽。
下面对上面的cross_val_predict进行展开:
- import numpy as np
- from sklearn import cross_validation
- from sklearn import datasets
- from sklearn import svm
- iris = datasets.load_iris()
- print iris.data.shape, iris.target.shape
- X_train, X_test, y_train, y_test = cross_validation.train_test_split(
- iris.data, iris.target, test_size = 0.4, random_state = 0)
- print X_train.shape, y_train.shape
- print X_test.shape, y_test.shape
- ””’
- statas_dict = dict()
- for element in iris.target:
- if statas_dict.get(element):
- statas_dict[element] += 1
- else:
- statas_dict[element] = 1
- print statas_dict
- ”’
- #print np.bincount(iris.target)
- #print np.unique(iris.target, return_counts = True)
- clf = svm.SVC(kernel = ’linear’, C = 1).fit(X_train, y_train)
- print clf.score(X_test, y_test)
- clf = svm.SVC(kernel = ’linear’, C = 1)
- scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv = 5)
- print scores
- def generate_split_sample(data, target, size_ratio):
- return cross_validation.train_test_split(data, target, test_size = size_ratio)
- #cv is the folds we need
- #use the size_ratio to the abs num
- def sample_split(data, target ,cv):
- size_num = int(data.shape[0] / cv)
- ori_data = data
- ori_target = target
- ListRequire = []
- for i in range(cv):
- if ori_data.shape[0] > size_num:
- X_train, X_test, y_train, y_test = generate_split_sample(ori_data, ori_target, size_num)
- ListRequire.append((X_test, y_test))
- ori_data = X_train
- ori_target = y_train
- else:
- ListRequire.append((ori_data, ori_target))
- return ListRequire
- #print sample_split(iris.data, iris.target, 5)
- def return_score(train_data, train_target, test_data, test_target):
- clf = svm.SVC(kernel = ’linear’, C = 1).fit(train_data, train_target)
- return clf.score(test_data, test_target)
- def return_scores(data, target, cv):
- ListRequire = []
- splitList = sample_split(data, target, cv)
- for i in range(len(splitList)):
- test_data, test_target = splitList[i]
- otherIndexs = set(range(len(splitList)))
- otherIndexs.remove(i)
- train_data = None
- train_target = None
- for j in otherIndexs:
- if type(train_data) == type(None):
- train_data, train_target = splitList[j]
- else:
- train_data = np.append(train_data, splitList[j][0], axis = 0)
- train_target = np.append(train_target, splitList[j][1], axis = 0)
- ListRequire.append(return_score(train_data, train_target, test_data, test_target))
- return ListRequire
- print return_scores(iris.data, iris.target, 5)
import numpy as npfrom sklearn import cross_validationfrom sklearn import datasetsfrom sklearn import svmiris = datasets.load_iris()print iris.data.shape, iris.target.shapeX_train, X_test, y_train, y_test = cross_validation.train_test_split( iris.data, iris.target, test_size = 0.4, random_state = 0)print X_train.shape, y_train.shapeprint X_test.shape, y_test.shape'''statas_dict = dict()for element in iris.target: if statas_dict.get(element): statas_dict[element] += 1 else: statas_dict[element] = 1print statas_dict'''
这里自实现了抽样过程,并得到相同结果。
random_state:伪随机数生成初值。test_size: 决定随机生成的test集合占总样本的比例。
svm.SVC svc指支持向量机分类。参数C指定penalty parameter
clf.score(X, y):指出用来进行拟合模型的(X, y)的正确率。
svm.SVC svc指支持向量机分类。参数C指定penalty parameter
clf.score(X, y):指出用来进行拟合模型的(X, y)的正确率。
有两种返回ndarray中分类数据个数统计的方法,推荐第二总方法。
np.bincount(ndarray_object): 仅仅返回排序后统计个数的ndarray
np.unique(ndarray_object, return_count = False) :return_count设定为True时可以返回类别及计数。
np.bincount(ndarray_object): 仅仅返回排序后统计个数的ndarray
np.unique(ndarray_object, return_count = False) :return_count设定为True时可以返回类别及计数。
np.append(first_ndarray, second_ndarray, axis = 0) :沿纵向进行扩展
这里利用了类型判断,这同样可以对np.ndarray进行(仅需初始化一个空对象 i.e. type(np.array([0, 1])) == type(np.ndarray([0])))
利用ndarray进行初始化也是最简单的(也可能是最快的)多维数组初始化方法。
利用ndarray进行初始化也是最简单的(也可能是最快的)多维数组初始化方法。
根据文档,cross_val_predict 与 cross_val_score有相同的接口,不同是前者返回的是相应样本在对应测试集上
进行拟合的结果,细节不述。
进行拟合的结果,细节不述。
更多了解请浏览:http://blog.csdn.net/sinat_30665603
阅读全文
0 0
- scikit-learn 交叉验证绘图及原理实践 分类:机器学习Sklearn
- scikit-learn 交叉验证绘图及原理实践
- scikit-learn 回归基础 分类:机器学习Sklearn
- scikit-learn svm初探 分类:机器学习 Sklearn
- scikit learn(sklearn)机器学习算法选择
- 【scikit-learn】05:sklearn文本分类及评价指标
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- Python机器学习库scikit-learn实践
- 通信系统仿真速成第2天:QPSK调制与解调(实验)
- 相似度计算的三种方式
- org.framework can't be resolved
- 【GCC】gcc编译参数之-fno-strict-aliasing
- 头像滑动,中间放大并带点击选中效果
- scikit-learn 交叉验证绘图及原理实践 分类:机器学习Sklearn
- mybatis入门基础(七)----延迟加载
- LINUX 指令大全(自用版)
- ubuntu更新后无法进入桌面
- 正则化方法:L1和L2 regularization、数据集扩增、dropout
- 谈谈我对封装,继承,抽象类,接口的理解
- java web 访问静态资源时注意设置 其访问权限,不然就访问不了
- dnsproxy2工具—Android系统DNS修改
- 总结一下Java的数据类型体系