优达(Udacity)-机器学习基础-回归

来源:互联网 发布:管理数据分析 编辑:程序博客网 时间:2024/06/05 04:20

用到的sklearn代码集锦:

from sklearn.linear_model import LinearRegressionreg = LinearRegression()reg.fit(ages_train , net_worths_train)#这里的predict中27的状态需要看测试特征是几维的,比方说这里的作业是二维的#就应该是reg.predict([[27]]),这里也可以用预测的特征来代替。print "Katie's net worth prediction: ", reg.predict([27])print "slope:", reg.coef_print "intercept:", reg.intercept_print "\n ############ stats on test dataset ############\n"print "r-squared score:", reg.score(ages_test, net_worths_test)print "\n ############ stats on training dataset ############\n"print "r-squared score:", reg.score(ages_train, net_worths_train)

优达(Udacity)-机器学习基础-回归(Linear Regressison)

迷你项目代码:

#!/usr/bin/python# -*- coding: utf-8 -*-"""    Starter code for the regression mini-project.    Loads up/formats a modified version of the dataset    (why modified?  we've removed some trouble points    that you'll find yourself in the outliers mini-project).    Draws a little scatterplot of the training/testing data    You fill in the regression code where indicated:"""    import sysimport picklesys.path.append("../tools/")from feature_format import featureFormat, targetFeatureSplitdictionary = pickle.load( open("../final_project/final_project_dataset_modified.pkl", "r") )### list the features you want to look at--first item in the ### list will be the "target" feature#salary 和 bonus 的关系#features_list = ["bonus", "salary"]#salary 和 bonus 的关系features_list = ["bonus", "long_term_incentive"]data = featureFormat( dictionary, features_list, remove_any_zeroes=True)target, features = targetFeatureSplit( data )### training-testing split needed in regression, just like classificationfrom sklearn.cross_validation import train_test_splitfeature_train, feature_test, target_train, target_test = train_test_split(features, target, test_size=0.5, random_state=42)train_color = "b"test_color = "r"### Your regression goes here!### Please name it reg, so that the plotting code below picks it up and ### plots it correctly. Don't forget to change the test_color above from "b" to### "r" to differentiate training points from test points.#训练模型,并且提取斜率(存储在 reg.coef_ 属性中)和截距。from sklearn.linear_model import LinearRegressionreg= LinearRegression()reg.fit( feature_train , target_train )print "slope: ",reg.coef_print "intercept:", reg.intercept_#在训练数据上计算回归的分数print "\n ############ stats on train dataset ############\n"print "r-squared score:",reg.score(feature_train,target_train)#在测试数据上计算回归的分数print "\n ############ stats on test dataset ############\n"print "r-squared score:",reg.score(feature_test,target_test)### draw the scatterplot, with color-coded training and testing pointsimport matplotlib.pyplot as pltfor feature, target in zip(feature_test, target_test):    plt.scatter( feature, target, color=test_color ) for feature, target in zip(feature_train, target_train):    plt.scatter( feature, target, color=train_color ) ### labels for the legendplt.scatter(feature_test[0], target_test[0], color=test_color, label="test")plt.scatter(feature_test[0], target_test[0], color=train_color, label="train")### draw the regression line, once it's codedtry:    plt.plot( feature_test, reg.predict(feature_test) )except NameError:    passplt.xlabel(features_list[1])plt.ylabel(features_list[0])plt.legend()plt.show()
阅读全文
0 0
原创粉丝点击