《python机器学习及实践-从零开始通往kaggle竞赛之路（代码Python 3.6 版）》chapter1.1

来源：互联网发布：济南网络推广招聘编辑：程序博客网时间：2024/05/29 18:36

本博客代码是对书《python机器学习及实践-从零开始通往kaggle竞赛之路》，基于Python3.6的实现，并且使用的所需的库是最新的（2017/12/8）。

chapter1_1

import pandas as pd  #导入pandas 库df_train = pd.read_csv('../Datasets/Breast-Cancer/breast-cancer-train.csv') #读取目录下的数据,如果代码与文件路径不在一起，则需要另行设置df_test = pd.read_csv('../Datasets/Breast-Cancer/breast-cancer-test.csv')print(df_train.head(5)) #显示df_train 前列5行数据，了解数据大概样式print(df_test.head(5))df_test_negative = df_test.loc[df_test['Type'] == 0][['Clump Thickness', 'Cell Size']] #先对test 的“Type”行进行判断，然后切分其他两列数据df_test_positive = df_test.loc[df_test['Type'] == 1][['Clump Thickness', 'Cell Size']]print(df_test_negative.head())print(df_test_positive.head())import matplotlib.pyplot as pltplt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'],marker = 'o', s=20, c='green')plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=10, c='red')plt.xlabel('Clump Thickness')plt.ylabel('Cell Size')plt.show()import numpy as npintercept = np.random.random([1])coef = np.random.random([2])lx = np.arange(0, 12)ly = (-intercept - lx * coef[0]) / coef[1]plt.plot(lx, ly, c='yellow')plt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'], marker = 'o', s=200, c='red')plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=150, c='black')plt.xlabel('Clump Thickness')plt.ylabel('Cell Size')plt.show()from sklearn.linear_model import LogisticRegressionlr = LogisticRegression()lr.fit(df_train[['Clump Thickness', 'Cell Size']][:10], df_train['Type'][:10])print ('Testing accuracy (10 training samples):', lr.score(df_test[['Clump Thickness', 'Cell Size']], df_test['Type']))intercept = lr.intercept_coef = lr.coef_[0, :]ly = (-intercept - lx * coef[0]) / coef[1]plt.plot(lx, ly, c='green')plt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'], marker = 'o', s=200, c='red')plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=150, c='black')plt.xlabel('Clump Thickness')plt.ylabel('Cell Size')plt.show()lr = LogisticRegression()lr.fit(df_train[['Clump Thickness', 'Cell Size']], df_train['Type'])print ('Testing accuracy (all training samples):', lr.score(df_test[['Clump Thickness', 'Cell Size']], df_test['Type']))intercept = lr.intercept_coef = lr.coef_[0, :]ly = (-intercept - lx * coef[0]) / coef[1]plt.plot(lx, ly, c='blue')plt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'], marker = 'o', s=200, c='red')plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=150, c='black')plt.xlabel('Clump Thickness')plt.ylabel('Cell Size')plt.show()

发布修改代码已经过作者同意，如果有疑问，可以留言给我。

阅读全文

0 0