优达机器学习:聚类

来源:互联网 发布:淘宝代销可以改价格吗 编辑:程序博客网 时间:2024/05/29 03:31

K-MEANS的最初中心点选择对最后的分类效果有很大关系,比如下图出现的聚类,就有很大的问题

这里写图片描述

练习: 聚类特征

  • salary
  • exercised_stock_options

这里写图片描述

练习:部署聚类

### cluster here; create predictions of the cluster labels### for the data and store them to a list called predfrom sklearn.cluster import KMeanskmeans = KMeans(n_clusters=2).fit(finance_features)pred = kmeans.predict(finance_features)

这里写图片描述

练习:使用3个特征聚类

  • 是的,有4个测试点的聚类变了

这里写图片描述

### the input features we want to use ### can be any key in the person-level dictionary (salary, director_fees, etc.) feature_1 = "salary"feature_2 = "exercised_stock_options"feature_3 = "total_payments"poi  = "poi"features_list = [poi, feature_1, feature_2, feature_3]data = featureFormat(data_dict, features_list )poi, finance_features = targetFeatureSplit( data )### in the "clustering with 3 features" part of the mini-project,### you'll want to change this line to ### for f1, f2, _ in finance_features:### (as it's currently written, the line below assumes 2 features)for f1, f2,_ in finance_features:    plt.scatter( f1, f2 )plt.show()

练习:股票期权范围

  • max:34348384
  • min:3285
import numpy as npstocklist = []for item in data_dict:    stock = data_dict[item]['exercised_stock_options']    if stock != 'NaN':        stocklist.append( stock )stocklist = np.array(stocklist)print np.max(stocklist)print np.min(stocklist)

练习:薪酬范围

  • max:1111258
  • min:477
import numpy as npsalarylist = []for item in data_dict:    salary = data_dict[item]['salary']    if salary != 'NaN':        salarylist.append( salary )salarylist = np.array(salarylist)print np.max(salarylist)print np.min(salarylist)

练习:聚类更改

这里写图片描述

原创粉丝点击