优达机器学习：聚类

来源：互联网发布：淘宝代销可以改价格吗编辑：程序博客网时间：2024/05/29 03:31

K-MEANS的最初中心点选择对最后的分类效果有很大关系，比如下图出现的聚类，就有很大的问题

这里写图片描述

练习：聚类特征

salary
exercised_stock_options

这里写图片描述

练习：部署聚类

### cluster here; create predictions of the cluster labels### for the data and store them to a list called predfrom sklearn.cluster import KMeanskmeans = KMeans(n_clusters=2).fit(finance_features)pred = kmeans.predict(finance_features)

这里写图片描述

练习：使用3个特征聚类

是的，有4个测试点的聚类变了

这里写图片描述

### the input features we want to use ### can be any key in the person-level dictionary (salary, director_fees, etc.) feature_1 = "salary"feature_2 = "exercised_stock_options"feature_3 = "total_payments"poi  = "poi"features_list = [poi, feature_1, feature_2, feature_3]data = featureFormat(data_dict, features_list )poi, finance_features = targetFeatureSplit( data )### in the "clustering with 3 features" part of the mini-project,### you'll want to change this line to ### for f1, f2, _ in finance_features:### (as it's currently written, the line below assumes 2 features)for f1, f2,_ in finance_features:    plt.scatter( f1, f2 )plt.show()

练习：股票期权范围

max:34348384
min:3285

import numpy as npstocklist = []for item in data_dict:    stock = data_dict[item]['exercised_stock_options']    if stock != 'NaN':        stocklist.append( stock )stocklist = np.array(stocklist)print np.max(stocklist)print np.min(stocklist)

练习：薪酬范围

max:1111258
min:477

import numpy as npsalarylist = []for item in data_dict:    salary = data_dict[item]['salary']    if salary != 'NaN':        salarylist.append( salary )salarylist = np.array(salarylist)print np.max(salarylist)print np.min(salarylist)

练习：聚类更改

这里写图片描述

阅读全文

0 0

优达机器学习：聚类

练习： 聚类特征

练习：部署聚类

练习：使用3个特征聚类

练习：股票期权范围

练习：薪酬范围

练习：聚类更改

练习：聚类特征