【脚本语言系列】关于Python机器学习scikits-learn,你需要知道的事

来源:互联网 发布:asp聊天室源码 编辑:程序博客网 时间:2024/06/05 23:54

如何使用scikits-learn

  • 使用easy_install或pip安装scikits-learn
pip install -U scikit-learneasy_install -U scikit-learn



* 简单计算样例

from sklearn import datasetsboston_prices = datasets.load_boston()print "Data shape", boston_prices.data.shapeprint "Data max = %s min = %s" %(boston_prices.data.max(), boston_prices.data.min())print "Target max = %s min = %s" %(boston_prices.target.max(), boston_prices.target.min())

这里写图片描述
* 简单聚类分析
1. 下载股票数据

start = datetime.datetime(2011, 01, 01)end = datetime.datetime(2012, 01, 01)quotes = [finance.quotes_historical_yahoo_ochl('^GSPC', start, end, asobject=True, adjusted=True) for symbol in symbols]close = numpy.array([q.close for q in quotes]).astype(numpy.float)print close.shape


2. 计算亲和度矩阵

logreturns = numpy.diff(numpy.log(close))print logreturns.shapelogreturns_norms = numpy.sum(logreturns ** 2, axis = 1)S = -logreturns_norms[:, numpy.newaxis]-logreturns_norms[numpy.newaxis,:]+2*numpy.dot(logreturns, logreturns.T)


3. 亲和传播聚类

aff_pro = sklearn.cluster.AffinityPropagation().fit(S)labels = aff_pro.labels_for i in xrange(len(labels)):     print "%s in Cluster %d" % (symbols[i],labels[i])


什么是scikits-learn

scikits-learn项目提供了机器学习相关的API。sckits-learn项目中包含了若干数据集和范例图像,可以用来做一些实验。
聚类(clustering)代表一类机器学习算法,用来基于相似度对研究对象分组。

0 0