ALS(python pyspark)
来源:互联网 发布:怎么开通淘宝店 编辑:程序博客网 时间:2024/04/26 19:23
简介
ALS是alternating least squares的缩写 , 意为交替最小二乘法。该方法常用于基于矩阵分解的推荐系统中。例如:将用户(user)对商品(item)的评分矩阵分解为两个矩阵:一个是用户对商品隐含特征的偏好矩阵,另一个是商品所包含的隐含特征的矩阵。在这个矩阵分解的过程中,评分缺失项得到了填充,也就是说我们可以基于这个填充的评分来给用户最商品推荐了
公式推导
- 固定Y,求解X
- 固定X,求解Y
python实现
import numpydef mf_als(R, K, steps=15, beta=0.0002): M, N = R.shape X = numpy.random.rand(M, K) X = numpy.mat(X) Y = numpy.random.rand(N, K) Y = numpy.mat(Y) I = numpy.mat(numpy.eye(K)) err = [] it = [] for step in xrange(steps): X1 = [] Y1 = [] P = (Y.T * Y + beta * I).I * Y.T for i in xrange(M): X1.append([j[0, 0] for j in (P * R[i, :].T)]) X = numpy.mat(X1) Q = (X.T * X + beta * I).I * X.T for i in xrange(N): Y1.append([j[0, 0] for j in (Q * R[:, i])]) Y = numpy.mat(Y1) it.append(step) err.append(numpy.sqrt(numpy.sum(pow(numpy.array(R - X * Y.T), 2)) / (M * N))) return it, errif __name__ == "__main__": M = 100 F = 30 U = 300 R = numpy.matrix(numpy.random.rand(M, F)) * numpy.matrix(numpy.random.rand(U, F)).T it, err1 = mf_als(R, F, beta=0.1) it, err2 = mf_als(R, F, beta=0.01) it, err3 = mf_als(R, F, beta=0.001) it, err4 = mf_als(R, F, beta=0.0001) import matplotlib.pyplot as plt fig = plt.figure() ax = fig.add_subplot(111) ax.plot(it, err1, 'r', label='beta=0.1') ax.plot(it, err2, 'b', label='beta=0.01') ax.plot(it, err3, 'y', label='beta=0.001') ax.plot(it, err4, 'g', label='beta=0.0001') ax.legend(loc='upper right') plt.xlabel('iterations') plt.ylabel('rsme') plt.show()
实验效果
pyspark实现
整体思路就是把矩阵拆成行向量,分别来做最小二乘参数估计
- 并行求解
- 并行求解
- 两者串行求解
# coding:utf-8"""SimpleApp"""from pyspark import SparkContext, SparkConfimport numpy as npappName = "qinq"master = "local"conf = SparkConf().setAppName(appName).setMaster(master)sc = SparkContext(conf=conf)M = 100U = 200F = 30ITERATIONS = 50LAMBDA = 0.01# M: item个数, U: user个数, F: 分解矩阵的秩# 初始化评分矩阵R = np.mat(np.random.rand(M, F)) * np.mat(np.random.rand(U, F)).Tms = np.mat(np.random.rand(M, F))us = np.mat(np.random.rand(U, F))# 将评分矩阵,item矩阵,user矩阵广播到所有节点Rb = sc.broadcast(R)msb = sc.broadcast(ms)usb = sc.broadcast(us)def update(mat, rating): # 变成可逆矩阵 XtX = mat.T * mat Xty = mat.T * rating # 正则项 for j in range(F): XtX[j, j] += LAMBDA * 1 return np.linalg.solve(XtX, Xty)def rmse(R, ms, us): return np.sqrt(np.sum(pow(np.array(R - ms * us.T), 2)) / (M * U))steps = []errors = []for i in range(ITERATIONS): ms = sc.parallelize(range(M)).map(lambda x: update(usb.value, Rb.value[x, :].T)).collect() ms = np.mat(np.array(ms)[:, :, 0]) msb = sc.broadcast(ms) us = sc.parallelize(range(U)).map(lambda x: update(msb.value, Rb.value[:, x])).collect() us = np.mat(np.array(us)[:, :, 0]) usb = sc.broadcast(us) error = rmse(R, ms, us) errors.append(error) steps.append(i) print i, errorimport matplotlib.pyplot as pltplt.plot(steps, errors, 'r')plt.xlabel('iterations')plt.ylabel('rsme')plt.show()
实现效果
0 0
- ALS(python pyspark)
- pyspark RDD 自定义排序(python)
- pyspark Python 连接 HBase thrift
- 如何将PySpark导入Python
- (%%%××××××××####重要)python安装pyspark步骤&&************
- Python-pyspark中常见问题总结
- pyspark
- python实例pyspark以及python中文显示
- pyspark principle | python spark 集成原理
- kafka+spark streaming代码实例(pyspark+python)
- Logistic 逻辑回归(PySpark)
- pyspark-MLlib(Data Types)
- SparkML之推荐算法(一)ALS
- 可交替的最小二乘法(ALS-WR)
- 在mac上安装下pySpark,并且在pyCharm中python调用pyspark
- 【pySpark教程】Introduction & 预备工作(一)
- Spark 2.0 Programming Guide 翻译(PySpark)
- Ipython与spark(pyspark)整合
- Traffic Analysis of an SSL/TLS Session by Álvaro Castro-Castilla
- 你不一定懂的cpu显示信息
- 网络流
- linux命令
- ASP.NET MVC 5 - 将数据从控制器传递给视图
- ALS(python pyspark)
- hdu 1754 I Hate It
- 利用javascript预览本地上传图片
- Unity开发小型游戏中如何便捷使用PureMVC框架
- ASP.NET MVC 5 - 添加一个模型
- ld
- 自定义dialog 圆角
- Installed JREs时 Standard 1.1.x VM与Standard VM的区别
- java零基础