相关性学习—python实现Pearson相关系数
来源:互联网 发布:mac远程连接vps 编辑:程序博客网 时间:2024/06/05 02:47
Discussion of Similarity Metrics
Pearson Correlation Coefficient
Analysis
Unlike the Euclidean Distance similarity score (which is scaled from 0 to 1), this metric measures how highly correlated are two variables and is measured from -1 to +1. Similar to the modified Euclidean Distance, a Pearson Correlation Coefficient of 1 indicates that the data objects are perfectly correlated but in this case, a score of -1 means that the data objects are not correlated. In other words, the Pearson Correlation score quantifies how well two data objects fit a line.
There are several benefits to using this type of metric. The first is that the accuracy of the score increases when data is not normalized. As a result, this metric can be used when quantities (i.e. scores) varies. Another benefit is that the Pearson Correlation score can correct for any scaling within an attribute, while the final score is still being tabulated. Thus, objects that describe the same data but use different values can still be used. Figure 1 demonstrates how the Pearson Correlation score may appear if graphed.
Figure 1. A chart demonstates the Pearson Correlation Coefficient. The axes are the scores given by the labeled critics and the similarity of the scores given by both critics in regards to certain an_items.
In essence, the Pearson Correlation score finds the ratio between the covariance and the standard deviation of both objects. In the mathematical form, the score can be described as:
In this equation, (x,y) refers to the data objects and N is the total number of attributes
Python Implementation
# Input: 2 objects# Output: Pearson Correlation Scoredef pearson_correlation(object1, object2): values = range(len(object1)) # Summation over all attributes for both objects sum_object1 = sum([float(object1[i]) for i in values]) sum_object2 = sum([float(object2[i]) for i in values]) # Sum the squares square_sum1 = sum([pow(object1[i],2) for i in values]) square_sum2 = sum([pow(object2[i],2) for i in values]) # Add up the products product = sum([object1[i]*object2[i] for i in values]) #Calculate Pearson Correlation score numerator = product - (sum_object1*sum_object2/len(object1)) denominator = ((square_sum1 - pow(sum_object1,2)/len(object1)) * (square_sum2 - pow(sum_object2,2)/len(object1))) ** 0.5 # Can"t have division by 0 if denominator == 0: return 0 result = numerator/denominator return result
References
The previous content is based on Chapter 2 of the following book:Segaran, Toby. Programming Collective Intelligence: Building Smart Web 2.0 Applications. Sebastopol, CA: O'Reilly Media, 2007.
- 相关性学习—python实现Pearson相关系数
- 【Python学习系列二十七】pearson相关系数计算
- 相关性检验之Pearson系数及python实现
- 相关性学习-皮尔逊相关系数
- 统计相关系数(1)——Pearson(皮尔逊)相关系数及MATLAB实现
- 统计相关系数(1)——Pearson(皮尔逊)相关系数及MATLAB实现
- 统计相关系数(1)——Pearson(皮尔逊)相关系数及MATLAB实现
- 统计相关系数(1)——Pearson(皮尔逊)相关系数及MATLAB实现
- 统计相关系数(1)——Pearson(皮尔逊)相关系数及MATLAB实现
- Pearson相关系数
- Pearson相关系数
- 相关性学习-皮尔逊相关系数2
- Pearson(皮尔逊)相关系数及MATLAB实现
- Pearson相关系数公式的四种形式及Python代码实现
- Pearson相关系数公式的四种形式及Python代码实现
- Pearson相关系数的疑问
- Pearson Correlation 皮尔逊相关系数
- 相似性度量--Pearson相关系数
- 域名跳转代码
- selenium -- Xpath 使用
- <五>、简单分析基于物品的 CF(Item CF)推荐算法
- centos7下svn服务器搭建并配置http
- IntelliJ IDEA 14 注册码
- 相关性学习—python实现Pearson相关系数
- IntelliJ IDEA For Mac 快捷键
- c#通用配置文件读写类与格式转换(xml,ini,json)
- 【DL--14】Keras案例学习-- CNN做手写字符分类(mnist_cnn )
- centos安装redis
- gulp菜鸟级零基础详细教程,嘴对嘴教会你怎么使用gulp
- spring,mybatis配置多数据源
- iOS官方demo下载网站
- nginx配置