Using R to Fix Data Quality: Section 3
来源:互联网 发布:vb.net sqlite 编辑:程序博客网 时间:2024/05/01 03:47
Section 3: Correlation Test
Overview
In this section, we are going to talk about the relationship between two variables, making a scatter plot for two columns, and making a correlation test.
Scatter Plots
In last section, we use some useful way to visualize one variable. Now, I am happy to show you a way to visualize two variables. This way is to generate a scatter plot. A scatter plot can put one variable on the x-axis and the other variable on the y-axis. From a scatter plot, we can see the relationship between two variables easily.
Read the data from weather.csv again:
> data=read.csv("weather.csv")Create scatter plot:
> plot(data$Wind, data$Temp)Correlation Test
One way to test the relationship between two variables is to check the correlation. If a variable is increasing when another variable is increasing, we say that they are positively correlated. If a variable is increasing when another variable is decreasing, they are negatively correlated. It is also possible that two variables have no relationship, so they are uncorrelated.
There is more than one way to make a correlation test. Because we are not sure if the data are normally distributed, Spearman’s rho is a good idea to be used for our statistical correlation test.
Make a Spearman’s rho:
> cor.test(data$Wind, data$Temp, method="spearman")
This code can get a sample estimate of the rho. The range of the value should be between -1 and 1. A value of -1 means negative correlation, and a value of 1 means positive correlation. If the value is 0, they are uncorrelated.
Practice Questions
1. What is the relationship between Wind and Temp?2. If two variables are correlated, does that mean that one must cause the other?
- Using R to Fix Data Quality: Section 3
- Using R to Fix Data Quality: Section 0
- Using R to Fix Data Quality: Section 1
- Using R to Fix Data Quality: Section 2
- Using R to Fix Data Quality: Section 4
- Using R to Fix Data Quality: Section 5
- Using R to Fix Data Quality: Section 6
- Using R to Fix Data Quality: Section 7
- Using R to Fix Data Quality: Section 8
- Using R to read and plot the csv data
- 学习Introduction to Data Analysis using R系列
- [译] 使用Using Data Quality Services (DQS) 清理用户数据
- using linker option to fix error LNK2005
- Microsoft buys Zoomix to add data quality to SQL Server
- How to generate high quality image by using Imagemagick
- How to implement Quality Of Service using Floodlight
- Microsoft Signs Agreement to Purchase Data Quality Start-up Zoomix
- Using OLE to Add Data
- Java数组浅析
- JAVA笔记4
- How to Read a Technical Paper
- 开源 免费 java CMS - FreeCMS1.2-标签 unitList
- 利用python2.7打造1个web站点2
- Using R to Fix Data Quality: Section 3
- USB8616 功能分类: 高速并行数据采集卡
- PCI8524 并行8Ch24Bits100Ksps高精度数据采集卡
- Java中的引用概念
- JDK下载地址
- 写2章的书,挖掘机+LabVIEW
- Java基础之循环的使用
- rqnoj-257
- 学习tcp/ip应记住的东西