Using R to Fix Data Quality: Section 4
来源:互联网 发布:由诲女知之乎的意思 编辑:程序博客网 时间:2024/05/22 01:35
Section 4: Rule Checking
Overview
Sometimes, using some common sense to check the data is a good way to fix data quality problem. In R, there are some great features to help us checking rules conveniently. In this section, we are going to talk about how to check some rules in your data based on R.
Logical vector
Logical vector is a useful functionality in R. We can use some expressions to make logical vector directly. There is a good demo to show you how it works. We create a vector which includes several numbers. Then, we can make a logical vector to see how many numbers greater than 3 in this vector.Create a number vector to be our test data:
> v=1:5> v
[1] 1 2 3 4 5
Create the logical vector:
> grThree = v>3> grThree
[1] FALSE FALSE FALSE TRUE TRUE
Check the number of TRUE:
> sum(grThree)[1] 2
As can be seen, there are 2 numbers greater than 3 in vector v.
Show the numbers meet a condition:
> v[grThree][1] 4 5
Practice Question
1. Using weather.csv to be the data. How many values in Ozone are greater than 150? (using na.rm to delete NA)- Using R to Fix Data Quality: Section 4
- Using R to Fix Data Quality: Section 0
- Using R to Fix Data Quality: Section 1
- Using R to Fix Data Quality: Section 2
- Using R to Fix Data Quality: Section 3
- Using R to Fix Data Quality: Section 5
- Using R to Fix Data Quality: Section 6
- Using R to Fix Data Quality: Section 7
- Using R to Fix Data Quality: Section 8
- Using R to read and plot the csv data
- 学习Introduction to Data Analysis using R系列
- [译] 使用Using Data Quality Services (DQS) 清理用户数据
- using linker option to fix error LNK2005
- Microsoft buys Zoomix to add data quality to SQL Server
- How to generate high quality image by using Imagemagick
- How to implement Quality Of Service using Floodlight
- Microsoft Signs Agreement to Purchase Data Quality Start-up Zoomix
- Using OLE to Add Data
- Java学习笔记五
- 软件评测知识点
- 插入排序(InserSort)
- rqnoj-653
- hdu 1429 二进制 位压缩 bfs
- Using R to Fix Data Quality: Section 4
- linux 用户空间和内核空间延时函数
- 2013菏泽国际牡丹文化旅游节
- Android自动化测试之使用java调用monkeyrunner
- 构造函数的重载
- Java学习笔记六
- wAF绕过测试
- Netfilter中返回值的具体用法
- Java学习笔记七