Using R to Fix Data Quality: Section 4

来源:互联网 发布:由诲女知之乎的意思 编辑:程序博客网 时间:2024/05/22 01:35

Section 4: Rule Checking


Overview

Sometimes, using some common sense to check the data is a good way to fix data quality problem. In R, there are some great features to help us checking rules conveniently. In this section, we are going to talk about how to check some rules in your data based on R.


Logical vector

Logical vector is a useful functionality in R. We can use some expressions to make logical vector directly. There is a good demo to show you how it works. We create a vector which includes several numbers. Then, we can make a logical vector to see how many numbers greater than 3 in this vector.

Create a number vector to be our test data:

> v=1:5
> v
[1] 1 2 3 4 5

Create the logical vector:

> grThree = v>3
> grThree
[1] FALSE FALSE FALSE  TRUE  TRUE

Check the number of TRUE:

> sum(grThree)
[1] 2
As can be seen, there are 2 numbers greater than 3 in vector v.

Show the numbers meet a condition:

> v[grThree]

[1] 4 5


Practice Question

1. Using weather.csv to be the data. How many values in Ozone are greater than 150? (using na.rm to delete NA)