stringr数据处理

来源：互联网发布：微信淘宝客怎么拉人编辑：程序博客网时间：2024/05/18 17:44

stringr数据处理

前言：

在数据处理阶段，主要用到的是dplyr包，但随着数据的多样性和复杂性，对字符串的处理越来越重要，R语言基础的数据处理能力一般，且使用不够方便。为此，学习stringr包能解决字符串处理的所有问题，它建华了R语言中字符串的转换，搜索，辨识，定位，匹配，替换，提取，分离等操作，同时封装了一些复杂的字符串处理函数。

一、字符串拼接函数

1.word()函数：从句子中提取词组 - 调用公式：

word(string,start= ,end= ,sep=fixed(" "))#sep为字符之间的分隔符，默认是空格

简单例子：

library(stringr)data<-'Using R programming to work for data science'#提取后两个字符word(data,-2:-1)## [1] "data"    "science"#从第1个单词开始，提取前3个单词word(data,start=1,end=3)## [1] "Using R programming"

2.str_wrap()函数：段落操作 - 调用公式：

str_wrap(string,width=80,indent=0,exdent=0)# width：设定每行的宽度# indent:设定每个段落第一行的缩进格式，默认无缩进# exdent:设定每个段落除了第一行的缩进格式，默认无缩进

简单例子：

string<-"New York is 3 hours ahead of California, but it does not make California slow. Someone graduated at age of 22, but waited 5 years before securing a good job!"str_wrap(string,width=80,indent=4)## [1] "    New York is 3 hours ahead of California, but it does not make California slow.\nSomeone graduated at age of 22, but waited 5 years before securing a good job!"# \n 是换行符# cat()函数，在转义符处连接句子cat(str_wrap(string,indent=4),sep="\n")##     New York is 3 hours ahead of California, but it does not make California slow.## Someone graduated at age of 22, but waited 5 years before securing a good job!

3.str_trim()函数：剔除字符串中多余空格 - 调用公式：

str_trim(string,side="both"/"left"/"right")# side 表示剔除字符两边/左边/右边的空格

简单例子：

data<-'  Using R to explore the data science   'str_trim(data,side="both")## [1] "Using R to explore the data science"# 前后空格均删除

4.str_c()函数：字符串连接 - 调用公式：

str_c(...,sep=" ",collapse = NULL)# sep:字符串之间的连接符，功能类似于paste()函数# collapse:如果是向量之间的连接，collapse的作用与sep一样，只不过此时sep无效

简单例子：

str_c('x_',c(1:10),':')##  [1] "x_1:"  "x_2:"  "x_3:"  "x_4:"  "x_5:"  "x_6:"  "x_7:"  "x_8:" ##  [9] "x_9:"  "x_10:"str_c(c(2016,05,13),collapse = '-')## [1] "2016-5-13"# 向量内连接，collapse可代替sep

5.str_pad()函数：字符填充 - 调用公式：

str_pad(string,width,side=("left","right","both"),sep=" ")# width 填充字符后的宽带# side 填充的方向，默认向左填充# sep 填充的字符内容，默认空格填充

简单例子：

data<-'Michael_Jordan'str_pad(data,width = 20,side = "both",pad="*")## [1] "***Michael_Jordan***"

6.str_dup()函数：复制字符串 - 调用公式：

str_dup(string,times)# times：复制字符串的次数

简单例子：

data<-c("A","B","C","D")str_dup(data,2)## [1] "AA" "BB" "CC" "DD"str_dup(data,1:4)## [1] "A"    "BB"   "CCC"  "DDDD"

7.str_sub()函数：截取字符串 - 调用公式：

str_sub(string,start=,end=)# 功能与word()类似，区别在于，sub截取的是字符串的子串，且能起到替换的作用，# word()提取的是单词。

简单例子：

data<-"Using R programming to work for data science"str_sub(data,1,4)## [1] "Usin"word(data,1,4)## [1] "Using R programming to"#发现sub截取字符，word截取单词str_sub(data,1,7)<-'Using Python';data## [1] "Using Python programming to work for data science"

二、字符串计算函数

1.str_length():字符串长度，类似与nchar()函数

fruit<-c('apple','banana','pear',NA)str_length(fruit)## [1]  5  6  4 NAnchar(fruit)## [1]  5  6  4 NA

2.str_count():字符串计数函数

str_count(fruit,pattern="a")## [1]  1  3  1 NA## 对数字的检测 \\dstr_count(fruit,'\\d')## [1]  0  0  0 NA

3.str_order()，str_sotr():对字符向量排序

fruit<-c('banana','pear','orange','apple','pinapple')str_sort(fruit,decreasing=F) ##升序## [1] "apple"    "banana"   "orange"   "pear"     "pinapple"str_order(fruit) ##返回升序顺序的索引## [1] 4 1 3 2 5fruit[str_order(fruit)]## [1] "apple"    "banana"   "orange"   "pear"     "pinapple"

三、字符串匹配函数

1.str_split(),str_split_fixed():字符串分割函数

data<-'myxyznamexyzisxyzkobexyzbryant!'str_split(data,'xyz') ##返回列表，pattern参数 xyz## [[1]]## [1] "my"      "name"    "is"      "kobe"    "bryant!"str_split_fixed(data,'xyz',5) ##返回矩阵形式##      [,1] [,2]   [,3] [,4]   [,5]     ## [1,] "my" "name" "is" "kobe" "bryant!"

2.str_match(),str_match_all():提取匹配的字符串

string <- c('139-1234-5678','133,1267,4589','134 6543 7890','178 2345 1111 or 133 7890 1234')str_match(string,'[1][3-9]{2}[- ,][0-9]{4}[- ,][0-9]{4}')##      [,1]           ## [1,] "139-1234-5678"## [2,] "133,1267,4589"## [3,] "134 6543 7890"## [4,] "178 2345 1111"##解释一下：[]表示要匹配的字符，{}表示匹配个数

3.str_detect():检测字符串是否存在某种匹配模式

str_detect(fruit,'an') ##匹配an## [1]  TRUE FALSE  TRUE FALSE FALSEstr_detect(fruit,'\\d') ##匹配数字## [1] FALSE FALSE FALSE FALSE FALSE

四、字符串变换函数

1.str_to_upper,str_to_lower,str_to_title :字符串转换

data<-'a new way to explore the world'str_to_upper(data)## [1] "A NEW WAY TO EXPLORE THE WORLD"str_to_title(data,locale = "") ##标题首字母大写## [1] "A New Way To Explore The World"

2.str_subset():使用正则表达式匹配字符串中的值

##开头匹配str_subset(fruit,'^a')## [1] "apple"##结尾匹配str_subset(fruit,'e$')## [1] "orange"   "apple"    "pinapple"

3.str_replace():字符串替换

string <- c('139-1234-5678','133,1267,4589','134 6543 7890','178 2345 1111 or 133 7890 1234')string<-str_match_all(string,'[1][3-9]{2}[- ,][0-9]{4}[- ,][0-9]{4}')string<-str_replace_all(string,',','-')str_replace_all(string,' ','-')## [1] "139-1234-5678"                          ## [2] "133-1267-4589"                          ## [3] "134-6543-7890"                          ## [4] "c(\"178-2345-1111\"--\"133-7890-1234\")"

阅读全文

0 0