正则表达式 stringr包

来源:互联网 发布:tor browser mac 编辑:程序博客网 时间:2024/05/18 20:35

stringr包函数处理文本是游刃有余的

一、元字符

在正则表达式中,有12个字符有特殊用途

字符含义[ ]括号内的任意字符串\有两个含义:1、 对元字符串进行转义     2、一些以 \ 开头的特殊序列表达了一些字符串组^匹配字符串的开始,将^置于character class的首位表达的意思是取反义
如:[^5]表示除了‘5’以外的任何字符$匹配字符串的结束。但将它置于character class内则消除了它的特殊含义。
如: [akm$]将匹配’a’,’k’,’m’或者’$’.匹配除换行符以外的任意字符。|或者?前面的字符(组)最多被匹配一次*前面的字符(组)将被匹配零次或多次+前面的字符(组)将被匹配一次或多次()表示一个字符组,括号内的字符串将作为一个整体被匹配。  

1.1  重复



1.2 转义

如果我们想查找元字符本身,如”?”和”*“,我们需要提前告诉编译系统,取消这些字符的特殊含义。

这个时候,就需要用到转义字符\,即使用\?和\*.当然,如果我们要找的是\,则使用\\进行匹配。

注:R中的转义字符则是双斜杠:\\


1.3 R中预定义的字符组


1.4 代表字符组的特殊符号


二、主要函数

str_extract()   提取首个匹配模式的字符
str_extract_all(shopping_list, "\\b[a-z]+\\b")
shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")str_extract(shopping_list, "\\d")[1] "4" NA  NA  "2"str_extract_all(shopping_list, "\\b[a-z]+\\b")[[1]][1] "apples"[[2]][1] "bag"   "of"    "flour"[[3]][1] "bag"   "of"    "sugar"[[4]][1] "milk"

str_locate()           返回首个匹配模式的字符的位置 
str_locate_all()    返回所有匹配模式的字符的位置 
fruit <- c("apple", "banana", "pear", "pineapple")str_locate(fruit, "a")     start end[1,]     1   1[2,]     2   2[3,]     3   3[4,]     5   5 str_locate_all(fruit, "a")[[1]]     start end[1,]     1   1[[2]]     start end[1,]     2   2[2,]     4   4[3,]     6   6[[3]]     start end[1,]     3   3[[4]]     start end[1,]     5   5

str_replace()        替换首个匹配模式 
str_replace_all() 替换所有匹配模式 
fruits <- c("one apple", "two pears", "three bananas")str_replace(fruits, "[aeiou]", "_")[1] "_ne apple"     "tw_ pears"     "thr_e bananas"str_replace_all(fruits, "([aeiou])", "")[1] "n ppl"    "tw prs"   "thr bnns"

str_split()              按照模式分割字符串 

str_split_fixed()   按照模式将字符串分割成指定个数 

fruits <- c(     "apples and oranges and pears and bananas",     "pineapples and mangos and guavas"   )   str_split(fruits, " and ")[[1]][1] "one apple"[[2]][1] "two pears"[[3]][1] "three bananas"str_split(fruits, " and ", simplify = TRUE)     [,1]           [1,] "one apple"    [2,] "two pears"    [3,] "three bananas"str_split_fixed(fruits, " and ", 2)     [,1]            [,2][1,] "one apple"     ""  [2,] "two pears"     ""  [3,] "three bananas" ""  

str_detect()   检测字符是否存在某些指定模式 

fruit <- c("apple", "banana", "pear", "pinapple")str_detect(fruit, "a")[1] TRUE TRUE TRUE TRUE

str_count()  返回指定模式出现的次数 

fruit <- c("apple", "banana", "pear", "pineapple")str_count(fruit, "a")[1] 1 3 1 1

三、其他重要函数

str_sub()   提取指定位置的字符 

hw <- "Hadley Wickham"str_sub(hw, 1, 6)[1] "Hadley"

str_dup() 重复指定位置的字符 

fruit <- c("apple", "pear", "banana")str_dup(fruit, 2)[1] "appleapple"   "pearpear"     "bananabanana"


str_length() 返回字符的长度 

fruit <- c("apple", "pear", "banana")str_length(fruit)[1] 5 4 6


str_pad()  填补字符 

str_pad(c("a", "abc", "abcdef"), 10)[1] "    a"                "         a"           "                   a"


str_trim() 丢弃填充,如去掉字符前后的空格 

str_trim("  String with trailing and leading white space\t")[1] "String with trailing and leading white space"str_trim("\n\nString with trailing and leading white space\n\n")[1] "String with trailing and leading white space"


str_c() 连接字符 

str_c(letters, collapse = ", ")[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y,


原创粉丝点击