正则表达式 stringr包

来源：互联网发布：tor browser mac 编辑：程序博客网时间：2024/05/18 20:35

stringr包函数处理文本是游刃有余的

一、元字符

在正则表达式中，有12个字符有特殊用途

字符含义[ ]括号内的任意字符串\有两个含义：1、对元字符串进行转义 2、一些以 \ 开头的特殊序列表达了一些字符串组^匹配字符串的开始，将^置于character class的首位表达的意思是取反义
如：[^5]表示除了‘5’以外的任何字符$匹配字符串的结束。但将它置于character class内则消除了它的特殊含义。
如： [akm$]将匹配’a’,’k’,’m’或者’$’.匹配除换行符以外的任意字符。|或者？前面的字符(组)最多被匹配一次*前面的字符(组)将被匹配零次或多次+前面的字符(组)将被匹配一次或多次()表示一个字符组，括号内的字符串将作为一个整体被匹配。

1.1 重复

1.2 转义

如果我们想查找元字符本身，如”?”和”*“，我们需要提前告诉编译系统，取消这些字符的特殊含义。

这个时候，就需要用到转义字符\，即使用\?和\*.当然，如果我们要找的是\,则使用\\进行匹配。

注：R中的转义字符则是双斜杠：\\

1.3 R中预定义的字符组

1.4 代表字符组的特殊符号

二、主要函数

str_extract() 提取首个匹配模式的字符

str_extract_all(shopping_list, "\\b[a-z]+\\b")

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")str_extract(shopping_list, "\\d")[1] "4" NA  NA  "2"str_extract_all(shopping_list, "\\b[a-z]+\\b")[[1]][1] "apples"[[2]][1] "bag"   "of"    "flour"[[3]][1] "bag"   "of"    "sugar"[[4]][1] "milk"

str_locate() 返回首个匹配模式的字符的位置

str_locate_all() 返回所有匹配模式的字符的位置

fruit <- c("apple", "banana", "pear", "pineapple")str_locate(fruit, "a")     start end[1,]     1   1[2,]     2   2[3,]     3   3[4,]     5   5 str_locate_all(fruit, "a")[[1]]     start end[1,]     1   1[[2]]     start end[1,]     2   2[2,]     4   4[3,]     6   6[[3]]     start end[1,]     3   3[[4]]     start end[1,]     5   5

str_replace() 替换首个匹配模式

str_replace_all() 替换所有匹配模式

fruits <- c("one apple", "two pears", "three bananas")str_replace(fruits, "[aeiou]", "_")[1] "_ne apple"     "tw_ pears"     "thr_e bananas"str_replace_all(fruits, "([aeiou])", "")[1] "n ppl"    "tw prs"   "thr bnns"

str_split() 按照模式分割字符串

str_split_fixed() 按照模式将字符串分割成指定个数

fruits <- c(     "apples and oranges and pears and bananas",     "pineapples and mangos and guavas"   )   str_split(fruits, " and ")[[1]][1] "one apple"[[2]][1] "two pears"[[3]][1] "three bananas"str_split(fruits, " and ", simplify = TRUE)     [,1]           [1,] "one apple"    [2,] "two pears"    [3,] "three bananas"str_split_fixed(fruits, " and ", 2)     [,1]            [,2][1,] "one apple"     ""  [2,] "two pears"     ""  [3,] "three bananas" ""

str_detect() 检测字符是否存在某些指定模式

fruit <- c("apple", "banana", "pear", "pinapple")str_detect(fruit, "a")[1] TRUE TRUE TRUE TRUE

str_count() 返回指定模式出现的次数

fruit <- c("apple", "banana", "pear", "pineapple")str_count(fruit, "a")[1] 1 3 1 1

三、其他重要函数

str_sub() 提取指定位置的字符

hw <- "Hadley Wickham"str_sub(hw, 1, 6)[1] "Hadley"

str_dup() 重复指定位置的字符

fruit <- c("apple", "pear", "banana")str_dup(fruit, 2)[1] "appleapple"   "pearpear"     "bananabanana"

str_length() 返回字符的长度

fruit <- c("apple", "pear", "banana")str_length(fruit)[1] 5 4 6

str_pad() 填补字符

str_pad(c("a", "abc", "abcdef"), 10)[1] "    a"                "         a"           "                   a"

str_trim() 丢弃填充，如去掉字符前后的空格

str_trim("  String with trailing and leading white space\t")[1] "String with trailing and leading white space"str_trim("\n\nString with trailing and leading white space\n\n")[1] "String with trailing and leading white space"

str_c() 连接字符

str_c(letters, collapse = ", ")[1] "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y,

阅读全文

0 0