linux 初级学习之文本处理工具2-6

来源：互联网发布：淘宝店铺封店重开技巧编辑：程序博客网时间：2024/05/29 16:12

第六单元文本处理工具

1.diff 命令用于比较两个文件的内容 , 以了解其区别。它还可用于创建补丁文件。补丁文件用于在企业环境的多台计算机之间对相似文件进行更改
-c显示上下文周围的行
-u 使用统一输出格式（对于生成补丁文件很有用）
-r 从指定的目录开始执行第归之比较
[root@localhost Desktop]# vim file
[root@localhost Desktop]# vim file1
[root@localhost Desktop]# cat file
hello westos
[root@localhost Desktop]# cat file
hello westos
[root@localhost Desktop]# cat file1
hello westos
hahaha
[root@localhost Desktop]# diff file file1   ##比较两个文件的内容并显示
1a2
> hahaha
[root@localhost Desktop]# diff -c file file1 ##-c显示上下文周围的行
*** file   2017-05-01 05:05:49.099517460 -0400
--- file1   2017-05-01 05:06:05.199517460 -0400
***************
*** 1 ****
--- 1,2 ----
hello westos

+ hahaha

[root@localhost Desktop]# diff -u file file1 ##-u 使用统一输出格式
--- file 2017-05-01 05:05:49.099517460 -0400
+++ file1 2017-05-01 05:06:05.199517460 -0400
@@ -1 +1,2 @@
hello westos
+hahaha
[root@localhost Desktop]# diff -r file file1 ##-r 从指定的目录开始执行第归之比较
1a2

> hahaha

2.修补命令 patch
patch 采用补丁文件 patchfile ( 包含由 diff 生成的差异列表 ) 并将这些差异应用于生成补丁版的一个或多个原始文件。通常 , 补丁版替换原始文件 , 但当指定 -b 选项时 , 可以制作备份。将用 .orig 文件名后缀重命名原始文件
• patch 可用于将简单的补丁文件应用于使用以下语法的单个文件
– [root@host etc]# patch issue patchfile     Patching file issue
以下命令显示如何使用通过 diff -Naur 创建的补丁文件。用户更改为与从中创建补丁文件的原始目录相似的可比较目录后 , 将执行 patch– [user@host orig-dir]$ patch -b < /tmp/patchfile
Patching file hosts
Patching file network
[root@localhost Desktop]# diff -u file file1 > file.path ##生成补丁文件，补丁版替换原始文件
[root@localhost Desktop]# cat file.path
--- file   2017-05-01 05:05:49.099517460 -0400
+++ file1   2017-05-01 05:06:05.199517460 -0400
@@ -1 +1,2 @@
hello westos
+hahaha
[root@localhost Desktop]# patch file file.path
patching file file
[root@localhost Desktop]# cat file    ##file内容被替换
hello westos
hahaha
[root@localhost Desktop]# ls

file file1 file.path

[root@localhost Desktop]# patch -b file file.path ##当指定-b,可以制作备份,将用.orig 文件名后缀重命名原始文件
patching file file
Reversed (or previously applied) patch detected! Assume -R? [n] y
[root@localhost Desktop]# ls
file file1 file.orig file.path
[root@localhost Desktop]# cat file.orig
hello westos
hahaha
[root@localhost Desktop]# cat file

hello westos

3.grep 命令
• grep 将显示文件中与模式匹配的行。其也可以处理标准输入
• 模式可以包含正则表达式元字符 , 因此始终为正则表达式加引号通常被视为一种好办法。在本单元后面的部分中将介绍基本正则表达式
-i执行不区分大小写搜索
-n前置返回行的行号
-r 从文件开始执行第归之搜粟索，从命名目录开始
-c显示有匹配模式的行的计数
-v返回不包含模式的行
grep -i pattern files ：不区分大小写地搜索。默认情况区分大小写，
grep -l pattern files ：只列出匹配的文件名，
grep -L pattern files ：列出不匹配的文件名，
grep -w pattern files ：只匹配整个单词，而不是字符串的一部分(如匹配’magic’，而不是’magical’)，
grep -C number pattern files ：匹配的上下文分别显示[number]行，
grep pattern1 | pattern2 files ：显示匹配 pattern1 或 pattern2 的行，
grep pattern1 files | grep pattern2 ：显示既匹配 pattern1 又匹配 pattern2 的行。

grep -n pattern files 即可显示行号信息

grep -c pattern files 即可查找总行数

这里还有些用于搜索的特殊符号：
\< 和 \> 分别标注单词的开始与结尾。
例如：
grep man * 会匹配 ‘Batman’、’manic’、’man’等，
grep ‘\<man’ * 匹配’manic’和’man’，但不是’Batman’，
grep ‘\<man\>’ 只匹配’man’，而不是’Batman’或’manic’等其他的字符串。
‘^’：指匹配的字符串在行首，
‘$’：指匹配的字符串在行尾，

为了实验结果容易观察，将/etc/passwd/复制在/mnt/删除部分做实验
[root@localhost mnt]# cat passwd ##在下面加入部分字符
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
test:root:test
root:test:root
root:root:test
TEST:root:TEST
[root@localhost mnt]# grep -i test passwd ##-i执行不区分大小写搜索
test:root:test
root:test:root
root:root:test

TEST:root:TEST

[root@localhost mnt]# grep -i test passwd -v ##-v返回不包含test的行
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

operator:x:11:0:operator:/root:/sbin/nologin

[root@localhost mnt]# grep -i -E "test|root" passwd ##不区分大小写显示含有test或者root的行
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
test:root:test
root:test:root
root:root:test

TEST:root:TEST

[root@localhost mnt]# grep -E "test|root" passwd ##只显示小写
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
test:root:test
root:test:root
root:root:test

TEST:root:TEST

[root@localhost mnt]# grep -i -E "^test" passwd ##不区分大小写显示以test开头的行
test:root:test

TEST:root:TEST

[root@localhost mnt]# grep -i -E "test$" passwd ##不区分大小写显示以test结尾的行
test:root:test
root:root:test

TEST:root:TEST

[root@localhost mnt]# grep "test" passwd | grep -E "^test|test$" -v ##显示含有test但不以test开头且不以test结尾的行

root:test:root

4.Cut 命令
• cut 用于 “ 剪切 ” 文件中的文本字段或列并将其显示到标准输出
-d 指定用于提取字段的分隔符
-f指定要从每行中提取的字段
-c指定要从每行提取的文本列
[root@localhost mnt]# cut -d : -f 1-3 passwd ##以：为分隔符，选出passwd中每行的1-3个字段
root:x:0
bin:x:1
daemon:x:2
adm:x:3
lp:x:4
sync:x:5
shutdown:x:6
halt:x:7
mail:x:8
operator:x:11
test:root:test
root:test:root
root:root:test

TEST:root:TEST

[root@localhost mnt]# cut -c 2-4 passwd ##取出每行的2-4列
oot
in:
aem
dm:
p:x
ync
hut
alt
ail
per
est
oot
oot

EST

[root@localhost mnt]# ifconfig eth0 | grep inet | grep inet6 -v | awk -F " " '{print $2}' ##取出网络配置文件中的ip

172.25.254.134

5.sort 命令
• sort 用于排序文本数据。该数据可以位于文件中或其他命令输出中。 Sort 通常与管道一起使用
-n按数值而非字符排序
-k设置排序字段
-t指定其他字段分隔符（默认空格）

[root@localhost mnt]# sort -n file ##-n表示按数值排序
2
3
04
5
8
12
21
51
57
73
82
84

[root@localhost mnt]# sort file ##按字符排序

04
12
2
21
3
5
51
57
73
8
82
84

[root@localhost mnt]# sort -rn file ##-rn 按数值倒序排列
86
84
82
73
57
51
21
12
8
5
04
3

[root@localhost mnt]# sort -rn file | uniq -c ##-rn倒序 uniq -c统计出现的次数
      2 86
      1 82
      2 51
      1 21
      3 12
      1 8
      1 5
      1 3

1 2

[root@localhost mnt]# ps ax -o pid --sort -%mem | grep PID -v | head -n 5 ##仅显示当前占用内存前五的进程的pid
1901
623
1952
1836

508

6.uniq 命令
uniq“ 删除 ” 文件中重复的相邻行。若要只打印文件中出现的唯一行(“ 删除 ” 所有重复行 ), 必须首先对 uniq 的输入进行排序。由于可以为uniq 指定其决策所基于的字段或列 , 因此这些字段或列是对其输入进行排序所必须的字段或列。如果未与选项一起使用 , uniq 会使用整个记录作为决策键 , 删除其输入中的重复行
-u仅显示唯一行
-d显示重复行
-c每行显示一次包括显示计数
[root@localhost mnt]# uniq -d file
12
[root@localhost mnt]# cat file ##-d显示重复行
12
51
82
86
21
2
5
8
3
51
12
12

[root@localhost mnt]# uniq -u file ##每个仅显示一次
12
51
82
86
21
2
5
8
3
51

[root@localhost mnt]# uniq -c file ##-c每行显示一次包括显示计数
      1 12
      1 51
      1 82
      1 86
      1 21
      1 2
      1 5
      1 8
      1 3
      1 51
      2 12

1 86

7.tr 命令
• tr 用于转字符 : 即 , 如果给定了两个字符范围 , 某个字符位于第一个范围中 , 对等的字符。该命令通常在 shell 脚本中使用 , 情况转换数据
• tr 'A-Z' 'a-z' <file
[root@localhost mnt]# cat file1
westos
WESTOS
[root@localhost mnt]# tr 'a-z' 'A-Z' < file1 ##将大写转换为小写
WESTOS
WESTOS
[root@localhost mnt]# tr 'A-Z' 'a-z' < file1 ##将小写转换为大写
westos

westos

8.sed 命令
• sed 命令是流编辑器 , 用于对文本数据流执行编辑。假定要处理一个文件名 , sed 将对文件中的所有行执行搜索和替换 , 以将修改后的数据发送到标准输出 ; 即 , 其实际上并不修改现有文件。与 grep 一样 , sed 通常在管道中使用
• 由于 sed 命令通常包含可以解释为 shell 元字符的字符 ,因此请按下面示例所示引用 sed 命令。默认情况下 , sed对文件中的所有行执行操作。在提供 sed 时 , 可带有地址( 将命令限制用于仅那些行 )
s/old/new/执行字符串转换，将iod替换为new
d删除匹配的行

[root@localhost mnt]# sed 's/sbin/westos/g' passwd ##将passwd中的sbin替换为westos

[root@localhost mnt]# sed 's/nologin/lee/g' passwd ##将passwd中的nologin替换为lee

[root@localhost mnt]# sed -e 's/westos/sbin/g' -e 's/lee/nologin/g' -i passwd ##将passwd中的westos替换为sbin同时lee替换为nologin

1 0