2017-7-12 文本处理工具

来源：互联网发布：淘宝2016春秋连衣裙编辑：程序博客网时间：2024/05/20 03:42

1.tee 命令 将管道输出存入文件里但不影响管道输出

[root@localhost ~]# cat /etc/passwd | grep roo | tee a.txt |grep ^root

root:x:0:0:root:/root:/bin/bash

[root@localhost ~]# cat a.txt

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

2. 2>&1重定向标准错误到标准输出(错误输出不能通过管道)

[root@localhost ~]# ls /etc/passwd /etc/ffffff 2>&1 |cat

ls: cannot access /etc/ffffff: No such file or directory

/etc/passwd

3.() 组合多个程序的标准输出

[root@localhost ~]# (ls -ld /tmp/;cat 1.txt)

drwxrwxrwt. 31 root root 4096 Jul 12 14:59 /tmp/

baaaa

bac

baz

babababababa

baba

bab

4.tr 命令 修改字符

[root@localhost ~]# tr 'a' 'A' < 1.txt

bAAAA

bAc

bAz

5.<< 命令 发送多行给标准输入

[root@localhost ~]# cat <<end >1.txt

> 123

> 456

> 789

> end

[root@localhost ~]# cat 1.txt

123

456

789

6.< 命令 重定向标准输入

[root@localhost ~]# cat < 1.txt

123

456

789

7.head 命令 显示文件的前十行

开关：-n 显示出行数正数是显示从第一行开始的行数，负数是减去该数字显示剩余的行数

[root@localhost ~]# head -n2 1.txt

123

456

tail 命令显示文件的后十行

开关：-n 显示出行数正数负数没有区别

-f 显示在此之后文件的额外信息，会不断的刷新常用于监控日志文件 --refresh

[root@localhost ~]# tail -n2 1.txt -f

456

789

tail: 1.txt: file truncated（在另一个终端对该文件进行添加内容后此处会显示添加的内容）

new add

head 和tail一起截取内容

cat XXX | head -n $((起始行数+要截行数-1)) | tail -n 要截行数

[root@localhost ~]# cat 1.txt | head -n $((5+4-1)) | tail -n 4

2222

[root@localhost ~]# cat 1.txt

new add

4444

555

666

2222

7777

999

3333

222

8.less more cat 命令 查看内容

cat 可以查看多个文件

[root@localhost ~]# cat 1.txt passwd

less 一次查看文件或者标准输出的一页

/：查找关键字 n/N跳转到下一个/上一个匹配处

在man命令中使用less分页方便些

9.grep 命令 筛选关键字

开关： -i 查询是不区分大小写

-n 打印时匹配行号

-v 打印不包括匹配的行取反

-r（R）在文件夹下文件内容查找关键字

-Ax 匹配之后的x行

-Bx 匹配之前的x行

-E 加强开关与egrep作用相同

让grep的关键字有颜色

alias grep=‘grep --color=always’

将过滤出的内容保存时，让文档内容时不带颜色的乱码（有别名时要）用egrep

10.cut 命令 显示文件指定列或者标准输入的数据

开关：-d 指定列的分割符

-f 指定要打印的列

[root@localhost ~]# ifconfig eth1 | grep 'inet ' | cut -d ':' -f2 | cut -d ' ' -f1

172.24.91.23

-c 按字符切分

[root@localhost ~]# echo 'afsdf12345dfsf'|cut -c6-10

12345

11.globbing（通配符） regular expression（正则表达式）

ba* 代表以ba开头的 ’ba*‘代表有零个a或者无数个a

任何单词 ‘(ba)*’代表^$或者无数个ba

ba？代表ba开头的 ’ba？‘代表有一个或者零个a

三个字母的单词 ‘(ba)?’代表零个或一个ba

[ba]代表b或者a其中 ‘[ba]’与左边意思相同

一个

‘ba+’ 一个a及以上

’(ba)+‘一个ab及以上

‘ba{min，[max]}’出现的次数范围

12.wc 命令 统计单词，行和字节

开关：-l 统计行

-w 统计单词

[root@localhost ~]# wc 1.txt （行数单词数占用空间）

7 6 34 1.txt

[root@localhost ~]# cat 1.txt

babababa

bababa

baaaaaa

bac

[root@localhost ~]# wc 1.txt -l

7 1.txt

[root@localhost ~]# wc 1.txt -w

6 1.txt

13.sort 命令 排序文本

开关：-r 反向排序

-n 按数字排序

-f 忽略大小写

-u 移除重复的行

[root@localhost ~]# sort 1.txt -n -u

444

666

777

8888

uniq 命令去除重复的行但是他只是将相邻的重复去除不是将全文范围清除

开关：-c 统计重复出现的次数

[root@localhost ~]# uniq 1.txt

444

777

8888

444

666

要与sort结合在一起用

[root@localhost ~]# sort 1.txt |uniq -c

1 00

2 3

1 33

2 444

1 66

3 666

1 777

1 8888

1 9

14.注意

[root@localhost ~]# cat 1.txt | grep -E 00 >1.txt

[root@localhost ~]# cat 1.txt

[root@localhost ~]#

不要将对一个文件的查看筛选的输出再存入该文件会导致文件为空。因为会先再生成一个空的1.txt将原来的覆盖再cat 再grep 最后输出为空

要先将该结果存在另一个临时文件里，再cat另一个文件将输出存入原文件，再删除临时文件（汉诺塔游戏）

[root@localhost ~]# cat 1.txt | grep -E 00 >1.tmp;cat 1.tmp>1.txt;rm -rf 1.tmp

阅读全文

0 0