linux cut 命令

来源：互联网发布：java 日志怎么写编辑：程序博客网时间：2024/04/27 17:21

一个非常有用的命令，主要用来提取各种各样的数据。

cut -cchars file
如：
    -c5     提取第5个字符
        -c5-    提取第5个字符以后的字符
        -c1,5,12 提取多个字符，中间用“,”符号隔开
        -c5-14 提取第5个字符到第14个字符间的字符
http://gan.cublog.cn
[service@dsg tmp]$ cat f.txt
service pts/0        Oct 9 20:27 (211.95.114.235)
service pts/1        Oct 9 21:06 (218.80.203.242)
service pts/2        Oct 9 14:35 (218.80.203.242)
service pts/3        Oct 9 21:07 (218.80.213.242)
service pts/4        Oct 9 21:07 (218.80.213.242)
service pts/5        Oct 9 21:45 (58.31.205.19)
[service@dsg tmp]$ cut -c5 f.txt
i
i
i
i
i
i
[service@dsg tmp]$ cut -c5- f.txt
ice pts/0        Oct 9 20:27 (211.95.114.235)
ice pts/1        Oct 9 21:06 (218.80.203.242)
ice pts/2        Oct 9 14:35 (218.80.203.242)
ice pts/3        Oct 9 21:07 (218.80.213.242)
ice pts/4        Oct 9 21:07 (218.80.213.242)
ice pts/5        Oct 9 21:45 (58.31.205.19)
[service@dsg tmp]$ cut -c1,5,14 f.txt
si0
si1
si2
si3
si4
si5

------------------------------
cut -d -f
-d, --delimiter=DELIM
              use DELIM instead of TAB for field delimiter

-f, --fields=LIST
              output only these fields; also print any line that contains no
              delimiter character, unless the -s option is specified

-d -f 主要用来从某中分隔符中提取数据
如：
[service@dsg tmp]$ cat f.txt
service1:pts/0:Oct 9 20:27: (211.95.114.235)
service2:pts/1:Oct 9 21:06: (218.80.203.242)
service3:pts/2:Oct 9 14:35: (218.80.203.242)
service4:pts/3:Oct 9 21:07: (218.80.213.242)
service5:pts/4:Oct 9 21:07: (218.80.213.242)
service6:pts/5:Oct 9 21:45: (58.31.205.19)
[service@dsg tmp]$ cut -d: -f1 f.txt
service1
service2
service3
service4
service5
service6
[service@dsg tmp]$ cut -d: -f2 f.txt
pts/0
pts/1
pts/2
pts/3
pts/4
pts/5
[service@dsg tmp]$ cut -d: -f3 f.txt
Oct 9 20
Oct 9 21
Oct 9 14
Oct 9 21
Oct 9 21
Oct 9 21
[service@dsg tmp]$ cut -d: -f5 f.txt
(211.95.114.235)
(218.80.203.242)
(218.80.203.242)
(218.80.213.242)
(218.80.213.242)
(58.31.205.19)
[service@dsg tmp]$ cut -d: -f9 f.txt

[service@dsg tmp]$ cut -d: -f1,4 f.txt #提取1和4列数据
service1:27
service2:06
service3:35
service4:07
service5:07
service6:45

Given that the fields are separated by tabs, you should use the -f option to cut instead:

如果文件使用tab键隔开就直接使用-f就可以了，-d的默认分隔符号就为tab键。
############################################################################################
任务：
将日志中所有形如下的数据中大小为6141的行作处理,提取121212120_1.jpg的前8位后列表
-rw-r--r--    1 root     root         6141 Sep 8 10:39 /data1/mypic/temp/121212120_1.jpg
(1)提取数据行
tail -200000 /tmp/temp_move_video_images.log|grep " root "|grep " 6141 "
(2)提取 /data1/mypic/temp/121212120_1.jpg部分
tail -200000 /tmp/temp_move_video_images.log|grep " root "|grep " 6141 "|awk '{print $9}'
（3）提取121212120_1.jpg部分
tail -200000 /tmp/temp_move_video_images.log|grep " root "|grep " 6141 "|awk '{print $9}'|awk -F/ '{print $5}'
（4）提取12121212
tail -200000 /tmp/temp_move_video_images.log|grep " root "|grep " 6141 "|awk '{print $9}'|awk -F/ '{print $5}'|awk -F_ '{print $1}' |cut -c1-8
（5）排去重复项
tail -200000 /tmp/temp_move_video_images.log|grep " root "|grep " 6141 "|awk '{print $9}'|awk -F/ '{print $5}'|awk -F_ '{print $1}' |cut -c1-8|awk '{count[$1]++}END{for(name in count)print name,count[name] }' |awk '{print $1}'