shell中按列值统计行数及去除重复值

来源:互联网 发布:淘宝智能版店铺 编辑:程序博客网 时间:2024/05/28 06:05

测试文件在系统文件中的shell文件夹中的test.log文件,内容为:

abcdefg higklmn 12345 fuck!abcdefg higklmn 12345 fuck!abcdefg higklmn 12345 fuck!afdsaff adfgaga 63542 fdasgsdfasfd sdafadf 12345 asdgajfaldjf sdfasfs 63542 sdfadabcddfg higdfmn 12345 fuck!jfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 12345 sdfadabcdefg higklmn 67890 fuck!afdsaff adfgaga 63542 fdasgsdfasfd sdafadf 67890 asdgajfaldjf sdfasfs 67890 sdfadabcddfg higdfmn 63542 fuck!afdscff adfgada 67890 fdasgsdfagfd sdavadf 67890 asdgajfalsjf sdf4sfs 67890 sdfadjfalsjf sdf4sfs 67890 sdfadjfalsjf sdf4sfs 67890 sdfadjfalsjf sdf4sfs 67890 sdfadafdscff adfgada 12345 fdasgsdfagfd sdavadf 12345 asdga

1:首先查看日志文件:

[root@master ~]# cat /shell/test.log | sort -nabcddfg higdfmn 12345 fuck!abcddfg higdfmn 63542 fuck!abcdefg higklmn 12345 fuck!abcdefg higklmn 12345 fuck!abcdefg higklmn 12345 fuck!abcdefg higklmn 67890 fuck!afdsaff adfgaga 63542 fdasgafdsaff adfgaga 63542 fdasgafdscff adfgada 12345 fdasgafdscff adfgada 67890 fdasgjfaldjf sdfasfs 63542 sdfadjfaldjf sdfasfs 67890 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 67890 sdfadjfalsjf sdf4sfs 67890 sdfadjfalsjf sdf4sfs 67890 sdfadjfalsjf sdf4sfs 67890 sdfadsdfagfd sdavadf 12345 asdgasdfagfd sdavadf 67890 asdgasdfasfd sdafadf 12345 asdgasdfasfd sdafadf 67890 asdga

2:按照第三列的值的不同,统计各个值出现的行数,结果如下:

      使用awk命令:awk '{a[$3]++}END{for i in a}print i,a[i]}' /shell/test.log

[root@master ~]# awk '{a[$3]++}END{for(i in a)print i,a[i]}' /shell/test.log63542 467890 912345 11

3:查看某列中有几种不同的数值,输出:

      awk '{if(!a[$3]++) print $3}' /shell/test.log

[root@master ~]# awk '{if(!a[$3]++) print $3}' /shell/test.log123456354267890

4:查看某列中不同值的个数,并输出第一次在此列中出现的值的行:

      awk ‘{if(!($3 in a)){a[$3];print}}’ /shell/test.log

[root@master ~]# awk '{if(!($3 in a)){a[$3];print}}' /shell/test.logabcdefg higklmn 12345 fuck!afdsaff adfgaga 63542 fdasgabcdefg higklmn 67890 fuck!

5:uniq命令是去掉重复行,不过只能去掉相邻的重复行。    

[root@master ~]# uniq /shell/test.log | wc -l16

[root@master ~]# uniq /shell/test.log | sort -nabcddfg higdfmn 12345 fuck!abcddfg higdfmn 63542 fuck!abcdefg higklmn 12345 fuck!abcdefg higklmn 67890 fuck!afdsaff adfgaga 63542 fdasgafdsaff adfgaga 63542 fdasgafdscff adfgada 12345 fdasgafdscff adfgada 67890 fdasgjfaldjf sdfasfs 63542 sdfadjfaldjf sdfasfs 67890 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 67890 sdfadsdfagfd sdavadf 12345 asdgasdfagfd sdavadf 67890 asdgasdfasfd sdafadf 12345 asdgasdfasfd sdafadf 67890 asdga

      里面有重复值没有被完全去除

6:awk脚本中可以完全去掉重复行:

[root@master ~]# awk '{if(!(a[$0]++)){a[$0];print}}' /shell/test.log | wc -l15
[root@master ~]# awk '{if(!(a[$0]++)){a[$0];print}}' /shell/test.log | sort -nabcddfg higdfmn 12345 fuck!abcddfg higdfmn 63542 fuck!abcdefg higklmn 12345 fuck!abcdefg higklmn 67890 fuck!afdsaff adfgaga 63542 fdasgafdscff adfgada 12345 fdasgafdscff adfgada 67890 fdasgjfaldjf sdfasfs 63542 sdfadjfaldjf sdfasfs 67890 sdfadjfalsjf sdf4sfs 12345 sdfadjfalsjf sdf4sfs 67890 sdfadsdfagfd sdavadf 12345 asdgasdfagfd sdavadf 67890 asdgasdfasfd sdafadf 12345 asdgasdfasfd sdafadf 67890 asdga
      完全去除重复值

      通过结果可以看出,uniq命令得到16行,awk命令得到15行,上面两行重复值在此处已经去除了。

1 0