@如何删除重复的行?@

来源:互联网 发布:java爬虫获取天气预报 编辑:程序博客网 时间:2024/05/20 07:13

假如我们有一个文件file,然后想要删除该文件中重复的行,那么我们有哪些方法呢?

file文件的内容如下:

my friends, xiaoying
my teacher, xiaoniu
my teacher, xiaoniu
my fuqin, father
my sister, wushiying
my sister, wushiying
my friends, xiaoying
my teacher, xiaoniu
my fuqin, father
my sister, wushiying
my friends, xiaoying
my fuqin, father

方法一:awk '{if ($0!=line) print;line=$0}' file

也就是:

cat file |sort |awk '{if ($0!=line) print;line=$0}'【因为这个需要先排序,才能够用这样的方法~】

原理:

因为awk也是一次读入一行,line第一次为空【line 是 awk 的变量,像shell中的一样不需事先声明,没给它赋值前当然就是空的】

所以自然就不等于$0($0为"my friend,xiaoying"),所以就打印了;接着把line的值赋为$0;然后awk又读入一行,由于此时$0的值

与line相同(均为"my friend,xiaoying"),所以就不打印了。当读入"my teacher, liyong"时,$0与line(值为"my friend,xiaoying")又不

同了,所以打印出来,其余的以此类推。

方法二:【这个是最简单的~】

[root@sor-sys zy]# cat file| sort | uniq 
my friends, xiaoying
my fuqin, father
my sister, wushiying
my teacher, xiaoniu

方法三:

文件rmdup.sed的内容如下:

#n rmdup.sed - ReMove DUPlicate consecutive lines

# read next line into pattern space (if not the last line)
$!N

# check if pattern space consists of two identical lines
s/^\(.*\)\n\1$/&/
# if yes, goto label RmLn, which will remove the first line in pattern space
t RmLn
# if not, print the first line (and remove it)
P

# garbage handling which simply deletes the first line in the pattern space
: RmLn
D

[root@sor-sys zy]# cat file|sort |sed -f rmdup.sed
my friends, xiaoying
my fuqin, father
my sister, wushiying
my teacher, xiaoniu

原创粉丝点击