shell、gawk、sed使用散记

来源：互联网发布：卷积神经网络算法原理编辑：程序博客网时间：2024/05/19 18:42

查看特定目录下大小在前10位的文件

find 目录 -ls |sort -nrk7  |head-k7指的是按照第7列的值进行排序，r降序，n数字比较

过滤最大文件的名称：

find ./ -ls |sort -nrk7 | head -n 1 | gawk '{print $NF}'

查看特定目录下大小在前10位的文件夹

du -S 目录 |sort -nr |head

sed (Stream EDitor)

删除空白行：

sed '/^$/d' -i testfile

删除以空格开头的行中开头的空格：

sed 's#^[[:space:]]\+##' -i testfile

删除文件中所有以#开头，后面至少跟一个空白字符的行的行首的#号及#后面的所有空白字符

sed 's@^#[[:space:]]\+@@' testfile

每一行行首增加#号

sed 's@^@#&@' testfile

AWK（data driven ，search files for lines that contain certain patterns）

2 ways to Run awk Programs:

1. for short command: awk 'program' input-file1 input-file2 … 2. for long program: awk -f program-file input-file1 input-file2 …

找出某一个目录中最大的文件大小和名称：

ls -l | gawk '{if ($5 >max){max=$5;file=$NF}} END {print max,file}'

如何查看文本中某一行的具体内容(以第3行为例)

1. sed -n '3p' testfile2. cat testfile | gawk 'NR == 3'3. gawk 'NR ==3' testfile

如何查看文件中某行和以后的行(以第40行为例)

1. sed -n '40,$p' testfile2. sed '40,$!d' testfile3. gawk 'NR>40' testfile

设置当前时间： date [-u|–utc|–universal] [MMDDhhmm[[CC]YY][.ss]]

date 111213202017.23 #2017/11/12/13:20:23

ARGC : 命令行中参数的个数，其awk命令也算一个参数

awk 'END{print ARGC}' testfile : 显示共有几个参数

ARGV : 其是一个数组，保存的是命令行所给定的各参数

awk 'END{print ARGV[0]}' testfile : 显示第一个参数，默认第一个参数个awk命令本身

输出以某个关键字段开头的行：

awk '/^UUID/{print $1}' /etc/fstab

输出以某个关键字段结尾的行：

awk '/halo$/{print $0}' testfile

如何将文件所有内容一行输出：只需更改输出记录分隔符为空

awk 'BEGIN {ORS=""} //' testfile

针对上一中情况要在结尾添加一个换行，可以将其放在 END 规则中：

awk 'BEGIN {ORS=""} // {print} END {print "\n"}' testfile

NF 变量包含当前记录中字段的个数。使用 NF 可以引用其数值，而使用 $NF 则表示引用实际字段本身的内容。所以，如果记录有 100 个字段，print NF 将输出整数 100，而 print 100则与printNF 输出相同的结果，都是该记录中最后一个字段的内容。

NR 变量包含当前的记录个数。当读取到第 1 个记录时，其值为 1，当读取到第 2 个记录时，其值增为 2，依此类推。在 END 模式中使用它，以便输出输入中的行数：

gawk 'END {print NR}' testfilegawk 'BEGIN {print NR}{print} END {print NR}' testfile结果：0 内容 记录条数

输出每一条记录的字段个数以及最后一个字段

awk ' { print "Record " NR " has " NF " fields and ends with " $NF}' testfile

正则过滤关键字段

gawk '/shadow/ {print}' testfile

包含两个感叹号，并且其中可以有任意数量的文本：

gawk '/!.*!/' testfile

在字段中匹配模式（!~ 操作符以相反的方式进行操作）

gawk '$2 ~/ha/' testfile

结合条件语句：

awk '{ if ($2 ~ /halo/) print }' testfile

打印那些登录 Shell 不是 bash 的所有用户的全名

gawk 'BEGIN { FS=":"} $7 !~/bash/ {print $5}' /etc/passwd

bool操作符连接的多条件过滤：

gawk '/ha/ && /lo/ && /l!o/ {print}' testfile

打印示例数据中不包含 halo 或确实包含 hi 的那些记录，同时将输出记录分隔符更改为连字符

awk 'BEGIN { OFS="-" } !/halo/ || /hi/ { print $1,$2}' testfile

范围模式
在两个模式之间使用逗号，可以指定一个范围，这表示匹配位于这两种模式之间和模式本身的所有文本。与其他的搜索不同，范围模式在匹配文本时可以跨越不同的记录。它输出包含匹配项部分内容的完整的记录。

gawk '/halo/,/hi/' testfile

输出匹配的整段内容
如果您使用换行符作为字段分隔符、使用空字符串作为记录分隔符，那么 AWK 会将整段内容作为一个记录。这样使其成为了“段落 grep”，它将输出匹配搜索的整段内容。对于包含大量文本的文件，该功能可能非常有价值。

awk 'BEGIN { FS="\n"; RS="" } /first/' testfile

输出记录长度大于10的行内容：

awk 'length($0) > 10' testfile

打印最长的记录数

gawk '{if (length($0)>max) max=length($0)} END {print max}' testfile

Print every line that has at least one field:

awk 'NF > 0' data

统计当前文件夹中文件大小（bytes）：

ls -l ./ | awk '{ x += $5 }  END { print "total bytes: " x }'

Print a sorted list of the login names of all users:

awk -F: '{ print $1 }' /etc/passwd | sort

打印文件中偶数行内容：

awk 'NR % 2 == 0' testfile

统计某一个月所属文件的大小

ls -l | awk '$6 == "Aug" { sum += $5 } END { print sum }'

针对不同模式做不同处理：

gawk '/the/ {print $3}; /halo/ {print $2}' testfile

多数据来源处理：

cat anaconda-ks.cfg | gawk '{print $2}' testfile - gawk.awk

脚本之间的包含

@include "scriptx"

将多个相同字符替换为一个（sub() replaces the first instance of any text matched by the first argument with the string provided as the second argument;）

echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'

两种方式改变记录分隔符：

1. awk 'BEGIN { RS = "u" } { print $0 }' testfile2. awk '{ print $0 }' RS="u" testfile

Remove text between /* and */, inclusive

{    if ((i = index($0, "/*")) != 0) {        out = substr($0, 1, i - 1)  # leading part of the string        rest = substr($0, i + 2)    # ... */ ...        j = index(rest, "*/")       # is */ in trailing part?        if (j > 0) {            rest = substr(rest, j + 2)  # remove comment        } else {            while (j == 0) {                # get more text                if (getline <= 0) {                    print("unexpected EOF or error:", ERRNO) > "/dev/stderr"                    exit                }                # build up the line using string concatenation                rest = rest $0                j = index(rest, "*/")   # is */ in trailing part?                if (j != 0) {                    rest = substr(rest, j + 2)                    break                }            }        }        # build up the output line using string concatenation        $0 = out rest    }    print $0}

使用getline将获取的每一行赋值给一个变量：

gawk 'BEGIN{}{if ((getline tmp) > 0){print tmp}}' testfile

system(command)
Execute the operating system command command and then return to the awk program.
gawk程序执行结束之后向root发送邮件：

END {     system("date | mail -s 'gawk run done' root")}

制表符输出系统用户信息

awk -F: -v OFS="\t" '{if($3<=999)printf "Sys user:\t%-15s ID is :%d\n", $1,$3;else{printf "Common user:\t%-15s ID is :%d\n",$1,$3}}' /etc/passwd

Sys user: root ID is :0
Sys user: bin ID is :1
Sys user: daemon ID is :2
Sys user: adm ID is :3
Sys user: lp ID is :4

next
功能：提前结束本行文本的处理，并接着处理下一行

awk -F: '{if($3%2==0) next;print $1,$3}' /etc/passwd

要遍历数组中的每个元素，要使用for循环

for(var in array){statement1,.....}

定义数组的格式

awk 'BEGIN{weekdays["mon"]="Monday";weekdays["tue"]="Tuesday";print weekdays["mon"]}'

统计www服务的访问日志中IP数量

awk '{ip[$1]++} END {for (i in ip) {print i,ip[i]}}' /var/log/httpd/access_log

统计某个文件中各个字段出现的次数：

awk '{for(i=1;i<=NF;i++){count[$i]++}}END{for(i in count) {print i,count[i]}}' testfile

阅读全文

1 0