常用的shell日志统计脚本

来源:互联网 发布:苹果手机 数据恢复 编辑:程序博客网 时间:2024/05/17 22:08

egrep "2017:15:" access.log | awk '{print $6}'| sort | uniq -c | sort -rn | head
sort -u 去重 保证唯一性
uniq 去除连续性的重复
sort + uniq -c 是黄金搭档
sort -n 按照整数排序 非常重要!
参考 http://man.linuxde.net/uniq

参考 http://man.linuxde.net/sort

一段时间内域名访问总流量 or 粗略的掉量分析 精准的掉量分析可参考另一篇用数据库的方式

egrep "2017:14:" access-9011.log | awk '{print $7, $11}' | awk '{a[$1]+=$2;} END{for(i in a)print i,a[i];}'


egrep "2017:15:" access.log | awk '($6 == "112.64.68.252") {print $6, $11}' | awk '{a[$1] += $2;} END{ for(i in a) print i,a[i];}' | sort -k2nr | head -20
a[]类似一种map的容器 
sort -k2 安装第二列排序
参考 https://www.cnblogs.com/51linux/archive/2012/05/23/2515299.html

查看某个域名在一定时间内的访问次数

cat access.log | awk '{$1 >= 1445429880 && $1 <= 1445430000; if($7 ~/\/\/dup.baidustatic.com/) print $0}' | wc -l

~代表匹配正则表达式,例:awk ‘$0 ~ /.*/ {print}’ test.txt

查看日志错误的状态码

tail -f access.log | awk '{if($3 ~/(4|5)../) print $0}'


具体域名的请求时间

cat access.log | awk '{if($7 ~/\/\/img.baidu.com/) print $2}'|sort | uniq -c | sort -nr

具体域名的状态码数量

cat access.log | awk '$7~/img.baidu.com/ {a[$3]++} END{for(i in a) printf("%s %d\n", i, a[i])}' | sort

抓包过滤分析

egrep -v "(ali|dl|download|cname|taobao|tmall|ssl|https|api|login|denglu|logout|push|upload|https|ntp|timezone|pass|xunlei|pay|\:|update|akadns.net|money|ptlogin|(2[0-4][0-9]|25[0-5]|1[0-9][0-9]|[1-9]?[0-9])(\.(2[0-4][0-9]|25[0-5]|1[0-9][0-9]|[1-9]?[0-9])){3}|register|account|weibo|log|search|weather|reg|conf)" top20.txt | egrep "(img|static|pic|image)"> aaaaaaaaaa

统计域名访问量

cat access.log | awk '$1>= 1511107202 && $1<1511181423 && $11~"TCP_MISS" {print $7}' | awk -F "/" '{print $3}' | sort | uniq -c | sort -k1nr | head -10

awk '{$2=""; print $0}' hb_ip.txt  删除第二列

cat xx.txt | sed -e '/^$/d'  去除空行


看总行数 第二种更妥当一些

cat ip_file | awk '{line_num++;} END{printf("the sum of line num = %d\n",line_num);}'
cat ip_file | awk 'BEGIN {line_num=0;} {line_num++;} END{printf("the sum of line num = %d\n",line_num);}'

awk条件语句计算总大小

 ls -l | awk 'BEGIN{sum_size=0;} {if($5!=4096) sum_size+=$5;} END{printf("sum of = %dM\n",sum_size/1024/1024);}'

awk数组操作 此例必在一个{}内

awk 'BEGIN{info="it is a test";lens=split(info,tA," ");print length(tA),lens;}'
awk 'BEGIN{str="it is a test"; lens=split(str,tA," "); print tA[3]}'

流量激增

cat access.log | awk '{if ($8 == "GET" && $1 >= 1511107202 && $1<= 1511169624) print $9, $11 }' | awk '{split($1, s, "/")} {a[s[5]]+=$2;} END{for(i in a)print i, a[i];}'

统计哪种资源最多 eg 1.mp4?wd=linux&length=1024

awk '{split($7, arr_uri, "?"); num = split(arr_uri[1],suffix,"."); print suffix[num];}' icr_access.log | sort | uniq -c | sort -nr | head -20


icr分析方法:

某段ip在一段时间内的拦截次数

cd /var/log/icrskice/zcat icr_access.log.gz | grep '\[20171129.1437' | grep 203.187.160.131 -ccat icr_access.log.gz | grep '\[20171129.1450' | grep 203.187.160.131 | grep iqiyi -c


某段ip的访问总流量

egrep "10\.17\." access-9011.log | awk '{print $7, $11}' | awk '{a[$1]+=$2;} END{for(i in a)print i,a[i], sum_size+=a[i];}'zcat access-9011.log.gz | awk '$1>=1511830800&&$1<=1511859600{print}' | awk '$6~10.17{print}' | awk '{sum+=$11} END {print"Sum = ",sum}' 







原创粉丝点击