Linux_Shell 具有一定规律的日志提取 指定字段

来源:互联网 发布:三合一企业网站源码 编辑:程序博客网 时间:2024/05/16 18:47

今天接到了一个任务对于有一定规则的日志提取其中的a 字段并进行去重处理,主要用到了awk, 特此记录一下。


"112.65.201.58" - "-" - "[28/Feb/2017:00:08:21 +0800]" - "GET /track_proxy?tid=dc-811&cid=148820998091312764&dr=https%3A%2F%2Funitradeprod.alipay.com%2Facq%2FcashierReturn.htm%3Fsign%3DK1iSL1gljThca54X9aqL9TtzAbX82IDE0IXFEUvH7LSmdw06OpwU9sKt74VQ8Q%25253D%25253D%26outTradeNo%3DOBS000036105%26pid%3D2088121814027143%26type%3D1&sr=1920*1080&vp=1730*863&de=UTF-8&sd=24-bit&ul=zh-cn&je=0&fl=24.0%20r0&t=pulse&ni=1&dl=https%3A%2F%2Fwww.ikea-sh.cn%2Fcheckout%2Fmultipage%2Fsuccess%2F&dt=%E7%BB%93%E7%AE%97%E6%88%90%E5%8A%9F&ub=0-0-0-0-0-0-0-0&z=621680108 HTTP/1.1" - "-" - "200" - "43" - "zh-CN,zh;q=0.8" - "https://www.ikea-sh.cn/checkout/multipage/success/" - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36" - "-" -"a=cNuYc0APJ027; tsc=3_5891d9c8_5891d9c8_0_1"


"180.161.162.72" - "-" - "[28/Feb/2017:00:08:24 +0800]" - "GET /track_proxy?tid=dc-811&cid=148801148530168382&dr=https%3A%2F%2Funitradeprod.alipay.com%2Facq%2FcashierReturn.htm%3Fsign%3DK1iSL1gljThca54X9aqL9TtzAbX82IDE0IXFEuBQR0W2GmKy97vlJebyYape0w%25253D%25253D%26outTradeNo%3DOBS000036106%26pid%3D2088121814027143%26type%3D1&sr=1440*900&vp=1307*760&de=UTF-8&sd=24-bit&ul=zh-cn&je=1&fl=24.0%20r0&t=pageview&ni=0&dl=https%3A%2F%2Fwww.ikea-sh.cn%2Fcheckout%2Fmultipage%2Fsuccess%2F&dt=%E7%BB%93%E7%AE%97%E6%88%90%E5%8A%9F&ub=0-0-0-0-0-0-0-0&z=1103637723 HTTP/1.1" - "-" - "200" - "43" - "zh-cn" - "https://www.ikea-sh.cn/ch
eckout/multipage/success/" - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8" - "-" - "__ipdx=180.161.176.176; exptime=1488949463; geocode=1156310000;a=7huFc0QM6Ox5;sm=ts:1488210369,dm:www.ikea.com,ca:2037034,sp:74iZL; tsc=3_584cc619_58b449c1_28_31; syn=1_aa5c6481_58b29351_58b29351_1"



下面是我书写的语句,主要用到了awk  与 sort fileName | unique


提取 a字段后面的值,批量处理多个文件

#!/bin/bashfor((i=1;i<=28;i++));do    if (( $i < 10 )); then        zcat collect.cn.ms.com_2017020$i.log.gz | grep "id=dc-811" | grep "https://www.ikea-sh.cn/checkout/multipage/success/" >> /data/mission/site_811_2017020$i        echo "done grep 2017020$i"    else        zcat collect.cn.ms.com_201702$i.log.gz | grep "id=dc-811" | grep "https://www.ikea-sh.cn/checkout/multipage/success/" >> /data/mission/site_811_201702$i        echo "done greo 201702$i"    fidone




将多个文件汇总,并做去重处理

#!/bin/bashsourceFile=(`ls site*`)for (( i=0 ; i<${#sourceFile[@]}; i++)); do    cat ${sourceFile[$i]} | awk -F '" - "' '{ print $12}' | awk -F " a=" '{print $2}' | awk -F "; " '{print $1}' | awk -F '"' '{print $1}' >> alldonesort all | uniq >> uniq_id






0 0