ELK收集Nginx日志,使用grok正则表达式(二)

来源:互联网 发布:台湾 香港 知乎 编辑:程序博客网 时间:2024/06/05 22:51

一、Nginx日志例子

Nginx日志例子

nginx日志默认配置:

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '                  '$status $body_bytes_sent "$http_referer" '                  '"$http_user_agent" "$http_x_forwarded_for"';

nginx日志记录:

127.0.0.1 - - [22/Aug/2016:18:04:28 +0800] "GET / HTTP/1.1" 200 161831 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"127.0.0.1 - - [23/Aug/2016:14:51:48 +0800] "GET / HTTP/1.1" 200 161831 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"127.0.0.1 - - [23/Aug/2016:14:52:17 +0800] "GET / HTTP/1.1" 200 161824 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"127.0.0.1 - - [23/Aug/2016:14:52:31 +0800] "GET / HTTP/1.1" 200 161824 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"127.0.0.1 - - [23/Aug/2016:14:52:53 +0800] "GET / HTTP/1.1" 200 161824 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"127.0.0.1 - - [23/Aug/2016:14:52:54 +0800] "GET / HTTP/1.1" 200 161831 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"127.0.0.1 - - [23/Aug/2016:18:04:13 +0800] "GET / HTTP/1.1" 200 161824 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"

二、使用gork正则表达式进行匹配

针对以下一条记录,我们先写一个正则表达式进行匹配:
PS:gorkdebug网站:http://grokdebug.herokuapp.com/,可以线上测试gork表达式
127.0.0.1 - - [22/Aug/2016:18:04:28 +0800] "GET / HTTP/1.1" 200 161831 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"
正则表达式为:
^(?<remote_addr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (?<remote_user>(\s+)|-) \[(?<time_local>.*)\] "(?<request>[^"]*)" (?<status>\d+) (?<body_bytes_sent>\d+) "(?<http_referer>[^"]*)" "(?<http_user_agent>[^"]*)"
匹配结果:
{  "remote_addr": [    "127.0.0.1"  ],  "remote_user": [    "-"  ],  "time_local": [    "22/Aug/2016:18:04:28 +0800"  ],  "request": [    "GET / HTTP/1.1"  ],  "status": [    "200"  ],  "body_bytes_sent": [    "161831"  ],  "http_referer": [    "-"  ],  "http_user_agent": [    "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0"  ]}

三、整理我们所需要的字段

以上通过正则表达式匹配的结果,并不是我们想要的结果,因此我们需要整理出有用的信息


字段名称说明remote_addr请求的IP来源time_local请求时间request_method请求方式request_url请求地址statushttp响应码

再次修改正则表达式:
^(?<remote_addr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - ((\s+)|-) \[(?<time_local>.*)\] "(?<request_method>POST|GET) (?<request_url>[^ ]+) ([^"]*)" (?<status>\d+)
匹配结果:
{  "remote_addr": [    "127.0.0.1"  ],  "time_local": [    "22/Aug/2016:18:04:28 +0800"  ],  "request_method": [    "GET"  ],  "request_url": [    "/"  ],  "status": [    "200"  ]}

四、后续修饰

通过修改后的正则表达式,我们已经获取到我们想要的数据了,但是数据类型有些不对,time_local我们要转换成日期格式,而status要转换为数字。
这里需要使用到filter中的date和mutate插件:
input {    file {        path => "/usr/local/nginx_1.10.0/logs/access.log"        type => "nginx_log"    }}filter {    grok {        "message" => '^(?<remote_addr>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - ((\s+)|-) \[(?<time_local>.*)\] "(?<request_method>POST|GET) (?<request_url>[^ ]+) ([^"]*)" (?<status>\d+)'    }    date {        match => ["time_local", "dd/MMM/yyyy:HH:mm:ss Z"]        locale => "cn"    }    mutate {        convert => ["status", "integer"]    }}output {    elasticsearch {        hosts => "localhost"    }}
最后,使用kibana进行查询:


五、后话,性能问题


使用正则表达式,有时候会消耗性能,特别是应用于大量的Nginx日志访问记录的时候。这样有另一个处理方案,就是nginx日志格式为logstash让步,比如让nginx直接输出json格式的日志记录,然后使用filter的json模块解析,或者是用特殊的分割符分隔,然后使用split切割。

1 0