[运维]ELK实现日志监控告警

来源:互联网 发布:华道数据招聘 编辑:程序博客网 时间:2024/05/23 19:10

ELK(Elasticsearch+LogStash+Kibana),最近使用ELK处理了一些平台日志,下面以「mysql连接数监控」记录部署流程。

背景

平台缺失针对mysql连接数的告警,一旦mysql连接数打满,将直接影响平台的使用。另外,对于日志信息既没有可视化界面进行操作,也没有一套有效的实时监控策略。

收益
1. 当异常触发时能够及时通过短信、邮件等方式通知相关负责人员
2. 建立日志可视化界面,使得日志分析更加便捷

1. 软件版本

软件 版本 Logstash v2.3.4 Filebeat v1.3.1 ElasticSearch v2.3.3 Kibana v4.5.1 ElastAlert v0.1.4

2. 解决方案

2.1. 监控架构图

mysql_connection_monitor

2.2. mysql连接数查询

mysql的连接通常是一个请求占用一个连接,如果该请求(insert,delete,update,select)长时间没有执行完毕,则会造成连接的堆积,迅速地消耗完数据库的连接数,目前ph平台线上数据库的最大连接数是1000个。

这里使用一个shell脚本来持续监控mysql连接数情况,每分钟查询一次mysql的连接数,并写入到日志文件

日志样例参考:mysql连接数日志样例
shell脚本: mysql连接数查询脚本
轮询机制: crontab任务,每分钟轮询一次

# query mysql connection* * * * * /bin/sh /home/disk5/query_mysql_connection_log.sh > /dev/null

mysql连接数日志样例

2017-01-20 00:01:01 machine_0001=42017-01-20 00:01:01 machine_0002=562017-01-20 00:01:01 machine_0003=132017-01-20 00:01:01 machine_0004=872017-01-20 00:01:01 total_connection_number=160==========

2.3. FileBeat配置

FileBeat配置文件请参见:附录-filebeat配置文件

FileBeat负责监控mysql连接数查询产生的log(参考mysql连接数日志样例),并将不以===开头的内容上报到LogStash

配置信息

配置项 配置值 是否合并多行 No 轮询时间间隔 120s 文档类型 mysql_connection_log 监控路径 /home/disk5/logs/mysql_connection_* 筛选规则 不以===开头的log

2.4. LogStash配置

LogStash配置文件请参见:附录-logstash配置文件

正则匹配使用grok debug工具进行调试(grok debug)

描述

收集FileBeat发送过来的log信息,获取日志时间和错误信息

输入

2017-01-20 10:18:01 machine_0001=62

正则匹配

%{TIMESTAMP_ISO8601:time}\s+%{USER:machine}=%{NUMBER:connection_num}其中:TIMESTAMP_ISO8601、USER、NUMBER是LogStash的grok pattern变量

将得到:
time字段:日志时间
machine字段:机器host
connection_num字段:机器持有mysql的连接数

输出

time = 2017-01-20 10:18:01machine = machine_0001connection_num = 62

3.5. Elasticsearch配置

Elasticsearch模板请参见:附录-elasticsearch模板

模板名称:template_mysql_connection_log
ES索引(index): mysql-connection-log-%{+YYYY.MM.dd}
ES类型(type): mysql_connection_log

字段信息

字段名 字段类型 备注 message string 原始log信息 tags string @timestamp date log产生时间 host string count long source string input_type string type string offset long @version string machine string 机器host connection_num long 机器持有mysql的连接数

3.6. Kibana查看各机器连接数趋势

趋势图
ph_mysql_connection_line_chart_sample

3.7. ElastAlert配置

ElastAlert配置文件请参见:附录-elastalert配置文件

每10秒轮询Elasticsearch的mysql-connection-log-*索引,若在10分钟内mysql总连接数超过750个的次数超过2次,则向相关人员发送告警短信

附录

mysql连接数查询脚本

#!/bin/bashsource /etc/profile# Title:    Online Query Mysql Connection# Author:   ouyangyewei## Create:   ouyangyewei, 2017/01/18# Update:   ouyangyewei, 2017/01/19, add total_connection_numberFID=`readlink -f $0 | md5sum | awk '{print $1}'`LOG_FILE=/home/disk5/logs/mysql_connection_$(date +"%Y-%m-%d").log# ----------------------------------------------function get_process_list() {  mysql -uroot \  -pxxx \  -hxxx \  -P3306 \  -e 'show processlist' \  --silent \  --skip-column-names | awk '    {      if ($3=="user" && $4!="NULL") {        split($4, machine, ":");        print machine[1];      }      if ($3!="user" && $4!="user"){        split($3, machine, ":");        print machine[1];      }    }' | sort | uniq -c > /tmp/$FID}function run() {  # get current mysql connection status  get_process_list;  TIMESTAMP=`date +"%F %T"`  if [[ -f /tmp/$FID ]]; then    sum=0    while read line    do      machine=`echo $line | awk '{print $2}'`      connect_number=`echo $line | awk '{print $1}'`      sum=$(($sum+$connect_number))      echo "$TIMESTAMP $machine=$connect_number" >> $LOG_FILE    done < /tmp/$FID    echo "$TIMESTAMP total_connection_number=$sum" >> $LOG_FILE    echo "---------------------------------------" >> $LOG_FILE    # remove tmp file    rm -rf /tmp/$FID  fi}# ----------------------------------------------# staruprun

FileBeat配置文件

filebeat:  prospectors:    -      paths:        - /home/disk5/logs/mysql_connection_*      input_type: log      document_type: mysql_connection_log      ignore_older: 84h      scan_frequency: 120s      exclude_lines: ["^==="]output:  logstash:    hosts: ["xxx:8044"]logging:  level: debug  to_files: true  to_syslog: false  files:    path: /var/log/mybeat    name: mybeat.log    keepfiles: 7

LogStash配置文件

input {  beats {    port => 8044  }}filter {  if [type] == "mysql_connection_log" {    grok {      patterns_dir => ["/conf/patterns"]      match => {        "message" => "%{TIMESTAMP_ISO8601:time}\s+%{USER:machine}=%{NUMBER:connection_num}"      }      remove_field => ["beat"]    }    date {      match => ["time", "yy-MM-dd HH:mm:ss"]      remove_field => ["time"]    }  }}output {  if [type]=="mysql_connection_log" {    elasticsearch {      hosts => ["xxx:8096","xxx:8096"]      index => "mysql-connection-log-%{+YYYY.MM.dd}"    }  }}

Elasticsearch模板

curl -XPUT 'localhost:9200/_template/template_mysql_connection_log?pretty' -d'{  "order": 0,  "template": "mysql-connection-log-*",  "settings": {},  "mappings": {    "palo-log": {      "properties": {        "tags": {          "index": "not_analyzed",          "type": "string"        },        "message": {          "index": "not_analyzed",          "type": "string"        },        "@version": {          "type": "string"        },        "@timestamp": {          "format": "strict_date_optional_time||epoch_millis",          "type": "date"        },        "source": {          "index": "not_analyzed",          "type": "string"        },        "offset": {          "type": "long"        },        "type": {          "index": "not_analyzed",          "type": "string"        },        "input_type": {          "index": "not_analyzed",          "type": "string"        },        "count": {          "type": "long"        },        "host": {          "index": "not_analyzed",          "type": "string"        },        "machine": {          "index": "not_analyzed",          "type": "string"        },        "connection_num": {          "index": "not_analyzed",          "type": "long"        }      }    }  },  "aliases": {}}'

ElastAlert配置文件

# Alert when the rate of events exceeds a threshold# (Optional)# Elasticsearch hostes_host: xx.xx.xx.xx# (Optional)# Elasticsearch portes_port: 8096# (OptionaL) Connect with SSL to Elasticsearch#use_ssl: True# (Optional) basic-auth username and password for Elasticsearch#es_username: someusername#es_password: somepassword# (Required)# Rule name, must be uniquename: MysqlConnectionRule# (Required)# Type of alert.# the frequency rule type alerts when num_events events occur with timeframe timetype: frequency# type: any# (Required)# Index to search, wildcard supportedindex: mysql-connection-log-*# (Required, frequency specific)# Alert when this many documents matching the query occur within a timeframenum_events: 3# (Required, frequency specific)# num_events must occur within this amount of time to trigger an alerttimeframe:    minutes: 10    # hours: 1# (Required)# A list of Elasticsearch filters used for find events# These filters are joined with AND and nested in a filtered query# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.htmlfilter:- range:    connection_num:      from: 750# (Required)# The alert is use when a match is foundalert:- commandcommand: [  "curl",  "-X POST",  "-d",  '{"appId":"xxx", "token":"xxx", "alertList":[{"channel":"sms", "description":"Mysql连接数告警:当前总连接数为%(connection_num)s!", "receiver":"ouyangyew"}]}',  "http://xxx.baidu.com/alert/push"]# (required, email specific)# a list of email addresses to send alerts to# email:# - "elastalert@example.com"
0 0
原创粉丝点击