Zabbix搭建笔记[6]--日志监控,计算和聚合

来源:互联网 发布:ug数控车床编程讲解 编辑:程序博客网 时间:2024/06/06 20:51


Zabbix agent可以针对虚拟机进行日志采集和监控,当日志中出现特定字符串的时候,可以检测并做出处理。

 

日志分析需要zabbix agent主动模式的支持,同时需要zabbix用户对agent的日志文件有权限。

 

我们针对CentOS 7.3这台Clientsecure日志进行一个简单的分析,对于服务器,我们希望对登陆的用户进行traceCentOS中用户登录会在/var/log/secure中有下面的日志记录:

Aug 8 06:08:06 DanCentOS7 sshd[10623]: Acceptedpassword for daniel from 106.120.78.190 port 54065 ssh2

 

因此我们提取关键词"Acceptedpassword for"作为检测的字符串,为CentOS 7.3这个主机配置下面的Item(注意选择的是Zabbix agent (active)

ZABBIX Monitoring Inventory Reports Configuration Administration Host grotws Templates Hosts Maintenance Actions Event correlation Discovery IT services Items All hosts 1 cent0S73 Enabled SNMP JMX 'PMI Apphcations 1 password for] Items 3 Triggers Graphs 1 Discovery rules Name Type Key Type of information Update interval (in sec) History storage period (in days) Log time format New application Applications loginTrace Zabbix agent (active) [Y Log -None- CemoApplication

 

配置好之后,我们还需要在这台主机上使用下面的命令为zabbix用户赋予对/var/log/secure的访问权限(如果为了方便,也可以对/var/log整个目录赋予权限):

chown zabbix.root /var/log/secure

 

接着我们就可以在Monitoring中的Latest data看到对应Item抽到的数据了:

ZABBIX Monitoring Inventory Reports Dashboard Problems Overview Web Latest data Configuration Administration Triggers Graphs Screens Maps Filter Discovery IT services a Share Change DanZaDbixServel Latest data Host groups Hosts Application Host cent0S73 centosog LinuxHosts x type here to search type here to search Name A DemoApplicaüon (3 Items) loginTrace memAvailable memUsed DemoApplicaüon (2 Items) memAvailable memUsed Show items without data Show details Last check 2017-08-08 2017-08-08 2017-08-08 2017-08-08 2017-08-08 Last value Aug 8 124 386.98 MB 146 421.85 MB +102 MB 101 MB History Graph Graph Graph Graph

 

点击右侧的History可以看到一段时间的历史数据:

CentOS73 : loginTrace Items list Value Selected Values Filter A CentOS73: loginTrace Show selected. Apply As plain text Zoom: 5m Al 5m 1 5m Timestamp 2017-08-08 2017-08-08 Local time Value Aug S Dancencos7 sshd[10g94] s sh2 Aug S Dancencos7 sshd[10gs2] s sh2 2017-08-08 14:19 - 2017-08-0814:20 (now!) 1m fixed . Accepted password for daniel from 106. 120.78.190 port 1943 . Accepted password for daniel from 106. 120.78.190 port 1274

 

对于Windows虚拟机,我们可以针对WindowsEvent Log来进行日志的监控,例如我们为虚拟机配置一个登陆的日志监控以及一个虚拟机启动的日志监控:

登陆日志监控:

ZABBIX Monitoring Inventory Reports Configuration Administration Host grotws Templates Items hosts 1 Dan2012R2 Hosts Enabled Maintenance Actions SNMP JMX 'PMI Event correlation Discovery IT services Apphcations 14 Items 12g "46245,skjpl Triggers 61 Graphs 28 Discovery rules 3 Name Type Key Type of information Update interval (in sec) History storage period (in days) Log time format New application Applications Description Enabled userLogin Zabbix agent (active) [Y Audit'„ Log Memory Network interfaces Performance Processes Startup automatic delayed services Startup automatic services WinEventDemo Zabbtx agent

Key:eventlog[Security,,"Success Audit",,^4624$,,skip]

 

启动日志监控:

ZABBIX Monitoring Inventory Reports Configuration Administration Host grotws Templates Items hosts 1 Dan2012R2 Hosts Enabled Maintenance Actions SNMP JMX 'PMI Event correlation Discovery IT services Apphcations 14 Items 12g Triggers 61 Graphs 28 Discovery rules 3 Name Type Key Type of information Update interval (in sec) History storage period (in days) Log time format New application Applications Description Enabled winStart Zabbix agent (active) [Y eventlog[System, , "Information" I"KerneI-General" 2S] Log -None- CPU Filesystems General ICMP Memory Network interfaces Performance Processes

Key:eventlog[System,,"Information","Kernel-General",^12$]

需要注意一点,如果机器重启的话,那么zabbix agent也会重新启动,引起上面的Key的最后一个参数不能指定skip,否则zabbix agent启动后是检测不到History的事件的。

 

Latest data中查看:

ZABBIX Monitoring Inventory Reports Dashboard Problems Overview Web Latest data Configuration Administration Triggers Graphs Screens Maps Filter A Discovery Name IT services Latest data Host groups Hosts Application Host Dan2012R2 WindowsHosts x type here to search type here to search WinEventDemo WinEventDemo (2 Items) userLogin winStart Select Select Select Apply Show items without data g] Show details Last check 2017-08-08 2017-08-08 Last value An account was sue _ _ The operating syste_.

 

我们可以使用Calculated监控选项针对已有的监控项进行计算得到更为复杂的监控项,CalculatedKeyformula格式为:

func(<key>|<hostname:key>,<parameter1>,<parameter2>,…)

func可以选择last,min,max,avg,count等等。

 

需要注意一点,引用的已有监控项必须存在并且要能够正常收集数据,否会会报错找不到对应的监控项。

 

举一个简单的例子,针对CentOS 6.9这台虚拟机,我们统计一下他的磁盘sda读写总和:

首先创建两个监控Item,一个读,一个写:

Wizard Name A vfs. Triggers Key vfs. read[sda, ops] Interval History Trends 365d 365d Type Zabbix agent Zabbix agent Applications DemoAppIication DemoAppIication Enabled Enabled

接着在这两个监控项的基础上求和,创建一个Calculated类型的监控项:

Name Type Key Formula Type of information use custom multiplier Update interval (in sec) sdalO Calculated sdalO last(' ) Numeric (float) Select

我们在CentOS 6.9这台机器上拷贝一个文件,查看一下监控的图像:

CentOS69 sda10 sdalO last 0.7167 mln 0.7 33.77 Graph Filter CentOS69 sdaIO (30m) max 77.27 historm in

根据我们的配置,图中的点是每分钟读写iops的均值(vfs.dev.readwrite的默认ops间隔是avg1,即1分钟的均值),我们配置的zabbix取样间隔是30秒,所以相邻两个点的时间间隔是30秒。

 

我们也可以利用iostat命令收集到的参数来设置自己的监控指标,例如我们执行iostat -dxk可以看到磁盘r/w的情况:

[CentOS69] # iostat-dxk 1 100

Linux2.6.32-696.3.2.el6.x86_64 (DanCentOS69)        08/09/2017        _x86_64_        (1CPU)

 

Device:        rrqm/s  wrqm/s    r/s    w/s   rkB/s   wkB/s avgrq-sz avgqu-sz  awaitr_await w_await svctm %util

sda              0.17   19.14   0.03   0.98    1.19   80.50  161.94    0.35 348.03  75.49 356.05 10.01  1.01

sdb              0.00    1.15   0.00   0.01    0.01  269.62 33679.71    0.00  10.29   0.75  12.67  0.65  0.00

……

(会输出100次结果,第一次结果是平均值,忽略掉)

 

使用下面的命令将sda磁盘读和写的瞬时值数据拿到:

[CentOS69] # iostat-dxk 1 2 | grep sda | tail -1 | awk '{print $4}'

0.00

[CentOS69] # iostat-dxk 1 2 | grep sda | tail -1 | awk '{print $5}'

3.00

 

我们修改自定义参数,加入下面这两行(别忘了重启agent):

UserParameter=iostat.dev.read[*],iostat -dxk 1 2 | grep $1 | tail -1 | awk '{print $$4}'

UserParameter=iostat.dev.write[*],iostat -dxk 1 2 | grep $1 | tail -1 | awk '{print $$5}'

 

Zabbix server上使用zabbix_get测试一下自定义参数是否生效:

[ZabbixServer]# zabbix_get -s 172.16.0.6 -kiostat.dev.read[sda]

0.00

[ZabbixServer]# zabbix_get -s 172.16.0.6 -kiostat.dev.write[sda]

3.00

 

根据这两个自定义参数配置一下监控项:

Wizard Name A iostat.devh read[sdal iostat devh write[sda] Triggers Key iostatdev_readlsdal iostatdev_write[sdal Interval History Trends 365d 365d Type Zabbix agent Zabbix agent Apphcations DemoApplication DemoApplication Status Enabled Enabled Info

再添加一个Calculated的监控项:

Items All hosts 1 cent0S69 Enabled SNMP JMX 'PMI Apphcations 1 Items 8 Triggers Graphs 1 Name Type Key Formula Type of information use custom multiplier Update interval (in sec) iostatdev Calculated iostat_devnvlsdal dev read[sdal' , write[sda]", O) Numeric (float) Discovery rulæ Select

 

注意需要在agent配置文件中将Timeout的值修改为30,默认为3秒,由于iostat执行两次有可能出现超时,导致对应的监控项报下面的错误:

[ZabbixServer] # zabbix_get -s 172.16.0.6 -k iostat.dev.write[sda]

ZBX_NOTSUPPORTED:Timeout while executing a shell script.

 

配置好之后,我们看一下我们做的瞬时值的曲线:

ostat.dev.rw[sda] (15m) mln lostat.dev.rw[sda last 97.89 42.49 CentOS69 max 184.15

可以看到,时间间隔仍然是30秒,原因是我们readwrite的取样间隔是30秒。

 

我们还可以使用聚合检测(Aggregate)的方式对数据库中已经获取到的参数进行监控,监控项配置的语法为:

groupfunc["Host group", "Item Key", itemfunc,timeperiod]

groupfunc[["Hostgroup A", "Host group B", "Host group C", …],"Item Key", itemfunc, timeperiod]

 

 其中,groupfunc可以选择:grpavggrpmaxgrpmingrpsum

itemfunc可选项有:avgcountlastmaxminsum

如果itemfunc选择的是last,则timeperiod会被忽略,timeperiod支持单位,比如10m表示10分钟,1h表示一小时等,如果不带单位,则默认单位为秒。

 

注意,和Calculated类型的监控项一样,聚合检测项也要求所有聚合的项目都存在才能够进行聚合,不然也会报错说对应的检测项找不到。

 

由于监控项需要关联到一个主机或者模板中,因此我们添加一个用于放置聚合检测项的主机:

ZABBIX Monitoring Inventory Reports Configuration Administration Host grotws Templates Hosts Maintenance Actions Event correlation Discovery IT services Hosts Host Templates 'PMI Host name Visible name Groups New group Agent interfaces Macros Host inventory AggregateCoIlector AggregateCoIlector In groups Aggregate IP address 12700M Encryption ONS name Other groups Discovered hosts Hypervisors LinuxHosts Linux servers T emplates Virtual machines WindowsHosts Zabbix servers Connect to Port Default

 

接着我们添加一个聚合监测项,检测LinuxHosts组中CPU的均值,但是在做之前,需要首先为LinuxHosts组中的两台主机配置system.cpu.load[all,avg1]的监控项:

Items All hosts 1 cent0S69 Enabled SNMP JMX 'PMI Apphcations 1 Triggers Items g Key Triggers Graphs 1 Filter , Discovery rules Wizard Web sce Trend Name A iostatdev.read[sda] iostatdev.rw[sdal iostat dev .write[sda] CustomTemplateTest1: memAva"able CustomTemplateTest1: memUsed sdalO system avgl] Interval History iostatdev.read[sda] iostatdev.rw[sdal iostat dev .writelsda] vm memory _ size[availablel vm memory _ size[used] sdalO

 

Items All hosts 1 cent0S73 Enabled SNMP JMX Apphcations 1 Items 4 Triggers Graphs 1 Filter Wizard Discovery rules Interval 30s Name A loginTrace CustomTemplateTest1: memAvailable CustomTemplateTest1: memUsed Triggers Key password fon vm.memory.size[available] vm.memory.size[usedl

 

然后再添加聚合检测项:

Items All hosts I AggregateCollector Name Type Key Type of information use custom multiplier Update interval (in sec) Custom intervals History storage period (in days) Trend storage period (in days) Store value Show value New application Enabled zax SNMP JMX IPMI LinuxHost cpu.avg Zabbix aggregate grpavg['LinuxHosts','system.cpu Numeric (float) _YJ Applications Items Triggers Graphs Discover Select Type Flexible As is As is Aggregate Interval Scheduling Period 50 show value mappings

Key:grpavg["LinuxHosts","system.cpu.load[all,avg1]",last,0]

 

我们在两台机器上做一些事情,看一下CPU的曲线监控聚合值:

计算机生成了可选文字:Graphs zoom: 5m 15m 30m lh 2h 3h 611 1211 Id 3d 7d 14d 1m 3m 6m All 6m 1m 7d Id 12h lh 5m 1 5m lh 12h Id 7d 1m 6m Group Aggregate Filter A Host Graph LinuxHostload awl 'Y 2017-08-10 14:21 - 2017-08-10 14:28 6m fixed . LinuxHost.cpu.ax.ß ast ax.ß min ax.ß AggregateCo ector: LinuxHost.Ioad.avg1 (6m 53s) max 2.83

 

小贴士:在图里面我们可以用鼠标在图上直接拖拽截取当前图的时间片进行查看:

AggregateCoIIector: LinuxHost.Ioad.avg1 (6m 53s) 3m last mln LinuxHost.cpu.a\.ng 75 max 2.83