SNMP Informant How-To

来源:互联网 发布:angularjs 数组长度 编辑:程序博客网 时间:2024/05/29 16:43

转自http://www.opennms.org/index.php/SNMP_Informant_How-To
From OpenNMS

People using OpenNMS often wonder how to get SNMP information, such as traps and OIDs for data collection, into OpenNMS.

Recently, I did a rather complete examination of the SNMP-Informant standard MIB for a client, so I thought I would share the process in the Wiki page.

The first step in adding MIB information to OpenNMS is to find the MIB (grin). For SNMP-Informant, there is a MIBs directory in the folder that comes with the distribution. In it are both version 1 and version 2 mibs - it really doesn't matter which one we use.

There are two MIBs for the standard SNMP Informant agent: INFORMANT-STD.MIB and WTCS.MIB

The second step is to determine exactly what you want to get out of the MIB. There are two, distinctly different things: traps to convert to events and OIDs to collection for performance data.

A quick search for TRAP-TYPE and NOTIFICATION-TYPE in these two MIBs shows that neither contain traps, so we can ignore that here. Should you want to get trap information into OpenNMS, you need to use the mib2opennms tool, discussed elsewhere.

For data collection, there is another tool called the mibparser that will convert the OIDs in a MIB to a format that can be placed in the datacollection-config.xml file. There is even a convenient wrapper script to run it:

(Note: This wrapper works with Java 1.4.x ONLY)

$OPENNMS_HOME/contrib/mibparser/dist/parseMib.sh INFORMANT-STD.MIB

This gives me the error:

 ERROR: can't find parent 'informant' for textOid 'standard' Find which MIB the parent is defined in and add that to the command line

Since "informant" is defined in the WTCS.MIB file, I need to add that to my command:

$OPENNMS_HOME/contrib/mibparser/dist/parseMib.sh WTCS.MIB INFORMANT-STD.MIB

This returns a lot of output in a format that can be used in the datacollection-config.xml file.

Rather than post it here (I'm going to pretty much post it all anyway), I'll break it out later in the document.

Once I have successfully produced output from a MIB, I check it out to see how easy it will be to add it to OpenNMS. The things to look for are whether or not the data is in a table, and whether or not the data type is numeric.

This MIB provides for main areas of information: disk, memory, network, processes/threads and cpu.

Since the memory and processes information is not stored in a table, it's real easy to configure that and it is already included in the basic datacollection-config.xml file.

For example, the output for the mibParser looks like this:

 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.1" instance="0" alias="memoryAvailableBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.2" instance="0" alias="memoryAvailableKBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.3" instance="0" alias="memoryAvailableMBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.4" instance="0" alias="memoryCommittedBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.5" instance="0" alias="memoryCacheBytes" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.6" instance="0" alias="memoryCacheBytesPeakTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.7" instance="0" alias="memoryPageFaultsPerSecTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.8" instance="0" alias="memoryPagesInputPerSecTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.9" instance="0" alias="memoryPagesOutputPerSecTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.10" instance="0" alias="memoryPagesPerSec" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.11" instance="0" alias="memoryPoolNonpagedBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.12" instance="0" alias="memoryPoolPagedBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.13" instance="0" alias="memoryPoolPagedResidentBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.14" instance="0" alias="memorySystemCacheResidentBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.15" instance="0" alias="memorySystemCodeResidentBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.16" instance="0" alias="memorySystemCodeTotalBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.17" instance="0" alias="memorySystemDriverResidentBytesTOOLONG" type="Gauge32" /> <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.18" instance="0" alias="memorySystemDriverTotalBytesTOOLONG" type="Gauge32" />

Note that the instance is numeric ("0") which means the data is not in a table. Since RRDTool/jRobin can only store numeric data, it also helps that the data type on all of these values is "Gauge32".

You'll note that the alias for most of these OIDs has the letters "TOOLONG" in it. RRDTool has a 19 character limitation, and this is the parser's way of indicating that something needs to be changed. I also like to indicate in the alias name what device/MIB the data is from, so this ends up in datacollection-config.xml as:

<group  name="snmpinformant-memory" ifType="ignore">        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.3" instance="0" alias="sinfMemAvailMB" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.4" instance="0" alias="sinfMemComBytes" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.5" instance="0" alias="sinfMemCacheBytes" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.6" instance="0" alias="sinfMemCacheBytesPk" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.7" instance="0" alias="sinfMemPageFaultsPS" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.8" instance="0" alias="sinfMemPagesInputPS" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.9" instance="0" alias="sinfMemPagesOutPS" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.10" instance="0" alias="sinfMemPagesPerSec" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.11" instance="0" alias="sinfMemPNonpagedByt" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.12" instance="0" alias="sinfMemPPagedBytes" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.13" instance="0" alias="sinfMemPPagedResByt" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.14" instance="0" alias="sinfMemSysCacheResB" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.15" instance="0" alias="sinfMemSysCodeResB" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.16" instance="0" alias="sinfMemSysCodeTotB" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.17" instance="0" alias="sinfMemSysDrvResB" type="Gauge" />        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.18" instance="0" alias="sinfMemSysDrvTotB" type="Gauge" /></group>

Note that each alias is 19 characters or less, and that "sinf" for SNMP-Informant has been prefixed to each one.

The other three groups in this MIB, which reside in tables, are not so easy. The problem lies with how SNMP-Informant uses instances. For example, this is the available information for disks (output from the mibparser):

<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.1" instance="lDiskInstance" alias="lDiskInstance" type="InstanceName" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.2" instance="lDiskInstance" alias="lDiskPercentDiskReadTimeTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.3" instance="lDiskInstance" alias="lDiskPercentDiskTimeTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.4" instance="lDiskInstance" alias="lDiskPercentDiskWriteTimeTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5" instance="lDiskInstance" alias="lDiskPercentFreeSpaceTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.6" instance="lDiskInstance" alias="lDiskPercentIdleTimeTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.7" instance="lDiskInstance" alias="lDiskAvgDiskQueueLengthTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.8" instance="lDiskInstance" alias="lDiskAvgDiskReadQueueLengthTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.9" instance="lDiskInstance" alias="lDiskAvgDiskWriteQueueLengthTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.10" instance="lDiskInstance" alias="lDiskAvgDiskSecPerReadTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.11" instance="lDiskInstance" alias="lDiskAvgDiskSecPerTransferTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.12" instance="lDiskInstance" alias="lDiskAvgDiskSecPerWriteTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.13" instance="lDiskInstance" alias="lDiskCurrentDiskQueueLengthTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.14" instance="lDiskInstance" alias="lDiskDiskBytesPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.15" instance="lDiskInstance" alias="lDiskDiskReadBytesPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.16" instance="lDiskInstance" alias="lDiskDiskReadsPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.17" instance="lDiskInstance" alias="lDiskDiskTransfersPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.18" instance="lDiskInstance" alias="lDiskDiskWriteBytesPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.19" instance="lDiskInstance" alias="lDiskDiskWritesPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.20" instance="lDiskInstance" alias="lDiskFreeMegabytes" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.21" instance="lDiskInstance" alias="lDiskSplitIOPerSec" type="Gauge32" />

You'll see that the "lDiskInstance" index into the table. This is where things get really weird.

First, you'll need to run "diskperf -y" as an Administrator on the command line of the target windows boxes, and you'll need to reboot to get any information about disks at all. On my lone Windows box, I have two disk drives, C: and D:. If I run:

 $ snmpwalk -v 1 -c public butters.opennms.com .1.3.6.1.4.1.9600.1.1.1.1.1 SNMPv2-SMI::enterprises.9600.1.1.1.1.1.2.67.58 = STRING: "C:" SNMPv2-SMI::enterprises.9600.1.1.1.1.1.2.68.58 = STRING: "D:" SNMPv2-SMI::enterprises.9600.1.1.1.1.1.6.95.84.111.116.97.108 = STRING: _Total"

You'll see that there are three instances listed: C:, D: and _Total.

Here's the weird part. Note that the instance for the first one is "2.67.58". In ASCII, the .2 is unprintable but 67:58 is "C:" and 68:58 is "D:". Thus it becomes pretty easy to understand which instance you'll need to collect, but could get weird for oddly named drives.

Now it becomes an exercise in cut and paste. Rather than paste the whole disk group for SNMP-Informant, let's take a look at one OID.

I looked at "lDiskPercentFreeSpace" and figured that would be a good place to start, since many people want to know when their disks are full.

 $ snmpwalk -v 1 -c public butters.opennms.com .1.3.6.1.4.1.9600.1.1.1.1.5 SNMPv2-SMI::enterprises.9600.1.1.1.1.5.2.67.58 = Gauge32: 3 SNMPv2-SMI::enterprises.9600.1.1.1.1.5.2.68.58 = Gauge32: 98 SNMPv2-SMI::enterprises.9600.1.1.1.1.5.6.95.84.111.116.97.108 = Gauge32: 75

It is pretty dead on. My C: drive is full while my D: drive is pretty empty. Note that the total drive space percentage is also available (although I am not sure how that is calculated).

If I wanted to collect this information, I would need to edit datacollection-config.xml and add something like:

<group  name="snmpinformant-disk" ifType="ignore">       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.2.67" instance="58" alias="sinfDskPtFreeSpcC" type="Gauge32" />       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.2.68" instance="58" alias="sinfDskPtFreeSpcD" type="Gauge32" />       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.2.69" instance="58" alias="sinfDskPtFreeSpcE" type="Gauge32" />       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.6.95.84.111.116.97" instance="108" alias="sinfDskPtFreeSpcTl" type="Gauge32" /></group>

And then add the "snmpinformant-disk" entry to the system definitions at the bottom of the file. Note that I changed to alias names to reflect SNMP-Informant and fit within 19 characters.

Now, adding this to datacollection-config.xml and restarting OpenNMS will (should) start data collection.

The next step will be to add reports for these variables. Editing the snmp-graph.properties file and finding the SNMP Informant section, I added the following report:

report.sinf.diskfreeC.name=Available Disk Space (Drive C) (SNMP-Inf)report.sinf.diskfreeC.columns=sinfDskPtFreeSpcCreport.sinf.diskfreeC.type=nodereport.sinf.diskfreeC.command=--title="Windows Available Space Disk Drive C (SNMP-Informant)" / DEF:availspace={rrd1}:sinfDskPtFreeSpcC:AVERAGE / LINE2:availspace#ff0000:"% Avail." / GPRINT:availspace:AVERAGE:"Avg //: %10.2lf %s" / GPRINT:availspace:MIN:"Min //: %10.2lf %s" / GPRINT:availspace:MAX:"Max //: %10.2lf %s//n"

This will need to be repeated for all the other disks as well as adding it to the reports= line at the top of the file.

Finally, we want to know when the available disk gets to, say, 5%, so edit the thresholds.xml file and add:

<!-- SNMP Informant thresholds --><threshold type="low" ds-name="sinfDskPtFreeSpcC"  ds-type="node" value="5" rearm="10" trigger="1"/><threshold type="low" ds-name="sinfDskPtFreeSpcD"  ds-type="node" value="5" rearm="10" trigger="1"/><threshold type="low" ds-name="sinfDskPtFreeSpcE"  ds-type="node" value="5" rearm="10" trigger="1"/><threshold type="low" ds-name="sinfDskPtFreeSpcTl"  ds-type="node" value="5" rearm="10" trigger="1"/>

The next thing to look at are CPU stats:

<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.1" instance="cpuInstance" alias="cpuInstance" type="InstanceName" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.2" instance="cpuInstance" alias="cpuPercentDPCTime" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.3" instance="cpuInstance" alias="cpuPercentInterruptTimeTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.4" instance="cpuInstance" alias="cpuPercentPrivilegedTimeTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5" instance="cpuInstance" alias="cpuPercentProcessorTimeTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.6" instance="cpuInstance" alias="cpuPercentUserTime" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.7" instance="cpuInstance" alias="cpuAPCBypassesPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.8" instance="cpuInstance" alias="cpuDPCBypassesPerSecTOOLONG" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.9" instance="cpuInstance" alias="cpuDPCRate" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.10" instance="cpuInstance" alias="cpuDPCsQueuedPerSec" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.11" instance="cpuInstance" alias="cpuInterruptsPerSec" type="Gauge32" />

I only have one CPU on my machine, but I get:

 $ snmpwalk -v 1 -c public butters.opennms.com .1.3.6.1.4.1.9600.1.1.5.1.1 SNMPv2-SMI::enterprises.9600.1.1.5.1.1.1.48 = STRING: "0" SNMPv2-SMI::enterprises.9600.1.1.5.1.1.6.95.84.111.116.97.108 = STRING: "_Total"

and as you can see, we get both a single CPU and _Total.

The statistic that most people are interested in is how busy is the CPU? From the SNMP Informant MIB:

 cpuPercentProcessorTime OBJECT-TYPE    SYNTAX     Gauge32    MAX-ACCESS read-only    STATUS     current    DESCRIPTION            "% Processor Time is the percentage of time            that the processor is executing a non-Idle            thread.  This counter was designed as a primary            indicator of processor activity.  It is            calculated by measuring the time that the            processor spends executing the thread of the            Idle process in each sample interval, and            subtracting that value from 100%.  (Each            processor has an Idle thread which consumes            cycles when no other threads are ready to run).            It can be viewed as the percentage of the            sample interval spent doing useful work.  This            counter displays the average percentage of busy            time observed during the sample interval.  It            is calculated by monitoring the time the            service was inactive, and then subtracting that            value from 100%."    ::= { processorEntry 5 }

I especially liked "This counter was designed as a primary indicator of processor activity" since that is what we are looking for. So a value of 100% would be bad if sustained.

So off to modify datacollection-config.xml again:

<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="48" alias="sinfCpuPtProcTime0" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="49" alias="sinfCpuPtProcTime1" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="50" alias="sinfCpuPtProcTime2" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="51" alias="sinfCpuPtProcTime3" type="Gauge32" /><mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.6.95.84.111.116.97" instance="108" alias="sinfCpuPtProcTimeTl" type="Gauge32" />

This will collect the values we want.

And now for a sample graph to place in snmp-graph.properties:

 report.sinf.cpu0percent.name=Windows CPU 0 Percent Processor Time (SNMP-Inf) report.sinf.cpu0percent.columns=sinfCpuPtProcTime0 report.sinf.cpu0percent.type=node report.sinf.cpu0percent.command=--title="Windows CPU 0 Utilization (SNMP-Informant)" /  DEF:utilization={rrd1}:sinfCpuPtProcTime0:AVERAGE /  LINE2:utilization#ff0000:"% util." /  GPRINT:utilization:AVERAGE:"Avg //: %10.2lf %s" /  GPRINT:utilization:MIN:"Min //: %10.2lf %s" /  GPRINT:utilization:MAX:"Max //: %10.2lf %s//n"

Remember to add it to the "reports=" line at the top of the file.

For thresholds, it's similar to above:

<threshold type="high" ds-name="sinfCpuPtProcTime0"  ds-type="node" value="100" rearm="90" trigger="3"/><threshold type="high" ds-name="sinfCpuPtProcTime1"  ds-type="node" value="100" rearm="90" trigger="3"/><threshold type="high" ds-name="sinfCpuPtProcTime2"  ds-type="node" value="100" rearm="90" trigger="3"/><threshold type="high" ds-name="sinfCpuPtProcTime3"  ds-type="node" value="100" rearm="90" trigger="3"/><threshold type="high" ds-name="sinfCpuPtProcTimeTl"  ds-type="node" value="100" rearm="90" trigger="3"/>

This will require three consecutive polls where the CPU is at 100% before the alarm will be raised.  

----------------------------------------------------------------------------------

评论:

1.There are two, distinctly different things: traps to convert to events and OIDs to collection for performance data.

采集SNMP数据和接收是OpenNMS可以通过SNMP协议完成的两个动作。涉及两个工具mib2opennms,和mibparser,mib2opennms负责将mib中的snmp trap转化为OpenNMS可认识的数据格式,mibparser则负责将mib转化为OpenNMS采集格式协助管理员配置datacollection-config.xml文件。本文重点介绍了OpenNMS的mibparser的使用   

原创粉丝点击