openfalcon-mongodb监控插件
来源:互联网 发布:java解析json数组数据 编辑:程序博客网 时间:2024/05/29 02:40
下载地址
https://github.com/ZhuoRoger/mongomon
修改版本下载
- http://pan.baidu.com/s/1pLgwOob
- 修改uptime为COUNTER,方便监控
- 修改replca-set bug,可以正常采集mongodb集群
mongodb自己连接测试
./mongo -u admin2 -p “123” –authenticationDatabase admin
配置 conf/mongomon.conf
items:- {port: 27017, user: "admin2",password: "123"}
程序 bin/mongodb_monitor.py调整
- hostname调整,一般我喜欢调整成ip地址
mongodb monitor cron
// root下面用这个 * * * * * user (cd /opt/abc/agent/falcon-plugins/mongomon/bin/ && python mongodb_monitor.py > /dev/null)* * * * * cd /opt/abc/agent/falcon-plugins/mongomon/bin/ && python mongodb_monitor.py >> /dev/null
mongodb创建admin账户,要有root权限,否则不会被授权
use admindb.createUser( { user: "admin2", pwd: "123", roles: [ { role: "root", db: "admin" } ] });exit;
一个坑
- 领导要求监控uptime,如果这个300分钟内没有变化就说明mongodb不工作了,需要报警,zabbix里面有这个插件,那么falcon是否也可以呢?
调整之前
调整之后
经过调查发现,falcon里面对于uptime这个指标,是GAUGE类型,就是说,mongodb运行一分钟,我会有数值60,一个小时就是3600,运行个几天,数据大的不行,而且关键这个东西没法做报警,不是一个相对值,所以我稍微修改了下作者的代码,把uptime加到counter列表里面,这样,每次我们就会得到一个稳定的1,因为假设mongodb一分钟内一直在运行,两次踩点之间的差是60,然后除以60,最后得到1,也就是说,我们最终得到的值只可能是1左右,否则就是不工作。
又一个坑
主要是,作者本来插件已经支持单机和replca-set的监控,但是在replica-set的监控中,由于引入了mongodb的 Timestamp,这个是bson里面的,必须安装python的一个插件,以及修改json处理中少了 [“ts”], 两个地方会导致最终采集错误,修改后的整体代码如下,只需要覆盖 mongodb_server.py即可。
小伙伴们可以继续测试下,有问题请留言。
- 不要忘了pip install mongo-connector 哦。 参考 http://nullege.com/codes/search/bson.timestamp.Timestamp
- 总结下拍错思路,根据报错信息,然后加上type(something)去看到底是个啥鬼。
#! /bin/env python#-*- coding:utf8 -*-# Filename: mongodb_server.pyimport sysimport osfrom bson.timestamp import Timestampimport pymongofrom pymongo import MongoClientclass mongodbMonitor(object): def mongodb_connect(self,host=None, port=None, user=None, password=None):# try: conn = MongoClient(host, port, serverSelectionTimeoutMS=1000) # conntion timeout 1 sec. if user and password: db_admin = conn["admin"] if not db_admin.authenticate(user,password): pass; conn.server_info() print conn# except :# e = sys.exc_info()[0]# return e, None return 0,conn #data node(1): standalone, replset primary, replset secondary. mongos(2), mongoConfigSrv(3) def get_mongo_role(self, conn): mongo_role = 1 conn.server_info() if (conn.is_mongos): mongo_role = 2 elif ("chunks" in conn.get_database("config").collection_names()): # Role is a config servers? not mongos and has config.chunks collections. it's a config server. mongo_role = 3 return mongo_role def get_mongo_monitor_data(self, conn): mongo_monitor_dict ={} mongo_monitor_dict["mongo_local_alive"] = 1 # mongo local alive metric for all nodes. mongo_role = self.get_mongo_role(conn) if(mongo_role == 1): mongodb_role,serverStatus_dict = self.serverStatus(conn) mongo_monitor_dict.update(serverStatus_dict) repl_status_dict = {} if (mongodb_role == "master" or mongodb_role == "secondary"): repl_status_dict = self.repl_status(conn) mongo_monitor_dict.update(repl_status_dict) else: print "this is standalone node" elif(mongo_role == 2): # mongos shards_dict = self.shard_status(conn) mongo_monitor_dict.update(shards_dict) return mongo_monitor_dict def serverStatus(self,connection): serverStatus = connection.admin.command(pymongo.son_manipulator.SON([('serverStatus', 1)])) mongodb_server_dict = {} # mongodb server status metric for upload to falcon mongo_version = serverStatus["version"] #uptime metric mongodb_server_dict["uptime"] = int(serverStatus["uptime"]) #asserts section metrics mongo_asserts = serverStatus["asserts"] for asserts_key in mongo_asserts.keys(): asserts_key_name = "asserts_" + asserts_key mongodb_server_dict[asserts_key_name] = mongo_asserts[asserts_key] ### "extra_info" section metrics: page_faults. falcon counter type. if serverStatus.has_key("extra_info"): mongodb_server_dict["page_faults"] = serverStatus["extra_info"]["page_faults"] ### "connections" section metrics current_conn = serverStatus["connections"]["current"] available_conn = serverStatus["connections"]["available"] mongodb_server_dict["connections_current"] = current_conn mongodb_server_dict["connections_available"] = available_conn # mongodb connection used percent mongodb_server_dict["connections_used_percent"] = int((current_conn/(current_conn + available_conn)*100)) # total created from mongodb started. COUNTER metric mongodb_server_dict["connections_totalCreated"] = serverStatus["connections"]["totalCreated"] # "globalLock" currentQueue mongodb_server_dict["globalLock_currentQueue_total"] = serverStatus["globalLock"]["currentQueue"]["total"] mongodb_server_dict["globalLock_currentQueue_readers"] = serverStatus["globalLock"]["currentQueue"]["readers"] mongodb_server_dict["globalLock_currentQueue_writers"] = serverStatus["globalLock"]["currentQueue"]["writers"] # "locks" section, Changed in version 3.0 if serverStatus.has_key("locks") and mongo_version >"3.0": locks_dict_keys = serverStatus["locks"].keys() for lock_scope in locks_dict_keys: # Global, Database,Collection,Oplog for lock_metric in serverStatus["locks"][lock_scope]: for lock_type in serverStatus["locks"][lock_scope][lock_metric]: if lock_type == "R": lock_name = "Slock" elif lock_type == "W": lock_name = "Xlock" elif lock_type == "r": lock_name = "ISlock" elif lock_type == "w": lock_name = "IXlock" lock_metric_key = "locks_" + lock_scope + "_" + lock_metric + "_" + lock_name mongodb_server_dict[lock_metric_key] = serverStatus["locks"][lock_scope][lock_metric][lock_type] # "network" section metrics: bytesIn, bytesOut, numRequests; counter type if serverStatus.has_key("network"): for network_metric in serverStatus["network"].keys(): network_metric_key = "network_" + network_metric # network metric key for upload mongodb_server_dict[network_metric_key] = serverStatus["network"][network_metric] ### "opcounters" section metrics: insert, query, update, delete, getmore, command. couter type if serverStatus.has_key("opcounters"): for opcounters_metric in serverStatus["opcounters"].keys(): opcounters_metric_key = "opcounters_" + opcounters_metric mongodb_server_dict[opcounters_metric_key] = serverStatus["opcounters"][opcounters_metric] ### "opcountersRepl" section metrics: insert, query, update, delete, getmore, command. couter type if serverStatus.has_key("opcountersRepl"): for opcountersRepl_metric in serverStatus["opcountersRepl"].keys(): opcountersRepl_metric_key = "opcountersRepl_" + opcountersRepl_metric mongodb_server_dict[opcountersRepl_metric_key] = serverStatus["opcounters"][opcountersRepl_metric] ### "mem" section metrics: if serverStatus.has_key("mem"): for mem_metric in serverStatus["mem"].keys(): mem_metric_key = "mem_" + mem_metric if( mem_metric in ["bits","supported"] ): mongodb_server_dict[mem_metric_key] = serverStatus["mem"][mem_metric] else: mongodb_server_dict[mem_metric_key] = serverStatus["mem"][mem_metric]*1024*1024 ### "dur" section metrics: if serverStatus.has_key("dur"): mongodb_server_dict["dur_journaledBytes"] = serverStatus["dur"]["journaledMB"]*1024*1024 mongodb_server_dict["dur_writeToDataFilesBytes"] = serverStatus["dur"]["writeToDataFilesMB"]*1024*1024 mongodb_server_dict["dur_commitsInWriteLock"] = serverStatus["dur"]["commitsInWriteLock"] ### "repl" section mongodb_role = "" if (serverStatus.has_key("repl") and serverStatus["repl"].has_key("secondary")): if serverStatus["repl"]["ismaster"]: mongodb_role = "master" if serverStatus["repl"]["secondary"]: mongodb_role = "secondary" else: # not Replica sets mode mongodb_role = "standalone" ### "backgroundFlushing" section metrics, only for MMAPv1 if serverStatus.has_key("backgroundFlushing"): for bgFlush_metric in serverStatus["backgroundFlushing"].keys(): if bgFlush_metric != "last_finished": # discard last_finished metric bgFlush_metric_key = "backgroundFlushing_" + bgFlush_metric mongodb_server_dict[bgFlush_metric_key] = serverStatus["backgroundFlushing"][bgFlush_metric] ### cursor from "metrics" section if serverStatus.has_key("metrics") and serverStatus["metrics"].has_key("cursor"): cursor_status = serverStatus["metrics"]["cursor"] mongodb_server_dict["cursor_timedOut"] = cursor_status["timedOut"] mongodb_server_dict["cursor_open_noTimeout"] = cursor_status["open"]["noTimeout"] mongodb_server_dict["cursor_open_pinned"] = cursor_status["open"]["pinned"] mongodb_server_dict["cursor_open_total"] = cursor_status["open"]["total"] ### "wiredTiger" section if serverStatus.has_key("wiredTiger"): serverStatus_wt = serverStatus["wiredTiger"] #cache wt_cache = serverStatus_wt["cache"] mongodb_server_dict["wt_cache_used_total_bytes"] = wt_cache["bytes currently in the cache"] mongodb_server_dict["wt_cache_dirty_bytes"] = wt_cache["tracked dirty bytes in the cache"] mongodb_server_dict["wt_cache_readinto_bytes"] = wt_cache["bytes read into cache"] mongodb_server_dict["wt_cache_writtenfrom_bytes"] = wt_cache["bytes written from cache"] #concurrentTransactions wt_concurrentTransactions = serverStatus_wt["concurrentTransactions"] mongodb_server_dict["wt_concurrentTransactions_write"] = wt_concurrentTransactions["write"]["available"] mongodb_server_dict["wt_concurrentTransactions_read"] = wt_concurrentTransactions["read"]["available"] #"block-manager" section wt_block_manager = serverStatus_wt["block-manager"] mongodb_server_dict["wt_bm_bytes_read"] = wt_block_manager["bytes read"] mongodb_server_dict["wt_bm_bytes_written"] = wt_block_manager["bytes written"] mongodb_server_dict["wt_bm_blocks_read"] = wt_block_manager["blocks read" ] mongodb_server_dict["wt_bm_blocks_written"] = wt_block_manager["blocks written"] ### "rocksdb" engine if serverStatus.has_key("rocksdb"): serverStatus_rocksdb = serverStatus["rocksdb"] mongodb_server_dict["rocksdb_num_immutable_mem_table"] = serverStatus_rocksdb["num-immutable-mem-table"] mongodb_server_dict["rocksdb_mem_table_flush_pending"] = serverStatus_rocksdb["mem-table-flush-pending"] mongodb_server_dict["rocksdb_compaction_pending"] = serverStatus_rocksdb["compaction-pending"] mongodb_server_dict["rocksdb_background_errors"] = serverStatus_rocksdb["background-errors"] mongodb_server_dict["rocksdb_num_entries_active_mem_table"] = serverStatus_rocksdb["num-entries-active-mem-table"] mongodb_server_dict["rocksdb_num_entries_imm_mem_tables"] = serverStatus_rocksdb["num-entries-imm-mem-tables"] mongodb_server_dict["rocksdb_num_snapshots"] = serverStatus_rocksdb["num-snapshots"] mongodb_server_dict["rocksdb_oldest_snapshot_time"] = serverStatus_rocksdb["oldest-snapshot-time"] mongodb_server_dict["rocksdb_num_live_versions"] = serverStatus_rocksdb["num-live-versions"] mongodb_server_dict["rocksdb_total_live_recovery_units"] = serverStatus_rocksdb["total-live-recovery-units"] ### "PerconaFT" engine if serverStatus.has_key("PerconaFT"): serverStatus_PerconaFT = serverStatus["PerconaFT"] mongodb_server_dict["PerconaFT_log_count"] = serverStatus_PerconaFT["log"]["count"] mongodb_server_dict["PerconaFT_log_time"] = serverStatus_PerconaFT["log"]["time"] mongodb_server_dict["PerconaFT_log_bytes"] = serverStatus_PerconaFT["log"]["bytes"] mongodb_server_dict["PerconaFT_fsync_count"] = serverStatus_PerconaFT["fsync"]["count"] mongodb_server_dict["PerconaFT_fsync_time"] = serverStatus_PerconaFT["fsync"]["time"] ### cachetable PerconaFT_cachetable = serverStatus_PerconaFT["cachetable"] mongodb_server_dict["PerconaFT_cachetable_size_current"] = PerconaFT_cachetable["size"]["current"] mongodb_server_dict["PerconaFT_cachetable_size_writing"] = PerconaFT_cachetable["size"]["writing"] mongodb_server_dict["PerconaFT_cachetable_size_limit"] = PerconaFT_cachetable["size"]["limit"] ### PerconaFT checkpoint PerconaFT_checkpoint = serverStatus_PerconaFT["checkpoint"] mongodb_server_dict["PerconaFT_checkpoint_count"] = PerconaFT_checkpoint["count"] mongodb_server_dict["PerconaFT_checkpoint_time"] = PerconaFT_checkpoint["time"] mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_count"] = PerconaFT_checkpoint["write"]["nonleaf"]["count"] mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_time"] = PerconaFT_checkpoint["write"]["nonleaf"]["time"] mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_bytes_compressed"] = PerconaFT_checkpoint["write"]["nonleaf"]["bytes"]["compressed"] mongodb_server_dict["PerconaFT_checkpoint_write_nonleaf_bytes_uncompressed"] = PerconaFT_checkpoint["write"]["nonleaf"]["bytes"]["uncompressed"] mongodb_server_dict["PerconaFT_checkpoint_write_leaf_count"] = PerconaFT_checkpoint["write"]["leaf"]["count"] mongodb_server_dict["PerconaFT_checkpoint_write_leaf_time"] = PerconaFT_checkpoint["write"]["leaf"]["time"] mongodb_server_dict["PerconaFT_checkpoint_write_leaf_bytes_compressed"] = PerconaFT_checkpoint["write"]["leaf"]["bytes"]["compressed"] mongodb_server_dict["PerconaFT_checkpoint_write_leaf_bytes_uncompressed"] = PerconaFT_checkpoint["write"]["leaf"]["bytes"]["uncompressed"] ### serializeTime for serializeTime_item in serverStatus_PerconaFT["serializeTime"]: prefix = "PerconaFT_serializeTime_" + serializeTime_item for serializeTime_key in serverStatus_PerconaFT["serializeTime"][serializeTime_item]: key_name = prefix + "_" + serializeTime_key mongodb_server_dict[key_name] = serverStatus_PerconaFT["serializeTime"][serializeTime_item][serializeTime_key] ### PerconaFT compressionRatio for compressionRatio_item in serverStatus_PerconaFT["compressionRatio"]: key_name = "PerconaFT_compressionRatio_" + compressionRatio_item mongodb_server_dict[key_name] = serverStatus_PerconaFT["compressionRatio"][compressionRatio_item] return (mongodb_role, mongodb_server_dict) def repl_status(self,connection): replStatus = connection.admin.command("replSetGetStatus") print replStatus repl_status_dict = {} # repl set metric dict # myState "1" for PRIMARY , "2" for SECONDARY, "3": repl_status_dict["repl_myState"] = replStatus["myState"] repl_status_members = replStatus["members"] master_optime = 0 # Master oplog ops time myself_optime = 0 # SECONDARY oplog ops time print "开始打印repl_status_members" print repl_status_members print "结束打印repl_status_members" for repl_member in repl_status_members: if repl_member.has_key("self") and repl_member["self"]: repl_status_dict["repl_health"] = repl_member["health"] #repl_status_dict["repl_optime"] = repl_member["optime"].time #print "value of optime ts is:" #print type(repl_member["optime"]) #print type(repl_member["optime"]["ts"]) print repl_member["optime"]["ts"].time repl_status_dict["repl_optime"] = repl_member["optime"]["ts"].time if repl_member.has_key("repl_electionTime"): repl_status_dict["repl_electionTime"] = repl_member["electionTime"].time if repl_member.has_key("repl_configVersion"): repl_status_dict["repl_configVersion"] = repl_member["configVersion"] #myself_optime = repl_member["optime"].time myself_optime = repl_member["optime"]["ts"].time if (replStatus["myState"] == 2 and repl_member["state"] == 1 ): # CONDARY ,get repl lag master_optime = repl_member["optime"]["ts"].time if replStatus["myState"] == 2 : repl_status_dict["repl_lag"] = master_optime - myself_optime ### oplog window hours oplog_collection = connection["local"]["oplog.rs"] oplog_tFirst = oplog_collection.find({},{"ts":1}).sort('$natural',pymongo.ASCENDING).limit(1).next() oplog_tLast = oplog_collection.find({},{"ts":1}).sort('$natural',pymongo.DESCENDING).limit(1).next() oplogrs_collstats = connection["local"].command("collstats", "oplog.rs") window_multiple = 1 ##oplog.rs collections is not full if oplogrs_collstats.has_key("maxSize"): window_multiple = oplogrs_collstats["maxSize"]/(oplogrs_collstats["count"] * oplogrs_collstats["avgObjSize"]) else: window_multiple = oplogrs_collstats["storageSize"]/(oplogrs_collstats["count"] * oplogrs_collstats["avgObjSize"]) #oplog_window .xx hours oplog_window = round((oplog_tLast["ts"].time - oplog_tFirst["ts"].time)/3600.0,2) * window_multiple # full repl_status_dict["repl_oplog_window"] = oplog_window return repl_status_dict # only for mongos node def shard_status(self, conn): config_db = conn["config"] settings_col = config_db["settings"] balancer_doc = settings_col.find_one({'_id':'balancer'}) shards_dict = {} if balancer_doc is None: shards_dict["shards_BalancerState"] = 1 elif balancer_doc["stopped"]: shards_dict["shards_BalancerState"] = 0 else: shards_dict["shards_BalancerState"] = 1 # shards_activeWindow metric,0: without setting, 1:setting # shards_activeWindow_start metric, { "start" : "23:30", "stop" : "6:00" } : 23.30 for 23:30 # shards_activeWindow_stop metric if balancer_doc is None: shards_dict["shards_activeWindow"] = 0 elif balancer_doc.has_key("activeWindow"): shards_dict["shards_activeWindow"] = 1 if balancer_doc["activeWindow"].has_key("start"): window_start = balancer_doc["activeWindow"]["start"] shards_dict["shards_activeWindow_start"] = window_start.replace(":",".") if balancer_doc["activeWindow"].has_key("stop"): window_stop = balancer_doc["activeWindow"]["stop"] shards_dict["shards_activeWindow_stop"] = window_stop.replace(":",".") # shards_chunkSize metric chunksize_doc = settings_col.find_one({"_id" : "chunksize"}) if chunksize_doc is not None: shards_dict["shards_chunkSize"] = chunksize_doc["value"] # shards_isBalancerRunning metric locks_col = config_db["locks"] balancer_lock_doc = locks_col.find_one({'_id':'balancer'}) if balancer_lock_doc is None: print "config.locks collection empty or missing. be sure you are connected to a mongos" shards_dict["shards_isBalancerRunning"] = 0 elif balancer_lock_doc["state"] > 0: shards_dict["shards_isBalancerRunning"] = 1 else: shards_dict["shards_isBalancerRunning"] = 0 # shards_size metric shards_col = config_db["shards"] shards_dict["shards_size"] = shards_col.count() # shards_mongosSize metric mongos_col = config_db["mongos"] shards_dict["shards_mongosSize"] = mongos_col.count() return shards_dict
有用的指标
uptime
uptime/mongo=27017
opcounters
opcounters_insert/mongo=27017
opcounters_query/mongo=27017
opcounters_update/mongo=27017
opcounters_delete/mongo=27017
opcounters_getmore/mongo=27017
opcounters_command/mongo=27017
connections
connections_available/mongo=27017
connections_current/mongo=27017
connections_totalCreated/mongo=27017
connections_used_percent/mongo=27017
附录
官网关于openfalcon数据类型的介绍。
https://github.com/open-falcon/book/blob/master/zh/usage/data-push.mdcounterType: 只能是COUNTER或者GAUGE二选一,前者表示该数据采集项为计时器类型,后者表示其为原值 (注意大小写)
- GAUGE:即用户上传什么样的值,就原封不动的存储
- COUNTER:指标在存储和展现的时候,会被计算为speed,即(当前值 - 上次值)/ 时间间隔
- mongodb 指标意义 http://blog.itpub.net/15498/viewspace-1978198/
- openfalcon-mongodb监控插件
- openfalcon插件 监控http时候的编码问题
- openfalcon-zookeeper监控 "zkmon"
- openfalcon监控端口大全
- openfalcon 监控端口配置
- MongoDB nagios check_mongodb插件监控
- nagios check_mongodb插件监控mongodb
- nagios check_mongodb插件监控mongodb
- openfalcon
- openfalcon
- openfalcon
- openfalcon
- openfalcon的 日志文件监控 logdog
- openfalcon 监控fastdfs是否可下载
- 开源一个监控数据采集Agent:OpenFalcon-SuitAgent
- openfalcon 添加监控远程tcp端口异常情况
- openfalcon 使用 swcollector 监控交换机等网络设备 配合着
- mongodb监控
- SharpNL Stem&Tokenize
- 发布一个软件,需要多少人?
- SqlServer 2008R2 无法远程连接的解决办法
- 图片路径获取问题
- Java中线程同步
- openfalcon-mongodb监控插件
- linux比较文件命令-diff
- Web渗透测试学习路线图
- 如何确定函数参数
- 【高数】导数
- 设计模式~工厂模式
- 描述一下JVM加载class文件的原理机制?
- Plan
- Moving Tables