OpenTSDB数据采集器tcollector介绍及运行说明

来源：互联网发布：大数据分析报告撰写编辑：程序博客网时间：2024/05/16 08:03

摘自optntsdb.net的说明：

tcollector

tcollector is a client-sideprocess that gathers data from local collectors and pushes the data toOpenTSDB. You run it on all your hosts, and it does the work of sending eachhost's data to the TSD.

OpenTSDB is designed to make it easy tocollect and write data to it. It has a simple protocol, simple enough for evena shell script to start sending data. However, to do so reliably andconsistently is a bit harder. What do you do when your TSD server is down? Howdo you make sure your collectors stay running? This is where tcollector comesin.

Tcollector does several things for you:

Runs all of your data collectors andgathers their data

Does all of the connection management workof sending data to the TSD

You don't have to embed all of this code inevery collector you write

Does de-duplication of repeated values

Handles all of the wire protocol work foryou, as well as future enhancements

Deduplication

Typically you want to gather data abouteverything in your system. This generates a lot of datapoints, the majority ofwhich don't change very often over time (if ever). However, you wantfine-grained resolution when they do change. Tcollector remembers the lastvalue and timestamp that was sent for all of the time series for all of thecollectors it manages. If the value doesn't change between sample intervals, itsuppresses sending that datapoint. Once the value does change (or 10 minuteshave passed), it sends the last suppressed value and timestamp, plus thecurrent value and timestamp. In this way all of your graphs and such arecorrect. Deduplication typically reduces the number of datapoints TSD needs tocollect by a large fraction. This reduces network load and storage in thebackend. A future OpenTSDB release however will improve on the storage formatby using RLE (among other things), making it essentially free to store repeatedvalues.

Collectinglots of metrics with tcollector

Collectors in tcollector can be written inany language. They just need to be executable and output the data to stdout.Tcollector will handle the rest. The collectors are placed in thecollectors directory. Tcollectoriterates over every directory named with a number in that directory and runsall the collectors in each directory. If you name the directory60, then tcollector will try to runevery collector in that directory every 60 seconds. Use the directory0 for any collectors that arelong-lived and run continuously. Tcollector will read their output and respawnthem if they die. Generally you want to write long-lived collectors since thathas less overhead. OpenTSDB is designed to have lots of datapoints for eachmetric (for most metrics we send datapoints every 15 seconds).

If there any non-numeric named directoriesin the collectorsdirectory, then they are ignored. We've included alib and etc directory for library and configdata used by all collectors.

Installation of tcollector

You need to clone tcollector from GitHub:

git clonegit://github.com/OpenTSDB/tcollector.git

and edit 'tcollector/startstop'script to set following variables:

TSD_HOST=dns.name.of.tsdTCOLLECTOR_PATH=path/to/tcollector

To avoid having to run mkmetric for every metric thattcollector tracks you can to start TSD with the--auto-metric flag. This is useful to get started quickly, but it's notrecommended to keep this flag in the long term, to avoid accidental metriccreation.

运行说明：

下载tcollector

git clone git://github.com/OpenTSDB/tcollector.git

配置tcollector

修改tcollector/startstop

TSD_HOST=localhost

TCOLLECTOR_PATH=/usr/hadoop/tcollector

运行tcollector

启动hbase

启动tsd

./build/tsdb tsd--port=4242 --staticroot=build/staticroot --cachedir=/tmp/tsdtmp –zkquorum=localhost

添加标签

./build/tsdb mkmetric df.bytes.tota df.bytes.used df.bytes.freedf.inodes.total df.inodes.used df.inodes.free

运行tcollector

cd /usr/hadoop/tcollector

./startstop start

cp /collectors/0/dfstat.py dfstat.py

./dfstat.py

查看hbase表记录

scan ‘tsdb-uid’

scan ‘tsdb’