OpenTSDB配置安装

来源:互联网 发布:批量申请淘宝小号 编辑:程序博客网 时间:2024/05/16 19:14

Getting Started

This page will walk you through the setup process to getOpenTSDB running. It assumes you've read and understood theoverview. With noprior experience, it should take about 15 minutes to get OpenTSDBrunning, including the time needed to setup HBase on a singlenode.

此页面将引导你完成安装程序以使得OpenTSDB运行。假设你已经阅读并理解了概述(overview),在没有任何以往经验的情况下,这将花费大约15分钟去使用OpenTSDB运行,包括在单个节点上安装HBase的时间。

Setting up OpenTSDB

搭建OpenTSDB

OpenTSDB comes pre-packaged with all the necessary dependenciesexcept the JDK and Gnuplot.
The runtime dependencies for OpenTSDB are:

OpenTSDB预先打包好了所以必须的信赖包除了JDKGnuplot。运行时需要的OpenTSDB信赖包有:

  • JDK 1.6

  • asynchbase 1.3.0(BSD)

  • Guava 12.0(ASLv2)

  • logback 1.0 (LGPLv2.1 / EPL)

  • Netty 3.4 (ASLv2)

  • SLF4J1.6 (MIT) with Log4J and JCL adapters

  • suasync 1.2 (BSD)

  • ZooKeeper 3.3(ASLv2)

Additional compile-timedependencies:

附加的编译时间的信赖包:

  • GWT 2.4 (ASLv2)

Additional unit test dependencies:

附加的单元测试信赖包:

  • Javassist 3.15 (MPL / LGPL)

  • JUnit 4.10 (CPL)

  • Mockito 1.9 (MIT)

  • PowerMock 1.4(ASLv2)

You need to have Gnuplot(custom open-source license) installed in your PATHversion 4.2 minimum, 4.4 recommended.

你需要去安装Gnuplot (传统的开源牌照) 在你的路径下,最低要求4.2版本,推荐4.4版本。

Before getting started, you need an instance of HBase 0.92 (ASLv2) up and running. If youdon't already have one, you can get started quickly with asingle-node HBase instance.

在开始之前,你需要一个HBase启动和运行的实例。如果你没准备一个,你能使用一个单结点的HBase实例以快速开始。

Almost all the following instructions can be copy-pasteddirectly into a terminal on a Linux or Mac OS X (or otherwisePOSIXy) machine. You will need to edit the placeholders which aretypeset like-this. A Bourne shell (suchas bash or zsh) is assumed. No specialprivileges are required.

以下所有的工具几乎都能直接复制粘贴至一个Linux Mac OS X 机器的中转站。你将需要去编辑像这样排版的占位符。假定的是bourne shell(如bashzsh),不要求什么特殊的权限。

Checkout, compile & start OpenTSDB

确认,编译和开始OpenTSDB

OpenTSDB uses the usual build processthat consists in running ./bootstrap (only once, whenyou first check out the code), followed by ./configureand make. There is a handy shell script namedbuild.sh that will take care of all of that for you,and build OpenTSDB in a new subdirectory namedbuild:

OpenTSDB使用常用的包括在运行中的生成程序。./bootstrap(只要一次,当你第一次确认代码时),然后,./configure make。有一个便利的shell脚本:build.sh,它能为你照顾好所有,以及在一个叫build的新的子目录下生成OpenTSDB.

git clonegit://github.com/OpenTSDB/opentsdb.git

cd opentsdb

./build.sh

From there on, you can use the command-line tool by invoking./build/tsdb or you can run make installto install OpenTSDB on your system. Should you ever change yourmind, there is also make uninstall, so there are nostrings attached.

从那里,你可以使用命令行工具调用./build/tsdb/或你可以运行makeinstall在您的系统上安装OpenTSDB。如果你改变主意,也可以运行makeuninstall卸载之,没有任何附加条件。

If it's the first time you run OpenTSDB with your HBaseinstance, you first need to create the necessary HBase tables:

如果这是你第一次用你的HBase实例去运行OpenTSDB,你首先需要去创建必须的HBase表:

env COMPRESSION=noneHBASE_HOME=path/to/hbase-0.92.X./src/create_table.sh


This will create two tables: tsdb andtsdb-uid. If you're just evaluating OpenTSDB, don'tworry about compression for now. In production / at scale, makesure you use COMPRESSION=lzoand have LZO enabled.

这将创建两张表:tsdbtsdb-uid。如果你仅仅是评估一下OpenTSDB,现在不用担心压缩。在规模生产时,确认你用compression=lzo以及开启lzo

Now start a TSD (Time Series Daemon):

现在开始一个TSD(时间序列守护进程):

tsdtmp=${TMPDIR-'/tmp'}/tsd # Forbest performance, make sure

mkdir -p "$tsdtmp" # yourtemporary directory uses tmpfs

./build/tsdb tsd --port=4242--staticroot=build/staticroot --cachedir="$tsdtmp"

If you're using a real HBase cluster, you will also need to passthe --zkquorum flag to specify the comma-separatedlist of hosts serving your ZooKeeper quorum. The--cachedir can be purged periodically, e.g. by a cronjob.

如果你用一个真正的HBase簇,你也需要通过-zkquorum标志去指定用逗号分隔的服务ZooKeeper的主机列表。--cachedir能被周期性地清除,例如通过一个cron作业。

At this point you can access the TSD's web interface through127.0.0.1:4242 (if it's running on yourlocal machine).

此时你能访问TSD的网络接口通过:127.0.0.1:4242 (假设这跑在你的主机上).

Using OpenTSDB

Create your firstmetrics

创建你的第一个指标

Metrics need to be registered beforeyou can start storing data points for them.

./tsdb mkmetric mysql.bytes_receivedmysql.bytes_sent

This will create 2 metrics: mysql.bytes_receivedand mysql.bytes_sent

在你开始为指标存储数据点之前你需要去注册它们:

./tsdb mkmetric mysql.bytes_receivedmysql.bytes_sent

这里将创建两个指标:mysql.bytes_receivedmysql.bytes_sent

New tags, on the other hand, are automatically registeredwhenever they're used for the first time. Right now OpenTSDB onlyallows you to have up to 224 = 16777216 differentmetrics, 16777216 different tag names and 16777216 different tagvalues. This is because each one of those is assigned a UID on 3bytes. Metric names, tag names and tag values have their own UIDspaces, which is why you can have 16777216 of each kind. The sizeof each space is configurable but there is no knob that exposesthis configuration parameter right now. So bear in mind that usinguser ID or event ID as a tag value will not work right now if youhave a large site.

另一方面,新的标签,每当它们在第一次使用时,都会被自动注册。现在OpenTSDB只能允许你有至多224 =16777216个不同的指标,16777216个不同的标签名称和 16777216个不同的标签值,这是因为第一个指标、标签名称和标签值均会被分配一个3字节的UID。指标名称、标签名称和标签值都有它们各自的UID空间,这就是为什么每一种都有16777216个不同值的原因。每个空间的大小都是可配置的,但是现在没有一个旋钮来调节这个配置参数。所以记住,如果你有一个大型的网站,你的作为标签值的用户ID或事件ID可能不能正常工作。

Start collecting data

开始收集数据

So now that we have our 2 metrics, wecan start sending data do the TSD. Let's write a little shellscript to collect some data off of MySQL and send it to the TSD(note: this is just an example, in practice you can use tcollector's MySQL collector.):

我们现在有了2个指标,那我们能开始发送数据去做TSD了。让我们写一个小shell脚本去从MySQL收集一些数据,然后发送至TSD(注意:这只是一个例子,实践中我们可以用tcollectorMySQL收集器)。

cat>mysql-collector.sh<<\EOF

#!/bin/bash

set -e

while true;do

mysql -uUSER -pPASS --batch -N --execute "SHOWSTATUS LIKE 'bytes%'" \

| awk -F"\t" -vnow=`date +%s` -v host=`hostname` \

'{ print "putmysql." tolower($1) " " now " " $2 " host=" host }'

sleep 15

done | nc -w 30host.name.of.tsd PORT

EOF

chmod +xmysql-collector.sh

nohup./mysql-collector.sh &

Every 15 seconds, the script will collect 2 data points fromMySQL and send them to the TSD. You can use a smaller sleepinterval for more real-time monitoring, but remember you can't havesub-second precision, so you must sleep at least 1 second beforeproducing another data point.

15秒钟,脚本会从MySQL收集2个数据点,然后发送至TSD,你可以用一个更小的休眠时间间隔以获得更多的实时监控,但记住你不能使用低于秒级的精度,所以在产生另一个数据点之前必须间隔至少1秒钟。

What does the script do? If you're not a big fan of shell andawk scripting, it may not be obvious how this works.But it's simple. The set -e command simply instructsbash to exit with an error if any of the commandsfail. This simplifies error handling. The script then enters aninfinite loop. In this loop, we query MySQL to retrieve 2 of itsstatus variables:

脚本在干什么呢?如果你不是一个shellawk脚本的大粉丝,你将不会很明显地看到它们是怎么工作的。但这很简单。Set-e命令只是在任何命令失败的情况下,简单地使bash带着一个错误消息退出,这简化了错误处理。之后脚本进入一个无限循环,在这个循环里,我们查询MySQL去取出其2个状态变量:

$ mysql -u USER-pPASS --execute "SHOW STATUS LIKE 'bytes%'"

+-----------------+-------+

| Variable_name | Value |

+-----------------+-------+

| Bytes_received | 133 |

| Bytes_sent | 190 |

+-----------------+-------+

The --batch -N flags ask the mysqlcommand to remove the human friendly fluff so we don't have tofilter it out ourselves. Then theoutput is piped to awk, which is told to split fields on tabs(-F"\t") because with the --batchflag that's what mysqlwill use. We also create a couple ofvariables, one named now and initialize it to thecurrent timestamp, the other named host and set to thehostname of the local machine. Then, for every line, we printput mysql., followed by the lower-case form of thefirst word, then by a space, then by the current timestamp, then bythe second word (the value), another space, and finallyhost= and the current hostname. Rinse and repeat every15 seconds. The -w 30 parameter given tonc simply sets a timeout on the connection to theTSD.

--batch -N标志请求Mysql命令去删除人为的友好的失误,所以我们不需要亲自去过滤之。然后用管道输出至awkawk拆分标签的字段因为在—batch标志下那正是mysql将使用的。我们也创建一对变量,一个命名为now,初始化为当前的时间戳,另一个命名为host,设置为本地机器的主机名。然后,对每一行,我们打印putmysql.,随后是小写形式的第一个字,然后是一个空格、当前时间戳、第二个字(数值)、另一个空格,最后是host=当前主机名。每15秒钟更新和重复一次。Nc里的-w30参数简单地设置了一个连接TSD超时的时间。

Bear in mind, this is just an example, in practice you can usetcollector's MySQL collector.

If you don't have a MySQL server to monitor, you can try thisinstead to collect basic load metrics from your Linux servers.

记住,这只是一个例子,实践中你能使用tcollectorMySQL收集器。如果你没有一个MySQL服务器去监控,你可以试试这个,而不是从你的Linux服务器收集基本的负荷指标。

cat>loadavg-collector.sh<<\EOF

#!/bin/bash

set -e

while true;do

awk -v now=`date+%s` -v host=`hostname` \

'{ print "putproc.loadavg.1m " now " " $1 " host=" host;

print "putproc.loadavg.5m " now " " $2 " host=" host }' /proc/loadavg

sleep 15

done | nc -w 30host.name.of.tsd PORT

EOF

chmod +xloadavg-collector.sh

nohup./loadavg-collector.sh &

This will store a reading of the1-minute and 5-minute load average of your server in OpenTSDB bysending simple "telnet-style commands" to the TSD:

这将存储一个读数为1分钟和5分钟的你的OpenTSDB服务器的平均负载,通过简单的“远程登录”风格的命令发送到TSD


put proc.loadavg.1m 1288946927 0.36host=foo

put proc.loadavg.5m 1288946927 0.62host=foo

put proc.loadavg.1m 1288946942 0.43host=foo

put proc.loadavg.5m 1288946942 0.62host=foo

Batch imports

批量导入

Let's imagine that youhave a cron job that crunches gigabytes of application logs everyday or every hour to extract profiling data. For instance, youcould be logging the time taken to process a request andyour cron job would compute anaverage for every 30 second window. Maybe you'reparticularly interested in 2 types of requests handled by yourapplication, so you'll compute separate averages for thoserequests, and an another average for every other request type. Soyour cron job may produce an output file that looks like this:

让我们想象一下你有一个cron作业,每天或每几个小时不断地分析千兆字节的应用程序日志中提取分析数据。例如,你可以记录处理请求所花费的时间,你的cron作业计算平均每30第二个窗口。也许你特别感兴趣的是2种处理你的应用程序的请求类型,所以你单独计算这些请求的平均值,再计算所有其他请求类型的每一种类型的平均值。所以你的cron作业可能产生一个输出文件,它看起来像这样:

1288900000 42foo

1288900000 51bar

1288900000 69other

1288900030 40foo

1288900030 59bar

1288900030 80other

The first column is a timestamp, thesecond the average latency for that 30 second window, and the thirdthe type of request we're talking about. If you run your cron jobon a day worth of logs, you'll end up with 8640 such lines. Inorder to import those into OpenTSDB, you need to adjust your cronjob slightly to produce its output in the following format:

第一列是一个时间戳,第二列是30个第二窗口的平均延迟,第三列正是我们在讨论的请求类型。如果你在值得记录日志的一天跑你的cron作业,你最终达到8640这样的行。为了把它们导入到OpenTSDB,你需要去稍微调整你的cron作业以产生以下这种格式的输出:

myservice.latency.avg 1288900000 42 reqtype=foo

myservice.latency.avg 1288900000 51 reqtype=bar

myservice.latency.avg 1288900000 69 reqtype=other

myservice.latency.avg 1288900030 40 reqtype=foo

myservice.latency.avg 1288900030 59 reqtype=bar

myservice.latency.avg 1288900030 80 reqtype=other

Notice we're simply associating each data point with the name ofa metric (myservice.latency.avg) and naming the tagthat represents the request type. If each server has its own logsand you process them separately, you may want to add another tag toeach line like the host=foo tag we saw in the previoussection. This way you'll be able to plot the latency of each serverindividually, in addition to your average latency across the boardand/or per request type.

请注意,我们只是简单地用指标的名称(myservice.latency.avg)关联每个数据点,命名代表请求类型的标签。如果每台服务器有它自己的日志并且你能分别处理它们,你可能想去添加另一条标签到每一行,就如我们在前部分看到的host=foo标签。以这样的方式你可以去单独地绘制每台服务器的延迟,以及你全部和(或)每一个请求类型的平均延迟。

In order to import a data file in the format above (metrictimestamp value tags) simply run the following command:

为了在上述格式(metric timestamp valuetags) 中导入一个数据文件,只需要跑下面的命令:

./tsdb importyour-file

If your data file is large, consider gzip'ing itfirst. This can be as simple as piping the output of your cron jobto gzip -9 >output.gz instead ofwriting directly to a file. The import command is ableto read gzip'ed files and it greatly helps performancefor large batch imports.

如果你的数据文件很大,先考虑压缩它,这简单得可以这么做:让你的cron作业添加一条管道命令gzip-9> output.gz,而不是直接写入到一个文件中。导入命令能够读压缩后的文件,这极大地帮助了大批量导入的工作。

Self monitoring

自我监控

Each TSD exports some stats aboutitself through the simple stats command. You cancollect those stats and feed them back to the TSD every fewseconds. First, create the necessary metrics:

每一个TSD通过简单的stats命令输出一些它自身的状态,你能收集这些状态,然后每隔几秒钟反馈给TSD,首先,创建必要的指标:

echo stats | nc-w 1 localhost 4242 \

| awk '{ print$1 }' | sort -u \

| xargs ./tsdbmkmetric

This requests the stats from the TSD (assuming it's running onthe local host and listening to port 4242), extract the names ofthe metrics from the stats and assigns them UIDs.

这从TSD那边请求状态信息(假设它跑在本地主机上,监听4242端口),从状态信息中提取指标名称并给它们分配UID

Then you can use this simple script to collect stats and storethem in OpenTSDB:

然后你能使用简单的脚本去收集状态信息然后保存在OpenTSDB

#!/bin/bash

INTERVAL=15

while :; do

echo stats ||exit

sleep$INTERVAL

done | nc -w 30localhost $1 \

| sed 's/^/put/' \

| nc -w 30localhost $1

This way you will collect and storestats from the TSD every 15 seconds.

这样子每隔15秒钟你将从TSD收集状态信息然后保存。