Chukwa安装配置

来源：互联网发布：兄弟连it教育靠谱吗编辑：程序博客网时间：2024/05/02 02:07

2010-09-03

前提：已经安装好Hadoop

环境交待：

1.cloudera-training-0.3.4 VMWARE IMAGE ：什么都已经给你准备OK了！

2.下载mysql5.1?

正式开始：

chukwa的搭建是分布式的，所以我们配置的时候要分两部分进行，一部分在客户端上配置，一部分在服务端上。

在这里，我们一共分配三台虚拟机：两台作Agent，一台作collector。

1.安装Chukwa

1.1 First Steps

Obtain a copy of Chukwa. You can find the latest release on the Chukwa release page. 我这里的版本是0.4.0，需要注意一下的是，这个release中的链接已经陈旧，chukwa在人人网镜像上的地址是放在/hadoop目录下的，自行下载即可。
Un-tar the release, via tar xzf. ef:
Make sure a copy of Chukwa is available on each node being monitored, and on each node that will run a collector.
We refer to the directory containing Chukwa as CHUKWA_HOME. It may be helpful to set CHUKWA_HOME explicitly in your environment, but Chukwa does not require that you do so.

【设置方法】：

$ export CHUKWA_HOME="/home/training/chukwa/chukwa-0.4.0"ORvi /etc/profile 然后加上:export CHUKWA_HOME=/home/training/chukwa/chukwa-0.4.0查看是否设置成功：$ echo $CHUKWA_HOME

1.2 General Configuration

Agents and collectors are configured differently, but part of the process is common to both.

Make sure that JAVA_HOME is set correctly and points to a Java 1.6 JRE. It's generally best to set this in conf/chukwa-env.sh.

可以通过echo $JAVA_HOME查看java设置，须是JAVA6。
然后通过命令sudo vi $CHUKWA_HOME/conf/chukwa-env.sh找到"export JAVA_HOME=..."将其改写为刚刚查看到的真实地址。

In conf/chukwa-env.sh, set CHUKWA_LOG_DIR and CHUKWA_PID_DIR to the directories where Chukwa should store its console logs and pid files. The pid directory must not be shared between different Chukwa instances: it should be local, not NFS-mounted.

还是在conf/chukwa-env.sh文件中，设置变量CHUKWA_LOG_DIR 和CHUKWA_PID_DIR 。它们分别用来保存控制台日志和线程文件(不知道是什么东东，不理解，Orz)，而且需要注意的是，线程文件不能放在网络文件系统中，只能放在每个Chukwa实例的本地。

它们的默认值分别是：CHUKWA_LOG_DIR = /tmp/chukwa/log 和 CHUKWA_PID_DIR = /tmp/chukwa/pidDir ，我就是直接在相应的目录下为它们新建了相应的目录。

Optionally, set CHUKWA_IDENT_STRING. This string is used to name Chukwa's own console log files.

2. 配置Agents on source nodes

Agents are the Chukwa processes that actually produce data. This section describes how to configure and run them. More details are available in the Agent configuration guide.

2.1 Configuration

This section describes how to set up the agent process on the source nodes.

The one mandatory configuration step is to set up $CHUKWA_HOME/conf/collectors. This file should contain a list of hosts that will run Chukwa collectors. Agents will pick a random collector from this list to try sending to, and will fail-over to another listed collector on error. The file should look something like:

http://<collector1HostName>:<collector1Port>/http://<collector2HostName>:<collector2Port>/http://<collector3HostName>:<collector3Port>/

PS:chukwa collector默认在8080端口监听。

agent默认在9093端口监听。

如果conf目录下面没有agents文件，则调用stop-agents.sh或者stop-all.sh时会出现“No such file or directory”的提示，所以我们需要自己建这个文件并写入agent的IP及端口（格式和collector一样）。

Edit the CHUKWA_HOME/conf/initial_adaptors configuration file. This is where you tell Chukwa what log files to monitor. See the adaptor configuration guide for a list of available adaptors.

initial_adaptors 告诉chukwa哪些日志文件需要监控。

There are a number of optional settings in $CHUKWA_HOME/conf/chukwa-agent-conf.xml:

The most important of these is the cluster/group name that identifies the monitored source nodes. This value is stored in each Chunk of collected data; you can therefore use it to distinguish data coming from different groups of machines.

<property> <name>chukwaAgent.tags</name> <value>cluster="demo"</value> <description>The cluster's name for this agent</description> </property>

Another important option is chukwaAgent.checkpoint.dir. This is the directory Chukwa will use for its periodic checkpoints of running adaptors. It must not be a shared directory; use a local, not NFS-mount, directory.
Setting the option chukwaAgent.control.remote will disallow remote connections to the agent control socket.

2.2 Starting, stopping, and monitoring

To run an agent process on a single node, use bin/chukwa agent.

在0.4.0版本中，要在本地启动agent用./chukwa agent。

其它选项如下：

Usage: chukwa [--config confdir] COMMAND"where COMMAND is one of: agent run a Chukwa Agent archive run the Archive Manager collector run a Chukwa Collector demux run the Demux Manager dp run the Post Demux data processors hicc run a HICC Webserver droll run a daily rolling job (deprecated) hroll run a hourly rolling job (deprecated) # Daily rolling and hourly rolling will be deprecated by retention processor retention run the Retention Processor version print the version Utilities: backfill run a back fill data loader utility dumpArchive view an archive file dumpRecord view a record file tail start tailing a file Most command print help when invoked w/o parameters.

Typically, agents run as daemons. The script bin/start-agents.sh will ssh to each machine listed in conf/agents and start an agent, running in the background. The script bin/stop-agents.sh does the reverse.

You can, of course, use any other daemon-management system you like. For instance, tools/init.d includes init scripts for running Chukwa agents.

To check if an agent is working properly, you can telnet to the control port (9093 by default) and hit "enter". You will get a status message if the agent is running normally.

我收到的消息如下：
training-vm: Chukwa Agent running, version 0.4.0-dev, with 0 adaptors

至于bin目录下各个脚本的解释如下(来自官方市文档，貌似是老版本的，不过可以参考一下)：

start-all.sh - runs start-collectors.sh, start-agents.sh, start-probes.sh, start-data-processors.shstart-collectors.sh - start the chukwa collector daemon (jettyCollector.sh) on hosts listed in conf/collectorsstop-collectors.sh - stop the chukwa collector daemon (jettyCollector.sh) on hosts listed in conf/collectorsjettyCollector.sh - start the chukwa collector daemon on the current hoststart-agents.sh - start chukwa agent daemon (agent.sh) on all hosts listed in conf/agentsstop-agents.sh - stop chukwa agent daemon (agent.sh) on all hosts listed in conf/agentsagent.sh - start the chukwa agent on the current hoststart-probes.sh - runs, in this order, systemDataLoader.sh, torqueDataLoader.sh, nodeActivityDataLoader.shslaves.sh <command command_args ...> - run arbitrary commands on all hosts in conf/slavesjettycollector.sh - start a jetty based version of the Chukwa collectoragent.sh - start the chukwa agent on the local machine

2.3 Configuring Hadoop for monitoring

One of the key goals for Chukwa is to collect logs from Hadoop clusters. This section describes how to configure Hadoop to send its logs to Chukwa. Note that these directions require Hadoop 0.20.0+. Earlier versions of Hadoop do not have the hooks that Chukwa requires in order to grab MapReduce job logs.

The Hadoop configuration files are located in HADOOP_HOME/conf. To setup Chukwa to collect logs from Hadoop, you need to change some of the Hadoop configuration files.

Copy $CHUKWA_HOME/conf/hadoop-log4j.properties file to $HADOOP_HOME/conf/log4j.properties
Copy $CHUKWA_HOME/conf/hadoop-metrics.properties file to $HADOOP_HOME/conf/hadoop-metrics.properties
Edit $HADOOP_HOME/conf/hadoop-metrics.properties file and change @CHUKWA_LOG_DIR@ to your actual CHUKWA log dirctory (ie, CHUKWA_HOME/var/log)

3. 配置Collectors

This section describes how to set up the Chukwa collectors. For more details, see the collector configuration guide.

3.1 Configuration

First, edit $CHUKWA_HOME/conf/chukwa-env.sh In addition to the general directions given above, you should set HADOOP_HOME. This should be the Hadoop deployment Chukwa will use to store collected data. You will get a version mismatch error if this is configured incorrectly.

Next, edit $CHUKWA_HOME/conf/chukwa-collector-conf.xml. The one mandatory configuration parameter is writer.hdfs.filesystem. This should be set to the HDFS root URL on which Chukwa will store data. Various optional configuration options are described in the collector configuration guide and in the collector configuration file itself.

3.2 Starting, stopping, and monitoring

To run a collector process on a single node, use bin/chukwa collector.

Typically, collectors run as daemons. The script bin/start-collectors.sh will ssh to each collector listed in conf/collectors and start a collector, running in the background. The script bin/stop-collectors.sh does the reverse.

You can, of course, use any other daemon-management system you like. For instance, tools/init.d includes init scripts for running Chukwa collectors.

To check if a collector is working properly, you can simply access http://collectorhost:collectorport/chukwa?ping=true with a web browser. If the collector is running, you should see a status page with a handful of statistics.

i.e : 10.224.172.100:8080/chukwa?ping=true

return:

Date:1283759102858 Now:1283759107788 numberHTTPConnection in time window:0 numberchunks in time window:0 lifetimechunks:0

4.Demux and HICC

Start the Chukwa Processes

The Chukwa startup scripts are located in the CHUKWA_HOME/tools/init.d directory.

Start the Chukwa data processors script (execute this command only on the data processor node):

CHUKWA_HOME/tools/init.d/chukwa-data-processors start

0.4.0已经把该命令脚本放到bin/下面了，叫start/stop-data-processors.sh，反正我没找到/tools/init.d这个目录。

Create down sampling daily cron job:

CHUKWA_HOME/bin/downSampling.sh --config <path to chukwa conf> -n add

Set Up the Database

Set up and configure the MySQL database.

Install MySQL

Download MySQL 5.1 from the MySQL site.

tar fxvz mysql-*.tar.gz -C $CHUKWA_HOME/optcd $CHUKWA_HOME/opt/mysql-*

Configure and then copy the my.cnf file to the CHUKWA_HOME/opt/mysql-* directory:

./scripts/mysql_install_db./bin/mysqld_safe&./bin/mysqladmin -u root create <clustername>./bin/mysql -u root <clustername> < $CHUKWA_HOME/conf/database_create_table

Edit the CHUKWA_HOME/conf/jdbc.conf configuration file.

Set the clustername to the MYSQL root URL:

<clustername>=jdbc:mysql://localhost:3306/<clustername>?user=root

Download the MySQL Connector/J 5.1 from the MySQL site, and place the jar file in $CHUKWA_HOME/lib.

Set Up MySQL for Replication

Start the MySQL shell:

mysql -u root -pEnter password:【此处密码应该为空】

From the MySQL shell, enter these commands (replace <username> and <password> with actual values):

GRANT REPLICATION SLAVE ON *.* TO '<username>'@'%' IDENTIFIED BY '<password>';FLUSH PRIVILEGES;

Set Up HICC

The Hadoop Infrastructure Care Center (HICC) is the Chukwa web user interface. To set up HICC, do the following:

Download apache-tomcat 6.0.18+ from Apache Tomcat and decompress the tarball to CHUKWA_HOME/opt.
Copy CHUKWA_HOME/hicc.war to apache-tomcat-6.0.18/webapps.
Start up HICC by running:

$CHUKWA_HOME/bin/hicc.sh start

Point your favorite browser to: http://<server>:8080/hicc

【出现的问题】

NoClassDefFoundError: org/apache/hadoop/metrics/Updater

training@training-vm:~/chukwa/chukwa-0.4.0/bin$ ./chukwa agenttraining@training-vm:~/chukwa/chukwa-0.4.0/bin$ Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/metrics/Updater at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent.<clinit>(ChukwaAgent.java:63)Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.metrics.Updater at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 13 moreCould not find the main class: org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent. Program will exit.

如果出现这个问题，那多半是环境变量或者参数没有配置好，重新配置一下即可解决问题。

Troubleshooting Tips

UNIX Processes For Chukwa Agents

The Chukwa agent process name is identified by:

org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent

Command line to use to search for the process name:

ps ax | grep org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent

UNIX Processes For Chukwa Collectors

Chukwa Collector name is identified by:

org.apache.hadoop.chukwa.datacollection.collector.CollectorStub

UNIX Processes For Chukwa Data Processes

Chukwa Data Processors are identified by:

org.apache.hadoop.chukwa.extraction.demux.Demux
org.apache.hadoop.chukwa.extraction.database.DatabaseLoader
org.apache.hadoop.chukwa.extraction.archive.ChukwaArchiveBuilder

The processes are scheduled execution, therefore they are not always visible from the process list.

Checks for MySQL Replication

At slave server, MySQL prompt, run:

show slave status/G

Make sure both Slave_IO_Running and Slave_SQL_Running are both "Yes".

Things to check if MySQL replication fails:

Make sure grant permission has been enabled on master MySQL server.
Check disk space availability.
Check Error status in slave status.

To reset MySQL replication, run these commands on MySQL:

STOP SLAVE;CHANGE MASTER TO  MASTER_HOST='hostname',  MASTER_USER='username',  MASTER_PASSWORD='password',  MASTER_PORT=3306,  MASTER_LOG_FILE='master2-bin.001',  MASTER_LOG_POS=4,  MASTER_CONNECT_RETRY=10;START SLAVE;

Checks For Disk Full

If anything is wrong, use /etc/init.d/chukwa-agent and CHUKWA_HOME/tools/init.d/chukwa-system-metrics stop to shutdown Chukwa. Look at agent.log and collector.log file to determine the problems.

The most common problem is the log files are growing unbounded. Set up a cron job to remove old log files:

 0 12 * * * CHUKWA_HOME/tools/expiration.sh 10 !CHUKWA_HOME/var/log nowait

This will set up the log file expiration for CHUKWA_HOME/var/log for log files older than 10 days.

Emergency Shutdown Procedure

If the system is not functioning properly and you cannot find an answer in the Administration Guide, execute the kill command. The current state of the java process will be written to the log files. You can analyze these files to determine the cause of the problem.

kill -3 <pid>