hadoop相关启动脚本分析有助于troubleshoot，顺便学习shell

来源：互联网发布：市场投放数据分析编辑：程序博客网时间：2024/05/21 21:39

刚开始配置hadoop难免出错，学习一下启动的脚本对分析错误还是很有帮助的。而且可以顺便学习shell。

我自己对shell命令还算比较熟，shell脚本基本看得懂，不过没具体去深究。所以下面提到的一些shell要点高手莫笑。

Hadoop 0.20.203

hadoop主要命令集散地hadoop-config.sh hadoop-daemon.shRuns a Hadoop command as a daemon.hadoop-daemons.shRun a Hadoop command on all slave hosts.jsvcapplication to launch java daemonrccThe Hadoop record compiler 不懂slaves.shRun a shell command on all slave hosts.start-all.shstart-all = start-dfs + start+mapred # Start all hadoop daemons. Run this on master node.start-balancer.sh start-dfs.sh start-jobhistoryserver.sh start-mapred.sh stop-all.sh stop-balancer.sh stop-dfs.sh stop-jobhistoryserver.sh stop-mapred.sh

至于bash的参考资料，首选这里： http://www.gnu.org/software/bash/manual/bashref.html

我们从最常用的命令开始

start-all.sh

bin=`dirname "$0"`bin=`cd "$bin"; pwd`. "$bin"/hadoop-config.sh# start dfs daemons"$bin"/start-dfs.sh --config $HADOOP_CONF_DIR# start mapred daemons"$bin"/start-mapred.sh --config $HADOOP_CONF_DIR

第一部分是取得bin目录，方便调用其他sh。因为你无法知道是从什么目录运行脚本的，所以无法使用相对路径。

注意hadoop-config.sh 前面有一个点。

参看资料

. (a period)           . filename [arguments]    Read and execute commands from the filename argument in the current shell context.

我们获得一个很好的信息，all = dfs + mapred

之前都把namenode和jobtracker放到一起，datanode和tasktracker放到一起。其实dfs和mapred这两部分还是现对比较独立的。最新的hadoop就把这两部分分开了。

start-dfs.sh / start-mapred.sh

注意daemon一个后面有s，一个没。带s的是启动slaves的。刚开始没看清还蛋疼了许久。

"$bin"/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode $nameStartOpt"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR --hosts masters start secondarynamenode

这里能解开我之前的一个疑惑：master和slaves的怎样配置master和slaves文件。

对于master：master文件只用来启动secondarynamenode，slaves只用来启动slaves

对于slaves：bin下的脚本没用到这两个文件但其他部分可能用到

hadoop-daemon.sh

log rotate，看得出日志的流动方向么？新 1->2->3->4->5 旧

hadoop_rotate_log (){    log=$1;    num=5;    if [ -n "$2" ]; then        num=$2    fi    if [ -f "$log" ]; then # rotate logs        while [ $num -gt 1 ]; do            prev=`expr $num - 1`            [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"            num=$prev        done        mv "$log" "$log.$num";    fi}

# Determine if we're starting a secure datanode, and if so, redefine appropriate variablesif [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then

至于$EUID是系统变量。在hadoop还用到IFS变量。

 nohup nice -n $HADOOP_NICENESS "$HADOOP_HOME"/bin/hadoop --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &    echo $! > $pid

nohup 是个好东西，ssh登录上主机，如果连接断了，当前执行的命令也会挂掉（sighup），有兴趣自己再去google。

nice用来调优先级，没用过。

最终还是通过hadoop来执行程序。

hadoop-daemons.sh

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

cd "$HADOOP_HOME" \; 似乎没啥用。。。暂时没发现有关联pwd的配置

实际调用了slaves.sh，"$bin/hadoop-daemon.sh" 也传了过去。

这里能解开我之前的一个疑惑：master怎么知道slaves的的hadoop安装在哪。

"$bin/hadoop-daemon.sh" 这坑爹的路径是master的，也就是说，各个机器的hadoop目录要一致。hadoop-daemon.sh 里面可以rsync from $HADOOP_MASTER

slaves.sh

if [ "$HOSTLIST" = "" ]; then  if [ "$HADOOP_SLAVES" = "" ]; then    export HOSTLIST="${HADOOP_CONF_DIR}/slaves"  else    export HOSTLIST="${HADOOP_SLAVES}"  fififor slave in `cat "$HOSTLIST"|sed  "s/#.*$//;/^$/d"`; do ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \   2>&1 | sed "s/^/$slave: /" & if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then   sleep $HADOOP_SLAVE_SLEEP fidone

一般我们都用slaves文件来配置slaves机器。

sed  "s/#.*$//;/^$/d"`;

根据正则表达式的经验，看来是去掉#注释行和空行。

$"${@// /\\ }"

$@ = $1 $2 ...

${@// /\\ }为带有空格的参数加上空格转义。

$"..." 本土化？不知有啥意义，不都是英文么？

资料：

${parameter/pattern/string}    The pattern is expanded to produce a pattern just as in filename expansion. Parameter is expanded and the longest match ofpattern against its value is replaced with string. If pattern begins with ‘/’, all matches of pattern are replaced with string. Normally only the first match is replaced. If pattern begins with ‘#’, it must match at the beginning of the expanded value ofparameter. If pattern begins with ‘%’, it must match at the end of the expanded value of parameter. If string is null, matches ofpattern are deleted and the / following pattern may be omitted. If parameter is ‘@’ or ‘*’, the substitution operation is applied to each positional parameter in turn, and the expansion is the resultant list. If parameter is an array variablesubscripted with ‘@’ or ‘*’, the substitution operation is applied to each member of the array in turn, and the expansion is the resultant list. Bash supports the $"..." quoting syntax to do locale-specific translation of the characters between the double quotes.

这里能解开我之前的一个疑惑：master怎么去启动slaves的。

原来是通过ssh命令

hadoop-config.sh

# resolve links - $0 may be a softlinkthis="$0"while [ -h "$this" ]; do  ls=`ls -ld "$this"`  link=`expr "$ls" : '.*-> \(.*\)$'`  if expr "$link" : '.*/.*' > /dev/null; then    this="$link"  else    this=`dirname "$this"`/"$link"  fidone

取个绝对地址也搞得这么蛋疼。取出链接 -> 后面的部分。如果带有/就是绝对路径链接？不对吧？？

不然就是相对链接，需要拼装上当前目录。

果然是bug。https://issues.apache.org/jira/browse/HADOOP-7089 我用的是0.20.203版本。可以看看fix好的版本

# Resolve links ($0 may be a softlink) and convert a relative path# to an absolute path.  NB: The -P option requires bash built-ins# or POSIX:2001 compliant cd and pwd.this="${BASH_SOURCE-$0}"common_bin=$(cd -P -- "$(dirname -- "$this")" && pwd -P)script="$(basename -- "$this")"this="$common_bin/$script"

奇怪，在gnu的manual没到${BASH_SOURCE-$0} 这种 Shell Parameter Expansion，这里才有 http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06_02 坑爹啊

-- 的作用是表示参数接收已结束 http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap01.html#tag_01_11

也就是说，比如

grep -- -v file

-v是内容，不是grep的参数了。

man pwd ，找到

-P, --physical
avoid all symlinks

hadoop

exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS -classpath "$CLASSPATH" $CLASS "$@"

最终启动java class

hadoop相关启动脚本分析 有助于troubleshoot，顺便学习shell