HDP 2.5集成Hue

来源:互联网 发布:java poi 跨行 编辑:程序博客网 时间:2024/05/20 18:44

简介

Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job、Hive查询等等。

如果使用的是CDH平台,那么默认就已集成Hue服务了,但是HDP平台却是没有的。每次跑点小任务,查询什么的总是登陆服务器,那样极不方便,所以就干脆装上个Hue啦

环境

  • CentOS 7
  • HDP 2.5
  • Hue 3.12.0

安装Hue

从官网下载3.12.0版本:
http://gethue.com/hue-3-12-the-improved-editor-for-sql-developers-and-analysts-is-out/
不过国内好像挺难下的,正好我这里也提前下好了,已经上传到百度云:
https://pan.baidu.com/s/1cCifuu

将其放到服务器指定目录上,进行解压。
root@dell:/data/hue# tar -zxvf hue-3.12.0.tgz

安装依赖:

root@dell:~# yum install ant gcc gcc-c++ mysql-devel openssl-devel cyrus-sasl-devel cyrus-sasl cyrus-sasl-gssapi sqlite-devel openldap-devel libacl-devel libxml2-devel libxslt-devel mvn krb5-devel python-devel python-simplejson python-setuptools

编译安装hue:

root@dell:/data/hue# PREFIX=/usr/share make install
PREFIX指定安装路径,这个最好放在空间较大的分区。

安装的过程比较简单,主要是注意要提前配置好Maven,在编译的过程中需要很多的Jar包,会通过Maven进行下载。

配置Hue

编辑Hue安装路径中的desktop/conf/hue.ini文件

配置数据库

默认hue使用的是sqlite数据库,可以改为mysql
打开hue.ini文件,找到以下内容:

[[database]]     # Database engine is typically one of:     # postgresql_psycopg2, mysql, sqlite3 or oracle.     #     # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name     # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.     # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".     # Note for MariaDB use the 'mysql' engine.     ## engine=sqlite3  // 改为mysql     ## host=  // mysql服务器主机名或者ip     ## port=  // 3306     ## user=  // 数据库连接用户名(推荐新建一个hue用户)     ## password= // 连接用户密码     # Execute this script to produce the database password. This will be used when 'password' is not set.     ## password_script=/path/script     ## name=desktop/desktop.db // 这里要改为数据库的名字,如:hue     ## options={}     # Database schema, to be used only when public schema is revoked in postgres      ## schema=

将以上内容修改后,如下:

[[database]]     # Database engine is typically one of:     # postgresql_psycopg2, mysql, sqlite3 or oracle.     #     # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name     # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.     # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".     # Note for MariaDB use the 'mysql' engine.     engine=mysql // 改为mysql     host=192.168.1.2  // mysql服务器主机名或者ip     port=3306  // 3306     user=hue  // 数据库连接用户名(推荐新建一个hue用户)     password=lu123456 // 连接用户密码     # Execute this script to produce the database password. This will be used when 'password' is not set.     ## password_script=/path/script     name=hue // 这里要改为数据库的名字,如:hue     ## options={}     # Database schema, to be used only when public schema is revoked in postgres      ## schema=

配置完数据库后,还需要同步和迁移数据到我们指定的mysql数据库中。

root@dell:/data/hue/hue-3.12.0# build/env/bin/hue syncdbroot@dell:/data/hue/hue-3.12.0# build/env/bin/hue migrate

同步的过程中可能会提示创建admin用户,后面启动hue的登陆需要用到。

Hadoop配置

编辑hdfs-site.xml,添加以下属性:

<property>   <name>dfs.webhdfs.enable</name>   <value>true</value></property>

HDP默认应该是开启的,HDFS——>Configs——>Advanced——>General——>WebHDFS enabled

添加hue角色代理:
HDFS——>Configs——>Advanced——>Custom core-site
添加属性:
hadoop.proxyuser.hue.groups=*
hadoop.proxyuser.hue.hosts=*
如果不添加这个的话,是无法通过hue提交job的。

Hue 配置,编辑hue.ini,找到如下内容:

[hadoop]    # Configuration for HDFS NameNode    # ------------------------------------------------------------------------    [[hdfs_clusters]]      # HA support by using HttpFs      [[[default]]]        # Enter the filesystem uri        fs_defaultfs=hdfs://e5:8020  // 配置NameNode节点        # fs_defaultfs=hdfs://localhost:8020        # NameNode logical name.        ## logical_name=        # Use WebHdfs/HttpFs as the communication mechanism.        # Domain should be the NameNode or HttpFs host.        # Default port is 14000 for HttpFs.        webhdfs_url=http://e5:50070/webhdfs/v1  // 配置webhdfs地址        # Change this if your HDFS cluster is Kerberos-secured        security_enabled=false        # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs        # have to be verified against certificate authority       ## ssl_cert_ca_verify=True       # Directory of the Hadoop configuration      ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf'       hadoop_config_dir=/etc/hadoop/conf  // hadoop配置路径  [[yarn_clusters]]    [[[default]]]      # Enter the host on which you are running the ResourceManager      resourcemanager_host=e5      // YARN——>Configs——>Advanced——>Advanced yarn-site      // 查看yarn.resourcemanager.address属性      # The port where the ResourceManager IPC listens on      resourcemanager_port=8050      # Whether to submit jobs to this cluster      submit_to=True      # Resource Manager logical name (required for HA)      ## logical_name=      # Change this if your YARN cluster is Kerberos-secured      security_enabled=false      # URL of the ResourceManager API      resourcemanager_api_url=http://e5:8088      # URL of the ProxyServer API      proxy_api_url=http://e5:8088      # URL of the HistoryServer API      ## history_server_api_url=http://localhost:19888      # URL of the Spark History Server      ## spark_history_server_url=http://localhost:18088      # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs      # have to be verified against certificate authority      ## ssl_cert_ca_verify=True    # HA support by specifying multiple clusters.    # Redefine different properties there.    # e.g.    # [[[ha]]]  // HA集群高可用配置      # Resource Manager logical name (required for HA)      ## logical_name=my-rm-name      # Un-comment to enable      ## submit_to=True      # URL of the ResourceManager API      ## resourcemanager_api_url=http://localhost:8088      # ...  # Configuration for MapReduce (MR1)  # ------------------------------------------------------------------------  [[mapred_clusters]]    [[[default]]]      # Enter the host on which you are running the Hadoop JobTracker      jobtracker_host=e5      # The port where the JobTracker IPC listens on      jobtracker_port=8050      # JobTracker logical name for HA      ## logical_name=      # Thrift plug-in port for the JobTracker      ## thrift_port=9290      # Whether to submit jobs to this cluster      submit_to=False      # Change this if your MapReduce cluster is Kerberos-secured      security_enabled=false

这里只是贴一下hadoop的配置,其他的服务如:Oozie、Sqoop等等也很简单如果需要用到的话还是需要进行相应配置的。

启动hue服务

启动hue服务,执行以下命令以开发调试模式启动:

# 0.0.0.0表示允许任何主机连接,如果不加这个的话,默认只运行127.0.0.1本机访问root@dell:/data/hue/hue-3.12.0# build/env/bin/hue runserver 0.0.0.0:8888

初次启动成功后,在浏览器中打开:http://server-ip:8888就会跳转到hue的登陆界面,如果没有设置初始账号密码的话,默认就是admin/admin,如果通过上面的同步数据库,创建了admin用户的话,就使用那个用户名和密码登陆即可。
hue login

登陆也成功了,我们就可以将hue添加进系统的服务中,方面通过systemd来控制,这样启动、关闭、开机自启什么的也都很容易了。

我这里是基于CDH提供的hue脚本,然后稍微改动一下就可以拿过来用了。

/etc/init.d/hue

#!/bin/bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##     http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.##       /etc/rc.d/init.d/hue##       Hue web server## chkconfig: 2345 90 10# description: Hue web server# pidfile: /var/run/hue/supervisor.pid. /etc/init.d/functionsLOCKFILE=/var/lock/subsys/hue#DAEMON=/usr/lib/hue/build/env/bin/supervisor # Introduce the server's location hereDAEMON=/data/hue/hue-3.12.0/build/env/bin/supervisor # Introduce the server's location hereLOGDIR=/var/log/hue  # Log directory to usePIDFILE=/var/run/hue/supervisor.pidUSER=hue#EXEC=/usr/lib/hue/build/env/bin/pythonEXEC=/data/hue/hue-3.12.0/build/env/bin/pythonDAEMON_OPTS="-p $PIDFILE -l $LOGDIR -d"HUE_SHUTDOWN_TIMEOUT=15hue_start() {        export PYTHON_EGG_CACHE='/tmp/.hue-python-eggs'        #RE_REGISTER=/usr/lib/hue/.re_register        RE_REGISTER=/data/hue/hue-3.12.0/app.reg        if [ -e $RE_REGISTER ]; then            # Do app_reg on upgraded apps. This is a workaround for DISTRO-11.            # We can probably take it out after another release.            DO="/sbin/runuser -s /bin/bash $USER -c"            #APP_REG="/usr/lib/hue/tools/app_reg/app_reg.py"            APP_REG="/data/hue/hue-3.12.0/tools/app_reg/app_reg.py"            # Upgraded apps write their paths in the re_rgister file.            RE_REG_LOG=/var/log/hue/hue_re_register.log            # Make cwd somewhere that $USER can chdir into            pushd / > /dev/null            $DO "DESKTOP_LOG_DIR=$LOGDIR $EXEC $APP_REG --install $(cat $RE_REGISTER | xargs echo -n)  >> $RE_REG_LOG 2>&1"            ok=$?            popd > /dev/null            if [ $ok -eq 0 ] ; then                rm -f $RE_REGISTER            else                echo "Failed to register some apps: Details in $RE_REG_LOG"            fi        fi        echo -n "Starting hue: "        for dir in $(dirname $PIDFILE) $LOGDIR ${PYTHON_EGG_CACHE}        do            mkdir -p $dir            chown -R $USER $dir        done        # Check if already running        if [ -e $PIDFILE ] && checkpid $(cat $PIDFILE) ; then            echo "already running"            return 0        fi        # the supervisor itself will setuid down to $USER        su -s /bin/bash $USER -c "$DAEMON $DAEMON_OPTS"        ret=$?        base=$(basename $0)        if [ $ret -eq 0 ]; then            sleep 5            test -e $PIDFILE && checkpid $(cat $PIDFILE)            ret=$?        fi        if [ $ret -eq 0 ]; then            touch $LOCKFILE            success $"$base startup"        else            failure $"$base startup"        fi        echo        return $ret}hue_stop() {        if [ ! -e $PIDFILE ]; then            success "Hue is not running"            return 0        fi        echo -n "Shutting down hue: "        HUE_PID=`cat $PIDFILE 2>/dev/null`        if [ -n "$HUE_PID" ]; then          kill -TERM ${HUE_PID} &>/dev/null          for i in `seq 1 ${HUE_SHUTDOWN_TIMEOUT}` ; do            kill -0 ${HUE_PID} &>/dev/null || break            sleep 1          done          kill -KILL ${HUE_PID} &>/dev/null        fi        echo        rm -f $LOCKFILE $PIDFILE        return 0}hue_restart() {  hue_stop  hue_start}case "$1" in    start)        hue_start        ;;    stop)        hue_stop        ;;    status)        status -p $PIDFILE supervisor        ;;    restart|reload)        hue_restart        ;;    condrestart)        [ -f $LOCKFILE ] && restart || :        ;;    *)        echo "Usage: hue {start|stop|status|reload|restart|condrestart"        exit 1        ;;esacexit $?

主要改动以下几个变量的值即可:

DAEMONEXECRE_REGISTERAPP_REG

注意需要将上面的脚本放到/etc/init.d/目录下面。
之后我们就可以通过systemctl命令进行服务的控制,如:

# 启动服务# systemctl start hue# 停止服务# systemctl stop hue# 开机自动启动服务# systemctl enable hue

总结

hue的安装配置还是比较容易的,但是初次接触难免总是会出现各种各样的问题,如:python版本、数据库配置、大数据组件服务配置等等问题。遇到问题也不用慌,先看错误提示,然后再看错误日志,最后再百度google搜索,或者查阅官方文档,办法总是会有的。最后建议如果公司服务器配置比较好的话,还是上CDH吧,服务更健全,还有商业保证就算遇到了难以解决的问题,也可以提给Cloudera。。