HDP 2.5集成Hue
来源:互联网 发布:java poi 跨行 编辑:程序博客网 时间:2024/05/20 18:44
简介
Hue是一个开源的Apache Hadoop UI系统,最早是由Cloudera Desktop演化而来,由Cloudera贡献给开源社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job、Hive查询等等。
如果使用的是CDH平台,那么默认就已集成Hue服务了,但是HDP平台却是没有的。每次跑点小任务,查询什么的总是登陆服务器,那样极不方便,所以就干脆装上个Hue啦
环境
- CentOS 7
- HDP 2.5
- Hue 3.12.0
安装Hue
从官网下载3.12.0版本:
http://gethue.com/hue-3-12-the-improved-editor-for-sql-developers-and-analysts-is-out/
不过国内好像挺难下的,正好我这里也提前下好了,已经上传到百度云:
https://pan.baidu.com/s/1cCifuu
将其放到服务器指定目录上,进行解压。 root@dell:/data/hue# tar -zxvf hue-3.12.0.tgz
安装依赖:
root@dell:~# yum install ant gcc gcc-c++ mysql-devel openssl-devel cyrus-sasl-devel cyrus-sasl cyrus-sasl-gssapi sqlite-devel openldap-devel libacl-devel libxml2-devel libxslt-devel mvn krb5-devel python-devel python-simplejson python-setuptools
编译安装hue:
root@dell:/data/hue# PREFIX=/usr/share make install
PREFIX指定安装路径,这个最好放在空间较大的分区。
安装的过程比较简单,主要是注意要提前配置好Maven,在编译的过程中需要很多的Jar包,会通过Maven进行下载。
配置Hue
编辑Hue安装路径中的desktop/conf/hue.ini
文件
配置数据库
默认hue使用的是sqlite数据库,可以改为mysql
打开hue.ini文件,找到以下内容:
[[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, sqlite3 or oracle. # # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes. # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>". # Note for MariaDB use the 'mysql' engine. ## engine=sqlite3 // 改为mysql ## host= // mysql服务器主机名或者ip ## port= // 3306 ## user= // 数据库连接用户名(推荐新建一个hue用户) ## password= // 连接用户密码 # Execute this script to produce the database password. This will be used when 'password' is not set. ## password_script=/path/script ## name=desktop/desktop.db // 这里要改为数据库的名字,如:hue ## options={} # Database schema, to be used only when public schema is revoked in postgres ## schema=
将以上内容修改后,如下:
[[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, sqlite3 or oracle. # # Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name # Note for Oracle, options={"threaded":true} must be set in order to avoid crashes. # Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>". # Note for MariaDB use the 'mysql' engine. engine=mysql // 改为mysql host=192.168.1.2 // mysql服务器主机名或者ip port=3306 // 3306 user=hue // 数据库连接用户名(推荐新建一个hue用户) password=lu123456 // 连接用户密码 # Execute this script to produce the database password. This will be used when 'password' is not set. ## password_script=/path/script name=hue // 这里要改为数据库的名字,如:hue ## options={} # Database schema, to be used only when public schema is revoked in postgres ## schema=
配置完数据库后,还需要同步和迁移数据到我们指定的mysql数据库中。
root@dell:/data/hue/hue-3.12.0# build/env/bin/hue syncdbroot@dell:/data/hue/hue-3.12.0# build/env/bin/hue migrate
同步的过程中可能会提示创建admin用户,后面启动hue的登陆需要用到。
Hadoop配置
编辑hdfs-site.xml
,添加以下属性:
<property> <name>dfs.webhdfs.enable</name> <value>true</value></property>
HDP默认应该是开启的,HDFS——>Configs——>Advanced——>General——>WebHDFS enabled
添加hue角色代理:
HDFS——>Configs——>Advanced——>Custom core-site
添加属性:
hadoop.proxyuser.hue.groups=*
hadoop.proxyuser.hue.hosts=*
如果不添加这个的话,是无法通过hue提交job的。
Hue 配置,编辑hue.ini
,找到如下内容:
[hadoop] # Configuration for HDFS NameNode # ------------------------------------------------------------------------ [[hdfs_clusters]] # HA support by using HttpFs [[[default]]] # Enter the filesystem uri fs_defaultfs=hdfs://e5:8020 // 配置NameNode节点 # fs_defaultfs=hdfs://localhost:8020 # NameNode logical name. ## logical_name= # Use WebHdfs/HttpFs as the communication mechanism. # Domain should be the NameNode or HttpFs host. # Default port is 14000 for HttpFs. webhdfs_url=http://e5:50070/webhdfs/v1 // 配置webhdfs地址 # Change this if your HDFS cluster is Kerberos-secured security_enabled=false # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs # have to be verified against certificate authority ## ssl_cert_ca_verify=True # Directory of the Hadoop configuration ## hadoop_conf_dir=$HADOOP_CONF_DIR when set or '/etc/hadoop/conf' hadoop_config_dir=/etc/hadoop/conf // hadoop配置路径 [[yarn_clusters]] [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=e5 // YARN——>Configs——>Advanced——>Advanced yarn-site // 查看yarn.resourcemanager.address属性 # The port where the ResourceManager IPC listens on resourcemanager_port=8050 # Whether to submit jobs to this cluster submit_to=True # Resource Manager logical name (required for HA) ## logical_name= # Change this if your YARN cluster is Kerberos-secured security_enabled=false # URL of the ResourceManager API resourcemanager_api_url=http://e5:8088 # URL of the ProxyServer API proxy_api_url=http://e5:8088 # URL of the HistoryServer API ## history_server_api_url=http://localhost:19888 # URL of the Spark History Server ## spark_history_server_url=http://localhost:18088 # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs # have to be verified against certificate authority ## ssl_cert_ca_verify=True # HA support by specifying multiple clusters. # Redefine different properties there. # e.g. # [[[ha]]] // HA集群高可用配置 # Resource Manager logical name (required for HA) ## logical_name=my-rm-name # Un-comment to enable ## submit_to=True # URL of the ResourceManager API ## resourcemanager_api_url=http://localhost:8088 # ... # Configuration for MapReduce (MR1) # ------------------------------------------------------------------------ [[mapred_clusters]] [[[default]]] # Enter the host on which you are running the Hadoop JobTracker jobtracker_host=e5 # The port where the JobTracker IPC listens on jobtracker_port=8050 # JobTracker logical name for HA ## logical_name= # Thrift plug-in port for the JobTracker ## thrift_port=9290 # Whether to submit jobs to this cluster submit_to=False # Change this if your MapReduce cluster is Kerberos-secured security_enabled=false
这里只是贴一下hadoop的配置,其他的服务如:Oozie、Sqoop等等也很简单如果需要用到的话还是需要进行相应配置的。
启动hue服务
启动hue服务,执行以下命令以开发调试模式启动:
# 0.0.0.0表示允许任何主机连接,如果不加这个的话,默认只运行127.0.0.1本机访问root@dell:/data/hue/hue-3.12.0# build/env/bin/hue runserver 0.0.0.0:8888
初次启动成功后,在浏览器中打开:http://server-ip:8888
就会跳转到hue的登陆界面,如果没有设置初始账号密码的话,默认就是admin/admin,如果通过上面的同步数据库,创建了admin用户的话,就使用那个用户名和密码登陆即可。
登陆也成功了,我们就可以将hue添加进系统的服务中,方面通过systemd来控制,这样启动、关闭、开机自启什么的也都很容易了。
我这里是基于CDH提供的hue脚本,然后稍微改动一下就可以拿过来用了。
/etc/init.d/hue
#!/bin/bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.## /etc/rc.d/init.d/hue## Hue web server## chkconfig: 2345 90 10# description: Hue web server# pidfile: /var/run/hue/supervisor.pid. /etc/init.d/functionsLOCKFILE=/var/lock/subsys/hue#DAEMON=/usr/lib/hue/build/env/bin/supervisor # Introduce the server's location hereDAEMON=/data/hue/hue-3.12.0/build/env/bin/supervisor # Introduce the server's location hereLOGDIR=/var/log/hue # Log directory to usePIDFILE=/var/run/hue/supervisor.pidUSER=hue#EXEC=/usr/lib/hue/build/env/bin/pythonEXEC=/data/hue/hue-3.12.0/build/env/bin/pythonDAEMON_OPTS="-p $PIDFILE -l $LOGDIR -d"HUE_SHUTDOWN_TIMEOUT=15hue_start() { export PYTHON_EGG_CACHE='/tmp/.hue-python-eggs' #RE_REGISTER=/usr/lib/hue/.re_register RE_REGISTER=/data/hue/hue-3.12.0/app.reg if [ -e $RE_REGISTER ]; then # Do app_reg on upgraded apps. This is a workaround for DISTRO-11. # We can probably take it out after another release. DO="/sbin/runuser -s /bin/bash $USER -c" #APP_REG="/usr/lib/hue/tools/app_reg/app_reg.py" APP_REG="/data/hue/hue-3.12.0/tools/app_reg/app_reg.py" # Upgraded apps write their paths in the re_rgister file. RE_REG_LOG=/var/log/hue/hue_re_register.log # Make cwd somewhere that $USER can chdir into pushd / > /dev/null $DO "DESKTOP_LOG_DIR=$LOGDIR $EXEC $APP_REG --install $(cat $RE_REGISTER | xargs echo -n) >> $RE_REG_LOG 2>&1" ok=$? popd > /dev/null if [ $ok -eq 0 ] ; then rm -f $RE_REGISTER else echo "Failed to register some apps: Details in $RE_REG_LOG" fi fi echo -n "Starting hue: " for dir in $(dirname $PIDFILE) $LOGDIR ${PYTHON_EGG_CACHE} do mkdir -p $dir chown -R $USER $dir done # Check if already running if [ -e $PIDFILE ] && checkpid $(cat $PIDFILE) ; then echo "already running" return 0 fi # the supervisor itself will setuid down to $USER su -s /bin/bash $USER -c "$DAEMON $DAEMON_OPTS" ret=$? base=$(basename $0) if [ $ret -eq 0 ]; then sleep 5 test -e $PIDFILE && checkpid $(cat $PIDFILE) ret=$? fi if [ $ret -eq 0 ]; then touch $LOCKFILE success $"$base startup" else failure $"$base startup" fi echo return $ret}hue_stop() { if [ ! -e $PIDFILE ]; then success "Hue is not running" return 0 fi echo -n "Shutting down hue: " HUE_PID=`cat $PIDFILE 2>/dev/null` if [ -n "$HUE_PID" ]; then kill -TERM ${HUE_PID} &>/dev/null for i in `seq 1 ${HUE_SHUTDOWN_TIMEOUT}` ; do kill -0 ${HUE_PID} &>/dev/null || break sleep 1 done kill -KILL ${HUE_PID} &>/dev/null fi echo rm -f $LOCKFILE $PIDFILE return 0}hue_restart() { hue_stop hue_start}case "$1" in start) hue_start ;; stop) hue_stop ;; status) status -p $PIDFILE supervisor ;; restart|reload) hue_restart ;; condrestart) [ -f $LOCKFILE ] && restart || : ;; *) echo "Usage: hue {start|stop|status|reload|restart|condrestart" exit 1 ;;esacexit $?
主要改动以下几个变量的值即可:
DAEMONEXECRE_REGISTERAPP_REG
注意需要将上面的脚本放到/etc/init.d/
目录下面。
之后我们就可以通过systemctl
命令进行服务的控制,如:
# 启动服务# systemctl start hue# 停止服务# systemctl stop hue# 开机自动启动服务# systemctl enable hue
总结
hue的安装配置还是比较容易的,但是初次接触难免总是会出现各种各样的问题,如:python版本、数据库配置、大数据组件服务配置等等问题。遇到问题也不用慌,先看错误提示,然后再看错误日志,最后再百度google搜索,或者查阅官方文档,办法总是会有的。最后建议如果公司服务器配置比较好的话,还是上CDH吧,服务更健全,还有商业保证就算遇到了难以解决的问题,也可以提给Cloudera。。
- HDP 2.5集成Hue
- HDP 2.5集成Sqoop2
- Hadoop集成Hue详解
- hue 集成 hive问题
- HUE中集成Solr
- hue与hive集成
- hue与oozie集成
- 在Hortonworks HDP 2.2 上安装Hue 3.7.1
- Hue(三)集成Hadoop
- Hue(四)集成Hive
- Hue(五)集成Zookeeper
- Hue(六)集成HBase
- Ubuntu下hue集成hbase
- Hue集成的一些问题
- hue与Hadoop的集成
- HUE简介及部署集成
- HDP 2.2.4 Hue Oozie Editor生成workflow.xml的几点问题
- Hue与hadoop的集成配置
- java中的IO处理和使用,API详细介绍(一)
- 如何利用容器降低云成本?
- Java中Resources和Autowired的区别
- VC使用GSOAP调用C#WCF服务
- PHP socket
- HDP 2.5集成Hue
- 卷积神经网络CNN理论到实践(6)
- Javascript循环语句
- 学习使用EasyUI之easyloader加载
- Hadoop---在windows平台上搭建hadoop2.8
- EncodingUtils 过时
- 前段框架,如jQuery的某些控件datepicker,我想自己进行重新初始化,怎么办?
- Studio给模拟器打电话,发短信没有中文乱码
- 003--swift语法基础(变量和常量)