HIVE表数据Kibana展示

来源：互联网发布：模拟电路图软件编辑：程序博客网时间：2024/06/06 02:34

如果我们想展示hive中的数据,则可以使用Kibana展示工具,而在这之前需要把hive表数据导入到es中,这就用到了ES-Hadoop插件.

插件安装:

下载地址：https://github.com/elasticsearch/elasticsearch-hadoop#readme

add上面的jar包到hive

hive –e “add jar elasticsearch-hadoop-2.1.1.jar;”

假如我们现在想把表dms.visit_path表中的数据展示,步骤如下:

一.启动脚本dms_visit_path_es.sh：把hive表数据导入到ES中

脚本如下：

#!/bin/sh

#load data into ES

#example:sh load_data_.sh stage.CMS_NEWS_BY_DAY wordcloud-news/hexuntong myrowkey [20151013] [20151015]

ES_NODES='10.130.2.46';ES_PORT='9200';

nowday=$(date +%Y%m%d);

nowhour=$(date +%H);

HIVE_TABLE=$1;

ES_INDEX_TYPE=$2;

myrowkey=$3

firstday=$4;

endday=$5;

if [ ! -n "$firstday" ]; then

firstday=$nowday;

if [ ! -n "$endday" ]; then

endday=$firstday;

#--------------get the columns of table $HIVE_TABLE-----------------------------------

#columnname=$(hive -e "show create table ${HIVE_TABLE};" | tr "\n" " " | awk -F '(' '{print $2}' |awk -F ')' '{print $1}')

tablename=$HIVE_TABLE

filepath1="iowejlkjlsdjgoisj.txt"

filepath2="woperkpkgwefsater.txt"

hive -e "desc $tablename;" >$filepath1

sed -i '/^#/d' $filepath1

sed -i '/^WARN:/d' $filepath1

sed '/^\s*$/d' $filepath1

uniq -u $filepath1 >$filepath2

#uniq -u $filepath2 | awk -F '\t' '{print $1}'

uniq -u $filepath2 | while read line

var1=`echo $line |awk -F ' ' '{print $1}'`

var2=`echo $line |awk -F ' ' '{print $2}'`

sqlcolmn="${sqlcolmn}$var1 ${var2},"

echo ${sqlcolmn%,*} >$filepath1

done

result=`cat $filepath1`

rm -fr $filepath1

rm -fr $filepath2

# echo $result

columnname='`id` string',$result,'`partion_day` int'

#------------------------create mapping table for es----------------------------------

tablename=`echo $HIVE_TABLE | awk -F '.' '{print$2}'`;

echo ${tablename}

create_sql="drop table default.${tablename}_ES_TEMP;CREATE EXTERNAL TABLE IF NOT EXISTS default.${tablename}_ES_TEMP ($columnname) PARTITIONED BY(day String) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.r

esource' = '${ES_INDEX_TYPE}','es.index.auto.create' = 'true','es.mapping.id' = 'id','es.nodes'='${ES_NODES}','es.port'='${ES_PORT}');"

echo $create_sql

hive -e "${create_sql}"

#"'es.mapping.timestamp' = 'partion_day'",

#sources=`echo $ES_INDEX_TYPE | awk -F '/' '{print$2}'`;

#search_column="date,step,substring(path,1,32000),start_url,end_url,path_pv,path_uv,updatetime,day"

#---------------------exec load data into es-----------------------------------------------------------

while((firstday <= endday))

exec_sql="INSERT OVERWRITE TABLE default.${tablename}_ES_TEMP partition(day='${firstday}') SELECT $myrowkey, * FROM ${HIVE_TABLE} where day='${firstday}';"

hive -e "${exec_sql}"

firstday=`date -d "${firstday} +1 day " +%Y%m%d`

done

调用如下：

sh /diske/autoshell/liuzhaoming/userTag/dms_visit_path_es.sh dms.visit_path dms-visit_path/user "udf_md5(concat(date,path))"

说明：1.上面的脚本可自动把hive中的数据导入到ES中，譬如上面的调用把dms.visit_path数据导入ES，在ES中索引名为dms-visit_path，类型为user

2.参数说明：@1：数据表名,如dms.visit_path

@2：ES中索引名和类型名，如dms-visit_path/user

@3：为去重设置的组合主键，我设置的是多个列相加取md5值，这样可以保证数据唯一，重复倒数据达到去重的效果。如udf_md5(concat(date,path))

@4：可不输，不输入默认为今天日期；

@5：可不输，不输入默认为@4参数的值，这两个参数是为了循环导入多天的数据设置。当然如果@5大于@4则执行导入语句不执行。

下面列出带日期的调用方式：

sh /diske/autoshell/liuzhaoming/userTag/dms_visit_path_es.sh dms.visit_path dms-visit_path/user "udf_md5(concat(date,path))" 20160101 20160105

PS：hive表中的数据类型尽量规范，时间是时间格式，整形是int，不要统一设置为string，否则后面的kibana展示将会遇到难题。

二.数据kibana展示

先说明一下kibana的安装：

下载地址：https://www.elastic.co/thank-you?url=https://download.elastic.co/kibana/kibana/kibana-4.4.1-linux-x64.tar.gz

（我这是64位系统，还需注意kibana与es有版本对应）

解压tar包，然后修改$HOME/config/kibana.yml文件中的

把框内的ip改为ES集群的ip，这样的话访问http://ip:5601，即可访问kibana。

1.首先进入kibana页面，点击菜单【Setting】-【Indices】，

@：可以填入名称的通配形式，这样可以监控多个索引（一般是按天分索引的数据）

点击Create即可。

2.点击菜单【Discover】，选择你刚刚建立的Setting映射，

@然后点击右上角的保存，输入名称即可。

@这个是后面展示图所要用到的数据源，当然你也可以在这里搜索你的数据，注意字符串两边最好加双引号。

3.点击【Visualize】，进行各种图标制作。

可以选择制作哪种展示图，例如制作日统计量的柱状图，点击最后一个。

@order by的字段类型必须是date或int型的，这就是为什么前面导数据的时候要强调，数据类型的重要性。

4.最后点击【DashBoard】菜单，进行仪表盘的制作；可以把前面的discover和Visualize报存的数据和图集合在此仪表盘。

至此，hive数据到kibana展示过程就如此。下面重点提醒说明：

1.最重要的莫过于数据类型，假如表中字段类型全为string，则到es中也为string，则展示图的时候就没法排序（不是排序错误，而是排序的字段只能选date或int类型的），所以在导数据之前，一定要审查数据表的类型，假如符合规范，则可顺利执行上述步骤。

2.我的建议：这里的展示一切以ES为中心，所以要对ES有所了解，了解mapping的作用，建议hive表导入ES的数据，要提前在ES中把索引名和类型名定义好，就是针对每个字段设置好mapping，用到的数据可以设置multi_field，一般的数据设置不分析就行了，方便搜索和统计。下面给出一个模板：

curl -XPUT 'http://localhost:9200/dms-visit_path' -d '{

"settings": {

"number_of_shards": 5,

"number_of_replicas": 1,

"index.refresh_interval": "30s",

"index.translog.flush_threshold_ops": "100000"

}

curl -XPUT 'http://10.130.2.46:9200/dms-visit_path/user/_mapping' -d '{

"_source": {"compress": true},

"properties": {

"id": {

"type": "string",

"index" : "not_analyzed"

"start_url": {

"type": "string",

"index" : "not_analyzed"

"updatetime": {

"format": "dateOptionalTime",

"type": "date"

"partion_day": {

"type": "long"

"path": {

"type" : "multi_field",

"fields" : {

"path" : {"type" : "string", "index" : "analyzed"},

"path.raw" : {"type" : "string", "index" : "not_analyzed"}

}

"end_url": {

"type": "string",

"index" : "not_analyzed"

"date": {

"type": "string",

"index" : "not_analyzed"

"path_uv": {

"type": "long"

"step": {

"type": "string",

"index" : "not_analyzed"

"path_pv": {

"type": "long"

}

1 0