druid集群的安装和验证

来源:互联网 发布:中世纪2优化9血统玩法 编辑:程序博客网 时间:2024/06/06 05:16

核心
1、环境和架构
2、druid的安装
3、druid的配置
4、overlord json
5、overlord csv

1、druid 环境和架构
环境信息
Centos6.5
32GB 8C *5
Zookeeper 3.4.5
Druid 0.9.2
Hadoop-2.6.5
Jdk1.7
架构
10.20.23.42 Broker Real-time datanode NodeManager QuorumPeerMain
10.20.23.29 middleManager datanode NodeManager
10.20.23.38 overlord datanode NodeManager QuorumPeerMain
10.20.23.82 coordinator namenode ResourceManager
10.20.23.41 historical datanode NodeManager QuorumPeerMain

2、druid安装
Hadoop的安装就不介绍了,之前一直用Hadoop2.3.0安装但是没有成功,所以换成了2.6.5
和单机一样的流程

1、 先解压

2、 拷贝文件
拷贝Hadoop的配置文件到 ${DRUID_HOME}/conf/druid/_common目录下面,拷贝4个core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml

3、 创建目录,拷贝jar包
${DRUID_HOME} /hadoop-dependencies/hadoop-client目录下面创建一个2.6.5(建议选择Hadoop的版本号)的文件夹,将Hadoop的jar包拷贝到这个目录下面

4、 修改配置文件

注意:配置文件特别繁琐,只要有一个地方配置错误任务就不能执行

#配置元数据信息,修改成druid-hdfs-storage和mysql-metadata-storagedruid.extensions.loadList=["druid-hdfs-storage","mysql-metadata-storage"]#配置zookeeper的信息druid.zk.service.host=10.20.23.82:2181druid.zk.paths.base=/druid/cluster#配置元数据MySQL的信息druid.metadata.storage.type=mysqldruid.metadata.storage.connector.connectURI=jdbc:mysql://10.20.23.42:3306/druiddruid.metadata.storage.connector.user=rootdruid.metadata.storage.connector.password=123456# 配置存储的信息# Deep storage## For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):druid.storage.type=hdfsdruid.storage.storageDirectory=/druid/segments#配置日志存储的信息# Indexing service logs## For HDFS (make sure to include the HDFS extension and that your Hadoop config files in the cp):druid.indexer.logs.type=hdfsdruid.indexer.logs.directory=/druid/indexing-logs

broker的配置

broker的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800

[hadoop@SZB-L0038784 broker]$ cat jvm.config -server-Xms1g-Xmx1g-XX:MaxDirectMemorySize=4096m-Duser.timezone=UTC+0800-Dfile.encoding=UTF-8-Djava.io.tmpdir=var/tmp-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager[hadoop@SZB-L0038784 broker]$ cat runtime.properties druid.host=10.20.23.82druid.service=druid/brokerdruid.port=8082# HTTP server threadsdruid.broker.http.numConnections=5druid.server.http.numThreads=25# Processing threads and buffersdruid.processing.buffer.sizeBytes=536870912druid.processing.numThreads=7# Query cachedruid.broker.cache.useCache=truedruid.broker.cache.populateCache=truedruid.cache.type=localdruid.cache.sizeInBytes=2000000000

coordinator的配置

coordinator的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800

 [hadoop@SZB-L0038784 coordinator]$ cat jvm.config -server-Xms1g-Xmx1g-Duser.timezone=UTC+0800-Dfile.encoding=UTF-8-Djava.io.tmpdir=var/tmp-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager-Dderby.stream.error.file=var/druid/derby.log[hadoop@SZB-L0038784 coordinator]$ cat runtime.properties druid.host=10.20.23.82druid.service=druid/coordinatordruid.port=18091druid.coordinator.startDelay=PT30Sdruid.coordinator.period=PT30S

historical 的配置

historical的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800

 [hadoop@SZB-L0038784 historical]$ cat jvm.config -server-Xms1g-Xmx1g-XX:MaxDirectMemorySize=4960m-Duser.timezone=UTC+0800-Dfile.encoding=UTF-8-Djava.io.tmpdir=var/tmp-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager[hadoop@SZB-L0038784 historical]$ cat runtime.properties druid.host=10.20.23.82druid.service=druid/historicaldruid.port=8083# HTTP server threadsdruid.server.http.numThreads=25# Processing threads and buffersdruid.processing.buffer.sizeBytes=536870912druid.processing.numThreads=7# Segment storagedruid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize"\:130000000000}]druid.server.maxSize=130000000000

middleManager 的配置

middleManager的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800
其中hadoop-client:2.6.5 这个2.6.5是和第3点中创建的路径名字是一样的,

 [hadoop@SZB-L0038784 middleManager]$ cat jvm.config -server-Xms64m-Xmx64m-Duser.timezone=UTC+0800-Dfile.encoding=UTF-8-Djava.io.tmpdir=var/tmp-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager[hadoop@SZB-L0038784 middleManager]$ cat runtime.properties druid.service=druid/middleManagerdruid.port=8091# Number of tasks per middleManagerdruid.worker.capacity=3# Task launch parametersdruid.indexer.runner.javaOpts=-server -Xmx2g -Duser.timezone=UTC+0800 -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManagerdruid.indexer.task.baseTaskDir=var/druid/task# HTTP server threadsdruid.server.http.numThreads=25# Processing threads and buffersdruid.processing.buffer.sizeBytes=536870912druid.processing.numThreads=2# Hadoop indexingdruid.host=10.20.23.82druid.indexer.task.hadoopWorkingPath=/druid/hadoop-tmpdruid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:2.6.5"]

overlord 的配置

overlord的配置,主要配置根据实际情况修改内存分配的大小。添加druid.host参数和修改Duser.timezone的值,因为druid默认的时区是Z。所以我们需要加上+0800

 [hadoop@SZB-L0038784 overlord]$ cat jvm.config -server-Xms1g-Xmx1g-Duser.timezone=UTC+0800-Dfile.encoding=UTF-8-Djava.io.tmpdir=var/tmp-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager[hadoop@SZB-L0038784 overlord]$ cat runtime.properties druid.host=10.20.23.82druid.service=druid/overlorddruid.port=8090druid.indexer.queue.startDelay=PT30Sdruid.indexer.runner.type=remotedruid.indexer.storage.type=metadata

5、 在 通过scp拷贝到其他的机器上面去
6、 在对应机器启动各个进程

java `cat conf/druid/historical/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/historical:lib/*" io.druid.cli.Main server historicaljava `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" io.druid.cli.Main server brokerjava `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinatorjava `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" io.druid.cli.Main server overlordjava `cat conf/druid/middleManager/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager

可以通过下面2个URL查看到对应的页面
http://10.20.23.82:18091/#/
这里写图片描述
http://10.20.23.82:8090/console.html
这里写图片描述
7、 创建hdfs对应的路径
/druid/indexing-logs
/druid/segments

3、overlord json文件

[hadoop@SZB-L0038784 hadoop-client]$ hadoop fs -ls /druiddrwxr-xr-x   - hadoop supergroup           0 2017-05-30 16:02 /druid/hadoop-tmpdrwxr-xr-x   - hadoop supergroup           0 2017-05-30 16:00 /druid/indexing-logsdrwxr-xr-x   - hadoop supergroup           0 2017-05-30 15:39 /druid/segments-rw-r--r--   3 hadoop supergroup         153 2017-05-29 16:58 /druid/wikipedia_data.csv-rw-r--r--   3 hadoop supergroup    17106256 2017-05-29 10:54 /druid/wikiticker-2015-09-12-sampled.json

运行overlord命令

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json 10.20.23.38:8090/druid/indexer/v1/task

在监控页面可以查看到SUCCESS说明已经overlord成功了
这里写图片描述
查询

[hadoop@SZB-L0038787 druid-0.9.2]$ curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://10.20.23.42:8082/druid/v2/?pretty[ {  "timestamp" : "2015-09-12T00:46:58.771Z",  "result" : [ {    "page" : "Wikipedia:Vandalismusmeldung",    "edits" : 20  }, {    "page" : "Jeremy Corbyn",    "edits" : 18  }, {    "page" : "User talk:Dudeperson176123",    "edits" : 17  }, {    "page" : "Utente:Giulio Mainardi/Sandbox",    "edits" : 16  }, {    "page" : "User:Cyde/List of candidates for speedy deletion/Subpage",    "edits" : 15  }, {    "page" : "Wikipédia:Le Bistro/12 septembre 2015",    "edits" : 14  }, {    "page" : "Wikipedia:Administrators' noticeboard/Incidents",    "edits" : 12  }, {    "page" : "Kim Davis (county clerk)",    "edits" : 11  }, {    "page" : "The Naked Brothers Band (TV series)",    "edits" : 10  }, {    "page" : "Гомосексуальный образ жизни",    "edits" : 10  }, {    "page" : "Wikipedia:Administrator intervention against vandalism",    "edits" : 9  }, {    "page" : "Wikipedia:De kroeg",    "edits" : 9  }, {    "page" : "Wikipedia:Files for deletion/2015 September 12",    "edits" : 9  }, {    "page" : "التهاب السحايا",    "edits" : 9  }, {    "page" : "Chess World Cup 2015",    "edits" : 8  }, {    "page" : "The Book of Souls",    "edits" : 8  }, {    "page" : "Wikipedia:Requests for page protection",    "edits" : 8  }, {    "page" : "328-я стрелковая дивизия (2-го формирования)",    "edits" : 7  }, {    "page" : "Campanya dels Balcans (1914-1918)",    "edits" : 7  }, {    "page" : "Homo naledi",    "edits" : 7  }, {    "page" : "List of shipwrecks in August 1944",    "edits" : 7  }, {    "page" : "User:Tokyogirl79/sandbox4",    "edits" : 7  }, {    "page" : "Via Lliure",    "edits" : 7  }, {    "page" : "Vorlage:Revert-Statistik",    "edits" : 7  }, {    "page" : "Wikipedia:Löschkandidaten/12. September 2015",    "edits" : 7  } ]} ]

Json文件的内容 特别注意需要加上jobProperties 这个不然程序会报错

json index的配置

[hadoop@SZB-L0038787 druid-0.9.2]$ cat quickstart/wikiticker-index.json{  "type" : "index_hadoop",  "spec" : {    "ioConfig" : {      "type" : "hadoop",      "inputSpec" : {        "type" : "static",        "paths" : "/druid/wikiticker-2015-09-12-sampled.json"      }    },    "dataSchema" : {      "dataSource" : "wikiticker",      "granularitySpec" : {        "type" : "uniform",        "segmentGranularity" : "day",        "queryGranularity" : "none",        "intervals" : ["2015-09-12/2015-09-13"]      },      "parser" : {        "type" : "hadoopyString",        "parseSpec" : {          "format" : "json",          "dimensionsSpec" : {            "dimensions" : [              "channel",              "cityName",              "comment",              "countryIsoCode",              "countryName",              "isAnonymous",              "isMinor",              "isNew",              "isRobot",              "isUnpatrolled",              "metroCode",              "namespace",              "page",              "regionIsoCode",              "regionName",              "user"            ]          },          "timestampSpec" : {            "format" : "auto",            "column" : "time"          }        }      },      "metricsSpec" : [        {          "name" : "count",          "type" : "count"        },        {          "name" : "added",          "type" : "longSum",          "fieldName" : "added"        },        {          "name" : "deleted",          "type" : "longSum",          "fieldName" : "deleted"        },        {          "name" : "delta",          "type" : "longSum",          "fieldName" : "delta"        },        {          "name" : "user_unique",          "type" : "hyperUnique",          "fieldName" : "user"        }      ]    },    "tuningConfig" : {      "type" : "hadoop",      "partitionsSpec" : {        "type" : "hashed",        "targetPartitionSize" : 5000000      },       "jobProperties" : {        "mapreduce.job.classloader": "true",        "mapreduce.job.classloader.system.classes": "-javax.validation.,java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop."      }    }  }}

查询的json文件

[hadoop@SZB-L0038787 druid-0.9.2]$ cat quickstart/wikiticker-top-pages.json {  "queryType" : "topN",  "dataSource" : "wikiticker",  "intervals" : ["2015-09-12/2015-09-13"],  "granularity" : "all",  "dimension" : "page",  "metric" : "edits",  "threshold" : 25,  "aggregations" : [    {      "type" : "longSum",      "name" : "edits",      "fieldName" : "count"    }  ]}

5、overlord csv文件
我们先准备一些csv的数据

[hadoop@SZB-L0038787 data]$ cat test2017-08-01T01:02:33Z,10202111900173056925,30202111900037998891,2020211,20202000434,2,1,B18,3,4,J,B,2020003088,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-0910:56:03+08:00,2017-07-16T01:02:33Z,10202111900164385197,30202111900034745280,2020211,20202000434,2,1,B18,3,4,J,B,2020003454,,,,,,01,,00000655,,,,,-2000.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-04-1510:42:26+08:00,2017-05-15T01:02:33Z,13024011900164473005,33024011900035728305,2302401,2302401,2,1,A01,2,1,G,H,2300000212,,,,30240061,,01,309,,,,,,59.25,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-04-1517:23:31+08:00,2017-08-01T01:02:33Z,10202111900173999588,30202111900038540746,2020211,20202000434,2,1,B18,3,4,J,B,2020003155,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-1515:41:34+08:00,2017-08-01T01:02:33Z,10202111900174309914,30202111900038542126,2020211,20202000434,2,1,B18,3,4,J,B,2020003155,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-1710:36:16+08:00,2017-08-01T01:02:33Z,10202111900176540667,30202111900038893351,2020211,20202000434,2,1,B18,3,4,J,B,2020003155,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-2913:54:09+08:00,2017-06-18T01:02:33Z,12078001900174397522,32078001900038476523,22078,22078002835,2,1,A56,2,2,C,A,2200041441,,,,20760002,,01,999,,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-06-1717:36:41+08:00,2017-12-24T01:02:33Z,11414021900149429403,31414021900036312816,2141402,21414020238,2,1,A01,2,2,8,9,2141400018,,,,14140018,,01,402,,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2014-12-2612:15:31+08:00,2017-06-01T01:02:33Z,10202111900165839017,30202111900035354013,2020211,20202000434,2,1,B18,3,4,J,B,2020003088,,,,,,01,,00000655,,,,,0.00,OLAPMAN,2017-01-0421:16:08+08:00,OLAPMAN,2017-01-0421:16:08+08:00,2015-04-2314:32:53+08:00,

准备csv的json文件

[hadoop@SZB-L0038787 quickstart]$ cat test-index.json{  "type": "index_hadoop",  "spec": {    "dataSchema": {      "dataSource": "test",      "parser": {        "type": "string", "parseSpec": {       "format" : "csv",       "timestampSpec" :    {         "column" : "stat_date"       },       "columns" : [        "stat_date",                "policy_no",                "endorse_no",                "department_code",                "sale_group_code",                "business_type",                "business_mode",                "plan_code",                "business_source_code",                "business_source_detail_code",                "channel_source_code",                "channel_source_detail_code",                "sale_agent_code",                "primary_introducer_code",                "renewal_type",                "purchase_year",                "agent_code",                "partner_id",                "currency_code",                "parent_company_code",                "broker_code",                "dealer_code",                "auto_series_id",                "usage_attribute_code",                "new_channel_ground_mark",                "ply_prem_day",                "created_by",                "date_created",                "updated_by",                "date_updated",                "underwrite_time",                "partner_worknet_code"                          ],      "dimensionsSpec" :    {        "dimensions" : [           "department_code",                "sale_group_code",                "business_type",                "business_mode",                "plan_code",                "business_source_code",                "business_source_detail_code",                "channel_source_code",                "channel_source_detail_code",                "sale_agent_code"            ]       }         }      },      "metricsSpec": [        {          "type": "count",          "name": "count"        },        {          "type": "doubleSum",          "name": "ply_prem_day",          "fieldName": "ply_prem_day"        }      ],      "granularitySpec": {        "type": "uniform",        "segmentGranularity": "DAY",        "queryGranularity": "NONE",        "intervals": ["2017-05-15/2017-12-25"]      }    },    "ioConfig" : {      "type" : "hadoop",      "inputSpec" : {        "type" : "static",        "paths" : "/druid/test"      }    },    "tuningConfig" : {      "type": "hadoop",     "jobProperties" : {        "mapreduce.job.classloader": "true",        "mapreduce.job.classloader.system.classes": "-javax.validation.,java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop."      }    }  }}

准备csv的查询json文件

[hadoop@SZB-L0038787 quickstart]$ cat test-top-pages.json {  "queryType" : "topN",  "dataSource" : "test",  "intervals" : ["2017-05-15/2017-12-25"],  "granularity" : "all",  "dimension" : "department_code",  "metric" : "edits",  "threshold" : 25,  "aggregations" : [    {      "type" : "longSum",      "name" : "edits",      "fieldName" : "count"    }  ]}
原创粉丝点击