大数据学习[09]:presto0.184集群|多数据源|问题

来源:互联网 发布:算卦一条街源码 编辑:程序博客网 时间:2024/05/17 08:15

这里写图片描述

摘要:下载安装presto0.184,配置presto0.184的集群模式,测试presto与多种数据相连接的连接器,包括Hive,Mysql等等。记录安装与配置所遇到的问题和解决方法。

前置

Hadoop,Hive参考:
大数据学习[04]:Hive安装配置:
http://blog.csdn.net/ld326/article/details/78023101
大数据学习[02]:hadoop安装配置: http://blog.csdn.net/ld326/article/details/78004402

官网

https://prestodb.io/

下载

https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.184/presto-server-0.184.tar.gz

解压

tar zvf presto-server-0.184.tar.gz

创建文件夹

与目录presto-server-0.184相同的目录创建一个数据日志数据文件夹,presto-data; 为了方便之后升级这个文件不用修改;

配置

最后配置的文件结构

etc
├── catalog
│ └── hive.properties
├── config.properties
├── jvm.config
├── log.properties
└── node.properties

Create an etc directory

在presto的home目录下创建一个etc目录,主要配置以下的信息:
● Node Properties: environmental configuration specific to each node
● JVM Config: command line options for the Java Virtual Machine
● Config Properties: configuration for the Presto server
● Catalog Properties: configuration for Connectors (data sources)

节点属性配置

创建配置节点信息,先创建etc/node.properties文件,在一个机器上,节点是Presto安装的单例。简单配置一下etc/node.properties文件:
node.environment=pcluster
node.id=111
node.data-dir=/home/hadoop/data/presto
记得创建一个/home/hadoop/data/presto文件夹,当presto运行后会是这样的目录结构:

data└── presto    ├── etc -> /home/hadoop/presto-server-0.184/etc    ├── plugin -> /home/hadoop/presto-server-0.184/plugin    └── var        ├── log        │   ├── http-request.log        │   ├── launcher.log        │   └── server.log        └── run            └── launcher.pid

● node.environment: The name of the environment. All Presto nodes in a cluster must have the same environment name.
node.environment:是一个presto事个集群环境的名字,所有的Presto节点都要有这个相同的名字。
● node.id: The unique identifier for this installation of Presto. This must be unique for every node. This identifier should remain consistent across reboots or upgrades of Presto. If running multiple installations of Presto on a single machine (i.e. multiple nodes on the same machine), each installation must have a unique identifier.
node.id:Presto安装的唯一识别,对于每个节点这个id是唯一的;Presto无论重启或更新这个id是保持不变的。
● node.data-dir: The location (filesystem path) of the data directory. Presto will store logs and other data here.
nod.data-dir:Presto的数据储存的本地目录

JVM配置

文件:etc/jvm.config
-server
-Xmx1G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

配置Presto server

文件:etc/config.properties
presto server 中分两个角色:coordinator,workers;
Every Presto server can function as both a coordinator and a worker, but dedicating a single machine to only perform coordination work provides the best performance on larger clusters.
每台Presto服务器都可以做coordinatorgn与worker,但是在一个大的集群中,用单个机器作为coordination有好的性能。
coordinator[192.168.137.101]:

coordinator=truenode-scheduler.include-coordinator=falsehttp-server.http.port=8080query.max-memory=1GBquery.max-memory-per-node=512MBdiscovery-server.enabled=truediscovery.uri=http://hadoop01:8080

workers[192.168.137.102,192.168.137.103]:

coordinator=falsehttp-server.http.port=8080query.max-memory=1GBdiscovery.uri=http://hadoop01:8080

或想一台机器设置两个角色:

coordinator=truenode-scheduler.include-coordinator=truehttp-server.http.port=8080query.max-memory=1GBquery.max-memory-per-node=512MBdiscovery-server.enabled=truediscovery.uri=http://hadoop01:8080

这里就调整了一下:node-scheduler.include-coordinator,让主机也作为work节点。
注意:这里的hadoop01为主机名,如果配置成IP会出错。
说明:
● coordinator: Allow this Presto instance to function as a coordinator (accept queries from clients and manage query execution).
● node-scheduler.include-coordinator: Allow scheduling work on the coordinator. For larger clusters, processing work on the coordinator can impact query performance because the machine’s resources are not available for the critical task of scheduling, managing and monitoring query execution.
● http-server.http.port: Specifies the port for the HTTP server. Presto uses HTTP for all communication, internal and external.
● query.max-memory: The maximum amount of distributed memory that a query may use.
● query.max-memory-per-node: The maximum amount of memory that a query may use on any one machine.
● discovery-server.enabled: Presto uses the Discovery service to find all the nodes in the cluster. Every Presto instance will register itself with the Discovery service on startup. In order to simplify deployment and avoid running an additional service, the Presto coordinator can run an embedded version of the Discovery service. It shares the HTTP server with Presto and thus uses the same port.
● discovery.uri: The URI to the Discovery server. Because we have enabled the embedded version of Discovery in the Presto coordinator, this should be the URI of the Presto coordinator. Replace example.net:8080 to match the host and port of the Presto coordinator. This URI must not end in a slash.

配置日志Log

文件:etc/log.properties
配置内容

com.facebook.presto=INFO

配置连接数据源(Catalog Properties)

Connectors
● 5.1. Accumulo Connector
● 5.2. Black Hole Connector
● 5.3. Cassandra Connector
● 5.4. Hive Connector
● 5.5. Hive Security Configuration
● 5.6. JMX Connector
● 5.7. Kafka Connector
● 5.8. Kafka Connector Tutorial
● 5.9. Local File Connector
● 5.10. Memory Connector
● 5.11. MongoDB Connector
● 5.12. MySQL Connector
● 5.13. PostgreSQL Connector
● 5.14. Redis Connector
● 5.15. SQL Server Connector
● 5.16. System Connector
● 5.17. Thrift Connector
● 5.18. TPCDS Connector
● 5.19. TPCH Connector

配置hive的Catalog

vim hive.properties
connector.name=hive-hadoop2hive.metastore.uri=thrift://hadoop01:9083hive.config.resources=/home/hadoop/hadoop/etc/hadoop/core-site.xml, /home/hadoop/apache-hive-1.2.1-bin/conf/hdfs-site.xmlhive.allow-drop-table=true

记得hive-hadoop2这个名字不能乱起的哈,其它名字presto不认识的。
presto的hive配置属性说明表
presto

启动

两种启动方法
daemon运行:bin/launcher start
foreground运行:bin/launcher run
如果用到supersive运行的话,要用foreground运行这个。

Presto命令行界面Presto CLI###

下载:
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.184/presto-cli-0.184-executable.jar
修改名为presto,修改运行权限chmod +x,运行

./presto --server localhost:8080 --catalog hive --schema default

提示:Run the CLI with the –help option to see the available options.
cli启动:

[hadoop@hadoop01 ~]$ ./presto.jar --server 192.168.137.101:8080 --catalog hive --schema default

–server要跟前面presto中config.properties中所配置的IP与端口一致。

2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   PROPERTY                                           DEFAULT     RUNTIME                                                                                                DESCRIPTION2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.allow-corrupt-writes-for-testing              false       false                                                                                                  Allow Hive connector to write data even when data will likely be corrupt2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.assume-canonical-partition-keys               false       false2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.bucket-execution                              true        true                                                                                                   Enable bucket-aware execution: only use a single worker per bucket2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.bucket-writing                                true        true                                                                                                   Enable writing to bucketed tables2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.connect.max-retries                       5           52017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.connect.timeout                           500.00ms    500.00ms2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs-timeout                                   60.00s      60.00s2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.domain-compaction-threshold                   100         100                                                                                                    Maximum ranges to allow in a tuple domain without compacting it2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.domain-socket-path                        null        null2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.fs.cache.max-size                             1000        1000                                                                                                   Hadoop FileSystem cache size2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.force-local-scheduling                        false       false2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.hdfs.authentication.type                      NONE        NONE                                                                                                   HDFS authentication type2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.hdfs.impersonation.enabled                    false       false                                                                                                  Should Presto user be impersonated when communicating with HDFS2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.compression-codec                             GZIP        GZIP2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore.authentication.type                 NONE        NONE                                                                                                   Hive Metastore authentication type2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.storage-format                                RCBINARY    RCBINARY2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.immutable-partitions                          false       false                                                                                                  Can new data be inserted into existing partitions or existing unpartitioned tables2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.dfs.ipc-ping-interval                         10.00s      10.00s2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-concurrent-file-renames                   20          202017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-initial-split-size                        32MB        32MB2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-initial-splits                            200         2002017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-refresh-max-threads                 100         1002017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-outstanding-splits                        1000        10002017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore.partition-batch-size.max            100         1002017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-partitions-per-scan                       100000      100000                                                                                                 Maximum allowed partitions for a single table scan2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-partitions-per-writers                    100         100                                                                                                    Maximum number of partitions per writer2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-split-iterator-threads                    1000        10002017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.max-split-size                                64MB        64MB2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-cache-maximum-size                  10000       100002017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-cache-ttl                           0.00s       0.00s2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore-refresh-interval                    0.00s       0.00s2017-09-24T22:04:24.360+0800    INFO    main    Bootstrap   hive.metastore.thrift.client.socks-proxy           null        null2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.metastore-timeout                             10.00s      10.00s2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.metastore.partition-batch-size.min            10          102017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.orc.bloom-filters.enabled                     false       false2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.orc.default-bloom-filter-fpp                  0.05        0.05                                                                                                   ORC Bloom filter false positive probability2017-09-24T22:04:24.361+0800    INFO    main    Bootstrap   hive.orc.max-buffer-size                           8MB         8MB2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.max-merge-distance                        1MB         1MB2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.max-read-block-size                       16MB        16MB2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.optimized-writer.enabled                  false       false2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.stream-buffer-size                        8MB         8MB2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.parquet-optimized-reader.enabled              false       false2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.parquet-predicate-pushdown.enabled            false       false2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.per-transaction-metastore-cache-maximum-size  1000        10002017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.rcfile-optimized-writer.enabled               true        true2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.rcfile.writer.validate                        false       false                                                                                                  Validate RCFile after write by re-reading the whole file2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.recursive-directories                         false       false2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.config.resources                              null        [/home/hadoop/hadoop/etc/hadoop/core-site.xml, /home/hadoop/apache-hive-1.2.1-bin/conf/hdfs-site.xml]2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.respect-table-format                          true        true                                                                                                   Should new partitions be written using the existing table format or the default Presto format2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.skip-deletion-for-alter                       false       false                                                                                                  Skip deletion of old partition data when a partition is deleted and then inserted in the same transaction2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.table-statistics-enabled                      true        true                                                                                                   Enable use of table statistics2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.time-zone                                     PRC         PRC2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.orc.use-column-names                          false       false                                                                                                  Access ORC columns using names from the file2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.parquet.use-column-names                      false       false                                                                                                  Access Parquet columns using names from the file2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.dfs.verify-checksum                           true        true2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.write-validation-threads                      16          16                                                                                                     Number of threads used for verifying data after a write2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.non-managed-table-writes-enabled              false       false                                                                                                  Enable writes to non-managed (external) tables2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.pin-client-to-current-region               false       false                                                                                                  Should the S3 client be pinned to the current EC2 region2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.aws-access-key                             null        null2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.aws-secret-key                             [REDACTED]  [REDACTED]2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.connect-timeout                            5.00s       5.00s2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.encryption-materials-provider              null        null                                                                                                   Use a custom encryption materials provider for S3 data encryption2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.endpoint                                   null        null2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.kms-key-id                                 null        null                                                                                                   Use an AWS KMS key for S3 data encryption2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.max-backoff-time                           10.00m      10.00m2017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.max-client-retries                         5           52017-09-24T22:04:24.365+0800    INFO    main    Bootstrap   hive.s3.max-connections                            500         5002017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.max-error-retries                          10          102017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.max-retry-time                             10.00m      10.00m2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.multipart.min-file-size                    16MB        16MB                                                                                                   Minimum file size for an S3 multipart upload2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.multipart.min-part-size                    5MB         5MB                                                                                                    Minimum part size for an S3 multipart upload2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.signer-type                                null        null2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.socket-timeout                             5.00s       5.00s2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.sse.enabled                                false       false                                                                                                  Enable S3 server side encryption2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.sse.kms-key-id                             null        null                                                                                                   KMS Key ID to use for S3 server-side encryption with KMS-managed key2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.sse.type                                   S3          S3                                                                                                     Key management type for S3 server-side encryption (S3 or KMS)2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.ssl.enabled                                true        true2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.staging-directory                          /tmp        /tmp                                                                                                   Temporary directory for staging files before uploading to S32017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.use-instance-credentials                   true        true2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.s3.user-agent-prefix                                                                                                                                             The user agent prefix to use for S3 calls2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.metastore.uri                                 null        [thrift://hadoop01:9083]                                                                               Hive metastore URIs (comma separated)2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.metastore                                     thrift      thrift2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-add-column                              false       false                                                                                                  Allow Hive connector to add column2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-drop-column                             false       false                                                                                                  Allow Hive connector to drop column2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-drop-table                              false       true                                                                                                   Allow Hive connector to drop table2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-rename-column                           false       false                                                                                                  Allow Hive connector to rename column2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.allow-rename-table                            false       false                                                                                                  Allow Hive connector to rename table2017-09-24T22:04:24.366+0800    INFO    main    Bootstrap   hive.security                                      legacy      legacy
查询:presto:default> show tables;   Table   ----------- dual       student    student01  student02  student03  student04  student05 (7 rows)Query 20170924_141223_00007_bg85h, FINISHED, 1 nodeSplits: 18 total, 18 done (100.00%)0:01 [7 rows, 175B] [11 rows/s, 290B/s]
presto:default> select * from student05; student_id | student_name |    born    | sex ------------+--------------+------------+----- dlmu_01    | happyprince  | 2015-07-08 |   0 (1 row)Query 20170924_141229_00008_bg85h, FINISHED, 1 nodeSplits: 17 total, 17 done (100.00%)0:01 [1 rows, 358B] [1 rows/s, 448B/s]
presto:default> show tables;   Table   ----------- dual       student    student01  student02  student03  student04  student05 (7 rows)Query 20170924_145732_00004_uaaik, FINISHED, 2 nodesSplits: 18 total, 18 done (100.00%)0:01 [7 rows, 175B] [9 rows/s, 238B/s]
presto:default> select * from student; student_id | student_name ------------+-------------- dlmu_01    | happyprince  (1 row)Query 20170924_145746_00005_uaaik, FINISHED, 2 nodesSplits: 17 total, 17 done (100.00%)0:05 [1 rows, 20B] [0 rows/s, 3B/s]

配置其它的连接器

查看系统的连接器数据

(这个系统带有的,不用配置)
presto> show schemas from system;

       Schema       -------------------- information_schema  jdbc                metadata            runtime            (4 rows)Query 20170924_154001_00002_hnbrr, FINISHED, 2 nodesSplits: 18 total, 18 done (100.00%)0:02 [4 rows, 57B] [2 rows/s, 35B/s]
presto> show tables from system.information_schema;
          Table          ------------------------- __internal_partitions__  columns                  schemata                 table_privileges         tables                   views                   (6 rows)Query 20170924_154021_00003_hnbrr, FINISHED, 2 nodesSplits: 18 total, 18 done (100.00%)0:02 [6 rows, 233B] [3 rows/s, 136B/s]
presto> select * from system.information_schema.columns; table_catalog |    table_schema    |       table_name        |         column_name          | ordinal_position | column_default | is_nullable |   data_type    | comment | extra_info ---------------+--------------------+-------------------------+------------------------------+------------------+----------------+-------------+----------------+---------+------------ system        | runtime            | queries                 | node_id                      |                1 | NULL           | YES         | varchar        | NULL    | NULL        system        | runtime            | queries                 | query_id                     |                2 | NULL           | YES         | varchar        | NULL    | NULL        system        | runtime            | queries                 | state                        |                3 | NULL           | YES         | varchar        | NULL    | NULL        system        | runtime            | queries                 | user                         |                4 | NULL           | YES         | varchar        | NULL    | NULL        system        | runtime            | queries                 | source                       |                5 | NULL           | YES         | varchar        | NULL    | NULL       

配置JMX

[hadoop@hadoop01 catalog]$ vim jmx.propertiesconnector.name=jmxjmx.dump-tables=java.lang:type=Runtime,com.facebook.presto.execution.scheduler:name=NodeSchedulerjmx.dump-period=10sjmx.max-entries=86400
presto> SHOW TABLES FROM jmx.current;                                                                     Table                                                                       -------------------------------------------------------------------------------------------------------------------------------------------------- com.facebook.presto.execution.executor:name=multilevelsplitqueue                                                                                  com.facebook.presto.execution.executor:name=taskexecutor                                                                                          com.facebook.presto.execution.resourcegroups:name=internalresourcegroupmanager                                                                    com.facebook.presto.execution.scheduler:name=nodescheduler                                                                                        com.facebook.presto.execution.scheduler:name=splitschedulerstats                                                                                  com.facebook.presto.execution:name=queryexecution                                                                                                 com.facebook.presto.execution:name=querymanager                                                                                                   com.facebook.presto.execution:name=remotetaskfactory                                                                                              com.facebook.presto.execution:name=taskmanager                                                                                                    com.facebook.presto.execution:type=queryqueue,name=global,expansion=global                                                                        com.facebook.presto.failuredetector:name=heartbeatfailuredetector                                                                                 com.facebook.presto.hive.metastore:type=cachinghivemetastore,name=hive                                                                            com.facebook.presto.hive.metastore:type=thrifthivemetastore,name=hive                                                                             com.facebook.presto.hive:type=fileformatdatasourcestats,name=hive                                                                                 com.facebook.presto.hive:type=namenodestats,name=hive                                                                                             com.facebook.presto.hive:type=prestos3filesystem,name=hive                                                                                        com.facebook.presto.memory:name=clustermemorymanager                                                                                              com.facebook.presto.memory:type=clustermemorypool,name=general                                                                                    com.facebook.presto.memory:type=clustermemorypool,name=reserved                                                                                   com.facebook.presto.memory:type=clustermemorypool,name=system                                                                                     com.facebook.presto.memory:type=memorypool,name=general                                                                                           com.facebook.presto.memory:type=memorypool,name=reserved                                                                                          com.facebook.presto.memory:type=memorypool,name=system                                                                                            com.facebook.presto.metadata:name=discoverynodemanager                                                                                            com.facebook.presto.operator.index:name=indexjoinlookupstats                                                                                      com.facebook.presto.security:name=accesscontrolmanager                                                                                            com.facebook.presto.server.remotetask:name=remotetaskstats                                                                                        com.facebook.presto.server:name=asynchttpexecutionmbean                                                                                           com.facebook.presto.server:name=exchangeexecutionmbean                                                                                            com.facebook.presto.server:name=taskresource                                                                                                      com.facebook.presto.spiller:name=spillerfactory                                                                                                   com.facebook.presto.sql.gen:name=expressioncompiler                                                                                               com.facebook.presto.sql.gen:name=joincompiler                                                                                                     com.facebook.presto.sql.gen:name=joinfilterfunctioncompiler                                                                                       com.facebook.presto.sql.gen:name=joinprobecompiler                                                                                                com.facebook.presto.sql.gen:name=orderingcompiler                                                                                                 com.facebook.presto.sql.gen:name=pagefunctioncompiler                                                                                             com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=addintermediateaggregations                                                com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=aggregationexpressionrewrite                                               com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=applyexpressionrewrite                                                     com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=createpartialtopn                                                          com.facebook.presto.sql.planner.iterative:name=iterativeoptimizer,rule=eliminatecrossjoins                                                       Query 20170924_154511_00014_hnbrr, FINISHED, 1 nodeSplits: 18 total, 18 done (100.00%)0:03 [177 rows, 15.7KB] [67 rows/s, 5.98KB/s]
presto> SELECT node, vmname, vmversion     -> FROM jmx.current."java.lang:type=runtime"; node  |              vmname               | vmversion  -------+-----------------------------------+------------ 12345 | Java HotSpot(TM) 64-Bit Server VM | 25.144-b01  111   | Java HotSpot(TM) 64-Bit Server VM | 25.144-b01  222   | Java HotSpot(TM) 64-Bit Server VM | 25.144-b01 (3 rows)Query 20170924_154519_00015_hnbrr, FINISHED, 3 nodesSplits: 19 total, 19 done (100.00%)0:00 [3 rows, 140B] [11 rows/s, 523B/s]
presto> SELECT openfiledescriptorcount, maxfiledescriptorcount     -> FROM jmx.current."java.lang:type=operatingsystem";
 openfiledescriptorcount | maxfiledescriptorcount -------------------------+------------------------                     978 |                   4096                      919 |                   4096                      925 |                   4096 (3 rows)Query 20170924_154527_00016_hnbrr, FINISHED, 3 nodesSplits: 19 total, 19 done (100.00%)0:00 [3 rows, 48B] [8 rows/s, 130B/s]

配置本地文件连接【Local File Connector】

The local file connector allows querying data stored on the local file system of each worker
本地文件连接器,允许查询每个节点的数据文件存储的数据。
创建文件:etc/catalog/localfile.properties

connector.name=localfile

配置属性

Property Name Descriptionpresto-logs.http-request-log.location Directory or file where HTTP request logs are writtenpresto-logs.http-request-log.pattern If the log location is a directory this glob is used to match file names in the directoryLocal File Connector Schemas and TablesThe local file connector provides a single schema named logs. You can see all the available tables by running SHOW TABLES:SHOW TABLES FROM localfile.logs;http_request_logThis table contains the HTTP request logs from each node on the cluster.

测试结果:

presto> show schemas from localfile;
       Schema       -------------------- information_schema  logs               (2 rows)Query 20170924_154211_00006_hnbrr, FINISHED, 2 nodesSplits: 18 total, 18 done (100.00%)0:00 [2 rows, 32B] [4 rows/s, 79B/s]
presto> show tables from localfile.logs;
     Table       ------------------ http_request_log (1 row)Query 20170924_154222_00007_hnbrr, FINISHED, 2 nodesSplits: 18 total, 18 done (100.00%)0:00 [1 rows, 30B] [2 rows/s, 89B/s]
presto> show tables from localfile.information_schema;
       Table          ------------------------- __internal_partitions__  columns                  schemata                 table_privileges         tables                   views                   (6 rows)Query 20170924_154232_00008_hnbrr, FINISHED, 1 nodeSplits: 18 total, 18 done (100.00%)0:00 [7 rows, 263B] [19 rows/s, 728B/s]
presto> select * from localfile.information_schema.tables;
 table_catalog |    table_schema    |       table_name        | table_type ---------------+--------------------+-------------------------+------------ localfile     | information_schema | columns                 | BASE TABLE  localfile     | information_schema | tables                  | BASE TABLE  localfile     | information_schema | views                   | BASE TABLE  localfile     | information_schema | schemata                | BASE TABLE  localfile     | information_schema | __internal_partitions__ | BASE TABLE  localfile     | information_schema | table_privileges        | BASE TABLE  localfile     | logs               | http_request_log        | BASE TABLE (7 rows)Query 20170924_154251_00009_hnbrr, FINISHED, 1 nodeSplits: 17 total, 17 done (100.00%)0:00 [7 rows, 466B] [32 rows/s, 2.11KB/s]

配置一个mysql测试

[hadoop@hadoop01 catalog]$ vim mysql.properties

配置内容为

connector.name=mysqlconnection-url=jdbc:mysql://hadoop01:3306connection-user=rootconnection-password=AAAaaa111

结果测试显示:

presto> SHOW SCHEMAS FROM mysql;
     Schema       -------------------- hive                information_schema  test               (3 rows)Query 20170924_152408_00007_r5qdw, FINISHED, 2 nodesSplits: 18 total, 18 done (100.00%)0:00 [3 rows, 41B] [10 rows/s, 142B/s]
presto> SHOW tables FROM mysql.hive;
        Table           --------------------------- bucketing_cols             cds                        columns_v2                 database_params            dbs                        func_ru                    funcs                      global_privs               part_col_stats             partition_key_vals         partition_keys             partition_params           partitions                 roles                      sd_params                  sds                        sequence_table             serde_params               serdes                     skewed_col_names           skewed_col_value_loc_map   skewed_string_list         skewed_string_list_values  skewed_values              sort_cols                  tab_col_stats              table_params               tbls                       version                   (29 rows)Query 20170924_152420_00008_r5qdw, FINISHED, 2 nodesSplits: 18 total, 18 done (100.00%)0:00 [29 rows, 737B] [66 rows/s, 1.66KB/s]
presto> desc mysql.hive.version;
    Column      |     Type     | Extra | Comment -----------------+--------------+-------+--------- ver_id          | bigint       |       |          schema_version  | varchar(127) |       |          version_comment | varchar(255) |       |         (3 rows)Query 20170924_152526_00009_r5qdw, FINISHED, 1 nodeSplits: 18 total, 18 done (100.00%)0:01 [3 rows, 215B] [2 rows/s, 173B/s]
presto> select * from mysql.hive.version;
 ver_id | schema_version |             version_comment             --------+----------------+-----------------------------------------      1 | 1.2.0          | Set by MetaStore hadoop@192.168.137.101 (1 row)Query 20170924_152559_00010_r5qdw, FINISHED, 1 nodeSplits: 17 total, 17 done (100.00%)0:02 [1 rows, 0B] [0 rows/s, 0B/s]

应用mysql局限性
这里写图片描述

问题

错误01

java.lang.IllegalArgumentException: No factory for connector hive    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)    at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:170)    at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:99)    at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:77)    at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:120)    at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:67)

这个因为hive的名字写错了.记得要写hive-hadoop2这个名字不能乱起的哈,其它名字presto不认识的。

错误02

2017-09-24T22:25:57.586+0800    ERROR   main    com.facebook.presto.server.PrestoServer Unable to create injector, see the following errors:1) Error: Invalid configuration property node.environment: is malformed (for class io.airlift.node.NodeConfig.environment)2) Configuration property 'ordinator' was not used  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:235)2 errorscom.google.inject.CreationException: Unable to create injector, see the following errors:1) Error: Invalid configuration property node.environment: is malformed (for class io.airlift.node.NodeConfig.environment)2) Configuration property 'ordinator' was not used  at io.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:235)2 errors    at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:466)    at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:155)    at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:107)    at com.google.inject.Guice.createInjector(Guice.java:96)    at io.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:242)    at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:116)    at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:67)

修改一下节点的名字,节点名字写得有问题;集群用一串英文,节点ID用数字;或属性名字写错。要检查配置文件中属性名是否写错或写漏字母。实践过程中,这个是因为我复制配置时,漏了一个字母造成的。

错误3

2017-09-24T22:40:08.513+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Cannot connect to discovery server for announce: Announcement failed for http://192.169.137.101:80802017-09-24T22:40:08.514+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 1.16s. Next request will happen within 0.00s2017-09-24T22:40:09.278+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 744.71ms. Next request will happen within 1.00ms2017-09-24T22:40:09.539+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 258.22ms. Next request will happen within 2.00ms2017-09-24T22:40:10.282+0800    ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 739.25ms. Next request will happen within 4.00ms2017-09-24T22:40:10.515+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 223.25ms. Next request will happen within 8.00ms2017-09-24T22:40:10.562+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 30.86ms. Next request will happen within 16.00ms2017-09-24T22:40:11.293+0800    ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 698.12ms. Next request will happen within 32.00ms2017-09-24T22:40:11.534+0800    ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 175.65ms. Next request will happen within 64.00ms2017-09-24T22:40:12.362+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 699.87ms. Next request will happen within 128.00ms2017-09-24T22:40:12.665+0800    ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 45.73ms. Next request will happen within 256.00ms^C2017-09-24T22:40:13.600+0800  ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 422.26ms. Next request will happen within 512.00ms2017-09-24T22:40:14.667+0800    ERROR   Announcer-1 io.airlift.discovery.client.Announcer   Service announcement failed after 66.66ms. Next request will happen within 1000.00ms^C2017-09-24T22:40:16.182+0800  ERROR   Announcer-2 io.airlift.discovery.client.Announcer   Service announcement failed after 514.05ms. Next request will happen within 1000.00ms^V2017-09-24T22:40:17.604+0800  ERROR   Announcer-0 io.airlift.discovery.client.Announcer   Service announcement failed after 421.29ms. Next request will happen within 1000.00ms

把IP修改成域名或主机名就可以了, 用IP不识别;

另外对重启了一下主结点的情况,其它节点会显示,先是找不到,然后就恢复了:
这里写图片描述
配置好的presto引擎,不论源数据是什么形式的数据都可以由这引擎去查询,对于windows的界面像oracle sql/pl那样的查询,可以考虑一下这个工具:
连接Presto:SQuirrel SQL Client安装配置:
http://blog.csdn.net/ld326/article/details/77987507

【作者:happyprince, http://blog.csdn.net/ld326/article/details/78088863】

原创粉丝点击