Apache Phoenix基本操作(1)

来源：互联网发布：汇丰银行软件开发中心编辑：程序博客网时间：2024/05/16 12:07

上一篇博客：http://blog.csdn.net/jiangshouzhuang/article/details/52370765，我们已经将phoenix部署好了，并且测试都没有问题。

本篇我们将介绍phoenix的一些基本操作。

1. 如何使用Phoenix输出Hello World？

1.1 使用sqlline终端命令

sqlline.py SZB-L0023780:2181:/hbase114

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> create table test (mykey integernot null primary key, mycolumn varchar);

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> upsert into test values(1,'Hello');

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> upsert into test values(2,'World!');

0:jdbc:phoenix:SZB-L0023780:2181:/hbase114> select * from test;

+--------------+---------------------+

| MYKEY | MYCOLUMN |

+--------------+---------------------+

| 1 |Hello |

| 2 | World! |

+---------------+---------------------+

1.2 使用Java方式访问

创建test.java文件，内容如下：

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.ResultSet;

import java.sql.SQLException;

import java.sql.PreparedStatement;

import java.sql.Statement;

public class test2 {

public static void main(String[] args) throws SQLException {

Statement stmt = null;

ResultSet rset = null;

Connection con = DriverManager.getConnection("jdbc:phoenix:SZB-L0023780:2181:/hbase114");

stmt= con.createStatement();

stmt.executeUpdate("create table test2 (mykey integer not null primary key, mycolumn varchar)");

stmt.executeUpdate("upsert into test2 values (1,'Hello')");

stmt.executeUpdate("upsert into test2 values (2,'World!')");

con.commit();

PreparedStatement statement = con.prepareStatement("select * from test2");

rset= statement.executeQuery();

while(rset.next()) {

System.out.println(rset.getString("mycolumn"));

}

statement.close();

con.close();

}

编译：

javac test2.java

执行编译好的程序：

java -cp"../phoenix-4.8.0-HBase-1.1-client.jar:." test2

输出结果：

Hello

World!

2. 如何通过Phoenix批量加载数据

Phoenix提供了两种方法用来加载CSV数据到Phoenix 表中，一种是通过psql命令，单线程方式加载；另一种是基于MapReduce批量加载方式。

psql方式适合几十MB的数据量，而基于MapReduce的方式适合更大的数据量加载。

下面我们来演示一下通过这两种方式加载CSV格式的数据到Phoenix表中。

(1) 样例数据data.csv

12345,John,Doe

67890,Mary,Poppins

(2) 创建表SQL

CREATE TABLE example (

my_pk bigint not null,

m.first_name varchar(50),

m.last_name varchar(50)

CONSTRAINT pk PRIMARY KEY(my_pk)

);

(3) 通过psql方式加载

bin/psql.py -t EXAMPLE SZB-L0023780:2181:/hbase114 data.csv

psql.py使用的示例如下：

Examples:

psql my_ddl.sql

psql localhost my_ddl.sql

psql localhost my_ddl.sql my_table.csv

psql -t MY_TABLE my_cluster:1825 my_table2012-Q3.csv

psql -t MY_TABLE -h COL1,COL2,COL3 my_cluster:1825 my_table2012-Q3.csv

psql -t MY_TABLE -h COL1,COL2,COL3 -d : my_cluster:1825 my_table2012-Q3.csv

下面将一些参数说明一下：

Parameter

Description

-t

加载数据的表名，默认为CSV文件名称，大小写敏感

-h

Overrides the column names to which the CSV data maps and is case sensitive. A special value of in-line indicating that the first line of the CSV file determines the column to which the data maps.

-s

Run in strict mode, throwing an error on CSV parsing errors

-d

Supply a custom delimiter or delimiters for CSV parsing

-q

Supply a custom phrase delimiter, defaults to double quote character

-e

Supply a custom escape character, default is a backslash

-a

Supply an array delimiter (explained in more detail below)

(4) 通过MapReduce来加载数据

对于分布式集群更高吞吐量数据加载，建议使用MapReduce加载方式。这种方式首先将数据写入HFile中，等HFile创建好之后就写入到HBase表中。

MapReduce加载器是使用hadoop命令，然后借助Phoenix的Client的Jar实现的，如下：

hadoop jar phoenix-<version>-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool--table EXAMPLE --input /data/example.csv

这里需要注意的是，输入的文件必须是HDFS上的文件，不是本地文件系统上的。

比如我在环境里面执行如下；

hadoop jar phoenix-4.8.0-HBase-1.1-client.jarorg.apache.phoenix.mapreduce.CsvBulkLoadTool --table EXAMPLE --input /okok/data.csv-z SZB-L0023780:2181:/hbase114

执行部分日志如下：

mapreduce.AbstractBulkLoadTool: LoadingHFiles from /tmp/94b60a06-86d8-49d7-a8d1-df5428971a33

mapreduce.AbstractBulkLoadTool: LoadingHFiles for EXAMPLE from /tmp/94b60a06-86d8-49d7-a8d1-df5428971a33/EXAMPLE

mapreduce.LoadIncrementalHFiles: Trying toloadhfile=hdfs://SZB-L0023776:8020/tmp/94b60a06-86d8-49d7-a8d1-df5428971a33/EXAMPLE/M/b456b2a2a5834b32aa8fb3463d3bfd76first=\x80\x00\x00\x00\x00\x0009 last=\x80\x00\x00\x00\x00\x01\x092

下面我们将MapReduce加载器常用的参数罗列一下：

Parameter

Description

-i,–input

Input CSV path (mandatory)

-t,–table

Phoenix table name (mandatory)

-a,–array-delimiter

Array element delimiter (optional)

-c,–import-columns

Comma-separated list of columns to be imported

-d,–delimiter

Input delimiter, defaults to comma

-g,–ignore-errors

Ignore input errors

-o,–output

Output path for temporary HFiles (optional)

-s,–schema

Phoenix schema name (optional)

-z,–zookeeper

Zookeeper quorum to connect to (optional)

-it,–index-table

Index table name to load (optional)

注：

psql.py这种方式典型的upsert效率为每秒20k-50k行（依赖每行的大小）。

使用方法如下：

使用psql创建表：

psql.py [zookeeper] ../examples/web_stat.sql

使用psql批量upsert CSV格式的数据：

psql.py [zookeeper] ../examples/web_stat.csv

0 0