GreenPlum 可读写外部表 实战
来源:互联网 发布:oppo手机的mac地址 编辑:程序博客网 时间:2024/06/03 06:04
Greenplum的gpload工具通过可读外部表和并行化文件服务器gpfdist(或gpfdists)来加载数据。gpload处理并行化的基于文件的外部表设置,以及允许我们使用单个YAML文件来配置数据格式,外部表定义,以及gpfdist或gpfdists。
要使用gpload工具有几个前提条件必须满足:
1. 使用gpload的服务器必须安装Python 2.6.2或者以上版本,pygresql工具和pyyaml工具(数据库服务端已经安装了python和需要的python库文件)
2. 必须装gpfdist程序,并把它设置到环境变量PATH中(可以从数据库服务器端的安装目录的子目录bin中拷贝该工具)
3. 确保使用gpload工具的ETL服务器与Greenplum所有服务器的联通性,以及机器名解析正确
gpload通过它的控制文件(YAML格式控制文件)指定装载的细节信息。所以编写控制文件是用好gpload的关键所在。gpload的控制文件采用YAML1.0.0.1文档格式编写,因此它必须得是有效的YAML格式。
查看gpload参数:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[gpadmin@db_m2_slave1 ~]$ gpload -?
gpload [options] -f configuration file
Options:
-h hostname: host to connect to
-p port: port to connect to
-U username: user to connect as
-d database: database to connect to
-W: force password authentication
-q: quiet mode
-D: do not actually load data
-v: verbose
-V: very verbose
-l logfile: log output to logfile
--no_auto_trans: do not wrap gpload in transaction
--gpfdist_timeout timeout: gpfdist timeout value
--version: print version number and exit
-?: help
[gpadmin@db_m2_slave1 ~]$
创建序列
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
create SEQUENCE gpload_audit_seq INCREMENT BY 1 MINVALUE 1 NO MAXVALUE START WITH 1;
select * from gpload_audit_seq ;
创建审计表,记录每一次gpload执行的开始和结束事件信息
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
create table gpload_audit(
id bigint,
state text,
mode text,
tablename text,
updatetime timestamp
) distributed by (id);
创建gpload加载的表
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE expenses
(
name text,
amount numeric,
category text,
des text,
update_date date
);
建立外部表:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
create writable external table t02 (
id integer,
name varchar(128)
)
location (
'gpfdist://101.254.31.72:8090/gpextdata/t21.txt',
'gpfdist://101.254.31.72:8090/gpextdata/t22.txt'
)
Format 'TEXT' (delimiter as E'|' null as '' escape 'OFF')
;
创建gpload.yml配置文件
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[gpadmin@db_m2_slave1 ~]$ more gpload.yml
---
VERSION: 1.0.0.1
DATABASE: yueworld_db
USER: mch
HOST: 10.254.2.111
PORT: 5432
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- 10.254.3.72
PORT: 8090
FILE:
- /home/gpadmin/gpdextdata/t01.txt
- /home/gpadmin/gpdextdata/t02.txt
- COLUMNS:
- id: int
- name: text
- FORMAT: text
- DELIMITER: ','
- ERROR_LIMIT: 25
OUTPUT:
- TABLE: t01
- MODE: INSERT
SQL:
- BEFORE: "INSERT INTO gpload_audit VALUES(nextval('gpload_audit_seq'), 'start', 'insert', 'zhangyun_schema.expenses', current_timestamp)"
- AFTER: "INSERT INTO gpload_audit VALUES(nextval('gpload_audit_seq'), 'end', 'insert', 'zhangyun_schema.expenses', current_timestamp)"
[gpadmin@db_m2_slave1 ~]$
在greenplum db库上赋予账号权限:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
yueworld_db=# grant select,insert on gpload_audit to mch;
GRANT
yueworld_db=# grant select,insert,update,delete on t01 to mch;
GRANT
yueworld_db=#
yueworld_db=# grant select,usage,update on gpload_audit_seq to mch;
GRANT
yueworld_db=# grant all on table t02 to mch;
GRANT
yueworld_db=#
去gpload服务器上,执行gpload:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[gpadmin@db_m2_slave1 ~]$ gpload -f g2.yml
2017-05-15 17:52:30|INFO|gpload session started 2017-05-15 17:52:30
2017-05-15 17:52:30|INFO|setting schema 'public' for table 't02'
2017-05-15 17:52:30|INFO|started gpfdist -p 8090 -P 8091 -f "/data/greenplum/gpextdata/t21.txt /data/greenplum/gpextdata/t22.txt" -t 30
2017-05-15 17:52:30|INFO|running time: 0.29 seconds
2017-05-15 17:52:30|INFO|rows Inserted = 4
2017-05-15 17:52:30|INFO|rows Updated = 0
2017-05-15 17:52:30|INFO|data formatting errors = 0
2017-05-15 17:52:30|INFO|gpload succeeded
[gpadmin@db_m2_slave1 ~]$
在greenplum服务器上查询
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
yueworld_db=# select * from t02;
ERROR: it is not possible to read from a WRITABLE external table.
HINT: Create the table as READABLE instead
yueworld_db=#
奇怪的是,当你再执行一次gpload -f y2.yml的时候,会持续往外部表里面写数据,而且是将原有的数据重新写入到t02里面去,而且执行第三次的时候,这个rows Inserted会变成16,如下看执行过程:
[gpadmin@db_m2_slave1 ~]$ gpload -f g2.yml
2017-05-15 17:52:30|INFO|gpload session started 2017-05-15 17:52:30
2017-05-15 17:52:30|INFO|setting schema 'public' for table 't02'
2017-05-15 17:52:30|INFO|started gpfdist -p 8090 -P 8091 -f "/data/greenplum/gpextdata/t21.txt /data/greenplum/gpextdata/t22.txt" -t 30
2017-05-15 17:52:30|INFO|running time: 0.29 seconds
2017-05-15 17:52:30|INFO|rows Inserted = 8
2017-05-15 17:52:30|INFO|rows Updated = 0
2017-05-15 17:52:30|INFO|data formatting errors = 0
2017-05-15 17:52:30|INFO|gpload succeeded
[gpadmin@db_m2_slave1 ~]$
- GreenPlum 可读写外部表 实战
- GREENPLUM在HADOOP上的可写外部表和可读外部表
- GreenPlum 外部表external table 实战
- greenplum创建外部表
- Greenplum外部表问题解决一例
- greenplum创建外部表笔记-readable篇
- 在Greenplum外部表中使用HDFS
- postgresq通过l外部表连接greenplum
- greenplum通过hdfs访问外部表
- Greenplum中通过外部表进行数据导入导出
- Greenplum中创建外部表进行数据导出
- Greenplum中通过外部表进行数据导入导出 .
- 转:greenplum使用gpfdist与外部表高效导入数据
- 可读写文件系统jffs2
- socket可读/可写
- SquashFS 可读写文件系统
- Greenplum优化实战
- GreenPlum 集群 gpfdist 实战
- ambari架构
- Spring task基本使用
- js获取屏幕大小
- 多态的理解
- OpenGL蓝宝书源码学习(准备工作)
- GreenPlum 可读写外部表 实战
- UVALive
- c++第6次实验【项目4:字符删除】
- 公钥,私钥和数字签名这样最好理解
- 练习3.6 for (auto &c;line) getline(cin,line ) 下标 有个程序有错误没改正
- A3C代码详解
- linux 下载python psutil
- Elasticsearch java api(五) Bulk批量索引
- 随笔