开源GreenPlum编译及部署
来源:互联网 发布:淘宝新店铺如何装修 编辑:程序博客网 时间:2024/06/05 12:45
GreenPlum今年10月份已经开源出来,得到这个消息有点晚,立刻着手尝鲜试用。尽管当前大数据环境下HADOOP/SPARK是主流,但MPP应有一席之地。具体来讲,我目前所在的是传统行业,数据量TB级左右,结构化数据为主,数据膨胀情况可预见,场景仍然是适用的,关键是避免一系列开发、维护的上的麻烦事,多一个坑不如少一个坑,况且已经开源。从使用者的角度,我能接受这一类的东西。
下面我结合查的一些资料做了下实际编译以及部署,比对了我们当前生产环境的版本与源码版本。
一、 编译
1. 提前安装好依赖包
以下依赖包是我的环境上差的包,具体缺什么安装时候逐个排查
yum -y install readline readline-devel
yum -y install curl curl-devel
yum -y install bzip2-devel
yum -y install gcc gcc-g++
2. 从github上下载源代码包
解压至本地目录
3. make编译过程
./configure
make
make install
默认安装到目录:/usr/local/gpdb
4. python 问题修复
从我实际部署情况看,开源的这个包还是留了个坑,关于python库问题。我的解决办法:
- 从其它官方ZIP包中拷贝覆盖下 python 库:/usr/local/gpdb/lib/python
- 修改好/usr/local/gpdb/greenplum_path.sh,将#PYTHONHOME,LD_LIBRARY_PATH做了注释
二、 安装
下面是一个单机部署模式,即单节点上master+segment
1. 操作系统内核参数调整
在/etc/sysctl.conf文件中增加如下内容(有相同的项保留一个)
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 64000 100 512
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni=2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.conf.default.arp_filter = 1
net.ipv4.ip_local_port_range=1025 65535
net.core.netdev_max_backlog=10000
vm.overcommit_memory=2
后执行命令:sysctl -p 使其内容生效
在文件/etc/security/limits.conf (RHE6及以后 对应的文件是/etc/security/limits.d/90-nproc.conf)中添加如下
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
2. 创建操作系统用户
创建gp的安装以及管理用户gpadmin
useradd gpadmin
passwd gpadmin
做完后将gp自带的环境变量.sh文件加进去
vi ~/.bashrc
加入
source /usr/local/gpdb/greenplum_path.sh
3. 创建host文件
这里需要两个文件seg_hosts,all_hosts。直接文本编辑保存:
/home/gpadmin/seg_hosts
/home/gpadmin/all_hosts
内容为当前主机名
4. 利用host文件,给root用户交互密钥
此举是为了远程SSH到其它节点上执行命令方便(类似于HADOOP免密码登录过程)
$GPHOME/bin/gpssh-exkeys-f /home/gpadmin/all_hosts
做好之后,可以做个远程SSH测试
$GPHOME/bin/gpssh -f/home/gpadmin/seg_hosts 'useradd gpadmin'
#模拟SSH执行命令
gpssh -f /home/gpadmin/seg_hosts 'useraddgpadmin'
5. 目录创建及赋权限给gpadmin
#安装目录
chown -R gpadmin:gpadmin /usr/local/gpdb
#创建master,segment目录
mkdir /data/gpmaster
mkdir /data/gpsegment1
mkdir /data/gpsegment2
这里一个节点会产生出两个segment
chown gpadmin:gpadmin /data/gpmaster/
chown gpadmin:gpadmin /data/gpsegment1/
chown gpadmin:gpadmin /data/gpsegment2/
6. 利用host文件,给gpadmin用户交互密钥
切到gpadmin用户下
gpssh-exkeys -f/home/gpadmin/all_hosts
更新时间
gpssh -f seg_hosts -v date
7. 同步系统时钟
gpssh -f seg_hosts -vdate
8. 系统检测
gpcheckos -f /home/gpadmin/all_hosts -mbd-131 -s bd-131
checking: postgres.md5 = 122d211d83d93a10e584a0cc63ed08db
checking: postgres.version = postgres (Greenplum Database) 8.3devel
checking: sync.time between (2015-12-08 21:18:07.759845, 2015-12-08 21:18:27.759845)
checking: platform.hostname
checking: platform.memory
checking: platform.memory = 33667022848
checking: platform.system = linux or sunos
checking: platform.system = linux
checking: platform.release = 2.6.32-504.el6.x86_64
checking: sysctl.kernel.shmall = 4000000000
ERROR: on bd-131 - "sysctl.kernel.shmall = 4000000000" failed (current value is 4294967296)
checking: sysctl.net.ipv4.tcp_max_syn_backlog = 4096
ERROR: on bd-131 - "sysctl.net.ipv4.tcp_max_syn_backlog = 4096" failed (current value is 2048)
checking: sysctl.vm.overcommit_memory = 2
ERROR: on bd-131 - "sysctl.vm.overcommit_memory = 2" failed (current value is 0)
checking: sysctl.net.core.netdev_max_backlog = 10000
ERROR: on bd-131 - "sysctl.net.core.netdev_max_backlog = 10000" failed (current value is 1000)
checking: ulimit.nofile >= 65536
ERROR: on bd-131 - "ulimit.nofile >= 65536" failed (current value is 1024)
checking: env.GPHOME = /usr/local/gpdb
checking: sysctl.kernel.sem = 250 64000 100 512
ERROR: on bd-131 - "sysctl.kernel.sem = 250 64000 100 512" failed (current value is 250 32000 32 128)
checking: sysctl.kernel.shmmax >= 500000000
checking: ulimit.nproc >= 131072
ERROR: on bd-131 - "ulimit.nproc >= 131072" failed (current value is 1024)
checking: sysctl.kernel.shmmni >= 4096
checking: sysctl.net.ipv4.ip_local_port_range = 1025 65535
ERROR: on bd-131 - "sysctl.net.ipv4.ip_local_port_range = 1025 65535" failed (current value is 32768 61000)
checking: sysctl.net.ipv4.tcp_tw_recycle = 1
ERROR: on bd-131 - "sysctl.net.ipv4.tcp_tw_recycle = 1" failed (current value is 0)
checking: python.version >= 2.5.0
[FIX bd-131] please add/modify the following line(s) in /etc/sysctl.conf
kernel.shmall = 4000000000
net.ipv4.tcp_max_syn_backlog = 4096
vm.overcommit_memory = 2
net.core.netdev_max_backlog = 10000
kernel.sem = 250 64000 100 512
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.tcp_tw_recycle = 1
[FIX bd-131] please add/modify the following line(s) in /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
一开始我并没有去做修改内核参数之类的操作,因此这里会出现一堆的报错,尽可能去修复对为止。
磁盘IO测试(时间比较久,可选)
gpcheckperf -f /home/gpadmin/seg_hosts -rds -D -d /data/gpsegment1 -d /data/gpsegment2
网络测试
。。。
9. 初始化数据库
复制一份数据库配置文件过来修改
样本在:$GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config
保存至:/home/gpadmin/gpconfigs/gpinitsystem_config
declare -a DATA_DIRECTORY=(/data/gpsegment1 /data/gpsegment2)
MASTER_DIRECTORY=/data/gpmaster
MACHINE_LIST_FILE=/home/gpadmin/seg_hosts
。。。
执行创建命令:gpinitsystem -c /home/gpadmin/gpconfigs/gpinitsystem_config
中间会回答Y
10. 数据库状态
gpstate -s
按上述步骤做完后还报一个错,在.bashrc文件中加一个变量解决:
exportMASTER_DATA_DIRECTORY=/data/gpmaster/gpseg-1
11. 连接数据库测试
版本信息
Greenplum initsystemversion = 4.3.99.00 build dev
Postgres version = 8.3devel
我们目前生产的环境是:4.1.1.3
下面是告警,暂时未细究
20151208:22:11:45:000547gpinitsystem:bd-131:gpadmin-[INFO]:-Starting the Master in admin mode
/usr/local/gpdb/lib/python/subprocess32.py:472:RuntimeWarning: The _posixsubprocess module is not being used. Child processreliability may suffer if your program uses threads.
"program uses threads.",RuntimeWarning)
/usr/local/gpdb/lib/python/subprocess32.py:472:RuntimeWarning: The _posixsubprocess module is not being used. Child processreliability may suffer if your program uses threads.
"program uses threads.",RuntimeWarning)
/usr/local/gpdb/lib/python/subprocess32.py:472:RuntimeWarning: The _posixsubprocess module is not being used. Child processreliability may suffer if your program uses threads.
"program uses threads.",RuntimeWarning)
- 开源GreenPlum编译及部署
- GreenPlum部署
- Greenplum编译安装及简单测试
- 真正搭建部署greenplum
- red5编译及部署
- greenplum编译安装
- dubbo后台编译及部署
- Greenplum概述及架构
- Greenplum 源码编译安装教程
- 源码编译安装greenplum 5.0
- Greenplum基础及下载(整理)
- greenplum 日期及时间函数
- GreenPlum 集群部署详细过程 V2.0
- Shark简介、部署及编译小结
- Shark简介、部署及编译小结
- Spark 1.3.0源码编译及部署
- Shark简介、部署及编译小结
- Azkaban编译及WebServer模式部署
- js 获取浏览器高度和宽度值(多浏览器)
- Java内存溢出的详细解决方案
- $(document).height()获取文档的高度
- Linux系统下软件安装常用方法
- Linux下的tar压缩解压缩命令详解
- 开源GreenPlum编译及部署
- Hibernate的锁机制及原理
- HDU 2067 小兔的棋盘 【递推】
- Flex4实现 音频播放器 显示语音波形
- windows服务下启动外部程序
- 用JAVA判断一个URL是否有效
- HDU 2044:一只小蜜蜂...【dp】
- 浅析Java web程序之客户端和服务器端交互原理
- centos6.6 安装python环境及Django 1.9.0