Install Greenplum Hadoop on VirtualBox 4.2 + centos 6.2

来源:互联网 发布:淘宝达人有什么用 编辑:程序博客网 时间:2024/05/27 19:26

0 Reference

http://visit.gopivotal.com/index.php/email/emailWebview?mkt_tok=3RkMMJWWfF9wsRonuqTMZKXonjHpfsX%2B7ugtXqag38431UFwdcjKPmjr1YIGRMR0aPyQAgobGp5I5FEOQrTYSrNpt6QEXw%3D%3D

http://visit.gopivotal.com/rs/emc/images/GPHD_1_2_IUG.pdf?mkt_tok=3RkMMJWWfF9wsRonuqTMZKXonjHpfsX%2B7ugtXqag38431UFwdcjKPmjr1YIGRMR0aPyQAgobGp5I5FEOQrTYSrNpt6QEXw%3D%3D

1 Install virtualbox 4.2 on ubuntu12.04

Download virtualnox from https://www.virtualbox.org/wiki/Linux_Downloads
sudo dpkg -i virtualbox-4.2_4.2.12-84980~Ubuntu~precise_amd64.deb 
< OR>
http://thedaneshproject.com/posts/how-to-install-virtualbox-on-ubuntu-12-04-lts-precise-pangolin/ 
<OR>
 http://www.itworld.com/virtualization/308757/install-virtualbox-424-ubuntu-1204-or-1210

2 Install guest centos6.2  on virtualbox

Configure the bridge network on ubuntu, http://os.51cto.com/art/200908/144564.htm

Download the system ios from, http://mirror.nsc.liu.se/centos-store/6.2/isos/x86_64/

Install centos6.2 on virtualbox ,http://wenku.baidu.com/view/5a495a7131b765ce05081465.html

3 Install ssh on guest

3.0 Refrence 

http://www.cyberciti.biz/faq/centos-ssh/
http://www.cnblogs.com/eastson/archive/2012/06/29/2570163.html
http://abdussamad.com/archives/365-CentOS-Linux:-Secure-password-less-SSH-access.html
http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/ OpenSSH Installations under CentOS LinuX

3.1 OpenSSH Installations under CentOS Linux

# yum -y install openssh-server openssh-clients
# chkconfig sshd on# service sshd start
# netstat -tulpn | grep :22

       OpenSSH Server Configuration

# vi /etc/ssh/sshd_config
PasswordAuthentication yes
# service sshd restart

 4 clone nodes

Need to reconfigure net card for the new nodevim etc/udev/rules.d/70-persistent-net.rules

4.0 Reference
http://www.2cto.com/os/201210/159434.html

Delete the information about eth0
change the name eth1 to eth0
remember the hardware address "ha_p"

vim etc/sysconfig/network-scripts/ifcfg-eth0 

Change the hardware address to "ha_p" to keep consistent with the etc/udev/rules.d/70-persistent-net.rules  

5 Set up passwordless SSH

5.0 Refrence 

www.google.com.hk/url?sa=t&rct=j&q=centos+install+ssh+passwordless&source=web&cd=2&ved=0CDIQFjAB&url=http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/&ei=ngLDUeTyKcvfkgXMlIHoBA&usg=AFQjCNERLToYS7eIaiX7K-hWMmzoB4SBZA&bvm=bv.48175248,d.dGI&cad=rjt

5.1 Configure on cluster node

5.1.1 Disable the selinux and iptables

vi /etc/selinux/config  SELINUX=disabled 

reboot

chkconfig iptables off

5.1.2 Configure ssh

vim /etc/ssh/sshd_config  PermitRootLogin yes    RSAAuthentication yes  PubkeyAuthentication yes  AuthorizedKeysFile      .ssh/authorized_keysPermitEmptyPasswords yes  PasswordAuthentication no
service sshd restart

5.2Manual on admin node

Login as root

Follow the tutorial:(change the username and ip )

www.google.com.hk/url?sa=t&rct=j&q=centos+install+ssh+passwordless&source=web&cd=2&ved=0CDIQFjAB&url=http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/&ei=ngLDUeTyKcvfkgXMlIHoBA&usg=AFQjCNERLToYS7eIaiX7K-hWMmzoB4SBZA&bvm=bv.48175248,d.dGI&cad=rjt

[root@admin_node~]ssh-keygen –t rsa <enter><enter><enter>[root@admin_node~]ssh root@192.168.3.160 mkdir -p .ssh[root@admin_node~]cat .ssh/id_rsa.pub | ssh root@192.168.3.160 'cat >> .ssh/authorized_keys'[root@admin_node~]ssh root@192.168.3.160 "chmod 700 .ssh; chmod 640 ssh/authorized_keys"[root@admin_node~]ssh root@ <your ip>

5.3 Reonfigure ssh on cluster nodes

PermitEmptyPasswords no  PasswordAuthentication yes

Now the cluster node can support keylogin(passwordless login) if it has the authorized_keys, if not , then it support password login。

6 Install Java

Reference: http://www.if-not-true-then-false.com/2010/install-sun-oracle-java-jdk-jre-6-on-fedora-centos-red-hat-rhel/

7 Configure the repository

Su –c ‘rpm –Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6.8.noarch.rpm

Create new puppet.repo file

Vim /etc/yum.repos.d/puppet.repo

Add

[puppetlabs]name=Puppet Labs Packagesbaseurl=http://yum.puppetlabs.com/el/6/products/x86_64/enabled=1gpgcheck=0

8 Install GPHD Manager on the Admin Node

8.1 Run the pre-installaiton

1 Log on admin node

2 Download the GPHD_1_2_0_0_GA.all.tgz fromhttp://www.greenplum.com/ or on the EMC download center at Subscribenet.

3 GPHD_1_2_0_0_GA.all/icm/script/preinstall.py

Running gphdmgr preinstall scriptFinding available gphd rpms... [OK]Creating local repository... [OK]Pre install complete

4 cd /usr/lib/gphd/rpms

5 rpm -igphdmgr-webservices-1.0.0-1.noarch.rpm

error:Failed dependencies:         facter>= 1.6.3 is needed by gphdmgr-webservices-1.0.0-1.noarch         puppet= 2.7.19 is needed by gphdmgr-webservices-1.0.0-1.noarch         puppet-server= 2.7.19 is needed by gphdmgr-webservices-1.0.0-1.noarch         ruby= 1.8.7.352-7.el6_2 is needed by gphdmgr-webservices-1.0.0-1.noarch         ruby-augeas>= 0.3.0 is needed by gphdmgr-webservices-1.0.0-1.noarch         ruby-devel<= 1.8.7.352-7.el6_2 is needed by gphdmgr-webservices-1.0.0-1.noarch         ruby-libs<= 1.8.7.352-7.el6_2 is needed by gphdmgr-webservices-1.0.0-1.noarch         ruby-shadow>= 1.4.1 is needed by gphdmgr-webservices-1.0.0-1.noarch         rubygem-mongrelis needed by gphdmgr-webservices-1.0.0-1.noarchThen you should use tool yum and rpm to install the dependent package.

Note: If the the package is exist in the /usr/lib/gphd/rpms, tool rpm should be used,if  the low version package is specified, the package should be downloaded, thenuse rpm For that yum sometimes install the high version automaticly.

6 rpm -i gphdmgr-webservices-1.0.0-1.noarch.rpm

8.2 Install Greenplum Database(optional)

8.2.0 Reference 

http://media.gpadmin.me/wp-content/uploads/2011/02/GPInstallGuide-4.0.4.0.pdf

8.2.1 Install xfs filesystem on /dev/sda2 mounted at /gbdata1(optional)

yum install xfsprogsmkfs.xfs -f -i size=256 -l size=10m,lazy-count=1 -d agcount=4 /dev/sda2mkdir /gpdata1vim /etc/fstab## /etc/fstab# Created by anaconda on Fri Jun 21 09:10:11 2013## Accessible filesystems, by reference, are maintained under '/dev/disk'# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info#UUID=5a3added-a3d6-4a1d-a7b2-2c7f5ab916ab /            ext4    defaults        1 1#UUID=aa9ddcfb-c0a9-4090-b5e2-c11da46f5c15 /free_space   ext4    defaults        1 2UUID=1b8055d7-f2f6-4f56-964e-e809826934af swap          swap    defaults        0 0tmpfs                   /dev/shm                tmpfs   defaults        0 0devpts                  /dev/pts                devpts  gid=5,mode=620  0 0sysfs                   /sys                    sysfs   defaults        0 0proc                    /proc                   proc    defaults        0 0#xiaoziliang/dev/sda2 /gpdata1 xfs rw,noatime,nodiratime,noikeep,nobarrier,inode64,allocsize=16m 1 1
Note: UUID=aa9ddcfb-c0a9-4090-b5e2-c11da46f5c15 == /dev/sda2 in my host

Mount -a

8.2.2 Install greenplum database

Reference:

http://media.gpadmin.me/wp-content/uploads/2011/02/GPInstallGuide-4.0.4.0.pdf

http://blog.csdn.net/rgb_rgb/article/details/9103289

Note: 

N1:Type command  "source /usr/localgreenplum-db/greenplum_path.sh"  before using gpssh-xkeys command for a new user

N2: Edit /etc/hosts

Vim /etc/hosts127.0.0.1        admin.localdomain admin # your admin hostname is admin,whatever name is ok ::1             localhost6.localdomain6 localhost6192.168.3.217 master # admin node192.168.3.159 clusters_one # segment node192.168.3.160 clusters_two # segment node
So hostname(master, clusters_one,cluster_two) can present for the host or you must use the ip directly for connecting. 

N3: Run the gpcheck utility using the host file you just created. For example:
$ gpcheck -f host_file -m mdw -s smdw

host_file

masterclusters_oneclusters_two
Resulte:
20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(admin): on device (sda) IO scheduler 'cfq' does not match expected value 'deadline'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(admin): /etc/sysctl.conf value for key 'kernel.sem' has value '250 64000 100 512' and expects '250 512000 100 2048'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(admin): variable not detected in /etc/sysctl.conf: 'kernel.msgmni'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(admin): variable not detected in /etc/sysctl.conf: 'net.ipv4.ip_local_port_range'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_one): on device (sda) IO scheduler 'cfq' does not match expected value 'deadline'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_one): /etc/sysctl.conf value for key 'kernel.sem' has value '250 64000 100 512' and expects '250 512000 100 2048'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_one): variable not detected in /etc/sysctl.conf: 'kernel.msgmni'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_one): variable not detected in /etc/sysctl.conf: 'net.ipv4.ip_local_port_range'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_two): on device (sda) IO scheduler 'cfq' does not match expected value 'deadline'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_two): /etc/sysctl.conf value for key 'kernel.sem' has value '250 64000 100 512' and expects '250 512000 100 2048'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_two): variable not detected in /etc/sysctl.conf: 'kernel.msgmni'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_two): variable not detected in /etc/sysctl.conf: 'net.ipv4.ip_local_port_range'20130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_one): uname -r output different among hosts: admin : 2.6.32-358.11.1.el6.x86_64 != clusters_one : 2.6.32-220.el6.x86_6420130625:09:42:57:002253 gpcheck:admin:root-[ERROR]:-GPCHECK_ERROR host(clusters_two): uname -r output different among hosts: admin : 2.6.32-358.11.1.el6.x86_64 != clusters_two : 2.6.32-220.el6.x86_64
Address the problems

Set read-ahead

Vim /etc/rc.local and add
blockdev --setra 16384 /dev/sdablockdev --setra 16384 /dev/sda1blockdev --setra 16384 /dev/sda2blockdev --setra 16384 /dev/sda3
reboot

Configure IO schedule

Vim /boot/grub/menu.list
Add the item "elevator=deadline"    
default=0timeout=5splashimage=(hd0,0)/boot/grub/splash.xpm.gzhiddenmenutitle CentOS (2.6.32-220.el6.x86_64)root (hd0,0)kerne /boot/vmlinuz-2.6.32-220.el6.x86_64 ro root=UUID=5a3added-a3d6-4a1d-a7b2-2c7f5ab916ab rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD quiet SYSFONT=latarcyrheb-sun16 rhgb crashkernel=128M  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM  elevator=deadlineinitrd /boot/initramfs-2.6.32-220.el6.x86_64.img
Configure /etc/sysctl.conf

Add the items

kernel.sem = 250 64000 100 512kernel.shmmax = 500000000kernel.shmmni = 4096kernel.shmall = 4000000000kernel.sem = 250 64000 100 512kernel.sysrq = 1kernel.core_uses_pid = 1kernel.msgmnb = 65536kernel.msgmax = 65536#new itemskernel.msgmni = 2048net.ipv4.tcp_syncookies = 1net.ipv4.ip_forward = 0net.ipv4.conf.default.accept_source_route = 0#new itemsnet.ipv4.ip_local_port_range = 1025 65535net.ipv4.tcp_tw_recycle=1net.ipv4.tcp_max_syn_backlog=4096net.ipv4.conf.all.arp_filter = 1net.core.netdev_max_backlog=10000vm.overcommit_memory=2

Configure /etc/security/limits.conf add the items

* soft nofile 65536* hard nofile 65536* soft nproc 131072* hard nproc 131072
All nodes should run with the same kernel version

8.2.3 Initialzing Greenplum Database

Note: 

Close iptables and selinux on all nodes

Chkconfig iptables offvim /etc/selinux/confSELINUX=disable
vim /home/gpadmin/gpinitsystem_config
ARRAY_NAME="EMC Greenplum DW"MACHINE_LIST_FILE=/home/gpadmin/all_segmentSEG_PREFIX=gpsegPORT_BASE=40000declare -a DATA_DIRECTORY=(/gpdata1/primary /gpdata1/primary /gpdata1/primary)MASTER_HOSTNAME=adminMASTER_DIRECTORY=/masterdataMASTER_PORT=5432TRUSTED_SHELL=sshCHECK_POINT_SEGMENTS=8ENCODING=UNICODE


gpstate –d /masterdata/gpseg-1
resulte:
20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-Greenplum Database instance successfully created20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-------------------------------------------------------20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-To complete the environment configuration, please 20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-update gpadmin .bashrc file with the following20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-2. Add "export MASTER_DATA_DIRECTORY=/masterdata/gpseg-1"20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-   or, use -d /masterdata/gpseg-1 option for the Greenplum scripts20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-   Example gpstate -d /masterdata/gpseg-120130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-Script log file = /home/gpadmin/gpAdminLogs/gpinitsystem_20130625.log20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-To initialize a Standby Master Segment for this Greenplum instance20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-Review options for gpinitstandby20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-------------------------------------------------------20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-The Master /masterdata/gpseg-1/pg_hba.conf post gpinitsystem20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-has been configured to allow all hosts within this new20130625:16:24:07:002389 gpinitsystem:admin:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this20130625:16:24:08:002389 gpinitsystem:admin:gpadmin-[INFO]:-new array must be explicitly added to this file20130625:16:24:08:002389 gpinitsystem:admin:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is20130625:16:24:08:002389 gpinitsystem:admin:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory20130625:16:24:08:002389 gpinitsystem:admin:gpadmin-[INFO]:-------------------------------------------------------

10 Follow the next steps from 

http://visit.gopivotal.com/rs/emc/images/GPHD_1_2_IUG.pdf?mkt_tok=3RkMMJWWfF9wsRonuqTMZKXonjHpfsX%2B7ugtXqag38431UFwdcjKPmjr1YIGRMR0aPyQAgobGp5I5FEOQrTYSrNpt6QEXw%3D%3D




 

 

 

 

 


 

原创粉丝点击