greenplum segment恢复的过程
来源:互联网 发布:js开发中定时器用哪个 编辑:程序博客网 时间:2024/04/29 06:22
#此时已经知道坏了两个SEGMENT 在启动命令里加上-R以限制模式启动
[gpadmin1@hadoop1 ~]$ gpstart -R
20101027:14:11:55:gpstart:hadoop1:gpadmin1-[INFO]:-Starting gpstart with args: -R
20101027:14:11:55:gpstart:hadoop1:gpadmin1-[INFO]:-Gathering information and validating the environment...
20101027:14:11:55:gpstart:hadoop1:gpadmin1-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.0.1.0 build 1'
20101027:14:11:55:gpstart:hadoop1:gpadmin1-[INFO]:-Greenplum Catalog Version: '201005134'
20101027:14:11:55:gpstart:hadoop1:gpadmin1-[INFO]:-Starting Master instance in admin mode
20101027:14:11:56:gpstart:hadoop1:gpadmin1-[INFO]:-Obtaining Greenplum Master catalog information
20101027:14:11:56:gpstart:hadoop1:gpadmin1-[INFO]:-Obtaining Segment details from master...
20101027:14:11:56:gpstart:hadoop1:gpadmin1-[INFO]:-Master Started...
20101027:14:11:56:gpstart:hadoop1:gpadmin1-[INFO]:-Shutting down master
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[WARNING]:-Skipping startup of segment marked down in configuration: on hadoop1 directory /home/gpadmin1/gp4datap1/aligp0 <<<<<
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[WARNING]:-Skipping startup of segment marked down in configuration: on hadoop1 directory /home/gpadmin1/gp4datap2/aligp1 <<<<<
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:---------------------------
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:-Master instance parameters
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:---------------------------
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:-Database = template1
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:-Master Port = 2345
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:-Master directory = /home/gpadmin1/gp4master/aligp-1
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:-Timeout = 60 seconds
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:-Master standby start = On
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:---------------------------------------
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:-Segment instances that will be started
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:---------------------------------------
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- Host Datadir Port Role
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop2 /home/gpadmin1/gp4datam1/aligp0 40000 Primary
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop2 /home/gpadmin1/gp4datam2/aligp1 40001 Primary
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop2 /home/gpadmin1/gp4datap1/aligp2 30000 Primary
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop3 /home/gpadmin1/gp4datam1/aligp2 40000 Mirror
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop2 /home/gpadmin1/gp4datap2/aligp3 30001 Primary
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop3 /home/gpadmin1/gp4datam2/aligp3 40001 Mirror
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop3 /home/gpadmin1/gp4datap1/aligp4 30000 Primary
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop1 /home/gpadmin1/gp4datam1/aligp4 40000 Mirror
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop3 /home/gpadmin1/gp4datap2/aligp5 30001 Primary
20101027:14:11:57:gpstart:hadoop1:gpadmin1-[INFO]:- hadoop1 /home/gpadmin1/gp4datam2/aligp5 40001 Mirror
Continue with Greenplum instance startup Yy|Nn (default=N):
> y
20101027:14:11:58:gpstart:hadoop1:gpadmin1-[INFO]:-Starting standby master
20101027:14:11:58:gpstart:hadoop1:gpadmin1-[INFO]:-Checking if standby master is running on host: hadoop2 in directory: /home/gpadmin1/gp4master/aligp-1
20101027:14:11:58:gpstart:hadoop1:gpadmin1-[INFO]:-No db instance process, entering recovery startup mode
20101027:14:11:59:gpstart:hadoop1:gpadmin1-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
.....
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:-Process results...
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:-----------------------------------------------------
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:- Successful segment starts = 10
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:- Failed segment starts = 0
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 2 <<<<<<<<
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:-----------------------------------------------------
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:-
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:-Successfully started 10 of 10 segment instances, skipped 2 other segments
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:-----------------------------------------------------
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[WARNING]:-****************************************************************************
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[WARNING]:-There are 2 segment(s) marked down in the database
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[WARNING]:-management utility which will recover failed segment instance databases.
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[WARNING]:-****************************************************************************
20101027:14:12:04:gpstart:hadoop1:gpadmin1-[INFO]:-Starting Master instance hadoop1 directory /home/gpadmin1/gp4master/aligp-1 in RESTRICTED mode
20101027:14:12:05:gpstart:hadoop1:gpadmin1-[INFO]:-Command pg_ctl reports Master hadoop1 instance active
NOTICE: Master mirroring synchronizing
20101027:14:12:08:gpstart:hadoop1:gpadmin1-[WARNING]:-Database started but warnings generated <<<<<
20101027:14:12:08:gpstart:hadoop1:gpadmin1-[INFO]:-Check status of database with gpstate utility
[gpadmin1@hadoop1 ~]$ psql -c 'select * from gp_segment_configuration;'
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
4 | 2 | p | p | s | u | 30000 | hadoop2 | hadoop2 | 10000 |
6 | 4 | p | p | s | u | 30000 | hadoop3 | hadoop3 | 10000 |
10 | 2 | m | m | s | u | 40000 | hadoop3 | hadoop3 | 20000 |
12 | 4 | m | m | s | u | 40000 | hadoop1 | hadoop1 | 20000 |
11 | 3 | m | m | s | u | 40001 | hadoop3 | hadoop3 | 20001 |
5 | 3 | p | p | s | u | 30001 | hadoop2 | hadoop2 | 10001 |
7 | 5 | p | p | s | u | 30001 | hadoop3 | hadoop3 | 10001 |
13 | 5 | m | m | s | u | 40001 | hadoop1 | hadoop1 | 20001 |
1 | -1 | p | p | s | u | 2345 | hadoop1 | hadoop1 | |
14 | -1 | m | m | s | u | 2345 | hadoop2 | hadoop2 | |
2 | 0 | m | p | s | d | 30000 | hadoop1 | hadoop1 | 10000 |
8 | 0 | p | m | c | u | 40000 | hadoop2 | hadoop2 | 20000 |
3 | 1 | m | p | s | d | 30001 | hadoop1 | hadoop1 | 10001 |
9 | 1 | p | m | c | u | 40001 | hadoop2 | hadoop2 | 20001 |
(14 rows)
[gpadmin1@hadoop1 ~]$ gprecoverseg
20101027:14:12:36:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Starting gprecoverseg with args:
20101027:14:12:36:gprecoverseg:hadoop1:gpadmin1-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.0.1.0 build 1'
20101027:14:12:36:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Obtaining Segment details from master...
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Greenplum instance recovery parameters
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:----------------------------------------------------------
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Recovery type = Standard
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:----------------------------------------------------------
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Recovery 1 of 2
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:----------------------------------------------------------
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Synchronization mode = Incremental
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance host = hadoop1
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance address = hadoop1
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance directory = /home/gpadmin1/gp4datap1/aligp0
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance port = 30000
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance replication port = 10000
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance host = hadoop2
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance address = hadoop2
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance directory = /home/gpadmin1/gp4datam1/aligp0
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance port = 40000
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance replication port = 20000
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Target = in-place
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:----------------------------------------------------------
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Recovery 2 of 2
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:----------------------------------------------------------
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Synchronization mode = Incremental
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance host = hadoop1
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance address = hadoop1
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance directory = /home/gpadmin1/gp4datap2/aligp1
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance port = 30001
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Failed instance replication port = 10001
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance host = hadoop2
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance address = hadoop2
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance directory = /home/gpadmin1/gp4datam2/aligp1
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance port = 40001
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Source instance replication port = 20001
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:- Recovery Target = in-place
20101027:14:12:37:gprecoverseg:hadoop1:gpadmin1-[INFO]:----------------------------------------------------------
Continue with segment recovery procedure Yy|Nn (default=N):
> y
20101027:14:12:38:gprecoverseg:hadoop1:gpadmin1-[INFO]:-2 segment(s) to recover
20101027:14:12:38:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Ensuring 2 failed segment(s) are stopped
.
20101027:14:12:39:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Updating configuration with new mirrors
20101027:14:12:39:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Updating mirrors
.
20101027:14:12:40:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Starting mirrors
20101027:14:12:40:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
..
20101027:14:12:42:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Process results...
20101027:14:12:42:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Pausing prober
20101027:14:12:42:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Updating configuration to mark mirrors up
20101027:14:12:43:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Unpausing prober
20101027:14:12:43:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Updating primaries
20101027:14:12:43:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
...
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Process results...
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Done updating primaries
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-******************************************************************
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Updating segments for resynchronization is completed.
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-Use gpstate -s to check the resynchronization progress.
20101027:14:12:46:gprecoverseg:hadoop1:gpadmin1-[INFO]:-******************************************************************
[gpadmin1@hadoop1 ~]$ psql -c 'select * from gp_segment_configuration;'
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
4 | 2 | p | p | s | u | 30000 | hadoop2 | hadoop2 | 10000 |
6 | 4 | p | p | s | u | 30000 | hadoop3 | hadoop3 | 10000 |
10 | 2 | m | m | s | u | 40000 | hadoop3 | hadoop3 | 20000 |
12 | 4 | m | m | s | u | 40000 | hadoop1 | hadoop1 | 20000 |
11 | 3 | m | m | s | u | 40001 | hadoop3 | hadoop3 | 20001 |
5 | 3 | p | p | s | u | 30001 | hadoop2 | hadoop2 | 10001 |
7 | 5 | p | p | s | u | 30001 | hadoop3 | hadoop3 | 10001 |
13 | 5 | m | m | s | u | 40001 | hadoop1 | hadoop1 | 20001 |
1 | -1 | p | p | s | u | 2345 | hadoop1 | hadoop1 | |
14 | -1 | m | m | s | u | 2345 | hadoop2 | hadoop2 | |
8 | 0 | p | m | r | u | 40000 | hadoop2 | hadoop2 | 20000 |
2 | 0 | m | p | r | u | 30000 | hadoop1 | hadoop1 | 10000 |
9 | 1 | p | m | r | u | 40001 | hadoop2 | hadoop2 | 20001 |
3 | 1 | m | p | r | u | 30001 | hadoop1 | hadoop1 | 10001 |
(14 rows)
after a few seconds...
[gpadmin1@hadoop1 ~]$ psql -c 'select * from gp_segment_configuration;'
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
4 | 2 | p | p | s | u | 30000 | hadoop2 | hadoop2 | 10000 |
6 | 4 | p | p | s | u | 30000 | hadoop3 | hadoop3 | 10000 |
10 | 2 | m | m | s | u | 40000 | hadoop3 | hadoop3 | 20000 |
12 | 4 | m | m | s | u | 40000 | hadoop1 | hadoop1 | 20000 |
11 | 3 | m | m | s | u | 40001 | hadoop3 | hadoop3 | 20001 |
5 | 3 | p | p | s | u | 30001 | hadoop2 | hadoop2 | 10001 |
7 | 5 | p | p | s | u | 30001 | hadoop3 | hadoop3 | 10001 |
13 | 5 | m | m | s | u | 40001 | hadoop1 | hadoop1 | 20001 |
1 | -1 | p | p | s | u | 2345 | hadoop1 | hadoop1 | |
14 | -1 | m | m | s | u | 2345 | hadoop2 | hadoop2 | |
8 | 0 | p | m | s | u | 40000 | hadoop2 | hadoop2 | 20000 |
2 | 0 | m | p | s | u | 30000 | hadoop1 | hadoop1 | 10000 |
9 | 1 | p | m | s | u | 40001 | hadoop2 | hadoop2 | 20001 |
3 | 1 | m | p | s | u | 30001 | hadoop1 | hadoop1 | 10001 |
(14 rows)
注意到gp_segment_configuration中mode字段在不同阶段的值。
1、
由于hadoop1上30000和30001端口上的两个PRIMARY INSTANCE宕掉了,与之相对应的在hadoop2上40000和40001端口上的两个MIRROR INSTANCE的MODE字段值变为c,也就是change logging,用于记录在此阶段(原先的PRIMARY INSTANCE宕机的时间段)产生的日志内容。
2、
执行gprecoverseg命令以后,四个INSTANCE的MODE字段均变为r,resyncing,此时系统在做的就是应用日志内容,同步PRIMARY和MIRROR。
3、
再次查看各INSTANCE的状态,此时已经同步完成,MODE列均为s了,也就是synchronized。
下面是GPADMIN4.0文档上的一段话
In the event of a segment failure, the file replication process is stopped and the mirror segment is automatically brought up as the active segment instance. All database operations then continue using the mirror. While the mirror is active, it is also logging all transactional changes made to the database. This system state is known as Change Tracking mode. When the failed segment is ready to be brought back online, administrators initiate a recovery process to bring it back into operation. The recovery process synchronizes with the mirror and only copies over the changes that were missed while the segment was down. This system state is known as Resynchronizing mode. Once all mirrors and their primaries are synchronized again, the system state becomes Synchronized.
- greenplum segment恢复的过程
- greenplum恢复失败的segment的方法
- greenplum初始化的过程
- 监控GreenPlum Segment磁盘空间
- Greenplum添加segment节点
- Greenplum删除segment节点
- Greenplum recover failed segment
- greenplum segment down 实例
- Greenplum 扩展 segment个数
- Greenplum segment 锁处理
- GreenPlum存储过程的源码导出
- greenplum恢复standby
- greenplum备份与恢复
- 从GREENPLUM集群中去除某个SEGMENT
- Greenplum Database Installation Segment on RHEL 5
- Greenplum segment级锁问题排查方法
- greenplum安装详细过程
- greenplum安装详细过程
- 创建进程时,CreateProcess的前两个参数的作用
- VS2008 PRO 、WDK 和DDKWizard搭建Windows7驱动开发环境[转]
- 关于权限设计的2套方案
- 23种设计模式的有趣见解
- 进程
- greenplum segment恢复的过程
- SB哥
- 静态测试技术
- 网络分析之最短路径分析
- Windows中的桌面管理 Window Station and Desktop Creation
- 如何在IIS上搭建WAP网站
- AAC解码算法原理详解
- 线程安全的单例模式
- asp.net错误:类型与控件(ASP.header _ascx)的类型不兼容。