GreenPlum 启动失败Failed to start Master instance in admin mode问题

来源:互联网 发布:淘宝女装哪个货源好 编辑:程序博客网 时间:2024/05/14 10:02

开发同事跟我说,测试环境的greenplun突然连接不上了,于是我登陆进去服务器,发现没有greenplun进程了,问开发同事是否有对greenplumn有过改动之类的,他们说没有动过,这就奇了怪了,咋回事呢?



自己手动尝试下gpstart启动报错

[gpadmin@00_mdw ~]$ gpstart20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment...20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c'20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'20170517:10:53:59:017586 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Failed to start Master instance in admin mode20170517:10:54:01:017586 gpstart:00_mdw:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 1 Command was: 'env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /home/gpadmin/gpdata/gpmaster/gpseg-1 -l /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_log/startup.log -w -t 600 -o " -p 5432 -b 1 -z 0 --silent-mode=true -i -M master -C -1 -x 0 -c gp_role=utility " start'rc=1, stdout='waiting for server to start...... stopped waiting', stderr='pg_ctl: PID file "/home/gpadmin/gpdata/gpmaster/gpseg-1/postmaster.pid" does not existpg_ctl: could not start serverExamine the log output.'[gpadmin@00_mdw ~]$



日志信息比较简单,没有看出来啥有用的信息,砸破呢?

2017-05-16 11:18:20.666964 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,2017-05-16 11:18:20.692596 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,2017-05-16 11:18:20.693209 CST,,,p16542,th251283232,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,2017-05-16 13:27:17.059691 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,2017-05-16 13:27:17.062897 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,2017-05-16 13:27:17.063528 CST,,,p16630,th930637600,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,2017-05-17 10:53:59.610428 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","removing all temporary files",,,,,,,,"RemovePgTempFiles","fd.c",1873,2017-05-17 10:53:59.643630 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","temporary files using default filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2569,2017-05-17 10:53:59.644220 CST,,,p17597,th695740192,,,,0,,,seg-1,,,,,"LOG","00000","transaction files using default pg_system filespace",,,,,,,,"primaryMirrorPopulateFilespaceInfo","primary_mirror_mode.c",2629,



去日志目录下面去查看所有的日志记录,看到最新的有一个.csv文件,gpdb-2017-05-17_112454.csv

博客来源地址:http://blog.csdn.net/mchdba/article/details/72383684,作者为mchdba黄杉,谢绝转载。

[gpadmin@00_mdw pg_log]$ ll -ttotal 740-rw-------. 1 gpadmin gpadmin    386 May 17 11:24 gpdb-2017-05-17_112454.csv-rw-------. 1 gpadmin gpadmin   3951 May 17 11:24 startup.log-rw-------. 1 gpadmin gpadmin    384 May 17 10:53 gpdb-2017-05-17_105359.csv-rw-------. 1 gpadmin gpadmin    384 May 16 13:27 gpdb-2017-05-16_132717.csv-rw-------. 1 gpadmin gpadmin    384 May 16 11:18 gpdb-2017-05-16_111820.csv-rw-------. 1 gpadmin gpadmin  30004 May 16 11:17 gpdb-2017-05-16_000000.csv-rw-------. 1 gpadmin gpadmin      0 May 15 00:00 gpdb-2017-05-15_000000.csv-rw-------. 1 gpadmin gpadmin      0 May 14 00:00 gpdb-2017-05-14_000000.csv-rw-------. 1 gpadmin gpadmin      0 May 13 00:00 gpdb-2017-05-13_000000.csv-rw-------. 1 gpadmin gpadmin      0 May 12 00:00 gpdb-2017-05-12_000000.csv-rw-------. 1 gpadmin gpadmin      0 May 11 00:00 gpdb-2017-05-11_000000.csv-rw-------. 1 gpadmin gpadmin      0 May 10 00:00 gpdb-2017-05-10_000000.csv-rw-------. 1 gpadmin gpadmin  13073 May  9 21:14 gpdb-2017-05-09_000000.csv-rw-------. 1 gpadmin gpadmin  18458 May  8 11:38 gpdb-2017-05-08_000000.csv-rw-------. 1 gpadmin gpadmin      0 May  7 00:00 gpdb-2017-05-07_000000.csv[gpadmin@00_mdw pg_log]$ more gpdb-2017-05-17_112454.csv2017-05-17 11:24:54.936656 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"LOG","F0000","invalid authentication method ""127.0.0.1/28""",,,,,"line 87 of configuration file ""/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf""",,0,,"hba.c",1095,2017-05-17 11:24:54.936871 CST,,,p17681,th-400611552,,,,0,,,seg-1,,,,,"FATAL","XX000","could not load pg_hba.conf",,,,,,,0,,"postmaster.c",1529,[gpadmin@00_mdw pg_log]$ 

看到gpdb-2017-05-17_112454.csv文件里面描述的很清晰,是pg_hba.conf配置文件有误,然后去找配置文件/home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf,注释掉报错的那一行【line 87 of configuration file 】”127.0.0.1/28”“

#local all all 127.0.0.1/28 trust



然后再次启动greenplum集群,ok,可以启动起来了

[gpadmin@00_mdw pg_log]$ gpstart20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting gpstart with args: 20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Gathering information and validating the environment...20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.10.0 build commit: f413ff3b006655f14b6b9aa217495ec94da5c96c'20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'20170517:11:28:20:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance in admin mode20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Obtaining Segment details from master...20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Setting new master era20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Started...20170517:11:28:21:017745 gpstart:00_mdw:gpadmin-[INFO]:-Shutting down master20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg0 <<<<<20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 02_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg1 <<<<<20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam1/gpseg4 <<<<<20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on 01_sdw directory /home/gpadmin/gpdata/gpdatam2/gpseg5 <<<<<20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master instance parameters20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Database                 = template120170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master Port              = 543220170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master directory         = /home/gpadmin/gpdata/gpmaster/gpseg-120170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Timeout                  = 600 seconds20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Master standby           = Off 20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------------------20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-Segment instances that will be started20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:---------------------------------------20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Host      Datadir                                Port    Role20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   01_sdw    /home/gpadmin/gpdata/gpdatap1/gpseg0   40000   Primary20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   01_sdw    /home/gpadmin/gpdata/gpdatap2/gpseg1   40001   Primary20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   02_sdw    /home/gpadmin/gpdata/gpdatap1/gpseg2   40000   Primary20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatam1/gpseg2   50000   Mirror20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   02_sdw    /home/gpadmin/gpdata/gpdatap2/gpseg3   40001   Primary20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatam2/gpseg3   50001   Mirror20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatap1/gpseg4   40000   Primary20170517:11:28:23:017745 gpstart:00_mdw:gpadmin-[INFO]:-   03_sdwm   /home/gpadmin/gpdata/gpdatap2/gpseg5   40001   PrimaryContinue with Greenplum instance startup Yy|Nn (default=N):> y20170517:11:28:25:017745 gpstart:00_mdw:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...... 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Process results...20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Successful segment starts                                            = 820170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-   Failed segment starts                                                = 020170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration)   = 4   <<<<<<<<20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Successfully started 8 of 8 segment instances, skipped 4 other segments 20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-----------------------------------------------------20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-****************************************************************************20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-There are 4 segment(s) marked down in the database20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[WARNING]:-****************************************************************************20170517:11:28:28:017745 gpstart:00_mdw:gpadmin-[INFO]:-Starting Master instance 00_mdw directory /home/gpadmin/gpdata/gpmaster/gpseg-1 20170517:11:28:29:017745 gpstart:00_mdw:gpadmin-[INFO]:-Command pg_ctl reports Master 00_mdw instance active20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-No standby master configured.  skipping...20170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[WARNING]:-Number of segments not attempted to start: 420170517:11:28:30:017745 gpstart:00_mdw:gpadmin-[INFO]:-Check status of database with gpstate utility[gpadmin@00_mdw pg_log]$


bty有意思的是greenplum的关键报错信息竟然不在log日志里面,而是记录在了同目录的csv文件里面,这大大惊呆我,哈哈。



最后问题分析,为啥这条127的配置,greenplum就起不起来了呢,去查看pg_hba.conf文件,猜测原因有如下情况:

(1)因为已经有了一个127.0.0.1/28的配置了,导致相互冲突了

[gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep 127host     all         gpadmin         127.0.0.1/28    trust#local    all         all             127.0.0.1/28      trust[gpadmin@00_mdw ~]$ 

(2)local后面只能跟ident之类的配置,不能跟127…..trust的配置

[gpadmin@00_mdw ~]$ more /home/gpadmin/gpdata/gpmaster/gpseg-1/pg_hba.conf |grep local |grep -v "#"local    all         gpadmin         identlocal    replication gpadmin         ident#local    all         all             127.0.0.1/28      trust[gpadmin@00_mdw ~]$ 
阅读全文
1 1
原创粉丝点击