- 論壇徽章:
- 0
|
春節(jié)的假期里接到客戶客戶的電話,曰:主機(jī)重啟后,RAC一個(gè)也起不來(lái)(一個(gè)4節(jié)點(diǎn)的RAC,兩個(gè)滿配的570+兩個(gè)半配的570).一臺(tái)主機(jī)啟動(dòng)很慢很慢,一臺(tái)主機(jī)報(bào)錯(cuò),四個(gè)節(jié)點(diǎn)竟然2個(gè)節(jié)點(diǎn)報(bào)硬件錯(cuò)誤!幸好今年春節(jié)在魔都過(guò),簡(jiǎn)單的了解了一下情況,火速趕往現(xiàn)場(chǎng),路上聯(lián)系主機(jī)工程師,NND在魔都的工程師只有一人并且是轉(zhuǎn)銷售去了的,估計(jì)不會(huì)來(lái),電話找公司安排主機(jī)工程師,竟然無(wú)人接電話,無(wú)果,打公司800電話,TMD還是無(wú)人接,看來(lái)TMD什么7×24啊,什么800,都TMD是浮云,接單之前吹得天花亂墜,有事的時(shí)候又找不到人,找到了又安排一個(gè)新手去,TMD還不如我這個(gè)業(yè)余的去處理好了.......要不是跟客戶熟,客戶早就發(fā)飆了.....好了,牢騷發(fā)完了處理問(wèn)題吧..... 硬件不熟,還是先檢查RAC為啥起不來(lái),檢查crsd進(jìn)程的log: 2011-02-07 15:03:03.869: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2011-02-07 15:03:05.254: [ COMMCRS][351]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))
2011-02-07 15:03:05.254: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2011-02-07 15:03:05.256: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2011-02-07 15:03:06.590: [ COMMCRS][353]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs))
2011-02-07 15:03:06.590: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9
2011-02-07 15:03:06.590: [ CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2011-02-07 15:03:07.973: [ COMMCRS][355]clsc_connect: (1103b91d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_secu_crs)) 發(fā)現(xiàn)是cssd沒(méi)起來(lái),繼續(xù)檢查cssd的日志,發(fā)現(xiàn)一些信息: [ CSSD]2011-02-07 15:13:08.415 >node3: Copyright 2011, Oracle version 10.2.0.4.0 [ CSSD]2011-02-07 15:13:08.415 >node3: CSS daemon log for node node1, number 1, in cluster crs [ CSSD]2011-02-07 15:13:08.421 [1] >TRACE: clssscmain: local-only set to false [ CSSD]2011-02-07 15:13:08.427 [1] >TRACE: clssnmReadNodeInfo: added node 1 (node1) to cluster [ CSSD]2011-02-07 15:13:08.431 [1] >TRACE: clssnmReadNodeInfo: added node 2 (node2) to cluster [ CSSD]2011-02-07 15:13:08.436 [1] >TRACE: clssnmReadNodeInfo: added node 3 (node3) to cluster [ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=node1DBG_CSSD)) [ CSSD]2011-02-07 15:13:08.441 [1] >TRACE: clssnmReadNodeInfo: added node 4 (node4) to cluster [ CSSD]2011-02-07 15:13:08.444 [1] >TRACE: clssgmInitCMInfo: Wait for remote node termination set to 805306368 seconds [ CSSD]2011-02-07 15:13:08.446 [1029] >TRACE: clssnm_skgxninit: Compatible vendor clusterware not in use [ CSSD]2011-02-07 15:13:08.446 [1029] >TRACE: clssnm_skgxnmon: skgxn init failed [ CSSD]2011-02-07 15:13:08.447 [1] >TRACE: clssnmNMInitialize: misscount set to (30) [ CSSD]2011-02-07 15:13:08.448 [1] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 15000 ms, reconfig start (misscount) 30000 ms [ CSSD]2011-02-07 15:13:08.451 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/voting1) [ CSSD]2011-02-07 15:13:08.452 [1030] >TRACE: clssnmvDPT: spawned for disk 0 (/dev/voting1) [ CSSD]2011-02-07 15:13:08.453 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (1//dev/voting2) [ CSSD]2011-02-07 15:13:08.453 [1287] >TRACE: clssnmvDPT: spawned for disk 1 (/dev/voting2) [ CSSD]2011-02-07 15:13:08.455 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (2//dev/voting3) [ CSSD]2011-02-07 15:13:08.455 [1544] >TRACE: clssnmvDPT: spawned for disk 2 (/dev/voting3) [ CSSD]2011-02-07 15:13:10.464 [1030] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/voting1) [ CSSD]2011-02-07 15:13:10.464 [1801] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/voting1) initial sleep interval (1000)ms [ CSSD]2011-02-07 15:13:10.464 [1030] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(13) wrtcnt(604) LATS(4844712) Disk lastSeqNo(604) [ CSSD]2011-02-07 15:13:10.464 [1030] >TRACE: clssnmReadDskHeartbeat: node(3) is down. rcfg(11) wrtcnt(604) LATS(4844712) Disk lastSeqNo(604) [ CSSD]2011-02-07 15:13:10.464 [1030] >TRACE: clssnmReadDskHeartbeat: node(4) is down. rcfg(14) wrtcnt(3085) LATS(4844712) Disk lastSeqNo(3085) [ CSSD]2011-02-07 15:13:10.481 [1544] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (2//dev/voting3) [ CSSD]2011-02-07 15:13:10.481 [2058] >TRACE: clssnmvKillBlockThread: spawned for disk 2 (/dev/voting3) initial sleep interval (1000)ms [ CSSD]2011-02-07 15:13:10.481 [1544] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(13) wrtcnt(604) LATS(4844729) Disk lastSeqNo(604) [ CSSD]2011-02-07 15:13:10.481 [1544] >TRACE: clssnmReadDskHeartbeat: node(3) is down. rcfg(11) wrtcnt(605) LATS(4844729) Disk lastSeqNo(605) [ CSSD]2011-02-07 15:13:10.487 [1287] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (1//dev/voting2) [ CSSD]2011-02-07 15:13:10.487 [2315] >TRACE: clssnmvKillBlockThread: spawned for disk 1 (/dev/voting2) initial sleep interval (1000)ms [ CSSD]2011-02-07 15:13:10.488 [1] >TRACE: clssnmFatalInit: fatal mode enabled [ CSSD]2011-02-07 15:13:10.500 [2829] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=node1-priv)(PORT=49895))
[ CSSD]2011-02-07 15:13:10.500 [2829] >TRACE: clssnmClusterListener: Probing node node2 (2), probcon(1113fa5d0) [ CSSD]2011-02-07 15:13:10.500 [2829] >TRACE: clssnmClusterListener: Probing node node3 (3), probcon(11156db50) [ CSSD]2011-02-07 15:13:10.501 [2829] >TRACE: clssnmClusterListener: Probing node node4 (4), probcon(111570730) [ CSSD]2011-02-07 15:13:10.501 [2829] >TRACE: clssnmDiscHelper: node2, node(2) connection failed, con (1113fa5d0), probe(1113fa5d0)
只發(fā)現(xiàn)“clssnm_skgxnmon: skgxn init failed”這樣的錯(cuò)誤,在metalink上查了一下,發(fā)現(xiàn)沒(méi)啥可以參考的結(jié)果,其實(shí)這個(gè)日志里一個(gè)重要的信息被我忽略了:[ CSSD]2011-02-07 17:29:53.412 [1] >TRACE: clssgmInitCMInfo: Wait for remote node termination set to 805306368 seconds,這導(dǎo)致我花了很多時(shí)間去檢查日志,重啟主機(jī),在我ps -ef|grep d.bin的時(shí)候也忽略了oprocd進(jìn)程的參數(shù)值。
|
|