Oracle 11g RAC 启动时无法识别ASM

xingyuncaojun 2016-10-06

环境:RHEL5.5 + Oracle 11g RAC

客户联系说关闭cluster后,重启启动,发现CRS无法启动。提示Cannot communicate with Cluster Ready Services。

登录主机检查

[root@rac-2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

检查RAC的日志

[grid@rac-2 rac-2]$ tail -100 alertrac-2.log | more

2016-09-23 03:16:17.396
[ohasd(3899)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac-2'.
2016-09-23 03:16:18.697
[crsd(22676)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:18.704
[crsd(22676)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
]. Details at (:CRSD00111:) in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:19.433
[ohasd(3899)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac-2'.
2016-09-23 03:16:20.737
[crsd(22685)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:20.747
[crsd(22685)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
]. Details at (:CRSD00111:) in /u02/11.2.0/grid/log/rac-2/crsd/crsd.log.
2016-09-23 03:16:21.473
[ohasd(3899)]CRS-2765:Resource 'ora.crsd' has failed on server 'rac-2'.
2016-09-23 03:16:21.473
[ohasd(3899)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.

检查crsd.log

2016-09-23 03:16:20.461: [ CRSMAIN][1106286912] Policy Engine is not initialized yet!
2016-09-23 03:16:20.463: [ CRSMAIN][3556262304] Initializing OCR
[  CLWAL][3556262304]clsw_Initialize: OLR initlevel [70000]
2016-09-23 03:16:20.735: [  OCRASM][3556262304]proprasmo: Error in open/create file in dg [ORC_VOTE]
[  OCRASM][3556262304]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge 

2016-09-23 03:16:20.735: [  OCRASM][3556262304]ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup 

2016-09-23 03:16:20.737: [  OCRASM][3556262304]proprasmo: kgfoCheckMount returned [7]
2016-09-23 03:16:20.737: [  OCRASM][3556262304]proprasmo: The ASM instance is down
2016-09-23 03:16:20.738: [  OCRRAW][3556262304]proprioo: Failed to open [+ORC_VOTE]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2016-09-23 03:16:20.738: [  OCRRAW][3556262304]proprioo: No OCR/OLR devices are usable
2016-09-23 03:16:20.738: [  OCRASM][3556262304]proprasmcl: asmhandle is NULL
2016-09-23 03:16:20.738: [    GIPC][3556262304] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5326]
2016-09-23 03:16:20.740: [ default][3556262304]clsvactversion:4: Retrieving Active Version from local storage.
2016-09-23 03:16:20.742: [  OCRRAW][3556262304]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required.
2016-09-23 03:16:20.745: [  OCRRAW][3556262304]proprinit: Could not open raw device
2016-09-23 03:16:20.745: [  OCRASM][3556262304]proprasmcl: asmhandle is NULL
2016-09-23 03:16:20.746: [  OCRAPI][3556262304]a_init:16!: Backend init unsuccessful : [26]
2016-09-23 03:16:20.747: [  CRSOCR][3556262304] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup 

2016-09-23 03:16:20.748: [ CRSMAIN][3556262304] Created alert : (:CRSD00111:) :  Could not init OCR, error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup 

2016-09-23 03:16:20.748: [    CRSD][3556262304][PANIC] CRSD exiting: Could not init OCR, code: 26
2016-09-23 03:16:20.748: [    CRSD][3556262304] Done.

从错误信息判断是ASM出现了问题,检查ASM磁盘

[root@rac-2 ~]# /etc/init.d/oracleasm listdisks
ASMDATA01
ASMDATA02
ASMDATA03
OCR_VOTE

磁盘是存在的。

关闭CRS后,检查CRS相关进程

[root@rac-2 ~]# ps -ef | grep d.bin
root      3899    1  0 Jan13 ?        00:18:59 /u02/11.2.0/grid/bin/ohasd.bin reboot
grid      4267    1  0 Jan13 ?        00:34:32 /u02/11.2.0/grid/bin/oraagent.bin
grid      4280    1  0 Jan13 ?        00:00:16 /u02/11.2.0/grid/bin/mdnsd.bin
grid      4293    1  0 Jan13 ?        00:06:10 /u02/11.2.0/grid/bin/gpnpd.bin
root      4304    1  0 Jan13 ?        01:31:25 /u02/11.2.0/grid/bin/orarootagent.bin
grid      4307    1  0 Jan13 ?        00:27:27 /u02/11.2.0/grid/bin/gipcd.bin
root      4322    1  0 Jan13 ?        00:45:33 /u02/11.2.0/grid/bin/osysmond.bin
root      4332    1  0 Jan13 ?        00:01:24 /u02/11.2.0/grid/bin/cssdmonitor
root      4350    1  0 Jan13 ?        00:02:39 /u02/11.2.0/grid/bin/cssdagent
grid      4362    1  0 Jan13 ?        01:45:38 /u02/11.2.0/grid/bin/ocssd.bin
root      4437    1  0 Jan13 ?        00:28:42 /u02/11.2.0/grid/bin/octssd.bin reboot
grid      4461    1  0 Jan13 ?        00:00:22 /u02/11.2.0/grid/bin/evmd.bin
grid      4843  4461  0 Jan13 ?        00:00:00 /u02/11.2.0/grid/bin/evmlogger.bin -o /u02/11.2.0/grid/evm/log/evmlogger.info -l /u02/11.2.0/grid/evm/log/evmlogger.log
root      4941    1  0 Jan13 ?        00:21:18 /u02/11.2.0/grid/bin/ologgerd -m rac-1 -r -d /u02/11.2.0/grid/crf/db/rac-2
root    23122 22979  0 03:54 pts/3    00:00:00 grep d.bin

CRS已经关闭,但是好多进程没有释放。手动kill掉这些进程

[root@rac-2 ~]# ps -ef | grep d.bin | awk '{print $2}' | xargs kill -9
kill 23131: No such process

重启CRS,问题解决。

相关推荐