CRS-2674: Start of ‘ora.cssd‘ on ‘rac2‘ failed 引发的rac集群服务起不来问题

本文主要是介绍CRS-2674: Start of ‘ora.cssd‘ on ‘rac2‘ failed 引发的rac集群服务起不来问题,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

问题背景:客户反馈Oracle rac集群节点宕机

1、首先查看宕机原因,归档日志满导致服务重启,查看归档日志路径是USE_DB_RECOVERY_FILE_DEST (默认路径),

安装的时候没有做调整,应该调整单独的归档目录,首先清理归档日志然后修改归档路径

2、节点一正常启动,节点二起不来    没有cluster服务
  检查集群服务
在rac2节点上检查集群服务的状态报错

1 [grid@rac2 ~]# /u01/app/11.2.0/grid/bin/crs_stat -t
2 CRS-0184: Cannot communicate with the CRS daemon.


根据上面报错,可以判断出crs是有问题。
尝试启动也报错:注意需要使用root

尝试启动crs服务

 

 1 root@ora102 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs2 CRS-4640: Oracle High Availability Services is already active3 CRS-4000: Command Start failed, or completed with errors.4 正常情况是:5 [root@rac2 bin]# /u01/app/11.2.0/grid/bin/crsctl start crs6 CRS-4123: Oracle High Availability Services has been started.7 检查crs服务,发现有问题:8 [grid@rac2 ~]$ crsctl check crs9 CRS-4638: Oracle High Availability Services is online
10 CRS-4535: Cannot communicate with Cluster Ready Services
11 CRS-4530: Communications failure contacting Cluster Synchronization Services demon
12 CRS-4534: Cannot communicate with Event Manager‘

 

 

然后节点rac2查看ip情况,发现vip和scan ip都已经不在,可以判断出节点rac已经脱离了集群。
查看节点 ifconfig -a


3、尝试重新注册节点2加入集群

 

 1 [root@rac2 ~]# sh /u01/app/11.2.0/grid/root.sh2 Performing root user operation for Oracle 11g3 4 The following environment variables are set as:5     ORACLE_OWNER= grid6     ORACLE_HOME=  /u01/app/11.2.0/grid7 Enter the full pathname of the local bin directory: [/usr/local/bin]:8 The contents of "dbhome" have not changed. No need to overwrite.9 The contents of "oraenv" have not changed. No need to overwrite.
10 The contents of "coraenv" have not changed. No need to overwrite.
11 Entries will be added to the /etc/oratab file as needed by
12 Database Configuration Assistant when a database is created
13 Finished running generic part of root script.
14 Now product-specific root actions will be performed.
15 Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
16 User ignored Prerequisites during installation
17 Installing Trace File Analyzer
18 Configure Oracle Grid Infrastructure for a Cluster ... succeeded

 

 

4、还是有问题,清理节点2的配置信息,然后重新运行root.sh

 

 1 [root@rac2 trace]$ /u01/app/11.2.0/grid/crs/install/rootcrs.pl -verbose -deconfig -force2 [root@rac2 ~]# /u01/app/11.2.0/grid/crs/install/roothas.pl -verbose -deconfig -force3 [root@rac2 bin]# /u01/app/11.2.0/grid/root.sh4 5 报错:6 [root@rac2 install]#  /u01/app/11.2.0/grid/crs/install/roothas.pl -verbose -deconfig -force7 Can't locate Env.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 . /u01/app/11.2.0/grid/crs/install) at crsconfig_lib.pm line 703.8 BEGIN failed--compilation aborted at crsconfig_lib.pm line 703.9 Compilation failed in require at /u01/app/11.2.0/grid/crs/install/roothas.pl line 166.
10 BEGIN failed--compilation aborted at /u01/app/11.2.0/grid/crs/install/roothas.pl line 166.
11 缺少依赖包  安装命令 yum install perl-Env
12 
13 已安装:
14   perl-Env.noarch 0:1.04-2.el7

 

 

5、清理节点2配置信息

 

 1 [root@rac2 install]#  /u01/app/11.2.0/grid/crs/install/roothas.pl -verbose -deconfig -force2 Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params3 CRS-4535: Cannot communicate with Cluster Ready Services4 CRS-4000: Command Stop failed, or completed with errors.5 CRS-4535: Cannot communicate with Cluster Ready Services6 CRS-4000: Command Delete failed, or completed with errors.7 CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'8 CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'9 CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
10 CRS-2673: Attempting to stop 'ora.crf' on 'rac2'
11 CRS-2677: Stop of 'ora.crf' on 'rac2' succeeded
12 CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
13 CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
14 CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
15 CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
16 CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
17 CRS-4133: Oracle High Availability Services has been stopped.
18 Successfully deconfigured Oracle Restart stack

 


6、重新注册到集群中

 

 1 [root@rac2 install]# /u01/app/11.2.0/grid/root.sh2 Performing root user operation for Oracle 11g3 The following environment variables are set as:4     ORACLE_OWNER= grid5     ORACLE_HOME=  /u01/app/11.2.0/grid6 Enter the full pathname of the local bin directory: [/usr/local/bin]:7 The contents of "dbhome" have not changed. No need to overwrite.8 The contents of "oraenv" have not changed. No need to overwrite.9 The contents of "coraenv" have not changed. No need to overwrite.
10 
11 Entries will be added to the /etc/oratab file as needed by
12 Database Configuration Assistant when a database is created
13 Finished running generic part of root script.
14 Now product-specific root actions will be performed.
15 Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
16 User ignored Prerequisites during installation
17 Installing Trace File Analyzer
18 OLR initialization - successful
19 Adding Clusterware entries to inittab
20 CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac1, number 1, and is terminating
21 An active cluster was found during exclusive startup, restarting to join the cluster
22 Start of resource "ora.cssd" failed
23 CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac2'
24 CRS-2672: Attempting to start 'ora.gipcd' on 'rac2'
25 CRS-2676: Start of 'ora.cssdmonitor' on 'rac2' succeeded
26 CRS-2676: Start of 'ora.gipcd' on 'rac2' succeeded
27 CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
28 CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
29 CRS-2676: Start of 'ora.diskmon' on 'rac2' succeeded
30 CRS-2674: Start of 'ora.cssd' on 'rac2' failed
31 CRS-2679: Attempting to clean 'ora.cssd' on 'rac2'
32 CRS-2681: Clean of 'ora.cssd' on 'rac2' succeeded
33 CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
34 CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
35 CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac2'
36 CRS-2677: Stop of 'ora.cssdmonitor' on 'rac2' succeeded
37 CRS-5804: Communication error with agent process
38 CRS-4000: Command Start failed, or completed with errors.
39 Failed to start Oracle Grid Infrastructure stack
40 Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 1278.
41 /u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
42 依然失败

 

 


7、CSSD没有在第二个节点上启动。$grid_home/log/rac2子目录中查找cssd日志文件。查看日志信息。

 

1 /u01/app/11.2.0/grid/log/rac2/cssd
2 2019-10-12 15:41:19.013: [    CSSD][3199571712]clssgmDiscEndpcl: gipcDestroy 0x8a28
3 2019-10-12 15:41:19.064: [    CSSD][3181754112]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
4 2019-10-12 15:41:19.844: [    CSSD][3186484992]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 464729747, wrtcnt, 8055111, LATS 336904, lastSeqNo 8055110, uniqueness 1569234927, timestamp 1570866136/3845241248
5 2019-10-12 15:41:20.064: [    CSSD][3181754112]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
6 2019-10-12 15:41:20.845: [    CSSD][3186484992]clssnmvDHBValidateNcopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 464729747, wrtcnt, 8055112, LATS 337904, lastSeqNo 8055111, uniqueness 1569234927, timestamp 1570866137/3845242248

 

 

8、查看节点2的心跳

1 [grid@rac2 /]$ ping 20.20.20.201  --节点1的priv
2 PING 20.20.20.201 (20.20.20.201) 56(84) bytes of data.
3 From 20.20.20.202 icmp_seq=1 Destination Host Unreachable
4 From 20.20.20.202 icmp_seq=2 Destination Host Unreachable
5 From 20.20.20.202 icmp_seq=3 Destination Host Unreachable
6 From 20.20.20.202 icmp_seq=4 Destination Host Unreachable


 心跳不通、。。。。。心累,据客户说节点1的心跳出过好几次问题了,估计网卡有问题。

 

征得客户同意,先尝试节点1的网卡重启下,然后把服务重启下,节点1/2服务都正常起来了,
后续建议客户更换网卡消除隐患。

 9、绕了一大圈是因为心跳的问题,解决问题就应该大胆假设小心求证,对可能的原因排错最终顺藤摸瓜抓住本质。

 

聚焦技术与人文,分享干货,共同成长!

更多内容请关注“数据与人”

这篇关于CRS-2674: Start of ‘ora.cssd‘ on ‘rac2‘ failed 引发的rac集群服务起不来问题的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/733435

相关文章

vite搭建vue3项目的搭建步骤

《vite搭建vue3项目的搭建步骤》本文主要介绍了vite搭建vue3项目的搭建步骤,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学... 目录1.确保Nodejs环境2.使用vite-cli工具3.进入项目安装依赖1.确保Nodejs环境

Nginx搭建前端本地预览环境的完整步骤教学

《Nginx搭建前端本地预览环境的完整步骤教学》这篇文章主要为大家详细介绍了Nginx搭建前端本地预览环境的完整步骤教学,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录项目目录结构核心配置文件:nginx.conf脚本化操作:nginx.shnpm 脚本集成总结:对前端的意义很多

IDEA和GIT关于文件中LF和CRLF问题及解决

《IDEA和GIT关于文件中LF和CRLF问题及解决》文章总结:因IDEA默认使用CRLF换行符导致Shell脚本在Linux运行报错,需在编辑器和Git中统一为LF,通过调整Git的core.aut... 目录问题描述问题思考解决过程总结问题描述项目软件安装shell脚本上git仓库管理,但拉取后,上l

Linux创建服务使用systemctl管理详解

《Linux创建服务使用systemctl管理详解》文章指导在Linux中创建systemd服务,设置文件权限为所有者读写、其他只读,重新加载配置,启动服务并检查状态,确保服务正常运行,关键步骤包括权... 目录创建服务 /usr/lib/systemd/system/设置服务文件权限:所有者读写js,其他

前端缓存策略的自解方案全解析

《前端缓存策略的自解方案全解析》缓存从来都是前端的一个痛点,很多前端搞不清楚缓存到底是何物,:本文主要介绍前端缓存的自解方案,文中通过代码介绍的非常详细,需要的朋友可以参考下... 目录一、为什么“清缓存”成了技术圈的梗二、先给缓存“把个脉”:浏览器到底缓存了谁?三、设计思路:把“发版”做成“自愈”四、代码

通过React实现页面的无限滚动效果

《通过React实现页面的无限滚动效果》今天我们来聊聊无限滚动这个现代Web开发中不可或缺的技术,无论你是刷微博、逛知乎还是看脚本,无限滚动都已经渗透到我们日常的浏览体验中,那么,如何优雅地实现它呢?... 目录1. 早期的解决方案2. 交叉观察者:IntersectionObserver2.1 Inter

Vue3视频播放组件 vue3-video-play使用方式

《Vue3视频播放组件vue3-video-play使用方式》vue3-video-play是Vue3的视频播放组件,基于原生video标签开发,支持MP4和HLS流,提供全局/局部引入方式,可监听... 目录一、安装二、全局引入三、局部引入四、基本使用五、事件监听六、播放 HLS 流七、更多功能总结在 v

idea npm install很慢问题及解决(nodejs)

《ideanpminstall很慢问题及解决(nodejs)》npm安装速度慢可通过配置国内镜像源(如淘宝)、清理缓存及切换工具解决,建议设置全局镜像(npmconfigsetregistryht... 目录idea npm install很慢(nodejs)配置国内镜像源清理缓存总结idea npm in

pycharm跑python项目易出错的问题总结

《pycharm跑python项目易出错的问题总结》:本文主要介绍pycharm跑python项目易出错问题的相关资料,当你在PyCharm中运行Python程序时遇到报错,可以按照以下步骤进行排... 1. 一定不要在pycharm终端里面创建环境安装别人的项目子模块等,有可能出现的问题就是你不报错都安装

idea突然报错Malformed \uxxxx encoding问题及解决

《idea突然报错Malformeduxxxxencoding问题及解决》Maven项目在切换Git分支时报错,提示project元素为描述符根元素,解决方法:删除Maven仓库中的resolv... 目www.chinasem.cn录问题解决方式总结问题idea 上的 maven China编程项目突然报错,是