问题 HBase RegionServer频繁挂掉

2024-02-24 22:20

本文主要是介绍问题 HBase RegionServer频繁挂掉,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

错误日志
2019-09-21 20:42:17,264 INFO org.apache.hadoop.hbase.ScheduledChore: Chore: CompactionChecker missed its start time
2019-09-21 20:42:17,273 WARN org.apache.hadoop.hbase.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 156013ms
GC pool 'ParNew' had collection(s): count=1 time=156080ms
2019-09-21 20:42:17,264 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 158843ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2019-09-21 20:42:17,281 WARN org.apache.hadoop.hbase.ipc.RpcServer: (responseTooSlow): {"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)","starttimems":1569069581136,"responsesize":2051,"method":"Scan","processingtimems":156145,"client":"10.97.202.19:58322","queuetimems":0,"class":"HRegionServer"}
2019-09-21 20:42:17,300 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hdh19,60020,1568940808648: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead serverat org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:426)at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:331)at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:345)at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8617)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead serverat org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:426)at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:331)at org.apache.hadoop.hbase.master.MasterRpcServices.regionServerReport(MasterRpcServices.java:345)at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8617)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:423)at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:327)at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1158)at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:966)at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.YouAreDeadException): org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hdh19,60020,1568940808648 as dead server
......
2019-09-21 20:42:17,621 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x86cf6a57553f9a7 has expired, closing socket connection
2019-09-21 20:42:17,621 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hdh19,60020,1568940808648: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase regionserver:60020-0x86cf6a57553f9a7 received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expiredat org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:700)at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:611)at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2019-09-21 20:42:42,269 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getChildren failed after 4 attempts
2019-09-21 20:42:42,269 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase Unable to list children of znode /hbase/replication/rs/hdh19,60020,1568940808648
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs/hdh19,60020,1568940808648at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:456)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:484)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1476)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1398)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1280)at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:187)at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:310)at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:180)at org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:172)at org.apache.hadoop.hbase.regionserver.HRegionServer.stopServiceThreads(HRegionServer.java:2162)at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1088)at java.lang.Thread.run(Thread.java:748)
2019-09-21 20:42:42,270 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020-0x86cf6a57553f9a7, quorum=hdh12:2181,hdh53:2181,hdh1-07.p.xyidc:2181,hdh52:2181,hdh1-10.p.xyidc:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs/hdh19,60020,1568940808648at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:456)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchThem(ZKUtil.java:484)at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenBFSAndWatchThem(ZKUtil.java:1476)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursivelyMultiOrSequential(ZKUtil.java:1398)at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNodeRecursively(ZKUtil.java:1280)at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeAllQueues(ReplicationQueuesZKImpl.java:187)at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:310)at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:180)

通过线上日志可以看到 hbase由于GC时间较长,zk服务自动剔除该hbase节点,关闭当前连接,这种情况下,hbase框架选择停止了不能连接到zookeeper的 hbase regionserver,因为请求到这个超时节点的请求可能已经转到其他的节点。

解决方法
提高hbase zk的超时时间

hbase设置超时时间5分钟
只设置hbase的超时时间是不够的的,还需要设置zk的最大超时时间
zk最大超时时间5分钟

这篇关于问题 HBase RegionServer频繁挂掉的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/743547

相关文章

IDEA和GIT关于文件中LF和CRLF问题及解决

《IDEA和GIT关于文件中LF和CRLF问题及解决》文章总结:因IDEA默认使用CRLF换行符导致Shell脚本在Linux运行报错,需在编辑器和Git中统一为LF,通过调整Git的core.aut... 目录问题描述问题思考解决过程总结问题描述项目软件安装shell脚本上git仓库管理,但拉取后,上l

idea npm install很慢问题及解决(nodejs)

《ideanpminstall很慢问题及解决(nodejs)》npm安装速度慢可通过配置国内镜像源(如淘宝)、清理缓存及切换工具解决,建议设置全局镜像(npmconfigsetregistryht... 目录idea npm install很慢(nodejs)配置国内镜像源清理缓存总结idea npm in

pycharm跑python项目易出错的问题总结

《pycharm跑python项目易出错的问题总结》:本文主要介绍pycharm跑python项目易出错问题的相关资料,当你在PyCharm中运行Python程序时遇到报错,可以按照以下步骤进行排... 1. 一定不要在pycharm终端里面创建环境安装别人的项目子模块等,有可能出现的问题就是你不报错都安装

idea突然报错Malformed \uxxxx encoding问题及解决

《idea突然报错Malformeduxxxxencoding问题及解决》Maven项目在切换Git分支时报错,提示project元素为描述符根元素,解决方法:删除Maven仓库中的resolv... 目www.chinasem.cn录问题解决方式总结问题idea 上的 maven China编程项目突然报错,是

Python爬虫HTTPS使用requests,httpx,aiohttp实战中的证书异步等问题

《Python爬虫HTTPS使用requests,httpx,aiohttp实战中的证书异步等问题》在爬虫工程里,“HTTPS”是绕不开的话题,HTTPS为传输加密提供保护,同时也给爬虫带来证书校验、... 目录一、核心问题与优先级检查(先问三件事)二、基础示例:requests 与证书处理三、高并发选型:

前端导出Excel文件出现乱码或文件损坏问题的解决办法

《前端导出Excel文件出现乱码或文件损坏问题的解决办法》在现代网页应用程序中,前端有时需要与后端进行数据交互,包括下载文件,:本文主要介绍前端导出Excel文件出现乱码或文件损坏问题的解决办法,... 目录1. 检查后端返回的数据格式2. 前端正确处理二进制数据方案 1:直接下载(推荐)方案 2:手动构造

Python绘制TSP、VRP问题求解结果图全过程

《Python绘制TSP、VRP问题求解结果图全过程》本文介绍用Python绘制TSP和VRP问题的静态与动态结果图,静态图展示路径,动态图通过matplotlib.animation模块实现动画效果... 目录一、静态图二、动态图总结【代码】python绘制TSP、VRP问题求解结果图(包含静态图与动态图

MyBatis/MyBatis-Plus同事务循环调用存储过程获取主键重复问题分析及解决

《MyBatis/MyBatis-Plus同事务循环调用存储过程获取主键重复问题分析及解决》MyBatis默认开启一级缓存,同一事务中循环调用查询方法时会重复使用缓存数据,导致获取的序列主键值均为1,... 目录问题原因解决办法如果是存储过程总结问题myBATis有如下代码获取序列作为主键IdMappe

k8s容器放开锁内存限制问题

《k8s容器放开锁内存限制问题》nccl-test容器运行mpirun时因NCCL_BUFFSIZE过大导致OOM,需通过修改docker服务配置文件,将LimitMEMLOCK设为infinity并... 目录问题问题确认放开容器max locked memory限制总结参考:https://Access

Java中字符编码问题的解决方法详解

《Java中字符编码问题的解决方法详解》在日常Java开发中,字符编码问题是一个非常常见却又特别容易踩坑的地方,这篇文章就带你一步一步看清楚字符编码的来龙去脉,并结合可运行的代码,看看如何在Java项... 目录前言背景:为什么会出现编码问题常见场景分析控制台输出乱码文件读写乱码数据库存取乱码解决方案统一使