解决金仓数据库KingbaseES V8R3 shared_buffer占用过多导致实例崩溃的问题

本文主要是介绍解决金仓数据库KingbaseES V8R3 shared_buffer占用过多导致实例崩溃的问题,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

问题描述

备库意外宕机,从集群日志只看出发生了主备切换,备库一直持续恢复备库没有成功,从数据库日志看到如下报错:

terminating connection because of crash of another server process
DETAIL: The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

根据报错提示,怀疑当时并发太高,或者业务繁忙导致shared_buffer不够用,进而导致数据库宕机。由于V8R3版本数据库没有办法收集kwr报告,所以不容易定位这个判断。

分析

现在模拟实验:

测试环境:

shared_buffer 设置成16MB
max_wal_size  设置成32MB
create table test01(id integer, val char(1024)); insert into test01 values(generate_series(1,2888600),repeat( chr(int4(random()*26)+65),1024));TEST=# create table test01(id integer, val char(1024));
CREATE TABLE
TEST=# insert into test01 values(generate_series(1,2888600),repeat( chr(int4(random()*26)+65),1024));等待......
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

ps命令看到了每个process,其中process13674占用了大量内存

数据库日志警告发生corrupted shared memory 。实例崩溃,发生重启。

在这之前触发了大量检查点,这也符合预期,因为已经把max_wal_size调的足够小。需要不断写出page以保证足够的shared_buffer满足insert。

数据库也给出了合理建议增加参数“max_wal_size”大小。

2022-05-25 15:38:04 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:05 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:05 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:05 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:05 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:06 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:06 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:07 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:07 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:07 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:07 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:07 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:07 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:09 CST LOG:  checkpoints are occurring too frequently (2 seconds apart)
2022-05-25 15:38:09 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:09 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:09 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:10 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:10 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:12 CST LOG:  checkpoints are occurring too frequently (2 seconds apart)
2022-05-25 15:38:12 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:12 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:12 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:12 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:12 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:13 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:13 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:14 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:14 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:15 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:15 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:15 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:15 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:15 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:15 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:16 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:16 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:17 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:17 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:17 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:17 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:18 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:18 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:18 CST LOG:  checkpoints are occurring too frequently (1 second apart)
2022-05-25 15:38:18 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:18 CST LOG:  checkpoints are occurring too frequently (0 seconds apart)
2022-05-25 15:38:18 CST HINT:  Consider increasing the configuration parameter "max_wal_size".
2022-05-25 15:38:19 CST LOG:  server process (PID 13674) was terminated by signal 9: Killed
2022-05-25 15:38:19 CST DETAIL:  Failed process was running: insert into test01 values(generate_series(1,2888600),repeat( chr(int4(random()*26)+65),1024));
2022-05-25 15:38:19 CST LOG:  terminating any other active server processes
2022-05-25 15:38:19 CST WARNING:  terminating connection because of crash of another server process
2022-05-25 15:38:19 CST DETAIL:  The kingbase has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-05-25 15:38:19 CST HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2022-05-25 15:38:19 CST LOG:  all server processes terminated; reinitializing
2022-05-25 15:38:19 CST LOG:  database system was interrupted; last known up at 2022-05-25 15:38:19 CST
2022-05-25 15:38:19 CST LOG:  database system was not properly shut down; automatic recovery in progress
2022-05-25 15:38:19 CST LOG:  redo starts at 0/8F050338
2022-05-25 15:38:19 CST LOG:  redo wal segment count 1
2022-05-25 15:38:19 CST LOG:  invalid record length at 0/8FA6C178: wanted 24, got 0
2022-05-25 15:38:19 CST LOG:  complete: 1/1
2022-05-25 15:38:19 CST LOG:  redo done at 0/8FA6C108
2022-05-25 15:38:19 CST LOG:  MultiXact member wraparound protections are now enabled
2022-05-25 15:38:19 CST LOG:  redo done at 0/8FA6C108
2022-05-25 15:38:19 CST LOG:  MultiXact member wraparound protections are now enabled
2022-05-25 15:38:19 CST LOG:  database system is ready to accept connections
2022-05-25 15:38:19 CST LOG:  autovacuum launcher started
2022-05-25 15:38:19 CST LOG:  starting syslogical supervisor
2022-05-25 15:38:19 CST LOG:  starting syslogical database manager for database TEST
2022-05-25 15:38:19 CST LOG:  manager worker [13929] at slot 0 generation 1 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database TEMPLATE1
2022-05-25 15:38:20 CST LOG:  manager worker [13930] at slot 0 generation 2 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database TEMPLATE2
2022-05-25 15:38:20 CST LOG:  manager worker [13932] at slot 0 generation 3 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database SAMPLES
2022-05-25 15:38:20 CST LOG:  manager worker [13935] at slot 0 generation 4 detaching cleanly
2022-05-25 15:38:20 CST LOG:  starting syslogical database manager for database SECURITY
2022-05-25 15:38:20 CST LOG:  manager worker [13940] at slot 0 generation 5 detaching cleanly

再查看那个占用内存高的进程已经被干掉。

需要说明的是同样的环境,我在KingbaseV8R6上并没有复现,也没有发生宕机。能看到插入时间比较慢,看到进程占用内存没有如此之高。

总结:

由于突然性大并发导致数据库资源使用上限是常有之事,我们尽量和业务协商保持业务稳定,如有新上业务要提前评估内存,cpu,io使用情况后做决定。是否有可用内存以供增加,不然很容易像以上例子导致数据库崩溃。

尽量升级到高版本规避此问题。或在系统级限定资源消费上限。

这篇关于解决金仓数据库KingbaseES V8R3 shared_buffer占用过多导致实例崩溃的问题的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/318102

相关文章

Linux下MySQL数据库定时备份脚本与Crontab配置教学

《Linux下MySQL数据库定时备份脚本与Crontab配置教学》在生产环境中,数据库是核心资产之一,定期备份数据库可以有效防止意外数据丢失,本文将分享一份MySQL定时备份脚本,并讲解如何通过cr... 目录备份脚本详解脚本功能说明授权与可执行权限使用 Crontab 定时执行编辑 Crontab添加定

Vue3绑定props默认值问题

《Vue3绑定props默认值问题》使用Vue3的defineProps配合TypeScript的interface定义props类型,并通过withDefaults设置默认值,使组件能安全访问传入的... 目录前言步骤步骤1:使用 defineProps 定义 Props步骤2:设置默认值总结前言使用T

如何通过try-catch判断数据库唯一键字段是否重复

《如何通过try-catch判断数据库唯一键字段是否重复》在MyBatis+MySQL中,通过try-catch捕获唯一约束异常可避免重复数据查询,优点是减少数据库交互、提升并发安全,缺点是异常处理开... 目录1、原理2、怎么理解“异常走的是数据库错误路径,开销比普通逻辑分支稍高”?1. 普通逻辑分支 v

Python与MySQL实现数据库实时同步的详细步骤

《Python与MySQL实现数据库实时同步的详细步骤》在日常开发中,数据同步是一项常见的需求,本篇文章将使用Python和MySQL来实现数据库实时同步,我们将围绕数据变更捕获、数据处理和数据写入这... 目录前言摘要概述:数据同步方案1. 基本思路2. mysql Binlog 简介实现步骤与代码示例1

504 Gateway Timeout网关超时的根源及完美解决方法

《504GatewayTimeout网关超时的根源及完美解决方法》在日常开发和运维过程中,504GatewayTimeout错误是常见的网络问题之一,尤其是在使用反向代理(如Nginx)或... 目录引言为什么会出现 504 错误?1. 探索 504 Gateway Timeout 错误的根源 1.1 后端

Web服务器-Nginx-高并发问题

《Web服务器-Nginx-高并发问题》Nginx通过事件驱动、I/O多路复用和异步非阻塞技术高效处理高并发,结合动静分离和限流策略,提升性能与稳定性... 目录前言一、架构1. 原生多进程架构2. 事件驱动模型3. IO多路复用4. 异步非阻塞 I/O5. Nginx高并发配置实战二、动静分离1. 职责2

解决升级JDK报错:module java.base does not“opens java.lang.reflect“to unnamed module问题

《解决升级JDK报错:modulejava.basedoesnot“opensjava.lang.reflect“tounnamedmodule问题》SpringBoot启动错误源于Jav... 目录问题描述原因分析解决方案总结问题描述启动sprintboot时报以下错误原因分析编程异js常是由Ja

使用shardingsphere实现mysql数据库分片方式

《使用shardingsphere实现mysql数据库分片方式》本文介绍如何使用ShardingSphere-JDBC在SpringBoot中实现MySQL水平分库,涵盖分片策略、路由算法及零侵入配置... 目录一、ShardingSphere 简介1.1 对比1.2 核心概念1.3 Sharding-Sp

深度剖析SpringBoot日志性能提升的原因与解决

《深度剖析SpringBoot日志性能提升的原因与解决》日志记录本该是辅助工具,却为何成了性能瓶颈,SpringBoot如何用代码彻底破解日志导致的高延迟问题,感兴趣的小伙伴可以跟随小编一起学习一下... 目录前言第一章:日志性能陷阱的底层原理1.1 日志级别的“双刃剑”效应1.2 同步日志的“吞吐量杀手”

MySQL 表空却 ibd 文件过大的问题及解决方法

《MySQL表空却ibd文件过大的问题及解决方法》本文给大家介绍MySQL表空却ibd文件过大的问题及解决方法,本文给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友参考... 目录一、问题背景:表空却 “吃满” 磁盘的怪事二、问题复现:一步步编程还原异常场景1. 准备测试源表与数据