golang中的内存缓存如何避免被GC扫描,BigCache实现原理

2024-06-21 18:52

本文主要是介绍golang中的内存缓存如何避免被GC扫描,BigCache实现原理,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

GC到底清理的是什么?

Golang是函数式编程语言,如果是函数内定义的临时变量,在函数退出时会被自动清理掉不需要GC参与;如果使用了指针,那么即使函数退出了也不会将其清理,这个时候就需要全局的GC来清扫。

对于cache组件来说,存储的对象比较多,本质上就是一个大的哈希表,如果GC要去扫描这些对象的话可能会造成大量的延迟,因此我们不需要GC来扫描它们。

利用 Go 1.5+ 的特性:当 map 中的 key 和 value 都是基础类型时,GC 就不会扫到 map 里的 key 和 value

对于key:golang中的字符串本质上也是指针,于是将它进行hash操作,将字符串转换成整型,信息经过hash操作后有可能会丢失部分信息,为了避免hash冲突时分不清KEY值,所以会将key放到value中一起存储。

对于value:构建一个超大的byte数组buf,将原来的key和value信息经过序列化变成二进制数据,将其存放在buf中,并记录下它的offset,然后将offset值存到map的value位置,即 map[key-hash]offset。为了节省内存消耗,会将buf设计为环形队列的结构。

BigCache:https://github.com/allegro/bigcache

https://syslog.ravelin.com/further-dangers-of-large-heaps-in-go-7a267b57d487

to keep the amount of GC work down you essentially have two choices as follows.1、Make sure the memory you allocate contains no pointers. That means no slices, no strings, no time.Time, and definitely no pointers to other allocations. If an allocation has no pointers it gets marked as such and the GC does not scan it.2、Allocate the memory off-heap by directly calling the mmap syscall yourself. Then the GC knows nothing about the memory. This has upsides and downsides. The downside is that this memory can’t really be used to reference objects allocated normally, as the GC may think they are no longer in-use and free them.

How it works

BigCache relies on optimization presented in 1.5 version of Go (issue-9477). This optimization states that if map without pointers in keys and values is used then GC will omit its content. Therefore BigCache uses map[uint64]uint32 where keys are hashed and values are offsets of entries.Entries are kept in byte slices, to omit GC again. Byte slices size can grow to gigabytes without impact on performance because GC will only see single pointer to it.

Bigcache vs Freecache

Both caches provide the same core features but they reduce GC overhead in different ways. Bigcache relies on map[uint64]uint32, freecache implements its own mapping built on slices to reduce number of pointers.

bigcache 开发团队的博客的测试数据:

With an empty cache, this endpoint had maximum responsiveness latency of 10ms for 10k rps. When the cache was filled, it had more than a second latency for 99th percentile. Metrics indicated that there were over 40 mln objects in the heap and GC mark and scan phase took over four seconds.

最终他们采用了 map[uint64]uint64 作为 cacheShard 中的关键存储。key 是 sharding 时得到的 uint64 hashed key,value 则只存 offset ,整体使用 FIFO 的 bytes queue,也符合按照时序淘汰的需求。

经过优化,bigcache 在 2000w 条记录下 GC 的表现:

go version 
go version go1.13 linux/arm64go run caches_gc_overhead_comparison.go Number of entries:  20000000
GC pause for bigcache:  22.382827ms
GC pause for freecache:  41.264651ms
GC pause for map:  72.236853ms

初始化操作

// BigCache is fast, concurrent, evicting cache created to keep big number of entries without impact on performance.
// It keeps entries on heap but omits GC for them. To achieve that, operations take place on byte arrays,
// therefore entries (de)serialization in front of the cache will be needed in most use cases.
type BigCache struct {shards     []*cacheShard // 缓存分片lifeWindow uint64 // 过期时间clock      clockhash       Hasherconfig     ConfigshardMask  uint64close      chan struct{}
}type Config struct {// Number of cache shards, value must be a power of twoShards int// Time after which entry can be evictedLifeWindow time.Duration// Interval between removing expired entries (clean up).// If set to <= 0 then no action is performed. Setting to < 1 second is counterproductive — bigcache has a one second resolution.CleanWindow time.Duration// Max number of entries in life window. Used only to calculate initial size for cache shards.// When proper value is set then additional memory allocation does not occur.MaxEntriesInWindow int// Max size of entry in bytes. Used only to calculate initial size for cache shards.MaxEntrySize int// StatsEnabled if true calculate the number of times a cached resource was requested.StatsEnabled bool// Verbose mode prints information about new memory allocationVerbose bool// Hasher used to map between string keys and unsigned 64bit integers, by default fnv64 hashing is used.Hasher Hasher// HardMaxCacheSize is a limit for BytesQueue size in MB.// It can protect application from consuming all available memory on machine, therefore from running OOM Killer.// Default value is 0 which means unlimited size. When the limit is higher than 0 and reached then// the oldest entries are overridden for the new ones. The max memory consumption will be bigger than// HardMaxCacheSize due to Shards' s additional memory. Every Shard consumes additional memory for map of keys// and statistics (map[uint64]uint32) the size of this map is equal to number of entries in// cache ~ 2×(64+32)×n bits + overhead or map itself.HardMaxCacheSize int// OnRemove is a callback fired when the oldest entry is removed because of its expiration time or no space left// for the new entry, or because delete was called.// Default value is nil which means no callback and it prevents from unwrapping the oldest entry.// ignored if OnRemoveWithMetadata is specified.OnRemove func(key string, entry []byte)// OnRemoveWithMetadata is a callback fired when the oldest entry is removed because of its expiration time or no space left// for the new entry, or because delete was called. A structure representing details about that specific entry.// Default value is nil which means no callback and it prevents from unwrapping the oldest entry.OnRemoveWithMetadata func(key string, entry []byte, keyMetadata Metadata)// OnRemoveWithReason is a callback fired when the oldest entry is removed because of its expiration time or no space left// for the new entry, or because delete was called. A constant representing the reason will be passed through.// Default value is nil which means no callback and it prevents from unwrapping the oldest entry.// Ignored if OnRemove is specified.OnRemoveWithReason func(key string, entry []byte, reason RemoveReason)onRemoveFilter int// Logger is a logging interface and used in combination with `Verbose`// Defaults to `DefaultLogger()`Logger Logger
}// DefaultConfig initializes config with default values.
// When load for BigCache can be predicted in advance then it is better to use custom config.
func DefaultConfig(eviction time.Duration) Config {return Config{Shards:             1024,LifeWindow:         eviction,CleanWindow:        1 * time.Second,MaxEntriesInWindow: 1000 * 10 * 60,MaxEntrySize:       500,StatsEnabled:       false,Verbose:            true,Hasher:             newDefaultHasher(),HardMaxCacheSize:   0,Logger:             DefaultLogger(),}
}

其中分片的个数 Shards必须为 2 的倍数。

func (c *BigCache) Set(key string, entry []byte) errorentry的类型只能是[]byte,如果你要存结构体,我们还需要使用 json.Marshal 这样的工具来序列化。

对key进行hash

func (f fnv64a) Sum64(key string) uint64 {var hash uint64 = offset64for i := 0; i < len(key); i++ {hash ^= uint64(key[i])hash *= prime64}return hash
}

根据hash找到shard

shardMask:  uint64(config.Shards - 1),func (c *BigCache) getShard(hashedKey uint64) (shard *cacheShard) {return c.shards[hashedKey&c.shardMask]
}默认值 shards = 1024 (10000000000),shardMask = 1023 (1111111111),然后进行按位与运算,它要比取模运算效率高。

然后向shard 中写入

type cacheShard struct {hashmap     map[uint64]uint64 // 索引列表,hashedkey-indexOfEntryentries     queue.BytesQueue  // 实际数据存储lock        sync.RWMutexentryBuffer []byteonRemove    onRemoveCallbackisVerbose    boolstatsEnabled boollogger       Loggerclock        clocklifeWindow   uint64hashmapStats map[uint64]uint32stats        StatscleanEnabled bool
}func (s *cacheShard) set(key string, hashedKey uint64, entry []byte) error {currentTimestamp := uint64(s.clock.Epoch())s.lock.Lock()// 冲突检查,将之前的key对应value置为空,粗暴解决哈希冲突问题if previousIndex := s.hashmap[hashedKey]; previousIndex != 0 {if previousEntry, err := s.entries.Get(int(previousIndex)); err == nil {resetHashFromEntry(previousEntry)//remove hashkeydelete(s.hashmap, hashedKey)}}if !s.cleanEnabled {// 每次插入新数据时,bigCache 都会获取 BytesQueue 头部数据if oldestEntry, err := s.entries.Peek(); err == nil {// 然后判断数据是否过期,如果过期则删除s.onEvict(oldestEntry, currentTimestamp, s.removeOldestEntry)}}// 包装数据,得到 []bytew := wrapEntry(currentTimestamp, hashedKey, key, entry, &s.entryBuffer)for {// 找到插入位置,记录在 hashmapif index, err := s.entries.Push(w); err == nil {s.hashmap[hashedKey] = uint64(index)s.lock.Unlock()return nil}// 没有找到插入位置时因为没有空间了,所以要删除。if s.removeOldestEntry(NoSpace) != nil {s.lock.Unlock()return errors.New("entry is bigger than max shard size")}}
}func wrapEntry(timestamp uint64, hash uint64, key string, entry []byte, buffer *[]byte) []byte {keyLength := len(key)blobLength := len(entry) + headersSizeInBytes + keyLengthif blobLength > len(*buffer) {*buffer = make([]byte, blobLength)}blob := *bufferbinary.LittleEndian.PutUint64(blob, timestamp)binary.LittleEndian.PutUint64(blob[timestampSizeInBytes:], hash)binary.LittleEndian.PutUint16(blob[timestampSizeInBytes+hashSizeInBytes:], uint16(keyLength))copy(blob[headersSizeInBytes:], key)copy(blob[headersSizeInBytes+keyLength:], entry)return blob[:blobLength]
}// BytesQueue is a non-thread safe queue type of fifo based on bytes array.
// For every push operation index of entry is returned. It can be used to read the entry later
type BytesQueue struct {full         boolarray        []byte // 实际存储数据的地方capacity     intmaxCapacity  inthead         inttail         intcount        intrightMargin  intheaderBuffer []byteverbose      bool
}
cacheShard.entryBuffer
它是一个容器,用来包装数据,可以重复利用,它的值为
entryBuffer:  make([]byte, config.MaxEntrySize+headersSizeInBytes)它表示的是一个entry占用的最大存储空间。如果某个entry占用空间比entryBuffer还大,那么entryBuffer就会被替换掉。

在bigCache中,bigCache将数据存储在BytesQueue中,BytesQueue的底层结构是[]byte ,这样只给GC增加了一个额外对象,由于字节切片除了自身对象并不包含其他指针数据,所以GC对于整个对象的标记时间是O(1)的。

关于哈希冲突

func (s *cacheShard) get(key string, hashedKey uint64) ([]byte, error) {s.lock.RLock()wrappedEntry, err := s.getWrappedEntry(hashedKey)if err != nil {s.lock.RUnlock()return nil, err}if entryKey := readKeyFromEntry(wrappedEntry); key != entryKey {s.lock.RUnlock()s.collision()if s.isVerbose {s.logger.Printf("Collision detected. Both %q and %q have the same hash %x", key, entryKey, hashedKey)}return nil, ErrEntryNotFound}entry := readEntry(wrappedEntry)s.lock.RUnlock()s.hit(hashedKey)return entry, nil
}

结合Set方法我们知道,在写入的时候是直接覆盖的,在读取的时候会直接报不存在。

BigCache does not handle collisions. When new item is inserted and it's hash collides with previously stored item, new item overwrites previously stored value.

删除
对应的Key的删除其实就是把 entries 中 arrary 对应的 itemIndex 上置为0。其实这种做法并没有正在删除数据,只是置为0,实际的内存并没有归还,但是把 s.hashmap 中的key对应的 index 删除了。这就做了假删除。用户已经查询不到这个数据了。

缓存过期

bigCache 可以为插入的数据设置过期时间,但是缺点是所有数据的过期时间都是一样的。bigCache 中自动删除数据有两种场景:

  • 在插入数据时删除过期数据
  • 通过设置 CleanWindow,启动 goroutine 后台定时批量删除过期数据

这篇关于golang中的内存缓存如何避免被GC扫描,BigCache实现原理的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1082049

相关文章

SpringBoot集成redisson实现延时队列教程

《SpringBoot集成redisson实现延时队列教程》文章介绍了使用Redisson实现延迟队列的完整步骤,包括依赖导入、Redis配置、工具类封装、业务枚举定义、执行器实现、Bean创建、消费... 目录1、先给项目导入Redisson依赖2、配置redis3、创建 RedissonConfig 配

Python的Darts库实现时间序列预测

《Python的Darts库实现时间序列预测》Darts一个集统计、机器学习与深度学习模型于一体的Python时间序列预测库,本文主要介绍了Python的Darts库实现时间序列预测,感兴趣的可以了解... 目录目录一、什么是 Darts?二、安装与基本配置安装 Darts导入基础模块三、时间序列数据结构与

Python使用FastAPI实现大文件分片上传与断点续传功能

《Python使用FastAPI实现大文件分片上传与断点续传功能》大文件直传常遇到超时、网络抖动失败、失败后只能重传的问题,分片上传+断点续传可以把大文件拆成若干小块逐个上传,并在中断后从已完成分片继... 目录一、接口设计二、服务端实现(FastAPI)2.1 运行环境2.2 目录结构建议2.3 serv

C#实现千万数据秒级导入的代码

《C#实现千万数据秒级导入的代码》在实际开发中excel导入很常见,现代社会中很容易遇到大数据处理业务,所以本文我就给大家分享一下千万数据秒级导入怎么实现,文中有详细的代码示例供大家参考,需要的朋友可... 目录前言一、数据存储二、处理逻辑优化前代码处理逻辑优化后的代码总结前言在实际开发中excel导入很

SpringBoot+RustFS 实现文件切片极速上传的实例代码

《SpringBoot+RustFS实现文件切片极速上传的实例代码》本文介绍利用SpringBoot和RustFS构建高性能文件切片上传系统,实现大文件秒传、断点续传和分片上传等功能,具有一定的参考... 目录一、为什么选择 RustFS + SpringBoot?二、环境准备与部署2.1 安装 RustF

Nginx部署HTTP/3的实现步骤

《Nginx部署HTTP/3的实现步骤》本文介绍了在Nginx中部署HTTP/3的详细步骤,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学... 目录前提条件第一步:安装必要的依赖库第二步:获取并构建 BoringSSL第三步:获取 Nginx

MyBatis Plus实现时间字段自动填充的完整方案

《MyBatisPlus实现时间字段自动填充的完整方案》在日常开发中,我们经常需要记录数据的创建时间和更新时间,传统的做法是在每次插入或更新操作时手动设置这些时间字段,这种方式不仅繁琐,还容易遗漏,... 目录前言解决目标技术栈实现步骤1. 实体类注解配置2. 创建元数据处理器3. 服务层代码优化填充机制详

Python实现Excel批量样式修改器(附完整代码)

《Python实现Excel批量样式修改器(附完整代码)》这篇文章主要为大家详细介绍了如何使用Python实现一个Excel批量样式修改器,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一... 目录前言功能特性核心功能界面特性系统要求安装说明使用指南基本操作流程高级功能技术实现核心技术栈关键函

Java实现字节字符转bcd编码

《Java实现字节字符转bcd编码》BCD是一种将十进制数字编码为二进制的表示方式,常用于数字显示和存储,本文将介绍如何在Java中实现字节字符转BCD码的过程,需要的小伙伴可以了解下... 目录前言BCD码是什么Java实现字节转bcd编码方法补充总结前言BCD码(Binary-Coded Decima

SpringBoot全局域名替换的实现

《SpringBoot全局域名替换的实现》本文主要介绍了SpringBoot全局域名替换的实现,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一... 目录 项目结构⚙️ 配置文件application.yml️ 配置类AppProperties.Ja