Opentsdb官方优化文档 - 翻译

2024-01-11 07:20

本文主要是介绍Opentsdb官方优化文档 - 翻译,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

文档地址 :

Tuning — OpenTSDB 2.4 documentation

Tuning

As with any database there are many tuning parameters for OpenTSDB that can be used to improve write and read performance. Some of these options are specific to certain backends, others are global.

翻译:

与任何数据库一样,OpenTSDB有许多调优参数可用于提高写和读性能。其中一些选项是特定于某些后端的,另一些则是全局的。

TSDB Memory

As with any server process, tuning the TSD’s memory footprint is important for a successful install. There are a number of caches in OpenTSDB that can be tuned.

翻译:

与任何服务器进程一样,调整TSD的内存占用对于成功安装非常重要。OpenTSDB中有许多缓存可以进行调优

UID Caches

OpenTSDB saves space in storage by converting strings to binary UIDs. However for writing and querying, these UIDs have to be cached to improve performance (otherwise every operation would amplify queries against storage). In early versions of OpenTSDB, the cache would maintain an unbounded map of strings to UIDs and UIDs to strings regardless of modes of operation. In 2.4 you can choose how much data to cache and optionally use an LRU. The pertinent settings include:

  • ‘tsd.uid.use_mode=true’ - When this is set to true, the tsd.mode value is examined to determine if either or both of the forward (string to UID) and reverse (UID to string) maps are populated. When empty or set to rw, both maps are populated. When set to ro, only the reverse map is populated (as query results need caching). When in w or write-only mode, only the forward map is populated as metrics need to find their UIDs when writing.

  • tsd.uid.lru.enable=true - Switch to the LRU version of the cache that will expire entries that haven’t been read in a while.

  • tsd.uid.lru.id.size=<integer> - For TSDs focused on reads, set this size larger than tsd.uid.lru.name.size.

  • tsd.uid.lru.name.size=<integer> - For TSDs focused on writes, set this size larger than the tsd.uid.lru.id.size.

翻译:

OpenTSDB通过将字符串转换为二进制UID来节省存储空间。然而,对于写入和查询,必须缓存这些UID以提高性能(否则,每个操作都会放大对存储的查询)。在早期版本的OpenTSDB中,无论操作模式如何,缓存都会维护字符串到UID和UID到字符串的无边界映射。在2.4中,您可以选择缓存多少数据,也可以选择使用LRU。相关设置包括:

“tsd.uid.use_mode=true”-如果设置为true,将检查tsd.mode值,以确定是否填充正向(字符串到uid)和反向(uid到字符串)映射中的一个或两个。当为空或设置为rw时,将填充两个映射。当设置为ro时,只填充反向映射(因为查询结果需要缓存)。当处于w或仅写模式时,由于写入时度量需要找到其UID,因此仅填充前向映射。

tsd.uid.lru.enable=true-切换到缓存的lru版本,该版本将使一段时间内未读取的条目过期。

tsd.uid.lru.id.size=<integer>-对于专注于读取的tsd,请将此大小设置为大于tsd.uid.lou.name.size。

tsd.uid.lru.name.size=<integer>-对于专注于写入的tsd,请将此大小设置为大于tsd.uid.loi.id.size。

HBase Storage

These are parameters to look at for using OpenTSDB with Apache HBase.

Date Tierd Compaction

HBase is an LSM store meaning data is written to blocks in memory, then flushed in blocks to disk, then periodically those blocks on disk are merged into fewer blocks to reduce file count and redundancies. The process of reading and merging blocks from disk can consume a fair amount of CPU and memory so if a system is heavily loaded with time series write traffic, background compactions can begin to impact write performance. HBase offers various compaction strategies including Date Tiered Compaction that looks at the the edit timestamps of columns to figure out if stale file that haven’t been modified should be ignored and arranges data so that time ranged queries are much more efficient.

OpenTSDB can leverage this compaction strategy (as of 2.4) to improve HBase performance, particularly for queries. To setup Date Tiered Compaction in HBase, see HBase Book. Parameters that must be set in the OpenTSDB config include:

  • tsd.storage.use_otsdb_timestamp=true - By default the edit timestamps of every data point is now. This changes the timestamp to be that of the actual data point. Note that rollups do not use this strategy yet.

  • tsd.storage.get_date_tiered_compaction_start=<Unix Epoch MS timestamp> - A timestamp in milliseconds when the date tiered compactor was enabled on a table. If you are creating a brand new table for OpenTSDB you can leave this at the default of 0.

HBase是一种LSM存储,意味着数据被写入内存中的块,然后在块中刷新到磁盘,然后定期将磁盘上的这些块合并为更少的块,以减少文件数量和冗余。从磁盘读取和合并块的过程可能会消耗相当多的CPU和内存,因此,如果系统负载了大量时间序列写入流量,则后台压缩可能会开始影响写入性能。HBase提供了各种压缩策略,包括日期分层压缩,它查看列的编辑时间戳,以确定是否应该忽略尚未修改的陈旧文件,并排列数据,从而使时间范围的查询更加高效。

OpenTSDB可以利用这种压缩策略(从2.4开始)来提高HBase的性能,特别是对于查询。要在HBase中设置日期分层压缩,请参阅HBase Book。必须在OpenTSDB配置中设置的参数包括:

HBase Read/Write Queues

HBase has the ability to split the queues that handle RPCs for writes (mutations) and reads. This is a huge help for OpenTSDB as massive queries will avoid impacting writes and vice-versa. See the HBase Book for various configuration options. Note that different HBase versions have different names for these properties.

Possible starting values are 50% or 60%. E.g. hbase.ipc.server.callqueue.read.share=0.60 or hbase.ipc.server.callqueue.read.ratio=0.60 depending on HBase version.

Also, tuning the call queue size and threads is important. With a large queue, the region server can possibly OOM (if there are a large number of writes backing up) and RPCs will be more likely to time out on the client side if they sit in the queue too long. The queue size is a factor of the number of handlers. Reducing the call queue size also helps to cause clients to throttle, particularly the AsyncHBase client. For HBase 1.3 a good setting may be hbase.ipc.server.max.callqueue.length=100 and hbase.ipc.server.read.threadpool.size=2. If you need to increase the threads, reduce the queue length as well.

翻译:

HBase能够拆分处理写(突变)和读的RPC的队列。这对OpenTSDB来说是一个巨大的帮助,因为大量查询将避免影响写入,反之亦然。有关各种配置选项,请参阅HBase Book。请注意,不同的HBase版本对这些属性有不同的名称。

可能的起始值为50%或60%。例如,hbase.ipc.server.callqueue.read.share=0.60或hbase.iipc.server.callequeue.read.tratio=0.60,具体取决于hbase版本。

此外,调整调用队列大小和线程也很重要。对于大队列,区域服务器可能会OOM(如果有大量写入备份),如果RPC在队列中停留的时间过长,则更有可能在客户端超时。队列大小是处理程序数量的一个因素。减少调用队列大小也有助于导致客户端节流,尤其是AsyncHBase客户端。对于HBase 1.3,一个好的设置可能是HBase.ipc.server.max.callqueue.length=100和HBase.iipc.server.read.threadpool.size=2。如果需要增加线程,也可以减少队列长度。

HBase Cache

Tuning the HBase cache is also important for queries as you want to avoid reading data off disk as much as possible (particularly if that disk is S3). Try using the offheap cache via hbase.bucketcache.combinedcache.enabled=true and hbase.bucketcache.ioengine=offheap. Give the offheap cache a good amount of RAM, e.g. hbase.bucketcache.size=4000 for 4GB of RAM. Since the most recent data is usually queried when reading time series, it’s a good idea to populate the block cache on writes and use most of that space for the latest data. Try the following settings:

hbase.rs.cacheblocksonwrite=true
hbase.rs.evictblocksonclose=false
hfile.block.bloom.cacheonwrite=true
hfile.block.index.cacheonwrite=true
hbase.block.data.cachecompressed=true
hbase.bucketcache.blockcache.single.percentage=.99
hbase.bucketcache.blockcache.multi.percentage=0
hbase.bucketcache.blockcache.memory.percentage=.01
hfile.block.cache.size=.054 #ignored but needs a value.

This will allocate the majority of the black cache for writes and cache it in memory.

For the on-heap cache, you can try an allocation of:

hbase.lru.blockcache.single.percentage=.50
hbase.lru.blockcache.multi.percentage=.49
hbase.lru.blockcache.memory.percentage=.01

翻译:

调整HBase缓存对于查询也很重要,因为您希望尽可能避免从磁盘上读取数据(尤其是当磁盘是S3时)。尝试通过hbase.backetcache.cocombinedcache.enabled=true和hbase.blocketcache.ioengine=offheap使用堆外缓存。为堆外缓存提供充足的RAM,例如,对于4GB的RAM,hbase.bocketcache.size=4000。由于读取时间序列时通常会查询最新的数据,因此最好在写入时填充块缓存,并将大部分空间用于最新数据。请尝试以下设置:

HBase Compaction

Compaction is the process of merging multiple stores on disk for a region into fewer files to reduce space and query times. You can tune the number of threads and thresholds for compaction to avoid using too many resources when the focus should be on writes.

hbase.hstore.compaction.ratio=1.2
hbase.regionserver.thread.compaction.large=2
hbase.regionserver.thread.compaction.small=6
hbase.regionserver.thread.compaction.throttle=524288000

压缩是将一个区域的磁盘上的多个存储合并为更少的文件以减少空间和查询时间的过程。您可以调整线程数量和压缩阈值,以避免在应该将重点放在写入上时使用过多资源。

HBase Regions

For OpenTSDB we’ve observed that 10G regions are a good size for a large cluster hbase.hregion.max.filesize=10737418240.

翻译:

对于OpenTSDB,我们已经观察到10G区域对于大型集群hbase.hregion.max.filesize=1037418240来说是一个不错的大小。

HBase Memstore

Try flushing the mem store to disk (and cache) more often, particularly for heavy write loads. We’ve seen good behavior with 16MBs hbase.hregion.memstore.flush.size=16777216. Also try reducing the memstore size limit via hbase.regionserver.global.memstore.lowerLimit=.20 and hbase.regionserver.global.memstore.upperLimit=.30.

翻译: 

请尝试更频繁地将内存存储刷新到磁盘(和缓存),尤其是在写负载很大的情况下。我们已经看到了16MB hbase.hregion.memstore.flash.size=16777216的良好性能。也可以尝试通过hbase.regiserver.global.membestore.lowerLimit=.20和hbase.reregionserver.global-memstore.upperLimit=.30来减少内存存储大小限制。

这篇关于Opentsdb官方优化文档 - 翻译的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/593575

相关文章

C#实现一键批量合并PDF文档

《C#实现一键批量合并PDF文档》这篇文章主要为大家详细介绍了如何使用C#实现一键批量合并PDF文档功能,文中的示例代码简洁易懂,感兴趣的小伙伴可以跟随小编一起学习一下... 目录前言效果展示功能实现1、添加文件2、文件分组(书签)3、定义页码范围4、自定义显示5、定义页面尺寸6、PDF批量合并7、其他方法

Java实现在Word文档中添加文本水印和图片水印的操作指南

《Java实现在Word文档中添加文本水印和图片水印的操作指南》在当今数字时代,文档的自动化处理与安全防护变得尤为重要,无论是为了保护版权、推广品牌,还是为了在文档中加入特定的标识,为Word文档添加... 目录引言Spire.Doc for Java:高效Word文档处理的利器代码实战:使用Java为Wo

使用Python实现Word文档的自动化对比方案

《使用Python实现Word文档的自动化对比方案》我们经常需要比较两个Word文档的版本差异,无论是合同修订、论文修改还是代码文档更新,人工比对不仅效率低下,还容易遗漏关键改动,下面通过一个实际案例... 目录引言一、使用python-docx库解析文档结构二、使用difflib进行差异比对三、高级对比方

从原理到实战解析Java Stream 的并行流性能优化

《从原理到实战解析JavaStream的并行流性能优化》本文给大家介绍JavaStream的并行流性能优化:从原理到实战的全攻略,本文通过实例代码给大家介绍的非常详细,对大家的学习或工作具有一定的... 目录一、并行流的核心原理与适用场景二、性能优化的核心策略1. 合理设置并行度:打破默认阈值2. 避免装箱

Python自动化处理PDF文档的操作完整指南

《Python自动化处理PDF文档的操作完整指南》在办公自动化中,PDF文档处理是一项常见需求,本文将介绍如何使用Python实现PDF文档的自动化处理,感兴趣的小伙伴可以跟随小编一起学习一下... 目录使用pymupdf读写PDF文件基本概念安装pymupdf提取文本内容提取图像添加水印使用pdfplum

Python从Word文档中提取图片并生成PPT的操作代码

《Python从Word文档中提取图片并生成PPT的操作代码》在日常办公场景中,我们经常需要从Word文档中提取图片,并将这些图片整理到PowerPoint幻灯片中,手动完成这一任务既耗时又容易出错,... 目录引言背景与需求解决方案概述代码解析代码核心逻辑说明总结引言在日常办公场景中,我们经常需要从 W

Python实战之SEO优化自动化工具开发指南

《Python实战之SEO优化自动化工具开发指南》在数字化营销时代,搜索引擎优化(SEO)已成为网站获取流量的重要手段,本文将带您使用Python开发一套完整的SEO自动化工具,需要的可以了解下... 目录前言项目概述技术栈选择核心模块实现1. 关键词研究模块2. 网站技术seo检测模块3. 内容优化分析模

Java实现复杂查询优化的7个技巧小结

《Java实现复杂查询优化的7个技巧小结》在Java项目中,复杂查询是开发者面临的“硬骨头”,本文将通过7个实战技巧,结合代码示例和性能对比,手把手教你如何让复杂查询变得优雅,大家可以根据需求进行选择... 目录一、复杂查询的痛点:为何你的代码“又臭又长”1.1冗余变量与中间状态1.2重复查询与性能陷阱1.

Python内存优化的实战技巧分享

《Python内存优化的实战技巧分享》Python作为一门解释型语言,虽然在开发效率上有着显著优势,但在执行效率方面往往被诟病,然而,通过合理的内存优化策略,我们可以让Python程序的运行速度提升3... 目录前言python内存管理机制引用计数机制垃圾回收机制内存泄漏的常见原因1. 循环引用2. 全局变

C#高效实现Word文档内容查找与替换的6种方法

《C#高效实现Word文档内容查找与替换的6种方法》在日常文档处理工作中,尤其是面对大型Word文档时,手动查找、替换文本往往既耗时又容易出错,本文整理了C#查找与替换Word内容的6种方法,大家可以... 目录环境准备方法一:查找文本并替换为新文本方法二:使用正则表达式查找并替换文本方法三:将文本替换为图