CLR Threadpool 的内部机制

2024-06-12 18:32
文章标签 机制 内部 clr threadpool

本文主要是介绍CLR Threadpool 的内部机制,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

转自:New and Improved CLR 4 Thread Pool Engine

A way to think of the CLR thread pool and how it gets used, e.g. when you call ThreadPool.QueueUserWorkItem, is to picture a global queue where work items (essentially delegates) get queued on a global queue and multiple threads pick them out in a First In First Out order. The FIFO order is not something that is documented or guaranteed, but my personal guess is that too many applications rely on it, so I don’t see it changing any time soon.

The image on the left shows the main program thread as it is creating a work item; the second image shows the global thread pool queue after code has queued 3 work items; the third image shows 2 threads from the thread pool that have grabbed 2 work items and executing them. If in the context of those work items (i.e. from the executing code in the delegates) more work items get created for the CLR thread pool they end up on the global queue (see image on the right) and life goes on.

CLR Thread Pool v4 from a System.Threading.Tasks perspective
In CLR 4, the thread pool engine has had some improvements made to it (it has been having positive tweaks in every release of the CLR) and part of these improvements are some performance gains that are achievable when using the new System.Threading.Tasks.Task type. I'll show a code example in another post, but you can think of creating and starting a Task (passing it a delegate) as the equivalent of calling QueueUserWorkItem on the ThreadPool. A way to visualize the CLR thread pool when used via the Task-based API is that, in addition to the single global queue, each thread in the thread pool has its own local queue:

Just as with normal thread pool usage, the main program thread may create Tasks that will get queued on the global queue (e.g. Task1 and Task2) and threads will grab those Tasks typically in a FIFO manner. Where things diverge is that any new Tasks (e.g. Task3) created in the context of the executing Task (e.g. Task2) end up on a local queue for that thread pool thread.

Why Local Queues
With the era of manycore machines upon us and devs taking advantage of parallelism, the number of threads in the thread pool is likely to increase: at a minimum equal to the number of cores for compute bound operations, and likely more due to injection of additional threads as a result of IO bound operations or blocking calls stalling the CPU. Bottom line: more cores = more threads.

With more threads competing for work items, it is not optimal to put up with the contention issues of a single queue that all of them are trying to access safely. This would be amplified by the goal of fine grained parallelism where each work item finishes fairly quickly, so the trips to the global queue would be frequent.

It is for this reason we introduce a local queue per thread, where additional tasks get queued (with no contention) and will then be retrieved by that same thread (again with no contention).

LIFO
So, picking up from the picture further up, let's assume that Task2 additionally creates two more Tasks e.g. Task4 and Task5.

The tasks end up on the local queue as expected, but which Task does the thread pick to execute when it completes its current task (i.e. Task2)? The initially surprising answer is that it could be Task5, which is the last one that was queued – in other words a LIFO algorithm can be used for the local queues.

The reason that this is a good thing is locality. In most scenarios the data required by the last created Task in the queue is still hot in the cache, so it makes sense to pull that down and execute it. Obviously, this means there are no promises on ordering, but some level of fairness is relinquished in the sake of better performance.

Work Stealing
If the only enhancements were the introduction of LIFO local queues then performance is greatly increased, but you should be concerned. You should be concerned about what happens when another thread in the thread pool (likely executing on another core) finishes its work. Luckily, you don't have to be:

The other worker thread completes Task1 and then goes to its local queue and finds it empty; it then goes to the global queue and finds it empty. We don't want it sitting there idle so a beautiful thing happens: work stealing. The thread goes to a local queue of another thread and "steals" a Task and executes it! That way we keep all our cores busy and this contributes to our fine grained parallelism load balancing goal. In the image above notice that "stealing" happens in a FIFO manner, which again for locality reasons is good (its data would be cold in the cache). Furthermore, in many divide and conquer scenarios, tasks generated earlier on are likely to generate more work (e.g. Task6) themselves, which would end up now on this other thread's queue and hence reduce frequent stealing.
Nitpicking note: further up, I mentioned "no contention"; clearly work stealing is the exception to that rule.

What's Next
In addition to the links sprinkled above, you can find a simple implementation of a work stealing threadpool on Joe's blog post and some of the concepts above are touched on in this MSDN mag article by Eric and Erika. If you want to see the pictures above in a fully animated slide, get my PDC deck (slide 8) from this page. Do note that the implementation I am talking about above did not ship with the September CTP of Visual Studio 2010, but is slated for the next drop. Also note that all internal implementation details are subject to change and just shared here for the geek factor ;-)

So you can see how using the Task-based API will yield cool perf gains under the covers. An additional reason to use the new Task-based API is because of its richness. I'll touch on that in a next blog post.

这篇关于CLR Threadpool 的内部机制的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1055003

相关文章

JAVA线程的周期及调度机制详解

《JAVA线程的周期及调度机制详解》Java线程的生命周期包括NEW、RUNNABLE、BLOCKED、WAITING、TIMED_WAITING和TERMINATED,线程调度依赖操作系统,采用抢占... 目录Java线程的生命周期线程状态转换示例代码JAVA线程调度机制优先级设置示例注意事项JAVA线程

Java中自旋锁与CAS机制的深层关系与区别

《Java中自旋锁与CAS机制的深层关系与区别》CAS算法即比较并替换,是一种实现并发编程时常用到的算法,Java并发包中的很多类都使用了CAS算法,:本文主要介绍Java中自旋锁与CAS机制深层... 目录1. 引言2. 比较并交换 (Compare-and-Swap, CAS) 核心原理2.1 CAS

Spring Boot 集成 mybatis核心机制

《SpringBoot集成mybatis核心机制》这篇文章给大家介绍SpringBoot集成mybatis核心机制,本文结合实例代码给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值... 目录Spring Boot浅析1.依赖管理(Starter POMs)2.自动配置(AutoConfigu

Redis的安全机制详细介绍及配置方法

《Redis的安全机制详细介绍及配置方法》本文介绍Redis安全机制的配置方法,包括绑定IP地址、设置密码、保护模式、禁用危险命令、防火墙限制、TLS加密、客户端连接限制、最大内存使用和日志审计等,通... 目录1. 绑定 IP 地址2. 设置密码3. 保护模式4. 禁用危险命令5. 通过防火墙限制访问6.

JAVA实现Token自动续期机制的示例代码

《JAVA实现Token自动续期机制的示例代码》本文主要介绍了JAVA实现Token自动续期机制的示例代码,通过动态调整会话生命周期平衡安全性与用户体验,解决固定有效期Token带来的风险与不便,感兴... 目录1. 固定有效期Token的内在局限性2. 自动续期机制:兼顾安全与体验的解决方案3. 总结PS

详解Spring中REQUIRED事务的回滚机制详解

《详解Spring中REQUIRED事务的回滚机制详解》在Spring的事务管理中,REQUIRED是最常用也是默认的事务传播属性,本文就来详细的介绍一下Spring中REQUIRED事务的回滚机制,... 目录1. REQUIRED 的定义2. REQUIRED 下的回滚机制2.1 异常触发回滚2.2 回

基于Redis自动过期的流处理暂停机制

《基于Redis自动过期的流处理暂停机制》基于Redis自动过期的流处理暂停机制是一种高效、可靠且易于实现的解决方案,防止延时过大的数据影响实时处理自动恢复处理,以避免积压的数据影响实时性,下面就来详... 目录核心思路代码实现1. 初始化Redis连接和键前缀2. 接收数据时检查暂停状态3. 检测到延时过

Redis中哨兵机制和集群的区别及说明

《Redis中哨兵机制和集群的区别及说明》Redis哨兵通过主从复制实现高可用,适用于中小规模数据;集群采用分布式分片,支持动态扩展,适合大规模数据,哨兵管理简单但扩展性弱,集群性能更强但架构复杂,根... 目录一、架构设计与节点角色1. 哨兵机制(Sentinel)2. 集群(Cluster)二、数据分片

深入理解go中interface机制

《深入理解go中interface机制》本文主要介绍了深入理解go中interface机制,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学... 目录前言interface使用类型判断总结前言go的interface是一组method的集合,不

C# async await 异步编程实现机制详解

《C#asyncawait异步编程实现机制详解》async/await是C#5.0引入的语法糖,它基于**状态机(StateMachine)**模式实现,将异步方法转换为编译器生成的状态机类,本... 目录一、async/await 异步编程实现机制1.1 核心概念1.2 编译器转换过程1.3 关键组件解析