How and when to align to cache line size?(如何以及何时与缓存行大小对齐?)
问题描述
在 Dmitry Vyukov 用 C++ 编写的优秀有界 mpmc 队列中请参阅:http://www.1024cores.net/home/lock-free-算法/队列/bounded-mpmc-queue
In Dmitry Vyukov's excellent bounded mpmc queue written in C++ See: http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
他添加了一些填充变量.我认为这是为了使其与缓存线对齐以提高性能.
He adds some padding variables. I presume this is to make it align to a cache line for performance.
我有一些问题.
- 为什么要这样做?
- 它是一种可移植的方法吗?永远工作
- 在什么情况下最好使用
__attribute__((aligned (64)))
代替. 为什么在缓冲区指针之前填充有助于提高性能?不只是加载到缓存中的指针所以它真的只是一个指针的大小吗?
- Why is it done in this way?
- Is it a portable method that will always work
- In what cases would it be best to use
__attribute__ ((aligned (64)))
instead. why would padding before a buffer pointer help with performance? isn't just the pointer loaded into the cache so it's really only the size of a pointer?
static size_t const cacheline_size = 64;
typedef char cacheline_pad_t [cacheline_size];
cacheline_pad_t pad0_;
cell_t* const buffer_;
size_t const buffer_mask_;
cacheline_pad_t pad1_;
std::atomic<size_t> enqueue_pos_;
cacheline_pad_t pad2_;
std::atomic<size_t> dequeue_pos_;
cacheline_pad_t pad3_;
这个概念在 gcc 下是否适用于 c 代码?
Would this concept work under gcc for c code?
推荐答案
这样做是为了让修改不同字段的不同内核不必在它们的缓存之间反弹包含它们的缓存行.一般来说,处理器要访问内存中的某些数据,包含它的整个缓存行必须在该处理器的本地缓存中.如果它正在修改该数据,则该缓存条目通常必须是系统中任何缓存中的唯一副本(MESI/MOESI 样式缓存一致性协议中的独占模式).当不同的内核尝试修改恰好位于同一缓存行上的不同数据,从而浪费时间来回移动整行时,这称为错误共享.
It's done this way so that different cores modifying different fields won't have to bounce the cache line containing both of them between their caches. In general, for a processor to access some data in memory, the entire cache line containing it must be in that processor's local cache. If it's modifying that data, that cache entry usually must be the only copy in any cache in the system (Exclusive mode in the MESI/MOESI-style cache coherence protocols). When separate cores try to modify different data that happens to live on the same cache line, and thus waste time moving that whole line back and forth, that's known as false sharing.
在您给出的特定示例中,一个核心可以将条目入队(读取(共享)buffer_
和写入(独占)enqueue_pos_
),而另一个核心出队(共享)buffer_
和独占的 dequeue_pos_
),而没有一个内核在另一个拥有的缓存线上停滞.
In the particular example you give, one core can be enqueueing an entry (reading (shared) buffer_
and writing (exclusive) only enqueue_pos_
) while another dequeues (shared buffer_
and exclusive dequeue_pos_
) without either core stalling on a cache line owned by the other.
开头的填充意味着 buffer_
和 buffer_mask_
最终位于同一缓存行上,而不是分成两行,因此需要双倍的内存流量才能访问.
The padding at the beginning means that buffer_
and buffer_mask_
end up on the same cache line, rather than split across two lines and thus requiring double the memory traffic to access.
我不确定该技术是否完全可移植.假设每个 (见评论)cacheline_pad_t
本身将与 64 字节(其大小)缓存线边界对齐,因此接下来的任何内容都将位于下一个缓存线上.据我所知,C 和 C++ 语言标准只需要整个结构的这一点,这样它们就可以很好地存在于数组中,而不会违反其任何成员的对齐要求.
I'm unsure whether the technique is entirely portable. The assumption is that each (see comments)cacheline_pad_t
will itself be aligned to a 64 byte (its size) cache line boundary, and hence whatever follows it will be on the next cache line. So far as I know, the C and C++ language standards only require this of whole structures, so that they can live in arrays nicely, without violating alignment requirements of any of their members.
attribute
方法将更加特定于编译器,但可能会将这个结构的大小减半,因为填充将仅限于将每个元素四舍五入到一个完整的缓存行.如果有很多这些,那可能会非常有益.
The attribute
approach would be more compiler specific, but might cut the size of this structure in half, since the padding would be limited to rounding up each element to a full cache line. That could be quite beneficial if one had a lot of these.
同样的概念适用于 C 和 C++.
The same concept applies in C as well as C++.
这篇关于如何以及何时与缓存行大小对齐?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何以及何时与缓存行大小对齐?
- 近似搜索的工作原理 2021-01-01
- 从python回调到c++的选项 2022-11-16
- 如何对自定义类的向量使用std::find()? 2022-11-07
- STL 中有 dereference_iterator 吗? 2022-01-01
- Stroustrup 的 Simple_window.h 2022-01-01
- 使用/clr 时出现 LNK2022 错误 2022-01-01
- 与 int by int 相比,为什么执行 float by float 矩阵乘法更快? 2021-01-01
- C++ 协变模板 2021-01-01
- 静态初始化顺序失败 2022-01-01
- 一起使用 MPI 和 OpenCV 时出现分段错误 2022-01-01