Visual Studio 2017:_mm_load_ps 经常编译为 movups

Visual Studio 2017: _mm_load_ps often compiled to movups(Visual Studio 2017:_mm_load_ps 经常编译为 movups)

本文介绍了Visual Studio 2017:_mm_load_ps 经常编译为 movups的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看为我的代码生成的程序集(使用 Visual Studio 2017),并注意到 _mm_load_ps 通常(总是?)编译为 movups.

I am looking at the generated assembly for my code (using Visual Studio 2017) and noticed that _mm_load_ps is often (always?) compiled to movups.

我使用 _mm_load_ps 的数据是这样定义的:

The data I'm using _mm_load_ps on is defined like this:

struct alignas(16) Vector {
    float v[4];
}

// often embedded in other structs like this
struct AABB {
    Vector min;
    Vector max;
    bool intersection(/* parameters */) const;
}

现在当我使用这个构造时,会发生以下情况:

Now when I'm using this construct, the following will happen:

// this code
__mm128 bb_min = _mm_load_ps(min.v);

// generates this
movups  xmm4, XMMWORD PTR [r8]

由于 alignas(16),我期待 movaps.在这种情况下,我是否需要其他东西来说服编译器使用 movaps?

I'm expecting movaps because of alignas(16). Do I need something else to convince the compiler to use movaps in this case?

我的问题与 this question 不同,因为我没有遇到任何崩溃.该结构是专门对齐的,我也使用对齐分配.相反,我很好奇为什么编译器将 _mm_load_ps (对齐内存的内在属性)切换到 movups.如果我知道 struct 分配在一个对齐的地址并且我通过 this* 调用它,那么使用 movaps 是安全的,对吧?

My question is different from this question because I'm not getting any crashes. The struct is specifically aligned and I'm also using aligned allocation. Rather, I'm curious why the compiler is switching _mm_load_ps (the intrinsic for aligned memory) to movups. If I know struct was allocated at an aligned address and I'm calling it via this* it would be safe to use movaps, right?

推荐答案

在最新版本的 Visual Studio 和英特尔编译器(最近是 2013 年后?)上,编译器很少再生成对齐的 SIMD 加载/存储.

On recent versions of Visual Studio and the Intel Compiler (recent as post-2013?), the compiler rarely ever generates aligned SIMD load/stores anymore.

p>

为 AVX 或更高版本编译时:

  • Microsoft 编译器 (>VS2013?) 不会生成对齐的负载.但它仍然会生成对齐的商店.
  • 英特尔编译器(> Parallel Studio 2012?)根本不再这样做了.但您仍会在其手动优化的库(如 memset())内的 ICC 编译的二进制文件中看到它们.
  • 从 GCC 6.1 开始,当您使用对齐的内在函数时,它仍会生成对齐的加载/存储.
  • The Microsoft compiler (>VS2013?) doesn't generate aligned loads. But it still generates aligned stores.
  • The Intel compiler (> Parallel Studio 2012?) doesn't do it at all anymore. But you'll still see them in ICC-compiled binaries inside their hand-optimized libraries like memset().
  • As of GCC 6.1, it still generates aligned load/stores when you use the aligned intrinsics.

允许编译器执行此操作,因为正确编写代码时不会丢失功能.当地址对齐时,从 Nehalem 开始的所有处理器都不会因未对齐的加载/存储而受到惩罚.

The compiler is allowed to do this because it's not a loss of functionality when the code is written correctly. All processors starting from Nehalem have no penalty for unaligned load/stores when the address is aligned.

微软在这个问题上的立场是它通过不崩溃来帮助程序员".不幸的是,我再也找不到来自 Microsoft 的该声明的原始来源了.在我看来,这完全相反,因为它隐藏了错位惩罚.从正确性的角度来看,它也隐藏了错误的代码.

Microsoft's stance on this issue is that it "helps the programmer by not crashing". Unfortunately, I can't find the original source for this statement from Microsoft anymore. In my opinion, this achieves the exact opposite of that because it hides misalignment penalties. From the correctness standpoint, it also hides incorrect code.

无论如何,无条件地使用未对齐的加载/存储确实会稍微简化编译器.

Whatever the case is, unconditionally using unaligned load/stores does simplify the compiler a bit.

新关联:

  • 从 Parallel Studio 2018 开始,英特尔编译器不再生成对齐的移动 - 即使是 Nehalem 之前的目标也是如此.
  • 从 Visual Studio 2017 开始,Microsoft 编译器也不再生成对齐的移动 - 即使针对 AVX 之前的硬件也是如此.

这两种情况都会导致旧处理器的性能不可避免地下降.但似乎 这个是有意的,因为英特尔和微软都不再关心旧处理器了.

Both cases result in inevitable performance degradation on older processors. But it seems that this is intentional as both Intel and Microsoft no longer care about old processors.

唯一不受此影响的加载/存储内在函数是非临时加载/存储.它们没有未对齐的等价物,因此编译器别无选择.

The only load/store intrinsics that are immune to this are the non-temporal load/stores. There is no unaligned equivalent of them, so the compiler has no choice.

因此,如果您只想测试代码的正确性,可以在加载/存储内在函数中替换非临时代码.但请注意不要让这样的事情溜进生产代码,因为 NT 加载/存储(尤其是 NT 存储)是一把双刃剑,如果您不知道自己在做什么,可能会伤害到您.

So if you want to just test for correctness of your code, you can substitute in the load/store intrinsics for non-temporal ones. But be careful not to let something like this slip into production code since NT load/stores (NT-stores in particular) are a double-edged sword that can hurt you if you don't know what you're doing.

这篇关于Visual Studio 2017:_mm_load_ps 经常编译为 movups的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:Visual Studio 2017:_mm_load_ps 经常编译为 movups