Function crashes when using _mm_load_pd(使用 _mm_load_pd 时函数崩溃)
问题描述
我有以下功能:
template <typename T>
void SSE_vectormult(T * A, T * B, int size)
{
__m128d a;
__m128d b;
__m128d c;
double A2[2], B2[2], C[2];
const double * A2ptr, * B2ptr;
A2ptr = &A2[0];
B2ptr = &B2[0];
a = _mm_load_pd(A);
for(int i = 0; i < size; i+=2)
{
std::cout << "In SSE_vectormult: i is: " << i << '
';
A2[0] = A[i];
B2[0] = B[i];
A2[1] = A[i+1];
B2[1] = B[i+1];
std::cout << "Values from A and B written to A2 and B2
";
a = _mm_load_pd(A2ptr);
b = _mm_load_pd(B2ptr);
std::cout << "Values converted to a and b
";
c = _mm_mul_pd(a,b);
_mm_store_pd(C, c);
A[i] = C[0];
A[i+1] = C[1];
};
// const int mask = 0xf1;
// __m128d res = _mm_dp_pd(a,b,mask);
// r1 = _mm_mul_pd(a, b);
// r2 = _mm_hadd_pd(r1, r1);
// c = _mm_hadd_pd(r2, r2);
// c = _mm_scale_pd(a, b);
// _mm_store_pd(A, c);
}
当我在 Linux 上调用它时,一切都很好,但是当我在 Windows 操作系统上调用它时,我的程序崩溃并显示程序不再工作".我做错了什么,如何确定我的错误?
When I am calling it on Linux, everything is fine, but when I am calling it on a windows OS, my program crashes with "program is not working anymore". What am I doing wrong, and how can I determine my error?
推荐答案
不保证您的数据按照 SSE 加载的要求进行 16 字节对齐.要么使用 _mm_loadu_pd
:
Your data is not guaranteed to be 16 byte aligned as required by SSE loads. Either use _mm_loadu_pd
:
a = _mm_loadu_pd(A);
...
a = _mm_loadu_pd(A2ptr);
b = _mm_loadu_pd(B2ptr);
或确保您的数据在可能的情况下正确对齐,例如对于静态或本地:
or make sure that your data is correctly aligned where possible, e.g. for static or locals:
alignas(16) double A2[2], B2[2], C[2]; // C++11, or C11 with <stdalign.h>
或不使用 C++11,使用特定于编译器的语言扩展:
or without C++11, using compiler-specific language extensions:
__attribute__ ((aligned(16))) double A2[2], B2[2], C[2]; // gcc/clang/ICC/et al
__declspec (align(16)) double A2[2], B2[2], C[2]; // MSVC
您可以使用 #ifdef
来 #define
一个适用于目标编译器的 ALIGN(x)
宏.
You could use #ifdef
to #define
an ALIGN(x)
macro that works on the target compiler.
这篇关于使用 _mm_load_pd 时函数崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:使用 _mm_load_pd 时函数崩溃
- STL 中有 dereference_iterator 吗? 2022-01-01
- 一起使用 MPI 和 OpenCV 时出现分段错误 2022-01-01
- 从python回调到c++的选项 2022-11-16
- Stroustrup 的 Simple_window.h 2022-01-01
- 如何对自定义类的向量使用std::find()? 2022-11-07
- C++ 协变模板 2021-01-01
- 使用/clr 时出现 LNK2022 错误 2022-01-01
- 近似搜索的工作原理 2021-01-01
- 与 int by int 相比,为什么执行 float by float 矩阵乘法更快? 2021-01-01
- 静态初始化顺序失败 2022-01-01