Parallel for loop in openmp(openmp中的并行for循环)
问题描述
我正在尝试并行化一个非常简单的 for 循环,但这是我很长时间以来第一次尝试使用 openMP.我对运行时间感到困惑.这是我的代码:
I'm trying to parallelize a very simple for-loop, but this is my first attempt at using openMP in a long time. I'm getting baffled by the run times. Here is my code:
#include <vector>
#include <algorithm>
using namespace std;
int main ()
{
int n=400000, m=1000;
double x=0,y=0;
double s=0;
vector< double > shifts(n,0);
#pragma omp parallel for
for (int j=0; j<n; j++) {
double r=0.0;
for (int i=0; i < m; i++){
double rand_g1 = cos(i/double(m));
double rand_g2 = sin(i/double(m));
x += rand_g1;
y += rand_g2;
r += sqrt(rand_g1*rand_g1 + rand_g2*rand_g2);
}
shifts[j] = r / m;
}
cout << *std::max_element( shifts.begin(), shifts.end() ) << endl;
}
我用
g++ -O3 testMP.cc -o testMP -I /opt/boost_1_48_0/include
也就是说,没有-fopenmp",我得到了这些时间:
that is, no "-fopenmp", and I get these timings:
real 0m18.417s
user 0m18.357s
sys 0m0.004s
当我使用-fopenmp"时,
when I do use "-fopenmp",
g++ -O3 -fopenmp testMP.cc -o testMP -I /opt/boost_1_48_0/include
我得到了这些数字:
real 0m6.853s
user 0m52.007s
sys 0m0.008s
这对我来说没有意义.如何使用八个内核只能导致 3 倍性能提升?我是否正确编码循环?
which doesn't make sense to me. How using eight cores can only result in just 3-fold increase of performance? Am I coding the loop correctly?
推荐答案
您应该对 x
和 y
使用 OpenMP reduction
子句>:
You should make use of the OpenMP reduction
clause for x
and y
:
#pragma omp parallel for reduction(+:x,y)
for (int j=0; j<n; j++) {
double r=0.0;
for (int i=0; i < m; i++){
double rand_g1 = cos(i/double(m));
double rand_g2 = sin(i/double(m));
x += rand_g1;
y += rand_g2;
r += sqrt(rand_g1*rand_g1 + rand_g2*rand_g2);
}
shifts[j] = r / m;
}
使用 reduction
每个线程在 x
和 y
中累积自己的部分和,最后将所有部分值相加,以便获取最终值.
With reduction
each thread accumulates its own partial sum in x
and y
and in the end all partial values are summed together in order to obtain the final values.
Serial version:
25.05s user 0.01s system 99% cpu 25.059 total
OpenMP version w/ OMP_NUM_THREADS=16:
24.76s user 0.02s system 1590% cpu 1.559 total
参见 - 超线性加速 :)
See - superlinear speed-up :)
这篇关于openmp中的并行for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:openmp中的并行for循环
- 近似搜索的工作原理 2021-01-01
- C++ 协变模板 2021-01-01
- 从python回调到c++的选项 2022-11-16
- 静态初始化顺序失败 2022-01-01
- 使用/clr 时出现 LNK2022 错误 2022-01-01
- 一起使用 MPI 和 OpenCV 时出现分段错误 2022-01-01
- 如何对自定义类的向量使用std::find()? 2022-11-07
- Stroustrup 的 Simple_window.h 2022-01-01
- STL 中有 dereference_iterator 吗? 2022-01-01
- 与 int by int 相比,为什么执行 float by float 矩阵乘法更快? 2021-01-01