2d char array to CUDA kernel(二维字符数组到 CUDA 内核)
问题描述
我需要帮助将 char[][] 转移到 Cuda 内核.这是我的代码:
I need help with transfer char[][] to Cuda kernel. This is my code:
__global__
void kernel(char** BiExponent){
for(int i=0; i<500; i++)
printf("%c",BiExponent[1][i]); // I want print line 1
}
int main(){
char (*Bi2dChar)[500] = new char [5000][500];
char **dev_Bi2dChar;
...//HERE I INPUT DATA TO Bi2dChar
size_t host_orig_pitch = 500 * sizeof(char);
size_t pitch;
cudaMallocPitch((void**)&dev_Bi2dChar, &pitch, 500 * sizeof(char), 5000);
cudaMemcpy2D(dev_Bi2dChar, pitch, Bi2dChar, host_orig_pitch, 500 * sizeof(char), 5000, cudaMemcpyHostToDevice);
kernel <<< 1, 512 >>> (dev_Bi2dChar);
free(Bi2dChar); cudaFree(dev_Bi2dChar);
}
我使用:nvcc.exe" -gencode=arch=compute_20,code="sm_20,compute_20" --use-local-env --cl-version 2012 -ccbin
I use: nvcc.exe" -gencode=arch=compute_20,code="sm_20,compute_20" --use-local-env --cl-version 2012 -ccbin
感谢您的帮助.
推荐答案
cudaMemcpy2D
实际上并不处理二维(即双指针,**
)数组C.请注意,文档 表明它需要单个指针,不是双指针.
cudaMemcpy2D
doesn't actually handle 2-dimensional (i.e. double pointer, **
) arrays in C.
Note that the documentation indicates it expects single pointers, not double pointers.
一般来说,在主机和设备之间移动任意双指针 C 数组比单指针数组更复杂.
Generally speaking, moving arbitrary double pointer C arrays between the host and the device is more complicated than a single pointer array.
如果你真的想处理双指针数组,那么在这个页面的右上角搜索CUDA 2D Array",你会发现如何做的各种例子.(例如,@talonmies 给出的答案这里)
If you really want to handle the double-pointer array, then search on "CUDA 2D Array" in the upper right hand corner of this page, and you'll find various examples of how to do it. (For example, the answer given by @talonmies here)
通常,更简单的方法是简单地展平"数组,以便它可以被单个指针引用,即 char[]
而不是 char[][]
,然后使用索引算法来模拟二维访问.
Often, an easier approach is simply to "flatten" the array so it can be referenced by a single pointer, i.e. char[]
instead of char[][]
, and then use index arithmetic to simulate 2-dimensional access.
您的扁平化代码如下所示:(您提供的代码是不可编译的、不完整的代码段,我的也是)
Your flattened code would look something like this: (the code you provided is an uncompilable, incomplete snippet, so mine is also)
#define XDIM 5000
#define YDIM 500
__global__
void kernel(char* BiExponent){
for(int i=0; i<500; i++)
printf("%c",BiExponent[(1*XDIM)+i]); // I want print line 1
}
int main(){
char (*Bi2dChar)[YDIM] = new char [XDIM][YDIM];
char *dev_Bi2dChar;
...//HERE I INPUT DATA TO Bi2dChar
cudaMalloc((void**)&dev_Bi2dChar,XDIM*YDIM * sizeof(char));
cudaMemcpy(dev_Bi2dChar, &(Bi2dChar[0][0]), host_orig_pitch, XDIM*YDIM * sizeof(char), cudaMemcpyHostToDevice);
kernel <<< 1, 512 >>> (dev_Bi2dChar);
free(Bi2dChar); cudaFree(dev_Bi2dChar);
}
如果你想要一个有间距的数组,你可以类似地创建它,但你仍然会这样做作为单指针数组,而不是双指针数组.
If you want a pitched array, you can create it similarly, but you will still do so as single pointer arrays, not double pointer arrays.
这篇关于二维字符数组到 CUDA 内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:二维字符数组到 CUDA 内核
- 静态初始化顺序失败 2022-01-01
- 近似搜索的工作原理 2021-01-01
- Stroustrup 的 Simple_window.h 2022-01-01
- C++ 协变模板 2021-01-01
- STL 中有 dereference_iterator 吗? 2022-01-01
- 与 int by int 相比,为什么执行 float by float 矩阵乘法更快? 2021-01-01
- 一起使用 MPI 和 OpenCV 时出现分段错误 2022-01-01
- 从python回调到c++的选项 2022-11-16
- 如何对自定义类的向量使用std::find()? 2022-11-07
- 使用/clr 时出现 LNK2022 错误 2022-01-01