带浮点的自定义内核 GpuMat

Custom Kernel GpuMat with float(带浮点的自定义内核 GpuMat)

本文介绍了带浮点的自定义内核 GpuMat的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 GpuMat 数据编写自定义内核来查找图像像素的反余弦值.当 GPU 有 CV_8UC1 数据但不能使用字符计算反余弦时,我可以在上传数据时上传、下载和更改值.但是,当我尝试将我的 GPU 转换为 CV_32FC1 类型(浮点数)时,我在下载部分遇到了非法内存访问错误.这是我的代码:

I'm trying to write a custom kernel using GpuMat data to find the arc cosine of an image's pixels. I can upload, download, and change values when I upload data when the GPU has CV_8UC1 data but chars cannot be used to calculate arc cosines. However, when I try to convert my GPU to CV_32FC1 type (floats) I get an illegal memory access error during the download part. Here is my code:

//.cu code 
#include <cuda_runtime.h>
#include <stdlib.h>
#include <iostream>
#include <stdio.h>
__global__ void funcKernel(const float* srcptr, float* dstptr, size_t srcstep, const     size_t dststep, int cols, int rows){
    int rowInd = blockIdx.y*blockDim.y+threadIdx.y;
    int colInd = blockIdx.x*blockDim.x+threadIdx.x;
    if(rowInd >= rows || colInd >= cols)
            return;
    const float* rowsrcptr=srcptr+rowInd*srcstep;
    float* rowdstPtr=  dstptr+rowInd*dststep;
    float val = rowsrcptr[colInd];
    if((int) val % 90 == 0)
            rowdstPtr[colInd] = -1 ;
    else{
            float acos_val = acos(val);
            rowdstPtr[colInd] = acos_val;
    }
}

int divUp(int a, int b){
    return (a+b-1)/b;
}

extern "C"
{
void func(const float* srcptr, float* dstptr, size_t srcstep, const size_t dststep, int cols, int rows){
    dim3 blDim(32,8);
    dim3 grDim(divUp(cols, blDim.x), divUp(rows,blDim.y));
    std::cout << "calling kernel from func
";
    funcKernel<<<grDim,blDim>>>(srcptr,dstptr,srcstep,dststep,cols,rows);
    std::cout << "done with kernel call
";
     cudaDeviceSynchronize();
}

//.cpp code
void callKernel(const GpuMat &src, GpuMat &dst){
    float* p = (float*)src.data;
    float* p2 =(float*) dst.data;
    func(p,p2,src.step,dst.step,src.cols,src.rows);
}

int main(){
    Mat input = imread("cat.jpg",0);
    Mat float_input;
    input.convertTo(float_input,CV_32FC1);
    GpuMat d_frame,d_output;
    Size size = float_input.size();
    d_frame.upload(float_input);
    d_output.create(size,CV_32FC1);
    callKernel(d_frame,d_output);
    Mat output(d_output);
    return 0;
}

当我运行程序时,我的编译器告诉我:

When I run the program my compiler tells me this:

OpenCV 错误:Gpu API 调用(遇到非法内存访问)在副本中,文件/home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp,在抛出一个实例后调用第 882 行终止'cv::异常'什么():/home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:882:错误:(-217)在函数中遇到非法内存访问复制

OpenCV Error: Gpu API call (an illegal memory access was encountered) in copy, file /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, line 882 terminate called after throwing an instance of 'cv::Exception' what(): /home/mobile/opencv-2.4.9/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:882: error: (-217) an illegal memory access was encountered in function copy

推荐答案

您将图像 step 视为 float 偏移量.它是从一行到下一行的字节偏移量.

You are treating image step as if it is a float offset. It is a byte offset from one row to the next.

试试这样的:

const float* rowsrcptr= (const float *)(((char *)srcptr)+rowInd*srcstep);
float* rowdstPtr=  (float *) (((char *)dstptr)+rowInd*dststep);

来自文档:

step – 每个矩阵行占用的字节数.

step – Number of bytes each matrix row occupies.

添加 正确的cuda错误检查到你的代码(例如到func).您可以使用 cuda-memcheck 运行您的代码,以查看生成无效读/写的实际内核故障.

It's also a good idea to add proper cuda error checking to your code (e.g. to func). And you can run your code with cuda-memcheck to see the actual kernel failure generating the invalid reads/writes.

这篇关于带浮点的自定义内核 GpuMat的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:带浮点的自定义内核 GpuMat