how to use printf inside a CUDA kernel?

Hi,
I wonder why I cannot use printf in cuda kernels. The code inside my file test.cu (adapted from the Mathworks help)
#include <stdio.h>

#include <stdlib.h>
#include <string.h>

#include <iostream>

__global__ void add2( double * v1, const double * v2)
{
int idx = threadIdx.x;
v1[idx] += v2[idx];

printf("identity: %d n",idx);

}
compiles nicely with mexcuda with
mexcuda -ptx test.cu
but trying to runt it from the command line as
k = parallel.gpu.CUDAKernel("test.ptx","test.cu");

N = 8;
k.ThreadBlockSize = N;
in1 = ones(N,1,"gpuArray");
in2 = ones(N,1,"gpuArray");
result = feval(k,in1,in2);
gather(result);
does not put any result on screen.
this link suggests some operations with the header, as #undef printf to avoid conflicts with mex.h… but it didn’t work for me.Hi,
I wonder why I cannot use printf in cuda kernels. The code inside my file test.cu (adapted from the Mathworks help)
#include <stdio.h>

#include <stdlib.h>
#include <string.h>

#include <iostream>

__global__ void add2( double * v1, const double * v2)
{
int idx = threadIdx.x;
v1[idx] += v2[idx];

printf("identity: %d n",idx);

}
compiles nicely with mexcuda with
mexcuda -ptx test.cu
but trying to runt it from the command line as
k = parallel.gpu.CUDAKernel("test.ptx","test.cu");

N = 8;
k.ThreadBlockSize = N;
in1 = ones(N,1,"gpuArray");
in2 = ones(N,1,"gpuArray");
result = feval(k,in1,in2);
gather(result);
does not put any result on screen.
this link suggests some operations with the header, as #undef printf to avoid conflicts with mex.h… but it didn’t work for me. Hi,
I wonder why I cannot use printf in cuda kernels. The code inside my file test.cu (adapted from the Mathworks help)
#include <stdio.h>

#include <stdlib.h>
#include <string.h>

#include <iostream>

__global__ void add2( double * v1, const double * v2)
{
int idx = threadIdx.x;
v1[idx] += v2[idx];

printf("identity: %d n",idx);

}
compiles nicely with mexcuda with
mexcuda -ptx test.cu
but trying to runt it from the command line as
k = parallel.gpu.CUDAKernel("test.ptx","test.cu");

N = 8;
k.ThreadBlockSize = N;
in1 = ones(N,1,"gpuArray");
in2 = ones(N,1,"gpuArray");
result = feval(k,in1,in2);
gather(result);
does not put any result on screen.
this link suggests some operations with the header, as #undef printf to avoid conflicts with mex.h… but it didn’t work for me. kernel, parallel.gpu.cudakernel MATLAB Answers — New Questions

Cart

Cart