CUDA: How to check for the right compute capability? -

April 15, 2011

cuda code compiled higher compute capability execute long time on device lower compute capability, before silently failing 1 day in kernel. spent half day chasing elusive bug realize build rule had sm_21 while device (tesla c2050) 2.0.

is there cuda api code can add together can self-check if running on device compatible compute capability? need compile , work devices of many compute capabilities. there other action can take ensure such errors not occur?

in runtime api, cudagetdeviceproperties returns 2 fields major , minor homecoming compute capability given enumerated cuda device. can utilize parse compute capability of gpu before establishing context on create sure right architecture code does. nvcc can generate object file containing multiple architectures single invocation using -gencode option, example:

nvcc -c -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_13,code=sm_13 source.cu

would produce output object file embedded fatbinary object containing cubin files gt200 , gf100 cards. runtime api automagically handle architecture detection , seek loading suitable device code fatbinary object without host code.

cuda

Search This Blog

JC

CUDA: How to check for the right compute capability? -

Comments

Post a Comment

Popular posts from this blog

iphone - Dismissing a UIAlertView -

c# - Can ProtoBuf-Net deserialize to a flat class? -

javascript - Change element in each JQuery tab to dynamically generated colors -