CUDA: How to check for the right compute capability? -
CUDA: How to check for the right compute capability? -
cuda code compiled higher compute capability execute long time on device lower compute capability, before silently failing 1 day in kernel. spent half day chasing elusive bug realize build rule had sm_21 while device (tesla c2050) 2.0.
is there cuda api code can add together can self-check if running on device compatible compute capability? need compile , work devices of many compute capabilities. there other action can take ensure such errors not occur?
in runtime api, cudagetdeviceproperties returns 2 fields major , minor homecoming compute capability given enumerated cuda device. can utilize parse compute capability of gpu before establishing context on create sure right architecture code does. nvcc can generate object file containing multiple architectures single invocation using -gencode option, example:
nvcc -c -gencode arch=compute_20,code=sm_20 \ -gencode arch=compute_13,code=sm_13 source.cu would produce output object file embedded fatbinary object containing cubin files gt200 , gf100 cards. runtime api automagically handle architecture detection , seek loading suitable device code fatbinary object without host code.
cuda
Comments
Post a Comment