CUDA compute1.3 (and higher) add features that you might want to use, but they also add double support. This can be troublesome in performance applications, as double performance is much slower than float. Here are some tips for making your code use only floats. Please comment with any other tips you have!
The solution should just be to explicitly label any floats. For an example of the problem, 0.1 will be interpreted by the compiler as a double. This could propagate through your code and in an inner loop could seriously impact your performance. Whenever you use a floating point number and don’t want a double, add an ‘f’ to the end, thus: ‘0.1f’.
If you’ve already written a massive amount of code without this, here are some tips for finding doubles that have crept in.
- Regex for “0.[0-9]+”, and match the whole word (add word start and end tags, or tick match whole word in your editor). This will identify all ‘0.0’s, and ignores all ‘0.0f’s (this tip works for non CUDA code too)
- Add -keep to the NVCC compile options. You can then open the PTX in a text editor, and search for f64. If there are any occurrences of f64, your code is using doubles at some point. If you successfully did the above step there should be none.
- Watch the compiler output for warnings about float and double conversion.