CUDA compute1.3 (and higher) add features that you might want to use, but they also add double support. This can be troublesome in performance applications, as double performance is much slower than float. Here are some tips for making your code use only floats. Please comment with any other tips you have!
The solution should just be to explicitly label any floats. For an example of the problem, 0.1 will be interpreted by the compiler as a double. This could propagate through your code and in an inner loop could seriously impact your performance. Whenever you use a floating point number and don’t want a double, add an ‘f’ to the end, thus: ‘0.1f’.
If you’ve already written a massive amount of code without this, here are some tips for finding doubles that have crept in.
- Regex for “0.[0-9]+”, and match the whole word (add word start and end tags, or tick match whole word in your editor). This will identify all ‘0.0’s, and ignores all ‘0.0f’s (this tip works for non CUDA code too)
- Add -keep to the NVCC compile options. You can then open the PTX in a text editor, and search for f64. If there are any occurrences of f64, your code is using doubles at some point. If you successfully did the above step there should be none.
- Watch the compiler output for warnings about float and double conversion.
This error message misled me for a while. It was telling me that my templated CUDA kernel was “not a function or static data member” – this after I added a struct pointer as a parameter. If I changed the struct to an int parameter, it worked fine (except being semantically stupid). I didn’t see anything wrong with my declaration of the struct, but on closer inspection, I had mis-capitalised one letter. Not the error message I’d expect for this problem, so it took longer than expected to find.
Hope this helps if you have the same problem…
This is the formula for calculating the peak FLOPS on your cuda enabled gpu. (as uses by NVIDIA) This is useful as a comparison when evaluating how many FLOPS your cuda kernel is achieving, and how much you can hope to gain through optimization. You can get the numbers needed from the NVIDIA control panel’s system information option.
If your card is pre-fermi:
Processor clock*cuda cores*3
If your card is fermi based: (or higher?)
Processor clock*cuda cores*2
Original Source here.
This week I am in… California! You would hardly have guessed from the title of the post eh? I’m here for work, at the NVIDIA GTC 2010 conference. This is the second day here, and while yesterday was just a tutorial day, I’ve seen some amazing stuff today.
– NVIDIA named the next two gpus in their roadmap, kepler and maxwell, as well as claiming an 8* performance per watt improvement by 2013.
– Adobe showed off an amazing piece of digital photography tech allowing you to refocus after taking the shot. It used the high megapixel of current sensors and many lenses to capture lots of small images, then sew them together in software. Magic.
– stacks of stereo vision, auto stereo, surround displays, some really impressive, others blah. Auto stereo was a bit nauseous, I thought- bad omen for 3ds.
– a nice multi touch screen extension, allowing 32 simultaneous touch points, and it did indeed seem pretty robust.
Other than that I curse biological rhythm, my body is sure it should be asleep despite not being awake long enough yet!