CUDA C++ Programming
See also:
//TODO: study & summarize Map, Gather, Scatter etc.
//TODO: see also the notes for campus’ students assignments.
Coursera Course Notes: _GPU notes.docx
Code: github, or desktop search “gpu assignments”.
Compared with “knn-r-project.Rproj”, “knn_cuda.vcxproj” is at least 6k times faster for almost exactly the same job. (diff: data cleaned; windowSize added.) One reason is the R project was using data.frame instead of tada.table. See here for more R performance info.
Multi-dimension Index #
If the index format is [d1_index][d2_index][d3_index][d4_index]
, the index value will be ((d1_index*d2_nr + d2_index)*d3_nr + d3_index)*d4_nr + d4_index
Some Functions #
thrust::sort_by_key #
see official simple example, further: doc & example
Debug Using Parallel-Nsight #
// Print might be more convinient.
ref
single-gpu mode #
multi-gpu mode #
Visual Studio Red Underline #
VS always show the kernel calls are in wrong c++ format, though it will compile anyway. The soluction to get rid of this false negtive, is to use Driver API instead of Runtime API, so <<< >>>
format will not be there.