팁/CUDA
CUDA Visual Profiler
산과 나무
2016. 12. 28. 13:53
:: CUDA Visual Profiler
- 소 개
- 실 행
1. console
c:\>nvprof 실행파일.
ex)
|
f:\>~~~~hapter5\ZeroCopy\x64\Release>nvprof ZeroCopy.exe ==10480== NVPROF is profiling process 10480, command: ZeroCopy.exe ==10480== Warning: Unified Memory Profiling is not supported on the current configuration because a pair of devices without peer-to-peer support is detected on this multi-GPU setup. When peer mappings are not available, system falls back to using zero-copy memory. It can cause kernels, which access unified memory, to run slower. More details can be found at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-managed-memory ==10480== Profiling application: ZeroCopy.exe ==10480== Profiling result: Time(%) Time Calls Avg Min Max Name 100.00% 371.14us 1 371.14us 371.14us 371.14us vectorAdd(int*, int*, int*) ==10480== API calls: Time(%) Time Calls Avg Min Max Name 96.65% 134.60ms 3 44.866ms 667.64us 133.19ms cudaHostAlloc 1.83% 2.5486ms 91 28.006us 0ns 1.1908ms cuDeviceGetAttribute 0.74% 1.0312ms 3 343.74us 338.93us 353.05us cudaFreeHost 0.42% 582.31us 1 582.31us 582.31us 582.31us cuDeviceGetName 0.31% 437.78us 1 437.78us 437.78us 437.78us cudaThreadSynchronize 0.02% 32.751us 1 32.751us 32.751us 32.751us cudaLaunch 0.01% 20.732us 1 20.732us 20.732us 20.732us cuDeviceTotalMem 0.01% 9.9150us 3 3.3050us 601ns 8.4130us cudaHostGetDevicePointer 0.00% 2.7030us 3 901ns 300ns 2.1030us cuDeviceGetCount 0.00% 2.4040us 1 2.4040us 2.4040us 2.4040us cudaConfigureCall 0.00% 1.5030us 3 501ns 301ns 601ns cudaSetupArgument 0.00% 601ns 3 200ns 0ns 301ns cuDeviceGet |