:: CUDA Visual Profiler
- 소 개
- 실 행
1. console
c:\>nvprof 실행파일.
ex)
|
f:\>~~~~hapter5\ZeroCopy\x64\Release>nvprof ZeroCopy.exe ==10480== NVPROF is profiling process 10480, command: ZeroCopy.exe ==10480== Warning: Unified Memory Profiling is not supported on the current configuration because a pair of devices without peer-to-peer support is detected on this multi-GPU setup. When peer mappings are not available, system falls back to using zero-copy memory. It can cause kernels, which access unified memory, to run slower. More details can be found at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-managed-memory ==10480== Profiling application: ZeroCopy.exe ==10480== Profiling result: Time(%) Time Calls Avg Min Max Name 100.00% 371.14us 1 371.14us 371.14us 371.14us vectorAdd(int*, int*, int*) ==10480== API calls: Time(%) Time Calls Avg Min Max Name 96.65% 134.60ms 3 44.866ms 667.64us 133.19ms cudaHostAlloc 1.83% 2.5486ms 91 28.006us 0ns 1.1908ms cuDeviceGetAttribute 0.74% 1.0312ms 3 343.74us 338.93us 353.05us cudaFreeHost 0.42% 582.31us 1 582.31us 582.31us 582.31us cuDeviceGetName 0.31% 437.78us 1 437.78us 437.78us 437.78us cudaThreadSynchronize 0.02% 32.751us 1 32.751us 32.751us 32.751us cudaLaunch 0.01% 20.732us 1 20.732us 20.732us 20.732us cuDeviceTotalMem 0.01% 9.9150us 3 3.3050us 601ns 8.4130us cudaHostGetDevicePointer 0.00% 2.7030us 3 901ns 300ns 2.1030us cuDeviceGetCount 0.00% 2.4040us 1 2.4040us 2.4040us 2.4040us cudaConfigureCall 0.00% 1.5030us 3 501ns 301ns 601ns cudaSetupArgument 0.00% 601ns 3 200ns 0ns 301ns cuDeviceGet |
'팁 > CUDA' 카테고리의 다른 글
Nsight - Performance Analysis (0) | 2016.12.29 |
---|---|
CUDA Debugging - Nsight (0) | 2016.12.29 |
임시 - 쿠다 공유메모리 뱅크 충돌이란. (0) | 2016.12.27 |
CUDA 병렬 프로그래밍 - 정영훈- 샘플 vc2013에서 실행. (0) | 2016.12.26 |
CUDA 메모. (0) | 2016.12.22 |