:: CUDA Visual Profiler 

  - 소 개

  - 실 행

    1. console

       c:\>nvprof  실행파일. 
       ex) 
       
   

 

f:\>~~~~hapter5\ZeroCopy\x64\Release>nvprof ZeroCopy.exe

==10480== NVPROF is profiling process 10480, command: ZeroCopy.exe
==10480== Warning: Unified Memory Profiling is not supported on the current configuration because a pair of devices without peer-to-peer support is detected on this multi-GPU setup. When peer mappings are not available, system falls back to using zero-copy memory. It can cause kernels, which access unified memory, to run slower. More details can be found at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-managed-memory
==10480== Profiling application: ZeroCopy.exe
==10480== Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
100.00%  371.14us         1  371.14us  371.14us  371.14us  vectorAdd(int*, int*, int*)

==10480== API calls:
Time(%)      Time     Calls       Avg       Min       Max  Name
 96.65%  134.60ms         3  44.866ms  667.64us  133.19ms  cudaHostAlloc
  1.83%  2.5486ms        91  28.006us       0ns  1.1908ms  cuDeviceGetAttribute
  0.74%  1.0312ms         3  343.74us  338.93us  353.05us  cudaFreeHost
  0.42%  582.31us         1  582.31us  582.31us  582.31us  cuDeviceGetName
  0.31%  437.78us         1  437.78us  437.78us  437.78us  cudaThreadSynchronize
  0.02%  32.751us         1  32.751us  32.751us  32.751us  cudaLaunch
  0.01%  20.732us         1  20.732us  20.732us  20.732us  cuDeviceTotalMem
  0.01%  9.9150us         3  3.3050us     601ns  8.4130us  cudaHostGetDevicePointer
  0.00%  2.7030us         3     901ns     300ns  2.1030us  cuDeviceGetCount
  0.00%  2.4040us         1  2.4040us  2.4040us  2.4040us  cudaConfigureCall
  0.00%  1.5030us         3     501ns     301ns     601ns  cudaSetupArgument
  0.00%     601ns         3     200ns       0ns     301ns  cuDeviceGet      




' > CUDA' 카테고리의 다른 글

Nsight - Performance Analysis  (0) 2016.12.29
CUDA Debugging - Nsight  (0) 2016.12.29
임시 - 쿠다 공유메모리 뱅크 충돌이란.  (0) 2016.12.27
CUDA 병렬 프로그래밍 - 정영훈- 샘플 vc2013에서 실행.  (0) 2016.12.26
CUDA 메모.  (0) 2016.12.22

+ Recent posts