Table of contents
CUDA Tutorials I Profiling and Debugging Applications - NVIDIA Developer
(2024-01-20)
Source video: GPU L16: Support: cuda-gdb - YouTube - HPC Education (Rupesh Nasre 2021)
- It’s a gdb extension for real hardware (not a simulator). Comparing with Nsight having GUI, CUDA-GDB is CLI. Regretfully, cuda-gdb doesn’t have TUI.
Capture Last Error
|
|
Build: nvcc test_cuda-gdb.cu. Execution: ./a.out
-
Nothing is printed out, although 0 is supposed to show.
And no error is reported, because the CPU sometimes isn’t aware of the error (e.g., SegFault) that happens on the GPU.
-
To identify whether the error occurred on the GPU,
cudaGetLastError()1 2yi@yi-Alien:~/Downloads/CUDA_Study/Debug_CUDA$ ./a.out err=700, cudaErrorIllegalAddress, an illegal memory access was encountered -
xrequires GPU memory allocated:1 2 3 4 5 6 7 8 9 10 11int main() { int* x; cudaMalloc( (void**)&x, 1*sizeof(int) ); kernel<<<2,2>>>(x); cudaDeviceSynchronize(); cudaFree(x); cudaError_t err = cudaGetLastError(); printf("err=%d, %s, %s\n", err, cudaGetErrorName(err), cudaGetErrorString(err) ); return 0; }Output
1 2 3 4 5 6yi@yi-Alien:~/Downloads/CUDA_Study/Debug_CUDA$ ./a.out 0 0 0 0 err=0, cudaSuccess, no error
cudaError
Homework: Write programs to invoke these errors.
Ref:
CUDA-GDB CLI
Set flags to include the symbol information (variable name, function name) into the binary file:
- Names of variables and functions are used only for programming, as execution is instructed by memory addresses. So, symbols will be discarded for efficiency after compilation by default.
|
|
-
-gis for__host__functions, compiled by gcc. -
-Gis for__device__functions, compiled by nvcc. -
Disable optimizations (preventing remove unused code) for debugging line-by-line.
Debugging with cuda-gdb:
|
|
Given the erroneous code:
|
|
Build: nvcc test_cuda-gdb.cu. Debug: cuda-gdb a.out.
run
|
|
LWP: Light weight processSwitching focusto a specific thread