memo: CUDA | Debug with CUDA-GDB

Table of contents

CUDA Tutorials I Profiling and Debugging Applications - NVIDIA Developer


(2024-01-20)

Source video: GPU L16: Support: cuda-gdb - YouTube - HPC Education (Rupesh Nasre 2021)

  • It’s a gdb extension for real hardware (not a simulator). Comparing with Nsight having GUI, CUDA-GDB is CLI. Regretfully, cuda-gdb doesn’t have TUI.

Capture Last Error

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// Filename: test_cuda-gdb.cu
#include <cuda_runtime.h>   // to synchronize
#include <cstdio>

__global__ void kernel(int* x) {
    *x = 0;
    printf("%d\n", *x);
}

int main() {
    int* x;
    kernel<<<2, 10>>>(x);
    cudaDeviceSynchronize();

    // Capture error
    cudaError_t err = cudaGetLastError();
    printf("err=%d, %s, %s\n", err, cudaGetErrorName(err),
                               cudaGetErrorString(err) );
    return 0;
}

Build: nvcc test_cuda-gdb.cu. Execution: ./a.out

  • Nothing is printed out, although 0 is supposed to show.

    And no error is reported, because the CPU sometimes isn’t aware of the error (e.g., SegFault) that happens on the GPU.

  • To identify whether the error occurred on the GPU, cudaGetLastError()

    1
    2
    
    yi@yi-Alien:~/Downloads/CUDA_Study/Debug_CUDA$ ./a.out 
    err=700, cudaErrorIllegalAddress, an illegal memory access was encountered
    
  • x requires GPU memory allocated:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    
    int main() {
        int* x;
        cudaMalloc( (void**)&x, 1*sizeof(int) );
        kernel<<<2,2>>>(x);
        cudaDeviceSynchronize();
        cudaFree(x);
        cudaError_t err = cudaGetLastError();
        printf("err=%d, %s, %s\n", err, cudaGetErrorName(err),
                                   cudaGetErrorString(err) );
        return 0;
    }
    
    Output
    1
    2
    3
    4
    5
    6
    
    yi@yi-Alien:~/Downloads/CUDA_Study/Debug_CUDA$ ./a.out 
    0
    0
    0
    0
    err=0, cudaSuccess, no error
    

cudaError

Homework: Write programs to invoke these errors.

Ref:


CUDA-GDB CLI

Set flags to include the symbol information (variable name, function name) into the binary file:

  • Names of variables and functions are used only for programming, as execution is instructed by memory addresses. So, symbols will be discarded for efficiency after compilation by default.
1
nvcc -g -G main.cu
  • -g is for __host__ functions, compiled by gcc.

  • -G is for __device__ functions, compiled by nvcc.

  • Disable optimizations (preventing remove unused code) for debugging line-by-line.

Debugging with cuda-gdb:

1
cuda-gdb a.out

Given the erroneous code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
#include <cuda.h>
__global__ void kernel(int* x) {
    *x = 0;
    printf("%d\n", *x);
}

int main() {
    int* x;
    kernel<<<2, 2>>>(x);
    cudaDeviceSynchronize();
    return 0;
}

Build: nvcc test_cuda-gdb.cu. Debug: cuda-gdb a.out.

run

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
(cuda-gdb) run
Starting program: /home/yi/Downloads/CUDA_Study/Debug_CUDA/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5d9b000 (LWP 2434197)]
[New Thread 0x7ffff4ab1000 (LWP 2434198)]
[Detaching after fork from child process 2434199]
[New Thread 0x7fffeef3d000 (LWP 2434215)]
[New Thread 0x7fffed533000 (LWP 2434216)]

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x100002ede48

Thread 1 "a.out" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x00000100002ede78 in kernel(int*)<<<(2,1,1),(2,1,1)>>> ()
  • LWP: Light weight process
  • Switching focus to a specific thread

info cuda kernels


Intro to GPU: 06 Debugging on GPU - YouTube - NERSC

Built with Hugo
Theme Stack designed by Jimmy