memo: CUDA | Debug with CUDA-GDB

CUDA Tutorials I Profiling and Debugging Applications - NVIDIA Developer

(2024-01-20)

Source video: GPU L16: Support: cuda-gdb - YouTube - HPC Education (Rupesh Nasre 2021)

It’s a gdb extension for real hardware (not a simulator). Comparing with Nsight having GUI, CUDA-GDB is CLI. Regretfully, cuda-gdb doesn’t have TUI.

Capture Last Error

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


// Filename: test_cuda-gdb.cu
#include <cuda_runtime.h>   // to synchronize
#include <cstdio>

__global__ void kernel(int* x) {
    *x = 0;
    printf("%d\n", *x);
}

int main() {
    int* x;
    kernel<<<2, 10>>>(x);
    cudaDeviceSynchronize();

    // Capture error
    cudaError_t err = cudaGetLastError();
    printf("err=%d, %s, %s\n", err, cudaGetErrorName(err),
                               cudaGetErrorString(err) );
    return 0;
}

Build: nvcc test_cuda-gdb.cu. Execution: ./a.out

Nothing is printed out, although 0 is supposed to show.

And no error is reported, because the CPU sometimes isn’t aware of the error (e.g., SegFault) that happens on the GPU.

To identify whether the error occurred on the GPU, cudaGetLastError()

1
2


yi@yi-Alien:~/Downloads/CUDA_Study/Debug_CUDA$ ./a.out 
err=700, cudaErrorIllegalAddress, an illegal memory access was encountered

x requires GPU memory allocated:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


int main() {
    int* x;
    cudaMalloc( (void**)&x, 1*sizeof(int) );
    kernel<<<2,2>>>(x);
    cudaDeviceSynchronize();
    cudaFree(x);
    cudaError_t err = cudaGetLastError();
    printf("err=%d, %s, %s\n", err, cudaGetErrorName(err),
                               cudaGetErrorString(err) );
    return 0;
}

Output

1
2
3
4
5
6


yi@yi-Alien:~/Downloads/CUDA_Study/Debug_CUDA$ ./a.out 
0
0
0
0
err=0, cudaSuccess, no error

cudaError

Homework: Write programs to invoke these errors.

Ref:

CUDA-GDB CLI

Set flags to include the symbol information (variable name, function name) into the binary file:

Names of variables and functions are used only for programming, as execution is instructed by memory addresses. So, symbols will be discarded for efficiency after compilation by default.

1

nvcc -g -G main.cu

-g is for __host__ functions, compiled by gcc.
-G is for __device__ functions, compiled by nvcc.
Disable optimizations (preventing remove unused code) for debugging line-by-line.

Debugging with cuda-gdb:

1

cuda-gdb a.out

Given the erroneous code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


#include <cuda.h>
__global__ void kernel(int* x) {
    *x = 0;
    printf("%d\n", *x);
}

int main() {
    int* x;
    kernel<<<2, 2>>>(x);
    cudaDeviceSynchronize();
    return 0;
}

Build: nvcc test_cuda-gdb.cu. Debug: cuda-gdb a.out.

run

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


(cuda-gdb) run
Starting program: /home/yi/Downloads/CUDA_Study/Debug_CUDA/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5d9b000 (LWP 2434197)]
[New Thread 0x7ffff4ab1000 (LWP 2434198)]
[Detaching after fork from child process 2434199]
[New Thread 0x7fffeef3d000 (LWP 2434215)]
[New Thread 0x7fffed533000 (LWP 2434216)]

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x100002ede48

Thread 1 "a.out" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x00000100002ede78 in kernel(int*)<<<(2,1,1),(2,1,1)>>> ()

LWP: Light weight process
Switching focus to a specific thread

info cuda kernels

Intro to GPU: 06 Debugging on GPU - YouTube - NERSC

Table of contents

Capture Last Error

cudaError

CUDA-GDB CLI

run

info cuda kernels