memo: CUDA | Debugging

Docs of Nsight: Getting Started with the CUDA Debugger

Debug Demo

(2023-11-03)

Environment: Ubuntu 20.04, cuda-11.6 (in /usr/local/cuda-11.6), GPU 1050Ti.

nvcc

1
2
3
4
5
6


(base) yi@yi-Alienware:~/Downloads/CUDA_Study/Debug_CUDA$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

Prerequisite:

Install 2 extensions: Nsight and C/C++
Create 2 debugging configuration files: launch.json and tasks.json under “.vscode/”
- Select debugger: CUDA C++ (CUDA-GDB)

Example with nvcc

Testing repo

“test.cu”:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


#include <iostream>

int main(int argc, char **argv)
{
    std::cout << "Number of input arguments: " << argc << std::endl;

    for (int i = 0; i <= argc-1; i++) {
        std::cout << argv[i] << "\n";
    }
    return 0;
}

“launch.json” for debug configurations:

Set program as the output binary program to be debugged:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "CUDA C++: Launch",
            "type": "cuda-gdb",
            "request": "launch",
            "program": "${fileDirname}/test.bin", // binary file
            "preLaunchTask": "mynvcc"
        },

        // no need to change this:
        {
            "name": "CUDA C++: Attach",
            "type": "cuda-gdb",
            "request": "attach"
        }
    ]
}

“tasks.json” for building configurations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "mynvcc",
            "type": "shell",
            "command": "nvcc",
            "args": [
                 "${file}",
                 "-g","-G",
                 "-o","${fileDirname}/test.bin",
            ]
        }
    ]
}

Error: /usr/bin/bash: nvcc: command not found

Add command on PATH before running startup scripts. tasks - VSCode - Docs

Tried add export PATH="$PATH:/usr/local/cuda-12.3/bin" into “/etc/environment”, “/etc/profile”, “/etc/xprofile”, “/etc/bash.bashrc” all doesn’t work. How to permanently set $PATH on Linux/Unix -SO

Solution: Set integrated terminal in user settings.json (My vscode version: 1.83.1, 2023-11-03)

1
2
3
4
5
6
7
8
9


"terminal.integrated.profiles.linux": {
    "bash": {
        "path": "bash",
        "args": [
            "-i"
        ]
    }
},
"terminal.integrated.defaultProfile.linux": "bash",

Ref: VSCode tasks error: /bin/bash: npm: command not found (Found by DDG with searching “vscode debug tasks.json /usr/bin/bash: nvcc: command not found”)

Then, with the “test.cu” file opening in the editor, click the start button to initiate debugging.

(2024-01-26) I still don’t know how to include headers for libtorch in the CLI of nvcc. So, I didn’t manage to compile the 3DGS project with nvcc as above.

Potentially useful:
- An example: Include path problems for GPU library - SO
- Docs of nvcc: NVIDIA CUDA Compiler Driver NVCC

(2024-01-27)

Based on Troubles while compiling C++ program with PyTorch, HElib and OpenCV - SO, and reminded by perplexity, -Wl and -rpath are used in GCC for linking and specifying runtime library search path. In contrast, nvcc has -Xlinker for linking during compilation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


# Compile CUDA source files into object files with nvcc
nvcc --compile -g -G -std=c++17 \
-I/usr/local/libtorch/include \
-I/usr/local/libtorch/include/torch/csrc/api/include \
-I/usr/local/libtorch/include/torch \
main_copy.cu \
-o main.o

# Link object files into an executable with g++
g++ main.o \
-L/usr/local/libtorch/lib \
-L/usr/local/cuda/lib64 \
-Wl,-rpath,/usr/local/libtorch/lib \
-ltorch -ltorch_cpu -lc10 -lcudart \
-o my_executable

Compiling is OK. But linking reports error:

1
2
3


/usr/bin/ld: main.o: in function `main':
/home/yi/Downloads/debug_diff_rast/main_copy.cu:31: undefined reference to `RasterizeGaussiansCUDA(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, float, at::Tensor const&, at::Tensor const&, at::Tensor const&, float, float, int, int, at::Tensor const&, int, at::Tensor const&, bool, bool)'
collect2: error: ld returned 1 exit status

Not successful yet. Need to compile the library with nvcc.

Example with Makefile

Nvidia Tutorial Clip: Debugging CUDA kernels with VS Code
Microsoft VS CUDA Support in Visual Studio Code with Julia Reid

(2024-01-26)

Use cmake to produce Makefile, otherwise, error occurs: make *** no targets specified and no makefile found. stop
And then edit the launch.json and tasks.json following this article: Getting Started with the CUDA Debugger :: NVIDIA Nsight VSCE Documentation

Refer to my repo for debugging 3DGS.

Example with CMake

Debugging CUDA kernels with VS Code

Ref

Source articles: CUDA 番外篇 | Visual Studio Code的CUDA环境 - Master KangKang的文章 - 知乎
Adapted demo: vscode远程调试Linux CUDA程序- oushaojun2 - CSDN

Debug Cuda Samples

(2023-11-02)

Download sample project: NVIDIA/cuda-samples for 12.3

1

git clone https://github.com/NVIDIA/cuda-samples.git

Make 12.3 failed with 11.6

1
2


cd ./cuda-samples
make dbg=1

Error: /usr/bin/ld: cannot find -lglut

Need: sudo apt-get install freeglut3 freeglut3-dev

Error:

1
2
3
4
5
6


/usr/bin/ld: simpleCUFFT_callback.o: in function `main':
/home/yi/Downloads/cuda-samples/Samples/4_CUDA_Libraries/simpleCUFFT_callback/simpleCUFFT_callback.cu:103: undefined reference to `cudaGetDeviceProperties_v2'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:373: simpleCUFFT_callback] Error 1
make[1]: Leaving directory '/home/yi/Downloads/cuda-samples/Samples/4_CUDA_Libraries/simpleCUFFT_callback'
make: *** [Makefile:45: Samples/4_CUDA_Libraries/simpleCUFFT_callback/Makefile.ph_build] Error 2

cudaGetDeviceProperties_v2 is not existed in cuda 11.x, but appear in cuda 12.2. SO

cuda-sample-11.6

Download zip: Release pkg 11.6; Git tag-11.6

make

1
2
3
4
5
6
7


(base) yi@yi-Alienware-Aurora-R8:~/Downloads/cuda-samples-11.6$ make dbg=1
make[1]: Entering directory '/home/yi/Downloads/cuda-samples-11.6/Samples/3_CUDA_Features/ptxjit'
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../../Common  -m64 -g -G    --threads 0 --std=c++11 -gencode arch=compute_35,code=compute_35 -o ptxjit.o -c ptxjit.cpp
nvcc fatal   : Unsupported gpu architecture 'compute_35'
make[1]: *** [Makefile:396: ptxjit.o] Error 1
make[1]: Leaving directory '/home/yi/Downloads/cuda-samples-11.6/Samples/3_CUDA_Features/ptxjit'
make: *** [Makefile:45: Samples/3_CUDA_Features/ptxjit/Makefile.ph_build] Error 2

Devices with compute capacity (cc) 3.x have been dropped by cuda 12.x. Solution is removing the requests of compute_35 in the make file. Forum Nv
Cuda Toolkit is compatible the devices with lower cc than it supports. CUDA 11.x supports a maximum cc of 8.x. CSDN
(2023-11-02) Remove cc of 35 and 37 SO:
1. Replace all the pattern SMS ?= 35 37 with SMS ?= through VSCode.
2. Replace all the pattern compute_35 with compute_61

Make failed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


/usr/local/cuda/bin/nvcc -ccbin g++ -I../../../Common  -m64 -g -G    --std=c++11 --threads 0 
-gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 
-gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 
-gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 
-o reduction_kernel.o -c reduction_kernel.cu
reduction_kernel.cu(558): error: name followed by "::" must be a class or namespace name
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                                ^

reduction_kernel.cu(558): error: expected an identifier
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                                                                ^

reduction_kernel.cu(558): warning #1835-D: attribute "__shared__" does not apply here
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                   ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

reduction_kernel.cu(558): error: expected a ";"
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                                                                ^

reduction_kernel.cu(561): error: name followed by "::" must be a class or namespace name
    auto cta = cg::experimental::this_thread_block(scratch);
                   ^

reduction_kernel.cu(561): error: identifier "scratch" is undefined
    auto cta = cg::experimental::this_thread_block(scratch);
                                                   ^

reduction_kernel.cu(563): error: name followed by "::" must be a class or namespace name
    auto multiWarpTile = cg::experimental::tiled_partition<MultiWarpGroupSize>(cta);
                             ^

6 errors detected in the compilation of "reduction_kernel.cu".
make[1]: *** [Makefile:358: reduction_kernel.o] Error 255
make[1]: Leaving directory '/home/yi/Downloads/cuda-samples-11.6/Samples/2_Concepts_and_Techniques/reduction'
make: *** [Makefile:45: Samples/2_Concepts_and_Techniques/reduction/Makefile.ph_build] Error 2

VSCode didn’t find header with red underlines:

1
2
3


#include errors detected. Please update your includePath. Squiggles are disabled for this translation unit 
(/home/yi/Downloads/cuda-samples-11.6/Samples/2_Concepts_and_Techniques/reduction/reduction_kernel.cu).C/C++(1696)
cannot open source file "cooperative_groups/reduce.h"C/C++(1696)

Edit “c_cpp_properties.json” as:

1
2
3
4


"includePath": [
    "${workspaceFolder}/**",
    "/usr/local/cuda-11.6/include"
],

Didn’t solve. And the header is there and can be found.

(2023-11-03) Intellisense erros for CUDA syntax.

Ensure selecting lang as “CUDA C++” rather than “C++”. VSCode Nsight Intellisense not detecting functions and datatypes for *cu; *cpp works - NV forum

Red underlines disappeared after the C/C++ extension got disabled and only “Nsigh” extension left. But these 2 extensions both are installed in every tutorial.
(2023-11-03)~~No error in a folder contains only .cu files. Thus, Python code can’t co-exist with CUDA code?~~
(2023-11-08) ✅ Refering the “.vscode/c_cpp_properties.json” in the CUDA sample: “Samples/0_Introduction/matrixMul”, the compiler should be nvcc, not “/usr/bin/gcc”.
1

"compilerPath": "/usr/local/cuda/bin/nvcc",
Getting Started with the CUDA Debugger :: NVIDIA Nsight VSCE Documentation

Same error about reduction here issue#201. But he was 11.8.
Re-install cuda toolkit 11.8 and test samples of 11.8.
- Note: the final line of the installing scripts provided on official site should be: sudo apt-get -y install cuda-11-8 instead of sudo apt-get -y install cuda
- Error persists at reduction_kernel.cu.

Build with CMake

(2023-11-17)

The library “diff-gaussian-rasterization-comments/cuda_rasterizer” is built according to CXX standard.

However, I cannot step into the CUDA kernels when debugging.

Maybe using nvcc to build can enable debugging.

Create a CMakeLists.txt using nvcc?