memo: CUDA | Debugging

Docs of Nsight: Getting Started with the CUDA Debugger


Debug Demo

(2023-11-03)

Environment: Ubuntu 20.04, cuda-11.6 (in /usr/local/cuda-11.6), GPU 1050Ti.

  • nvcc

    1
    2
    3
    4
    5
    6
    
    (base) yi@yi-Alienware:~/Downloads/CUDA_Study/Debug_CUDA$ nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Tue_Mar__8_18:18:20_PST_2022
    Cuda compilation tools, release 11.6, V11.6.124
    Build cuda_11.6.r11.6/compiler.31057947_0
    

Prerequisite:

  1. Install 2 extensions: Nsight and C/C++

  2. Create 2 debugging configuration files: launch.json and tasks.json under “.vscode/”

    • Select debugger: CUDA C++ (CUDA-GDB)

Example with nvcc

Testing repo

  1. “test.cu”:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    
    #include <iostream>
    
    int main(int argc, char **argv)
    {
        std::cout << "Number of input arguments: " << argc << std::endl;
    
        for (int i = 0; i <= argc-1; i++) {
            std::cout << argv[i] << "\n";
        }
        return 0;
    }
    
  2. “launch.json” for debug configurations:

    Set program as the output binary program to be debugged:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    {
        "version": "0.2.0",
        "configurations": [
            {
                "name": "CUDA C++: Launch",
                "type": "cuda-gdb",
                "request": "launch",
                "program": "${fileDirname}/test.bin", // binary file
                "preLaunchTask": "mynvcc"
            },
    
            // no need to change this:
            {
                "name": "CUDA C++: Attach",
                "type": "cuda-gdb",
                "request": "attach"
            }
        ]
    }
    
  3. “tasks.json” for building configurations:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    {
        "version": "2.0.0",
        "tasks": [
            {
                "label": "mynvcc",
                "type": "shell",
                "command": "nvcc",
                "args": [
                     "${file}",
                     "-g","-G",
                     "-o","${fileDirname}/test.bin",
                ]
            }
        ]
    }
    
    • Error: /usr/bin/bash: nvcc: command not found

      Add command on PATH before running startup scripts. tasks - VSCode - Docs

      Tried add export PATH="$PATH:/usr/local/cuda-12.3/bin" into “/etc/environment”, “/etc/profile”, “/etc/xprofile”, “/etc/bash.bashrc” all doesn’t work. How to permanently set $PATH on Linux/Unix -SO

    • Solution: Set integrated terminal in user settings.json (My vscode version: 1.83.1, 2023-11-03)

      1
      2
      3
      4
      5
      6
      7
      8
      9
      
      "terminal.integrated.profiles.linux": {
          "bash": {
              "path": "bash",
              "args": [
                  "-i"
              ]
          }
      },
      "terminal.integrated.defaultProfile.linux": "bash",
      

      Ref: VSCode tasks error: /bin/bash: npm: command not found (Found by DDG with searching “vscode debug tasks.json /usr/bin/bash: nvcc: command not found”)

  4. Then, with the “test.cu” file opening in the editor, click the start button to initiate debugging.


  • (2024-01-26) I still don’t know how to include headers for libtorch in the CLI of nvcc. So, I didn’t manage to compile the 3DGS project with nvcc as above.

    Potentially useful:

  • (2024-01-27)

    Based on Troubles while compiling C++ program with PyTorch, HElib and OpenCV - SO, and reminded by perplexity, -Wl and -rpath are used in GCC for linking and specifying runtime library search path. In contrast, nvcc has -Xlinker for linking during compilation.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    # Compile CUDA source files into object files with nvcc
    nvcc --compile -g -G -std=c++17 \
    -I/usr/local/libtorch/include \
    -I/usr/local/libtorch/include/torch/csrc/api/include \
    -I/usr/local/libtorch/include/torch \
    main_copy.cu \
    -o main.o
    
    # Link object files into an executable with g++
    g++ main.o \
    -L/usr/local/libtorch/lib \
    -L/usr/local/cuda/lib64 \
    -Wl,-rpath,/usr/local/libtorch/lib \
    -ltorch -ltorch_cpu -lc10 -lcudart \
    -o my_executable
    
    • Compiling is OK. But linking reports error:

      1
      2
      3
      
      /usr/bin/ld: main.o: in function `main':
      /home/yi/Downloads/debug_diff_rast/main_copy.cu:31: undefined reference to `RasterizeGaussiansCUDA(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, float, at::Tensor const&, at::Tensor const&, at::Tensor const&, float, float, int, int, at::Tensor const&, int, at::Tensor const&, bool, bool)'
      collect2: error: ld returned 1 exit status
      
    • Not successful yet. Need to compile the library with nvcc.


Example with Makefile

(2024-01-26)

  1. Use cmake to produce Makefile, otherwise, error occurs: make *** no targets specified and no makefile found. stop

  2. And then edit the launch.json and tasks.json following this article: Getting Started with the CUDA Debugger :: NVIDIA Nsight VSCE Documentation

    Refer to my repo for debugging 3DGS.


Example with CMake

Debugging CUDA kernels with VS Code

Ref

  1. Source articles: CUDA 番外篇 | Visual Studio Code的CUDA环境 - Master KangKang的文章 - 知乎
  2. Adapted demo: vscode远程调试Linux CUDA程序- oushaojun2 - CSDN

Debug Cuda Samples

(2023-11-02)

  1. Download sample project: NVIDIA/cuda-samples for 12.3

    1
    
    git clone https://github.com/NVIDIA/cuda-samples.git
    
  2. Make 12.3 failed with 11.6

    1
    2
    
    cd ./cuda-samples
    make dbg=1
    
    • Error: /usr/bin/ld: cannot find -lglut

      Need: sudo apt-get install freeglut3 freeglut3-dev

    • Error:

      1
      2
      3
      4
      5
      6
      
      /usr/bin/ld: simpleCUFFT_callback.o: in function `main':
      /home/yi/Downloads/cuda-samples/Samples/4_CUDA_Libraries/simpleCUFFT_callback/simpleCUFFT_callback.cu:103: undefined reference to `cudaGetDeviceProperties_v2'
      collect2: error: ld returned 1 exit status
      make[1]: *** [Makefile:373: simpleCUFFT_callback] Error 1
      make[1]: Leaving directory '/home/yi/Downloads/cuda-samples/Samples/4_CUDA_Libraries/simpleCUFFT_callback'
      make: *** [Makefile:45: Samples/4_CUDA_Libraries/simpleCUFFT_callback/Makefile.ph_build] Error 2
      

      cudaGetDeviceProperties_v2 is not existed in cuda 11.x, but appear in cuda 12.2. SO


cuda-sample-11.6

  1. Download zip: Release pkg 11.6; Git tag-11.6

  2. make

    1
    2
    3
    4
    5
    6
    7
    
    (base) yi@yi-Alienware-Aurora-R8:~/Downloads/cuda-samples-11.6$ make dbg=1
    make[1]: Entering directory '/home/yi/Downloads/cuda-samples-11.6/Samples/3_CUDA_Features/ptxjit'
    /usr/local/cuda/bin/nvcc -ccbin g++ -I../../../Common  -m64 -g -G    --threads 0 --std=c++11 -gencode arch=compute_35,code=compute_35 -o ptxjit.o -c ptxjit.cpp
    nvcc fatal   : Unsupported gpu architecture 'compute_35'
    make[1]: *** [Makefile:396: ptxjit.o] Error 1
    make[1]: Leaving directory '/home/yi/Downloads/cuda-samples-11.6/Samples/3_CUDA_Features/ptxjit'
    make: *** [Makefile:45: Samples/3_CUDA_Features/ptxjit/Makefile.ph_build] Error 2
    
    • Devices with compute capacity (cc) 3.x have been dropped by cuda 12.x. Solution is removing the requests of compute_35 in the make file. Forum Nv

    • Cuda Toolkit is compatible the devices with lower cc than it supports. CUDA 11.x supports a maximum cc of 8.x. CSDN

    • (2023-11-02) Remove cc of 35 and 37 SO:

      1. Replace all the pattern SMS ?= 35 37 with SMS ?= through VSCode.
      2. Replace all the pattern compute_35 with compute_61

Make failed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../../Common  -m64 -g -G    --std=c++11 --threads 0 
-gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 
-gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 
-gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 
-o reduction_kernel.o -c reduction_kernel.cu
reduction_kernel.cu(558): error: name followed by "::" must be a class or namespace name
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                                ^

reduction_kernel.cu(558): error: expected an identifier
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                                                                ^

reduction_kernel.cu(558): warning #1835-D: attribute "__shared__" does not apply here
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                   ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

reduction_kernel.cu(558): error: expected a ";"
    __attribute__((shared)) cg::experimental::block_tile_memory<sizeof(T), BlockSize> scratch;
                                                                ^

reduction_kernel.cu(561): error: name followed by "::" must be a class or namespace name
    auto cta = cg::experimental::this_thread_block(scratch);
                   ^

reduction_kernel.cu(561): error: identifier "scratch" is undefined
    auto cta = cg::experimental::this_thread_block(scratch);
                                                   ^

reduction_kernel.cu(563): error: name followed by "::" must be a class or namespace name
    auto multiWarpTile = cg::experimental::tiled_partition<MultiWarpGroupSize>(cta);
                             ^

6 errors detected in the compilation of "reduction_kernel.cu".
make[1]: *** [Makefile:358: reduction_kernel.o] Error 255
make[1]: Leaving directory '/home/yi/Downloads/cuda-samples-11.6/Samples/2_Concepts_and_Techniques/reduction'
make: *** [Makefile:45: Samples/2_Concepts_and_Techniques/reduction/Makefile.ph_build] Error 2
  • VSCode didn’t find header with red underlines:

    1
    2
    3
    
    #include errors detected. Please update your includePath. Squiggles are disabled for this translation unit 
    (/home/yi/Downloads/cuda-samples-11.6/Samples/2_Concepts_and_Techniques/reduction/reduction_kernel.cu).C/C++(1696)
    cannot open source file "cooperative_groups/reduce.h"C/C++(1696)
    

    Edit “c_cpp_properties.json” as:

    1
    2
    3
    4
    
    "includePath": [
        "${workspaceFolder}/**",
        "/usr/local/cuda-11.6/include"
    ],
    

    Didn’t solve. And the header is there and can be found.

  • Same error about reduction here issue#201. But he was 11.8.

  • Re-install cuda toolkit 11.8 and test samples of 11.8.

    • Note: the final line of the installing scripts provided on official site should be: sudo apt-get -y install cuda-11-8 instead of sudo apt-get -y install cuda

    • Error persists at reduction_kernel.cu.


Build with CMake

(2023-11-17)

The library “diff-gaussian-rasterization-comments/cuda_rasterizer” is built according to CXX standard.

However, I cannot step into the CUDA kernels when debugging.

Maybe using nvcc to build can enable debugging.

Create a CMakeLists.txt using nvcc?