memo: LibTorch | Tensor APIs and Examples

Basic APIs about tensors in LibTorch.

Table of contents

Create Tensor

shape

(2024-01-24)

  1. .sizes() is an “vector-like” object of class: IntArrayRef. It can be created with curly braces, e.g., {5.2}, or an std::vector<int64_t>{1,2,4}.

    1
    2
    3
    4
    5
    6
    7
    
    #include <cassert>
    std::vector<int64_t> myVec = {5,2};
    assert(torch::ones({5,2}).sizes() == myVec); // pass
    std::cout << "Equal" << std::endl;
    
    c10::IntArrayRef myArrRef = {5,2};
    assert(myVec == myArrRef);    // pass
    
  2. Use tensor.size(i) (better than tensor.sizes()[i]) to access one of dimensions. Docs

    1
    2
    3
    
    std::cout << torch::ones(5).sizes() << std::endl;    // [5]
    std::cout << torch::ones({5,2}).sizes() << std::endl; // [5, 2]
    std::cout << myTensor.size(1) << std::endl;    // 2
    
  3. Use a Lambda function to reshape a tensor and return the updated shape:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    std::function<c10::IntArrayRef(c10::IntArrayRef newSize)>getResizedShape(torch::Tensor& t) {
        auto lambda = [&t](c10::IntArrayRef newSize){
            t.resize_(newSize);
            return t.sizes();
        };
        return lambda;
    }
    
    int main() {
        torch::Tensor myTensor = torch::rand({1,2,3});
        std::cout << myTensor << std::endl;
        auto getNewSize = getResizedShape(myTensor);
        std::cout << getNewSize({3,2}) << std::endl;
    }
    
    Output:
    1
    2
    3
    4
    5
    
    (1,.,.) = 
      0.9838  0.7854  0.6991
      0.8325  0.1196  0.3780
    [ CPUFloatType{1,2,3} ]
    [3, 2]
    

Create from factory func

General schema:

1
torch::<factory-func-name> (<func-specific-args>, <sizes>, <tensor-opt>)
  1. Create a tensor from the factory function torch::rand()

    1
    2
    3
    4
    5
    6
    7
    
    #include <torch/torch.h>  // unzipped to /usr/local/libtorch
    
    int main(){
        const torch::Tensor a = torch::randint(1, 9, {1,2,3});
        std::cout << a << std::endl;
        std::cout <<"size:" << a.sizes() << std::endl;
    }
    
  2. CMakeLists.txt

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
    project(MyLibTorchApp)    # name
    
    find_package(Torch REQUIRED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
    
    add_executable(${PROJECT_NAME} main.cpp)
    target_link_libraries(${PROJECT_NAME} "${TORCH_LIBRARIES}")
    set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 17)
    
  3. Build:

    1
    2
    3
    4
    
    mkdir -p build
    cd build
    cmake -DCMAKE_PREFIX_PATH=/usr/local/libtorch ..
    cmake --build . --config Release
    

    Or in a modern way (under the workspace; no need cd to ./build):

    1
    2
    
    cmake -B build -DCMAKE_PREFIX_PATH=/usr/local/libtorch -GNinja
    cmake --build build  # build in ./build
    

    Execute it: ./MyLibTorchApp

    Output:
    1
    2
    3
    4
    5
    
    (1,.,.) = 
      8  7  6
      2  7  2
    [ CPUFloatType{1,2,3} ]
    size: [1, 2, 3]
    

Create with 4 Properties

  1. Pass an instance TensorOptions to the factory function:

    1
    2
    3
    4
    5
    6
    7
    8
    
    torch::TensorOptions options = torch::TensorOptions().dtype(torch::kFloat32)
                                         .layout(torch::kStrided)
                                         .device(torch::kCUDA, 0)
                                         .requires_grad(true);
    torch::Tensor a = torch::full({3,4}, 123, options);
    std::cout << a << std::endl;
    std::cout << a.device() << std::endl;
    std::cout << a.requires_grad() << std::endl;
    
    • Only float and complex can .requires_grad.
    • full(...) is not implemented for sparse layout
    Output:
    1
    2
    3
    4
    5
    6
    
     123  123  123  123
     123  123  123  123
     123  123  123  123
    [ CUDAFloatType{3,4} ]
    cuda:0
    1
    
  2. Omitting torch::TensorOptions(), which will be pre-configured and returned if calling the 4 properties directly from torch:: namespace.

    1
    
    torch::Tensor a = torch::arange(1,9, torch::dtype(torch::kInt32).device(torch::kCUDA, 0));
    
  3. If only one property needs to be specified, its property name (torch::dtype()) can be omitted even further.

    1
    
    torch::Tensor a = torch::arange(8, torch::kInt32);
    

Convert tensor by .to

Use TensorOptions and .to() to create a new tensor on new memory based on a source tensor.

  1. Convert dtype:

    1
    2
    3
    4
    5
    6
    7
    8
    
    torch::Tensor src_tensor = torch::randn({3,2});
    torch::Tensor a = src_tensor.to(torch::kInt32);
    
    // combinational
    torch::Tensor a = src_tensor.to(torch::dtype(torch::kInt32).device(torch::kCUDA,0));
    
    auto opts = a.options();
    std::cout << opts << std::endl;
    
  • What does “new” mean?

Options Alteration

1
2
3
4
5
6
torch::Tensor a = torch::randn(3);

// change the property of dtype in the TensorOptions object
auto int_opts = a.options().dtype(torch::kInt32);

auto float_opts = a.options().dtype(torch::kFloat32);

size of a tensor

(2024-01-24)

LibTorch sizeof tensor - SO

1
2
3
4
5
6
torch::Tensor myTensor = torch::rand({1,2,3}, torch::kFloat32);
int sizeOfFloat = torch::elementSize(torch::typeMetaToScalarType(myTensor.dtype()));

std::cout << "size of the kFloat32 type: " << sizeOfFloat << std::endl;
std::cout << "Number of elements in the tensor: " << myTensor.numel() << std::endl;
std::cout << "Bytes occupied by the tensor: " << myTensor.numel() * sizeOfFloat << std::endl;
Output:
1
2
3
size of the kFloatt32 type: 4
Number of elements in the tensor: 6
Bytes occupied by the tensor: 24

Manipulate Tensor

ATen means “A Tensor Library”. The Tensor class under its namespace at:: lays the base for all tensor operations. ezyang’s blog


Resize

(2023-11-12)

API: Class Tensor in Namespace ATen - Docs

  1. Reshape a tensor in place:

    1
    2
    3
    4
    
    torch::Tensor t = torch::arange(6).resize_({1,2,3});
    std::cout << t << std::endl;
    t.resize_({6});
    std::cout << t << std::endl;
    
    Output:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    
    (1,.,.) = 
      0  1  2
      3  4  5
    [ CPULongType{1,2,3} ]
     0
     1
     2
     3
     4
     5
    [ CPULongType{6} ]
    
  2. It can be resized to more than its elements:

    1
    2
    3
    4
    5
    6
    7
    
     torch::Tensor t = torch::arange(6).resize_({1,2,3});
     t.resize_({10});
     std::cout << "Allocated bytes:" << t.numel() * torch::elementSize(torch::typeMetaToScalarType(t.dtype())) << std::endl;
     for (size_t i = 0; i < t.numel(); ++i) {
             std::cout << t[i] << " ";
         }
     std::cout << std::endl;
    
    Output
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    
    Allocated bytes:80
    0 [ CPULongType{} ]
    1 [ CPULongType{} ]
    2 [ CPULongType{} ]
    3 [ CPULongType{} ]
    4 [ CPULongType{} ]
    5 [ CPULongType{} ]
    0 [ CPULongType{} ]
    0 [ CPULongType{} ]
    0 [ CPULongType{} ]
    0 [ CPULongType{} ]
    

Flatten

Reshape a tensor to 1D and return the pointer to it. Code from 3DGS

  1. Use a lambda function to resize the tensor and return the data pointer.

  2. .data_ptr() points to data of the tensor x, while x doesn’t point to data directly.

  3. reinterpret_cast<char*> converts the tensor-type pointer .data_ptr() to a char-type pointer pResizedX, which will read memory byte by byte.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <torch/torch.h>
#include <functional>
#include <iostream>
#include <cstdio>

std::function<char*(size_t N)> resizeFunctional(torch::Tensor& t){
    std::cout << "size of the reference of the input tensor: " << sizeof(t) << std::endl;

    auto lambda = [&t](size_t N){    // Number of elements
        t.resize_({ (long long) N}); // shape: {N}
        std::cout << "N is: " << N << std::endl;
        std::cout << "size of t: " << sizeof(t) << std::endl;
        std::cout << "dtype of t: " << t.dtype() << std::endl;
        return reinterpret_cast<char*>(t.contiguous().data_ptr());  // read memory byte by byte
    };
    return lambda;
}

int main(){
    torch::Tensor a = torch::arange(33,40, torch::kByte).resize_({1,2,3});
    std::cout << "Test tensor: " << a << std::endl;
    std::cout << "Tensor is a ptr, so its size is: " << sizeof(torch::Tensor) << std::endl;

    auto resizer = resizeFunctional(a); // lambda expression
    char* pTensor = resizer(a.numel()); // pointer to tensor's data

    // Memory address
    printf("char*: %p \n", pTensor);
    std::cout << "size of pointer of a char: " << sizeof(pTensor) << std::endl;

    // The return address is the data_ptr()
    printf("data_ptr(): %p \n", a.data_ptr());

    // Print out the data stored in the returned address
    // Since a data is only 1 byte, the 1st byte is the 1st data.
    char data = *pTensor;   // the first byte. 

    // Note: unicode of 0-31 are invisible, so I test char 33-40
    printf("The first byte: %c \n", data);
    std::cout << data << std::endl;

   // Convert value (char) to integer
    printf("Decimal: %d \n", data);  // 33
    std::cout << "Convert 1st byte to int: " << static_cast<int>(*pTensor) << std::endl;

    // Indexing elements like an array:
    std::cout << "Use [0]: " << pTensor[0] << std::endl;  // !

    for (size_t i = 0; i < 6; ++i) {
            std::cout << static_cast<char>(pTensor[i]) << " ";
        }
    std::cout << std::endl;

    return 0;
}
  • Output
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    (base) yi@yi:~/Downloads/LibTorch_Study$ ./build/MyLibTorchApp 
    Test tensor: (1,.,.) = 
      33  34  35
      36  37  38
    [ CPUByteType{1,2,3} ]
    Tensor is a ptr, so its size is: 8
    size of the reference of the input tensor: 8
    N is: 6
    size of t: 8
    dtype of t: unsigned char
    char*: 0x55f202301640 
    size of pointer of a char: 8
    data_ptr(): 0x55f202301640 
    The first byte: !
    !
    Decimal: 33 
    Convert 1st byte to int: 33
    Use [0]: !
    ! " # $ % & 
    
  • N is the total number of elements in the tensor t.

  • resize_ requires the shape argument size to be c10::IntArrayRef type, which is an array of int64_t, i.e., signed 8-byte integer.

    Therefore, from the unsigned long int size_t (N) to a signed int64_t is a narrowing conversion.

    long is at least 32-bit. In my computer, long is 8-byte. And long long is at least 64-bit. Because the signedness modifier is omitted, both long and long long are signed. Thus, the type casting (long long) N is equivalent to (int64_t) N

  • int64_t is exact 8 bytes for all compilers, unlike long somewhere is 4-bytes. Definition of int64_t - SO

    1
    2
    3
    4
    5
    6
    
    std::cout << sizeof(size_t) <<  std::endl;    // 8
    std::cout << sizeof(signed long) <<  std::endl;   // 8
    std::cout << sizeof(unsigned long) <<  std::endl;   // 8
    std::cout << sizeof(long) <<  std::endl;   // 8
    std::cout << sizeof(long long) <<  std::endl;   // 8
    std::cout << sizeof(int64_t) <<  std::endl;   // 8
    

  1. Attributes of tensor x:

    --- title: tensor x --- classDiagram direction RL class T["at::TensorBase"]{ + c10::intrusive_ptr impl_ } class P["c10::intrusive_ptr"]{ + c10::TensorImpl* target_ } note for P "0x555557729610" P --> T
  2. View the memory via GDB command -exec:

    1
    2
    3
    
    -exec x/64xb 0x555557729610
    0x555557729610:	0x60	0x35	0xfb	0xf7	0xff	0x7f	0x00	0x00
    0x555557729618:	0x01	0x00	0x00	0x00	0x00	0x00	0x00	0x00
    
    • 0x7FFFF7FB3560 is not the address storing x’s data.

    Char pointer pResizedX = 0x555557729500 points to the memory storing the x’s data:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    -exec x/64b 0x555557729500
    0x555557729500:	3	0	0	0	0	0	0	0
    0x555557729508:	3	0	0	0	0	0	0	0
    0x555557729510:	3	0	0	0	0	0	0	0
    0x555557729518:	3	0	0	0	0	0	0	0
    0x555557729520:	0	0	0	0	0	0	0	0
    0x555557729528:	81	0	0	0	0	0	0	0
    0x555557729530:	0	0	0	0	0	0	0	0
    0x555557729538:	16	80	87	85	85	85	0	0
    
    • There are four 3. A tensor takes 8-byte integer?
  3. DEBUG CONSOLE panel:

    1
    2
    
    x.data_ptr
    {void *(const at::TensorBase * const)} 0x55555555b392 <at::TensorBase::data_ptr() const>
    
    • Don’t know what that address is?

Get value

(2023-11-12)

  1. Return the pointer to data: Tensor.data<T>(), which is deprecated and changed to Tensor.data_ptr<T>() internally. Source code

    1
    2
    3
    4
    5
    
    int main(){
        torch::Tensor x = torch::full({1,3}, 2, torch::dtype(torch::kFloat));
        std::cout << x << std::endl;
        std::cout << x.contiguous().data<float>() << std::endl;
        std::cout << x.contiguous().data_ptr<float>() << std::endl;}
    
    Output:
    1
    2
    3
    4
    5
    
    ~/l/build$ ./MyLibTorchApp
     2  2  2
    [ CPUFloatType{1,3} ]
    0x557d9beab500
    0x557d9beab500
    
  2. .item<dtype>() can get scalar data, not vector. Torch C++: Getting the value of a int tensor by using *.data() - SO

    1
    2
    3
    4
    5
    6
    
    int main(){
        torch::Tensor x = torch::randn({1,3});
        std::cout << x << std::endl;
        std::cout << x[0][0].item<int>() << std::endl;
        std::cout << x[0][0].item<float>() << std::endl;
    }
    
    Output:
    1
    2
    3
    4
    5
    
    ~/l/build$ ./MyLibTorchApp
    -0.6926 -0.2304  1.2920
    [ CPUFloatType{1,3} ]
    0
    -0.692582
    
  3. Use a vector to hold result tensor after inference: Part-2 Garry’s Blog

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    // Extract size of output (of the first and only batch)
    // and preallocate a vector with that size 
    auto output_size = output.sizes()[1]; 
    auto output_vector = std::vector<float>(output_size);  
    
    // Fill result vector with tensor items using `Tensor::item`
    for (int i = 0; i < output_size; i++) {
        output_vector[i] = output[0][i].item<float>();
    }
    
  4. Copy cv::Mat to a tensor: Part-3 Garry’s Blog

    1
    2
    3
    4
    
    torch::Tensor tensor = torch::empty({mat.row, mat.cols, mat.channels()}, 
            torch::TensorOptions().dtype(torch::kByte).device(torch::kCPU));
    
    std::memcpy(tensor.data_ptr(), reinterpret_cast<void*>(mat.data), tensor.numel() * sizeof(at::kByte));
    

libtorch 常用api函数示例(史上最全、最详细) - 博客园

AllentDan/LibtorchTutorials

From PyTorch to Libtorch: tips and tricks - Marc Lalonde - Medium

Announcing a series of blogs on PyTorch C++ API - Kushashwa Ravi Shrimali


empty tensor

(2024-01-28)

In 3DGS, the project diff-gaussian-rasterization is built as an cpp extension according to setup.py, which is called in Python program. Whereas the CMakeList.txt serves for building the project as a static library (.so) to be inserted into the C++ executable application.

Originally, I want to debug the diff-gaussian-rasterization as a static library, so I need to construct input tensors that mimic those passed from Python, where some tensors are assigned as None, such as cov3D_precomp.

However, I don’t know how to create a “None” tensor in the C++ program (Perplexity said: “You can’t directly set a tensor to NULL as you would do in Python by setting a variable to None.”).

I have tried torch::empty({0}), but its data_ptr() is not the desired nullptr. Consequently, a if judge statement later won’t enter into the branch that would happened when the extension is called by Python.

  • (2024-01-31) It turns out that I forgot the re-build and make the application again. So, CUDA-GDB still steps through the old application.

  • The .data_ptr() of torch::empty({0}) and torch::full({0},0) both are nullptr.


(2024-01-30)

Just found the None Python tensors in 3DGS are reassigned with torch.Tensor([]):

1
2
if cov3D_precomp is None:
    cov3D_precomp = torch.Tensor([])

The torch.Tensor([]) will be passed into the C++ package function: _C.rasterize_gaussians() (i.e., the forward method RasterizeGaussiansCUDA)

A demo where Python calls C++ package referring to AIkui’s CUDA extension tutorial:

Expand codes
  • The code can be evaluated by commands: chmod +x test.sh and ./test.sh
P t y o t r h c o h n . T e n s o r ( [ ] ) p a s s t o r L c i h b : T : o e r m c p h t y ( { 0 } ) r e t u r n t o r P c y h t . h T o e n n s o r ( [ ] )
Built with Hugo
Theme Stack designed by Jimmy