memo: LibTorch | Tensor APIs and Examples

Create Tensor

shape

(2024-01-24)

.sizes() is an “vector-like” object of class: IntArrayRef. It can be created with curly braces, e.g., {5.2}, or an std::vector<int64_t>{1,2,4}.

1
2
3
4
5
6
7


#include <cassert>
std::vector<int64_t> myVec = {5,2};
assert(torch::ones({5,2}).sizes() == myVec); // pass
std::cout << "Equal" << std::endl;

c10::IntArrayRef myArrRef = {5,2};
assert(myVec == myArrRef);    // pass

“Create vector out of the IntArrayRef constructor, , otherwise the vector is destroyed immediately afterward.” How to compare a torch::tensor shape against some other shapes? - SO

Use tensor.size(i) (better than tensor.sizes()[i]) to access one of dimensions. Docs

1
2
3


std::cout << torch::ones(5).sizes() << std::endl;    // [5]
std::cout << torch::ones({5,2}).sizes() << std::endl; // [5, 2]
std::cout << myTensor.size(1) << std::endl;    // 2

Use a Lambda function to reshape a tensor and return the updated shape:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


std::function<c10::IntArrayRef(c10::IntArrayRef newSize)>getResizedShape(torch::Tensor& t) {
    auto lambda = [&t](c10::IntArrayRef newSize){
        t.resize_(newSize);
        return t.sizes();
    };
    return lambda;
}

int main() {
    torch::Tensor myTensor = torch::rand({1,2,3});
    std::cout << myTensor << std::endl;
    auto getNewSize = getResizedShape(myTensor);
    std::cout << getNewSize({3,2}) << std::endl;
}

Output:

1
2
3
4
5


(1,.,.) = 
  0.9838  0.7854  0.6991
  0.8325  0.1196  0.3780
[ CPUFloatType{1,2,3} ]
[3, 2]

Create from factory func

General schema:

1

torch::<factory-func-name> (<func-specific-args>, <sizes>, <tensor-opt>)

<factory-func-name> e.g., arange, empty, …

Create a tensor from the factory function torch::rand()

1
2
3
4
5
6
7


#include <torch/torch.h>  // unzipped to /usr/local/libtorch

int main(){
    const torch::Tensor a = torch::randint(1, 9, {1,2,3});
    std::cout << a << std::endl;
    std::cout <<"size:" << a.sizes() << std::endl;
}

CMakeLists.txt

1
2
3
4
5
6
7
8
9


cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
project(MyLibTorchApp)    # name

find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(${PROJECT_NAME} main.cpp)
target_link_libraries(${PROJECT_NAME} "${TORCH_LIBRARIES}")
set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 17)

Build:

1
2
3
4


mkdir -p build
cd build
cmake -DCMAKE_PREFIX_PATH=/usr/local/libtorch ..
cmake --build . --config Release

Or in a modern way (under the workspace; no need cd to ./build):

1
2


cmake -B build -DCMAKE_PREFIX_PATH=/usr/local/libtorch -GNinja
cmake --build build  # build in ./build

Execute it: ./MyLibTorchApp

Output:

1
2
3
4
5


(1,.,.) = 
  8  7  6
  2  7  2
[ CPUFloatType{1,2,3} ]
size: [1, 2, 3]

Create with 4 Properties

Pass an instance TensorOptions to the factory function:

1
2
3
4
5
6
7
8


torch::TensorOptions options = torch::TensorOptions().dtype(torch::kFloat32)
                                     .layout(torch::kStrided)
                                     .device(torch::kCUDA, 0)
                                     .requires_grad(true);
torch::Tensor a = torch::full({3,4}, 123, options);
std::cout << a << std::endl;
std::cout << a.device() << std::endl;
std::cout << a.requires_grad() << std::endl;

Only float and complex can .requires_grad.
full(...) is not implemented for sparse layout

Output:

1
2
3
4
5
6


 123  123  123  123
 123  123  123  123
 123  123  123  123
[ CUDAFloatType{3,4} ]
cuda:0
1

Omitting torch::TensorOptions(), which will be pre-configured and returned if calling the 4 properties directly from torch:: namespace.
1

torch::Tensor a = torch::arange(1,9, torch::dtype(torch::kInt32).device(torch::kCUDA, 0));
If only one property needs to be specified, its property name (torch::dtype()) can be omitted even further.
1

torch::Tensor a = torch::arange(8, torch::kInt32);

Convert tensor by .to

Use TensorOptions and .to() to create a new tensor on new memory based on a source tensor.

Convert dtype:

1
2
3
4
5
6
7
8


torch::Tensor src_tensor = torch::randn({3,2});
torch::Tensor a = src_tensor.to(torch::kInt32);

// combinational
torch::Tensor a = src_tensor.to(torch::dtype(torch::kInt32).device(torch::kCUDA,0));

auto opts = a.options();
std::cout << opts << std::endl;

What does “new” mean?

Options Alteration

1
2
3
4
5
6


torch::Tensor a = torch::randn(3);

// change the property of dtype in the TensorOptions object
auto int_opts = a.options().dtype(torch::kInt32);

auto float_opts = a.options().dtype(torch::kFloat32);

size of a tensor

(2024-01-24)

LibTorch sizeof tensor - SO

1
2
3
4
5
6


torch::Tensor myTensor = torch::rand({1,2,3}, torch::kFloat32);
int sizeOfFloat = torch::elementSize(torch::typeMetaToScalarType(myTensor.dtype()));

std::cout << "size of the kFloat32 type: " << sizeOfFloat << std::endl;
std::cout << "Number of elements in the tensor: " << myTensor.numel() << std::endl;
std::cout << "Bytes occupied by the tensor: " << myTensor.numel() * sizeOfFloat << std::endl;

Output:

1
2
3


size of the kFloatt32 type: 4
Number of elements in the tensor: 6
Bytes occupied by the tensor: 24

Manipulate Tensor

ATen means “A Tensor Library”. The Tensor class under its namespace at:: lays the base for all tensor operations. ezyang’s blog

Resize

(2023-11-12)

API: Class Tensor in Namespace ATen - Docs

Reshape a tensor in place:

1
2
3
4


torch::Tensor t = torch::arange(6).resize_({1,2,3});
std::cout << t << std::endl;
t.resize_({6});
std::cout << t << std::endl;

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


(1,.,.) = 
  0  1  2
  3  4  5
[ CPULongType{1,2,3} ]
 0
 1
 2
 3
 4
 5
[ CPULongType{6} ]

It can be resized to more than its elements:

1
2
3
4
5
6
7


 torch::Tensor t = torch::arange(6).resize_({1,2,3});
 t.resize_({10});
 std::cout << "Allocated bytes:" << t.numel() * torch::elementSize(torch::typeMetaToScalarType(t.dtype())) << std::endl;
 for (size_t i = 0; i < t.numel(); ++i) {
         std::cout << t[i] << " ";
     }
 std::cout << std::endl;

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


Allocated bytes:80
0 [ CPULongType{} ]
1 [ CPULongType{} ]
2 [ CPULongType{} ]
3 [ CPULongType{} ]
4 [ CPULongType{} ]
5 [ CPULongType{} ]
0 [ CPULongType{} ]
0 [ CPULongType{} ]
0 [ CPULongType{} ]
0 [ CPULongType{} ]

Flatten

Reshape a tensor to 1D and return the pointer to it. Code from 3DGS

Use a lambda function to resize the tensor and return the data pointer.
.data_ptr() points to data of the tensor x, while x doesn’t point to data directly.
reinterpret_cast<char*> converts the tensor-type pointer .data_ptr() to a char-type pointer pResizedX, which will read memory byte by byte.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55


#include <torch/torch.h>
#include <functional>
#include <iostream>
#include <cstdio>

std::function<char*(size_t N)> resizeFunctional(torch::Tensor& t){
    std::cout << "size of the reference of the input tensor: " << sizeof(t) << std::endl;

    auto lambda = [&t](size_t N){    // Number of elements
        t.resize_({ (long long) N}); // shape: {N}
        std::cout << "N is: " << N << std::endl;
        std::cout << "size of t: " << sizeof(t) << std::endl;
        std::cout << "dtype of t: " << t.dtype() << std::endl;
        return reinterpret_cast<char*>(t.contiguous().data_ptr());  // read memory byte by byte
    };
    return lambda;
}

int main(){
    torch::Tensor a = torch::arange(33,40, torch::kByte).resize_({1,2,3});
    std::cout << "Test tensor: " << a << std::endl;
    std::cout << "Tensor is a ptr, so its size is: " << sizeof(torch::Tensor) << std::endl;

    auto resizer = resizeFunctional(a); // lambda expression
    char* pTensor = resizer(a.numel()); // pointer to tensor's data

    // Memory address
    printf("char*: %p \n", pTensor);
    std::cout << "size of pointer of a char: " << sizeof(pTensor) << std::endl;

    // The return address is the data_ptr()
    printf("data_ptr(): %p \n", a.data_ptr());

    // Print out the data stored in the returned address
    // Since a data is only 1 byte, the 1st byte is the 1st data.
    char data = *pTensor;   // the first byte. 

    // Note: unicode of 0-31 are invisible, so I test char 33-40
    printf("The first byte: %c \n", data);
    std::cout << data << std::endl;

   // Convert value (char) to integer
    printf("Decimal: %d \n", data);  // 33
    std::cout << "Convert 1st byte to int: " << static_cast<int>(*pTensor) << std::endl;

    // Indexing elements like an array:
    std::cout << "Use [0]: " << pTensor[0] << std::endl;  // !

    for (size_t i = 0; i < 6; ++i) {
            std::cout << static_cast<char>(pTensor[i]) << " ";
        }
    std::cout << std::endl;

    return 0;
}

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


(base) yi@yi:~/Downloads/LibTorch_Study$ ./build/MyLibTorchApp 
Test tensor: (1,.,.) = 
  33  34  35
  36  37  38
[ CPUByteType{1,2,3} ]
Tensor is a ptr, so its size is: 8
size of the reference of the input tensor: 8
N is: 6
size of t: 8
dtype of t: unsigned char
char*: 0x55f202301640 
size of pointer of a char: 8
data_ptr(): 0x55f202301640 
The first byte: !
!
Decimal: 33 
Convert 1st byte to int: 33
Use [0]: !
! " # $ % & 

N is the total number of elements in the tensor t.
resize_ requires the shape argument size to be c10::IntArrayRef type, which is an array of int64_t, i.e., signed 8-byte integer.

Therefore, from the unsigned long int size_t (N) to a signed int64_t is a narrowing conversion.

long is at least 32-bit. In my computer, long is 8-byte. And long long is at least 64-bit. Because the signedness modifier is omitted, both long and long long are signed. Thus, the type casting (long long) N is equivalent to (int64_t) N

int64_t is exact 8 bytes for all compilers, unlike long somewhere is 4-bytes. Definition of int64_t - SO

1
2
3
4
5
6


std::cout << sizeof(size_t) <<  std::endl;    // 8
std::cout << sizeof(signed long) <<  std::endl;   // 8
std::cout << sizeof(unsigned long) <<  std::endl;   // 8
std::cout << sizeof(long) <<  std::endl;   // 8
std::cout << sizeof(long long) <<  std::endl;   // 8
std::cout << sizeof(int64_t) <<  std::endl;   // 8

Attributes of tensor x:

--- title: tensor x --- classDiagram direction RL class T["at::TensorBase"]{ + c10::intrusive_ptr impl_ } class P["c10::intrusive_ptr"]{ + c10::TensorImpl* target_ } note for P "0x555557729610" P --> T

View the memory via GDB command -exec:

1
2
3


-exec x/64xb 0x555557729610
0x555557729610:	0x60	0x35	0xfb	0xf7	0xff	0x7f	0x00	0x00
0x555557729618:	0x01	0x00	0x00	0x00	0x00	0x00	0x00	0x00

0x7FFFF7FB3560 is not the address storing x’s data.

Char pointer pResizedX = 0x555557729500 points to the memory storing the x’s data:

1
2
3
4
5
6
7
8
9


-exec x/64b 0x555557729500
0x555557729500:	3	0	0	0	0	0	0	0
0x555557729508:	3	0	0	0	0	0	0	0
0x555557729510:	3	0	0	0	0	0	0	0
0x555557729518:	3	0	0	0	0	0	0	0
0x555557729520:	0	0	0	0	0	0	0	0
0x555557729528:	81	0	0	0	0	0	0	0
0x555557729530:	0	0	0	0	0	0	0	0
0x555557729538:	16	80	87	85	85	85	0	0

There are four 3. A tensor takes 8-byte integer?

DEBUG CONSOLE panel:

1
2


x.data_ptr
{void *(const at::TensorBase * const)} 0x55555555b392 <at::TensorBase::data_ptr() const>

Don’t know what that address is?

Get value

(2023-11-12)

Return the pointer to data: Tensor.data<T>(), which is deprecated and changed to Tensor.data_ptr<T>() internally. Source code

1
2
3
4
5


int main(){
    torch::Tensor x = torch::full({1,3}, 2, torch::dtype(torch::kFloat));
    std::cout << x << std::endl;
    std::cout << x.contiguous().data<float>() << std::endl;
    std::cout << x.contiguous().data_ptr<float>() << std::endl;}

Output:

1
2
3
4
5


~/l/build$ ./MyLibTorchApp
 2  2  2
[ CPUFloatType{1,3} ]
0x557d9beab500
0x557d9beab500

.item<dtype>() can get scalar data, not vector. Torch C++: Getting the value of a int tensor by using *.data() - SO

1
2
3
4
5
6


int main(){
    torch::Tensor x = torch::randn({1,3});
    std::cout << x << std::endl;
    std::cout << x[0][0].item<int>() << std::endl;
    std::cout << x[0][0].item<float>() << std::endl;
}

Output:

1
2
3
4
5


~/l/build$ ./MyLibTorchApp
-0.6926 -0.2304  1.2920
[ CPUFloatType{1,3} ]
0
-0.692582

Use a vector to hold result tensor after inference: Part-2 Garry’s Blog

1
2
3
4
5
6
7
8
9


// Extract size of output (of the first and only batch)
// and preallocate a vector with that size 
auto output_size = output.sizes()[1]; 
auto output_vector = std::vector<float>(output_size);  

// Fill result vector with tensor items using `Tensor::item`
for (int i = 0; i < output_size; i++) {
    output_vector[i] = output[0][i].item<float>();
}

Copy cv::Mat to a tensor: Part-3 Garry’s Blog

1
2
3
4


torch::Tensor tensor = torch::empty({mat.row, mat.cols, mat.channels()}, 
        torch::TensorOptions().dtype(torch::kByte).device(torch::kCPU));

std::memcpy(tensor.data_ptr(), reinterpret_cast<void*>(mat.data), tensor.numel() * sizeof(at::kByte));

A more detailed post: Data Transfer to and from PyTorch - SimonWenkel.com

libtorch 常用api函数示例（史上最全、最详细） - 博客园

AllentDan/LibtorchTutorials

From PyTorch to Libtorch: tips and tricks - Marc Lalonde - Medium

Announcing a series of blogs on PyTorch C++ API - Kushashwa Ravi Shrimali

empty tensor

(2024-01-28)

In 3DGS, the project diff-gaussian-rasterization is built as an cpp extension according to setup.py, which is called in Python program. Whereas the CMakeList.txt serves for building the project as a static library (.so) to be inserted into the C++ executable application.

Originally, I want to debug the diff-gaussian-rasterization as a static library, so I need to construct input tensors that mimic those passed from Python, where some tensors are assigned as None, such as cov3D_precomp.

However, I don’t know how to create a “None” tensor in the C++ program (Perplexity said: “You can’t directly set a tensor to NULL as you would do in Python by setting a variable to None.”).

I have tried torch::empty({0}), but its data_ptr() is not the desired nullptr. Consequently, a if judge statement later won’t enter into the branch that would happened when the extension is called by Python.

(2024-01-31) It turns out that I forgot the re-build and make the application again. So, CUDA-GDB still steps through the old application.
The .data_ptr() of torch::empty({0}) and torch::full({0},0) both are nullptr.

(2024-01-30)

Just found the None Python tensors in 3DGS are reassigned with torch.Tensor([]):

1
2


if cov3D_precomp is None:
    cov3D_precomp = torch.Tensor([])

The torch.Tensor([]) will be passed into the C++ package function: _C.rasterize_gaussians() (i.e., the forward method RasterizeGaussiansCUDA)

A demo where Python calls C++ package referring to AIkui’s CUDA extension tutorial:

Expand codes

The code can be evaluated by commands: chmod +x test.sh and ./test.sh