memo: PyTorch

Define multi layers by for loop

1
2
3
4
5
6


for i in range (n_layers):
    setattr(self, f"layer{i}", nn.Linear(2, n_hidden_units), nn.ReLU(True))

# 取出多层：
for i in range(n_layers):
    layer = getattr(self, f"layer{i}")

Access all weights of a model

1

for name, param in model.named_parameters:

Refer:

pytorch教程之nn.Module类详解——使用Module类来自定义模型- CSDN

Initialize weights of nn.Linear

model.apply(fn) will apply function fn to every children submodule. Therefore, let fn be init_weights()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


@torch.no_grad() # this func won't create graph
def init_weights(m):
    print(m)
    if type(m) == nn.Linear:
        torch.nn.init.ones_(m.weight)
        m.bias.data.fill_(0.01)
        print(m.weight)

model = Net()
model.apply(init_weights)

Refer:

ModuleList usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


'''
ModuleList is like a iterator
'''
class LinearNet(nn.Module):
    def __init__(self, in_features, out_features, num_layers, layer_size):
        super(LinearNet, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(in_features, layer_size)])  # 1st layer
        self.linears.extend([nn.Linear(layer_size, layer_size) for i in range(1, num_layers-1)])
        self.linears.append(nn.Linear(layer_size, out_features))

    def forward(self, x):
        for idx, layer in enumerate(self.linears):
            x = layer(x)
            # x = self.linears[idx // 2](x) + layer(x)  # pass x through each layer
        return x

Refer:

When should I use nn.ModuleList and when should I use nn.Sequential?

torch.Tensor

包含单一数据类型元素的多维矩阵
有10种张量类型，torch.Tensor是默认张量类型torch.FloatTensor的别名

Note 张量变异方法都带有下划线后缀,它们直接原地修改原张量的属性，而不派生新张量。例如torch.FloatTensor.abs_()直接计算并修改原张量，而torch.FloatTensor.abs()在新张量中计算结果。

Get value of a tensor

a = torch.tensor([3])

a.data ⮕ tensor([3])
a.item() ⮕ 3 (“Convert one-element tensor to python scalar.”)
a.numpy() ⮕ array([3])

b = (a/2).requires_grad_(True)

b.detach().cpu().numpy() ⮕ array([1.5], dtype=float32)

(2023-07-27)

Memory allocated for a tensor

1
2


a = torch.arange(4) # torch.int64
Num_bytes = a.element_size() * a.nelement() # 8 * 4

How to know the memory allocated for a tensor on gpu?

Move specified axis

Move the specified axis to the second axis

Question for ChatGPT:

If I was given a variable channel_dim, and it can be 3, that means the tensor is (B, H, W, channel_dim), so how can I transform the tensor to make the channel_dim the 2nd axis?

I mean the channel_dim is a variable, it can be any axis.

Answer:

Apologies for the confusion. If the channel_dim is a variable and can be any axis, you can use the following code snippet to move the specified axis to the second axis of the tensor:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


import torch

# Assuming you have a tensor with size (B, H, W, C) and channel_dim can be any axis
tensor = torch.randn(B, H, W, C)
channel_dim = 2  # Replace this with the desired axis you want to move to the second position

# Permute the tensor to move the specified axis (channel_dim) to the second axis
# Generate the permutation order: (0, channel_dim, 1, 2, ..., channel_dim-1, channel_dim+1, ..., C-1)
perm_order = list(range(len(tensor.shape)))
perm_order.remove(channel_dim)
perm_order = [0, channel_dim] + perm_order[1:]

tensor_permuted = tensor.permute(*perm_order)

# The resulting tensor will have size (B, C, H, W) if channel_dim was initially the third axis (2)
# The resulting tensor will have size (B, W, C, H) if channel_dim was initially the second axis (1), and so on.
print(tensor_permuted.size())

(2023-08-01)

Dynamically change Conv2d layer

Question of chatGPT:

I’m writing pytorch. I want to use nn.Conv2d, but the in_channels is the result generated in the forward method, how could I make the parameter of Conv2d optimized during training?

Make the in_channels a instance variable, like self.in_chnls = in_channels.
Then in the __init__(self, in_channels) method, a Conv2d layer can be construct: self.conv = nn.Conv2d(self.in_chnls, out_chnls, ...)
Then update it in the forward before calling self.conv layer: self.in_chnls = x.shape[1]

(DDG search: “use nn.Conv2d with dynamically determined in_channels during the forward pass”)

How to create a custom layer that accepts the input during forward pass - PyTorch forum

Make other class (nn.Module), in which the prefix model is called.
But the in_channels is required when initializing Conv2d(), so the parameters of it can be defined.

“Can we not define the filter size at runtime?” Why does nn.Conv2d require in_channels? - PyTorch Forum

nn.LazyConv2d(out_channels, ...) doesn’t need in_channels. Docs
torch.nn.modules.lazy.LazyModuleMixin

Get device of a module

1

dev = next(model.parameters()).device

How to get the device type of a pytorch module conveniently?

Module.apply(fn)

Apply the function fn to each submodule and the Module itself. This can be used to initialize weights, like the code in AIM:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


class ViT_CLIP(nn.Module):
  def __init__(self,):
    self.__init__()
  
  def init_weights(self,):
    # A class method
    def _init_weights(m): # Pass a submodule
      if isinstance(m, nn.Linear):
        trunc_normal_(m.weight, std=.02)
        if isinstance(m, nn.Linear) and m.bias is not None:
          nn.init.constant_(m.bias, 0)
      elif isinstance(m, nn.LayerNorm):
        nn.init.constant_(m.bias, 0)
        nn.init.constant_(m.weight, 1.0)

    # Call _init_weights
    self.apply(_init_weights)

Customize Autograd Op

(2024-01-23)

The .apply() method of a customized operation that subclasses autograd.Function requires forward and backward to be static methods. Docs

Example - 3DGS

torch.roll

Circular shift (“循环移位”) along some dimensions. Docs

Example - AIM - swin2d

Move 1 step along 1 dimension:
1 2

x = torch.tensor([[1,2,3],[4,5,6]]) torch.roll(x, shifts=1, dims=0)
It will shift 1 element in the dimension 0, i.e., [4,5,6] to the next position, so x will become: [[4,5,6],[1,2,3]]
Move steps (2, 1) along 2 dimensions separately:
1

torch.roll(x, shifts=(2,1), dims=(0,1))
The dimension 0 will shift twice, and the dimension 1 will shift once. So x becomes: [[3,1,2],[6,4,5]]

torch.max

torch.max(x, dim) compares each atom element on the equal position according to the dimension dim

1
2
3
4
5
6
7
8


a = torch.randint(8, (2,3,3)
tensor([[[0, 3, 2],
      [6, 1, 5],
      [2, 7, 0]],

     [[4, 7, 4],
      [6, 1, 0],
      [0, 1, 6]]])

torch.max(a, 0) will compare: 0-4, 3-7,, 2-4; 6-6, 1-1, 5-0; 2-0, 7-1, 0-6, so the result is [[4,7,4] [6, 1, 5] [2, 7, 6].

torch.max(a,1) will compare: 0-6-2, 2-1-7, 2-5-0; 4-6-0, 7-1-1, 4-0-6; then the result is [[6,7,5], [6,7,6]]

torch.diff

The back one minus the front one.

1
2
3
4
5
6


a = torch.arange(12, (2,2,3))
tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]]])

torch.diff(a): the last dim subtract [ [ [1-0, 2-1], [4-3, 5-4]]; [7-6, 8-7], [10-9, 11-10] ] = [ [ [1,1], [1,1] ]; [1,1], [1,1] ] ]
torch.diff(a, n=2): do again for 1st-time result: [ [ [0], [0]; [[0],[0]]]
torch.diff(a, append=a), the append tensor needs to have the same dimensions of input.

Count #params

From Match-NeRF:

1
2
3
4
5


for name, param in self.model.named_parameters():
    log.info(f'{name}: {param.requires_grad}')
num_param = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
num_total_param = sum(p.numel() for p in self.model.parameters())
log.info('Number of total parameters: {}, tunable parameters: {}'.format(num_total_param, num_param))

(2023-12-21) From MVSNet:

1

sum([p.data.nelement() for p in model.parameters()])

param_group

(2024-04-11)

Per-parameter options: an iterable of parameter groups. Docs
- Have practiced in MatchNeRF exp1 before: different modules are trained with separate LRs.
Example in 3DGS

torch.unbind

Docs

(2024-05-17)

Break one dimension apart.

1
2
3
4


a = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
torch.unbind(a) # out a tuple: ([1,2,3], [4,5,6], [7,8,9])

torch.chunk can get the same effect, but it requires specifying the number of chunks:

1
2


>>> torch.chunk(a, 3, dim=0)
(tensor([[1, 2, 3]]), tensor([[4, 5, 6]]), tensor([[7, 8, 9]]))

In contrast, torch.split requires specifying the number of entries contained in a chunk:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


>>> torch.split(a, 3, dim=0) # return a tuple
(tensor([[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]),)

>>> torch.split(a, 2, dim=0)
(tensor([[1, 2, 3],
         [4, 5, 6]]),
 tensor([[7, 8, 9]]))

>>> torch.split(a, 1, dim=0)
(tensor([[1, 2, 3]]), tensor([[4, 5, 6]]), tensor([[7, 8, 9]]))