watch: PyTorch - 刘二 10

Background

全连接：线性层串行

输入图像 in pytorch：(C, H, W) ToTensor doc

CNN = Feature Extraction + Classification

CCD 光敏电阻 + 透镜系统，一个电阻只检测一个光锥区域的光线，根据电阻与光强的函数关系，得到灰度图

input channels: rgb

Conv Layer

每次取一个块：(C * kernel_h * kernel_w) 通过 Conv layer 得到 (C’ * kernel_h’ * kernel_w’)。

做 Conv layer 时，input 的所有 channel 都做卷积操作（各通道使用的卷积核不同），然后(~~按不同权重~~)相加。这两步做 n 次，那么 output的 channels 就是 n。所以后续的每一层 channel 都包含了input 的全部通道的信息。 CNN 的权重存在于卷积核中。

(2023-10-22) 各输入通道 乘以的 kernel 不同，然后各通道直接相加:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


a = torch.arange(18.).reshape(1, 2,3,3)  # (bs, C, h,w)
n = 1  # convolution repeat times
conv_lyr = torch.nn.Conv2d(a.size(1), n, kernel_size=3, bias=False)
print(conv_lyr.weight)    # shape (1,2,3,3) 

conv_lyr.weight = torch.nn.Parameter(
  torch.stack([torch.ones(3,3), torch.zeros(3,3)]).unsqueeze(0))
print(conv_lyr.weight)
out_a = conv_lyr(a)   # (1,1,1,1), tensor([[[[36.]]]])
# i.e. 0+1+2+3+4+...+8 + 0+0+..0  = 36.

(2023-10-22) Conv2d 与 FC 做的数学运算相同：
- 如果是 1x1 的卷积 (stride=1)，即没有邻居像素相加，Conv 就会和 FC 等价: 把每个像素投影到另一个维度的空间。
  
  chnl1 × w’ + chnl2 × w’’ + chnl3 × w’’’ = out-chnl1
  
  dim1 × w’ + dim2 × w’’ + dim3 × w’’’ = out-dim1
- (2023-12-01) I already forgot the original meaning of the left picture because I didn’t write descriptions.
  
  For a FC layer given by 4 samples of 3 dimensions, each dimension multiplied by a factor wₓ becomes a portion of the out dimension. This corresponds to the operations in Conv2d layer: every channel is multiplied by a kernel and then sum up all weighted channels to form one of out channels.
  
  The difference is that in FC the 4 samples aren’t merged, but Conv2d merged 4 pixels to 1 pixel. Thus, the number of pixels in FC is consistent, whereas conv layer reduces pixels.
  
  Each channel in feature maps always shares a common 2-D kernel. A Conv2d layer stands for a 4-D kernel as each channel uses different kernels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


import torch
in_channels, out_channels = 5, 10
width, height = 100, 100
kernel_size = 3
batch_size = 1

input = torch.randn(batch_size, in_channels, width, height) # (1,5,100,100)

conv_layer = torch.nn.Conv2d(in_channels,
                             out_channels,
                             kernel_size = kernel_size)

output = conv_layer(input)      # (1,10,98,98)
print(conv_layer.weight.shape)  # (10,5,3,3)

Padding 保持输出图像尺寸不变。3x3核pad 1圈，5x5 pad 2圈…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


import torch

input = [3, 4, 6, 5, 7,
         2, 4, 6, 8, 2,
         1, 6, 7, 8, 4,
         9, 7, 4, 6, 2,
         3, 7, 5, 4, 1]

input = torch.Tensor(input).view(1, 1, 5, 5)

conv_layer = torch.nn.Conv2d(1,1, kernel_size=3, padding=1, bias=False)

kernel = torch.Tensor([1, 2, 3,
                       4, 5, 6,
                       7, 8, 9]).view(1, 1, 3, 3)   # (out_chnls, in_chnls, h, w)
conv_layer.weight.data = kernel.data

output = conv_layer(input)
print(output)

stride 卷积核移动步长，（减少步数）用于缩小feature map 的大小

Max Pooling 下采样, 每 h x w 区块里面取最大值，kerenl_size=2 尺寸缩小为原来的一半。该操作无参数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


import torch

input = [3, 4, 6, 5, 7,
         2, 4, 6, 8, 2,
         1, 6, 7, 8, 4,
         9, 7, 4, 6, 2,
         3, 7, 5, 4, 1]

input = torch.Tensor(input).view(1, 1, 5, 5)

maxpooling_layer = torch.nn.MaxPool2d(kernel_size=2)

output = maxpooling_layer(input)
print(output)

>>> tensor([[[[4., 8.],
>>>           [9., 8.]]]])

定义网络：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
        self.pooling = torch.nn.MaxPool2d(kernel_size=2)
        self.fc = torch.nn.Linear(320, 10)

    def forward(self, x):
        batch_size = x.size(0)
        x = F.relu(self.pooling(self.conv1(x)))
        x = F.relu(self.pooling(self.conv2(x)))
        x = x.review(batch_size, -1)    # or flatten
        x = self.fc(x)
        return x    # 使用交叉熵损失，所以不做激活

model = Net()

device = torch.device("cuda:0" if torch.cuda.is_avaliable() else "cpu")
model.to(device)

watch: PyTorch - 刘二 10 | CNN Basics

Table of contents

Background

Conv Layer