watch: PyTorch - 刘二 10 | CNN Basics

Table of contents

Background

全连接:线性层串行

输入图像 in pytorch:(C, H, W) ToTensor doc

CNN = Feature Extraction + Classification

CCD 光敏电阻 + 透镜系统, 一个电阻只检测一个光锥区域的光线,根据电阻与光强的函数关系,得到灰度图

input channels: rgb

Conv Layer

每次取一个块:(C * kernel_h * kernel_w) 通过 Conv layer 得到 (C’ * kernel_h’ * kernel_w’)。

做 Conv layer 时,input 的所有 channel 都做卷积操作(各通道使用的卷积核不同),然后(按不同权重)相加。 这两步做 n 次,那么 output的 channels 就是 n。所以后续的每一层 channel 都包含了input 的全部通道的信息。 CNN 的权重存在于卷积核中。

  • (2023-10-22) 各输入通道 乘以的 kernel 不同,然后各通道直接 相加:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    a = torch.arange(18.).reshape(1, 2,3,3)  # (bs, C, h,w)
    n = 1  # convolution repeat times
    conv_lyr = torch.nn.Conv2d(a.size(1), n, kernel_size=3, bias=False)
    print(conv_lyr.weight)    # shape (1,2,3,3) 
    
    conv_lyr.weight = torch.nn.Parameter(
      torch.stack([torch.ones(3,3), torch.zeros(3,3)]).unsqueeze(0))
    print(conv_lyr.weight)
    out_a = conv_lyr(a)   # (1,1,1,1), tensor([[[[36.]]]])
    # i.e. 0+1+2+3+4+...+8 + 0+0+..0  = 36.
    
  • (2023-10-22) Conv2d 与 FC 做的数学运算 相同

    p p p p i i i i F x x x x C e e e e l l l l L 1 2 3 4 a y c e h r n R G B R G B R G B R G B ' l - - - - - - - - - - - - s s d w i w w w m 1 d d d d i i i i m m m m 1 1 1 1 o o o o f f f f o o o o u u u u t t t t 1 2 3 4 1 s t A S O u c p u m h o t n r o l t = f i C o o a o f n l n l v 4 o f w l p e a i o i y x x u g e e t h r l p t ' s u e s t d w w ' c s i h n n 1 p l w w s u t t 1 c c h h n n l l s
    • 如果是 1x1 的卷积 (stride=1),即没有邻居像素相加,Conv 就会和 FC 等价: 把每个像素投影到另一个维度的空间。

      chnl1 × w’ + chnl2 × w’’ + chnl3 × w’’’ = out-chnl1

      dim1 × w’ + dim2 × w’’ + dim3 × w’’’ = out-dim1

    • (2023-12-01) I already forgot the original meaning of the left picture because I didn’t write descriptions.

      For a FC layer given by 4 samples of 3 dimensions, each dimension multiplied by a factor wₓ becomes a portion of the out dimension. This corresponds to the operations in Conv2d layer: every channel is multiplied by a kernel and then sum up all weighted channels to form one of out channels.

      The difference is that in FC the 4 samples aren’t merged, but Conv2d merged 4 pixels to 1 pixel. Thus, the number of pixels in FC is consistent, whereas conv layer reduces pixels.

      Each channel in feature maps always shares a common 2-D kernel. A Conv2d layer stands for a 4-D kernel as each channel uses different kernels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import torch
in_channels, out_channels = 5, 10
width, height = 100, 100
kernel_size = 3
batch_size = 1

input = torch.randn(batch_size, in_channels, width, height) # (1,5,100,100)

conv_layer = torch.nn.Conv2d(in_channels,
                             out_channels,
                             kernel_size = kernel_size)

output = conv_layer(input)      # (1,10,98,98)
print(conv_layer.weight.shape)  # (10,5,3,3)

Padding 保持输出图像尺寸不变。3x3核pad 1圈,5x5 pad 2圈…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import torch

input = [3, 4, 6, 5, 7,
         2, 4, 6, 8, 2,
         1, 6, 7, 8, 4,
         9, 7, 4, 6, 2,
         3, 7, 5, 4, 1]

input = torch.Tensor(input).view(1, 1, 5, 5)

conv_layer = torch.nn.Conv2d(1,1, kernel_size=3, padding=1, bias=False)

kernel = torch.Tensor([1, 2, 3,
                       4, 5, 6,
                       7, 8, 9]).view(1, 1, 3, 3)   # (out_chnls, in_chnls, h, w)
conv_layer.weight.data = kernel.data

output = conv_layer(input)
print(output)

stride 卷积核移动步长,(减少步数)用于缩小feature map 的大小

Max Pooling 下采样, 每 h x w 区块里面取最大值,kerenl_size=2 尺寸缩小为原来的一半。该操作无参数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import torch

input = [3, 4, 6, 5, 7,
         2, 4, 6, 8, 2,
         1, 6, 7, 8, 4,
         9, 7, 4, 6, 2,
         3, 7, 5, 4, 1]

input = torch.Tensor(input).view(1, 1, 5, 5)

maxpooling_layer = torch.nn.MaxPool2d(kernel_size=2)

output = maxpooling_layer(input)
print(output)

>>> tensor([[[[4., 8.],
>>>           [9., 8.]]]])

定义网络:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
        self.pooling = torch.nn.MaxPool2d(kernel_size=2)
        self.fc = torch.nn.Linear(320, 10)

    def forward(self, x):
        batch_size = x.size(0)
        x = F.relu(self.pooling(self.conv1(x)))
        x = F.relu(self.pooling(self.conv2(x)))
        x = x.review(batch_size, -1)    # or flatten
        x = self.fc(x)
        return x    # 使用交叉熵损失,所以不做激活

model = Net()

device = torch.device("cuda:0" if torch.cuda.is_avaliable() else "cpu")
model.to(device)
Built with Hugo
Theme Stack designed by Jimmy