Background
全连接:线性层串行
输入图像 in pytorch:(C, H, W) ToTensor doc
CNN = Feature Extraction + Classification
CCD 光敏电阻 + 透镜系统, 一个电阻只检测一个光锥区域的光线,根据电阻与光强的函数关系,得到灰度图
input channels: rgb
Conv Layer
每次取一个块:(C * kernel_h * kernel_w) 通过 Conv layer 得到 (C’ * kernel_h’ * kernel_w’)。
做 Conv layer 时,input 的所有 channel 都做卷积操作(各通道使用的卷积核不同),然后(按不同权重)相加。
这两步做 n 次,那么 output的 channels 就是 n。所以后续的每一层 channel 都包含了input 的全部通道的信息。
CNN 的权重存在于卷积核中。
-
(2023-10-22) 各输入通道 乘以的 kernel 不同,然后各通道直接 相加:
1 2 3 4 5 6 7 8 9 10a = torch.arange(18.).reshape(1, 2,3,3) # (bs, C, h,w) n = 1 # convolution repeat times conv_lyr = torch.nn.Conv2d(a.size(1), n, kernel_size=3, bias=False) print(conv_lyr.weight) # shape (1,2,3,3) conv_lyr.weight = torch.nn.Parameter( torch.stack([torch.ones(3,3), torch.zeros(3,3)]).unsqueeze(0)) print(conv_lyr.weight) out_a = conv_lyr(a) # (1,1,1,1), tensor([[[[36.]]]]) # i.e. 0+1+2+3+4+...+8 + 0+0+..0 = 36. -
(2023-10-22) Conv2d 与 FC 做的数学运算 相同:
-
如果是 1x1 的卷积 (stride=1),即没有邻居像素相加,Conv 就会和 FC 等价: 把每个像素投影到另一个维度的空间。
chnl1 × w’ + chnl2 × w’’ + chnl3 × w’’’ = out-chnl1
dim1 × w’ + dim2 × w’’ + dim3 × w’’’ = out-dim1
-
(2023-12-01) I already forgot the original meaning of the left picture because I didn’t write descriptions.
For a FC layer given by 4 samples of 3 dimensions, each dimension multiplied by a factor wₓ becomes a portion of the out dimension. This corresponds to the operations in Conv2d layer: every channel is multiplied by a kernel and then sum up all weighted channels to form one of out channels.
The difference is that in FC the 4 samples aren’t merged, but Conv2d merged 4 pixels to 1 pixel. Thus, the number of pixels in FC is consistent, whereas conv layer reduces pixels.
Each channel in feature maps always shares a common 2-D kernel. A Conv2d layer stands for a 4-D kernel as each channel uses different kernels.
-
|
|
Padding 保持输出图像尺寸不变。3x3核pad 1圈,5x5 pad 2圈…
|
|
stride 卷积核移动步长,(减少步数)用于缩小feature map 的大小
Max Pooling 下采样, 每 h x w 区块里面取最大值,kerenl_size=2 尺寸缩小为原来的一半。该操作无参数
|
|
定义网络:
|
|