memo: PyTorch | Sample & Interpolation

F.grid_sample

F.grid_sample(tensor,p,mode) 把 tensor 插补成与网格 p 有相同大小的 tensor。网格p指定了在 input 上的采样点坐标，在采样点附近插值形成一个新像素，各 channel 上采样点相同。 grid 中的 (x,y) 已缩放到 [-1,1]，在函数内变换到 [0,W],[0,H] 索引像素 PyTorch中grid_sample的使用方法-csdn

mode = ‘bilinear’ 是三线性插值: 双线性插值（bilinear）的3D形式(卿卿小徐的评论); c++源码
4D input 对应一个 batch 的图片 (B,C,H,W)。
align_corners = True 认为input以像素为单位，各像素由其中心点代表，从而做双线性插值的时候，就用像素的四个角点； (2023-10-08: 角点？邻点？)
- False 是把 input 当作各像素角点的集合，此时 input 的边界 (W-1,H-1) 小于原来的图片边界 (W,H)，则grid采样点的坐标可能超出 input边界（比如采样点落在图片边缘，而双线性插值需要用周围 4 个点），所以需要在input外围padding，再与各neighbor点做插值。
  
  torch.nn.functional.grid_sample() 函数实例-csdn
- As shown below, black points are input datapoints. In the left figure, the datapoints fit the pixels corners of the input image to be scaled, whereas in the right figure, the datapoints forms an independent image to perform interpolation.
- A numerical example of difference: Docs - nn.Upsample
- (2024-03-09) In the left figure, the corners of the grid formed by pixels and sampling points are aligned. Whereas, the right figure isn’t.
像素中心点的位置与一个像素的大小有关，所以对于相同维度的输入，采样点坐标可能不同。所以使用像素角点做为基准 (align_corners = False)，与像素尺寸无关，采样点位置是相对的。 Docs
线性插值是一维长度的加权平均，双线性插值是二维面积（两个方向）的加权平均：每个顶点的权重是其对角位置上的矩阵面积占比；三线性插值是三维体积的加权和。

【三维重建和NeRF常用的三线性插值，原理讲解+代码实现】-意の茗
双线性插值：在x方向和y方向上做线性回归并预测在目标点上的数值。所插值是周围4个neighbor的加权和，权重是两方向上neighbor到目标点距离与两点间距之比的乘积

grid_sample()函数及双线性采样 - 180天后再改名的文章 - 知乎

(2023-12-18) F.grid_sample() cannot sample images?

1
2
3


import cv2
src = cv2.imread("dtu/Rectified/scan11_train/rect_011_3_r5000.png") # ndarray, (h,w,3)
F.grid_sample(torch.from_numpy(src).unsqueeze(0), uv_src.view(1, 192*h, w, 2))

And it will lead to: RuntimeError: grid_sampler_2d_cpu not implemented for Byte

Therefore, the image requires to be normalized to [0,1]:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


import PIL
import torch.nn.functional as F
from torchvision import transforms
from matplotlib import pyplot as plt

h, w = 128, 160
y,x = torch.meshgrid(torch.arange(h), torch.arange(w))
normalized_y, normalized_x = y/h, x/w
normalized_xy = torch.stack([normalized_x, normalized_y], dim=-1)

src = PIL.Image.open("dtu/Rectified/scan11_train/rect_009_6_r5000.png")
src = transforms.ToTensor()(src)
samp= F.grid_sample(src.unsqueeze(0), normalized_xy.unsqueeze(0))

# revert to an image
samp_scaled = torch.tensor(samp[0] * 255, dtype=int).permute(1,2,0)
plt.imshow(samp_scaled)

Example in GNT

train_imgs is a 4D tesnor.

featmaps is a 5D tensor (N, Chanl, D, H, W,), the first dimension also vary determined by the indexing tensor

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


# compute the projection of the query points to each reference image
pixel_locations, mask_in_front = self.compute_projections(xyz, train_cameras)   # pixel coords: (n_views, n_rays, n_samples, 2), (n_views, n_rays, n_samples)
normalized_pixel_locations = self.normalize(    # pixel coords range: ([0,h],[0,w])-> [-1,1] for F.grid_sample
    pixel_locations, h, w
)  # [n_views, n_rays, n_samples, 2]

# rgb sampling
rgbs_sampled = F.grid_sample(input=train_imgs, grid=normalized_pixel_locations, align_corners=True) # (n_views, 3, n_rays, n_samples)
rgb_sampled = rgbs_sampled.permute(2, 3, 0, 1)  # [n_rays, n_samples, n_views, 3]

# deep feature sampling     # sample n_view feature maps for each 3D point. All chanls on the location in a feature map will be taken.
feat_sampled = F.grid_sample(featmaps, normalized_pixel_locations, align_corners=True)  # (n_views, out_chnls, H, W)->(n_views, out_chnls, n_rays,n_samples)

F.interpolate

F.interpolate(input, size, [scale_factor,] mode, ) 把 input 缩放到 size，或者各维度缩放 factor 倍。插值算法为mode。可以处理3D,4D,5D input。4D input的维度：(B, chnls, H, W) Docs

1
2
3
4
5
6
7
8


import torch
depth_ref = np.load("mvsnet_depth.npy").reshape(480,640)
depth_data = torch.tensor(depth_ref)
print(depth_data.shape)
rsz = torch.nn.functional.interpolate(depth_data[None,None], size=(60, 80), mode="bicubic", align_corners=False,).squeeze()
print(rsz.shape) # (480,640)
plt.imshow(rsz)
plt.savefig("PT_F_intrpl_exmpl.jpg",bbox_inches='tight', dpi =100)

scipy zoom

(2024-05-06)

1
2
3
4


from scipy.ndimage import zoom
scale_f_depth = 1/8
depth_rsz = zoom(depth_ref, (scale_f_depth, scale_f_depth), order=3) # (8,10)
plt.imshow(depth_rsz)

If mode="constant", the bottom row is 0 (cval).

(2023-10-10)

nn.Upsample

This “layer” has no learnable parameters:

1
2
3
4


upsampler = torch.nn.Upsample(size=(3,4), mode="bilinear")
a = torch.arange(4).view(1,1, 2,2).float()
dict(upsampler.named_parameters()).items()
# out:  dict_items([])

So it may be equal to torch.nn.functional.interpolate, which can be used as a layer in a model: Which function is better for upsampling: upsampling or interpolate?

torch.nn.functional layers require passing the learnable parameters from outside, since they don’t contain nn.Parameter like nn.Module does. Are torch.nn.Functional layers learnable?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    self.weight = nn.Parameter(torch.randn(out_features, in_features))
    self.bias = nn.Parameter(torch.randn(out_features))

    # Corresponding to a nn.module:
    self.linear = nn.Linear(in_features, out_features)

  def forward(self, input):
    return F.linear(input, weight, bias)

    # return self.linear(input)

(2023-10-22)

Downsample

Conv layer can perform evenly downsampling, referring to Context Cluster Reducing points by fusing neighbors covered by kernels:

1
2


proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size,
                      stride=stride, padding=padding)

Pooling can also be used for placing centers, referring to Context Cluster

1

centers_proposal = nn.AdaptiveAvgPool2d((proposal_w, proposal_h))

F.interpolate

Trilinear Interp

(2024-02-28)

Perform 2-point interpolation in 3 directions sequentially:

Each point has 3 weights: u, v, w (or (1-u), (1-v), (1-w))

Table of contents