memo: PyTorch | Pre-trained models

How to extract features of an image from a trained model - PyTorch Forum

How can l load my best model as a feature extractor/evaluator?

Load and call

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


import torch
from torch import nn
from torchvision import models, transforms
import PIL

resnet50 = models.resnet50(weights='DEFAULT')
resnet50.eval()

im_tensor = transforms.ToTensor()(PIL.Image.open('data/nerf_llff_data/fern/images_4/image000.png'))

output = resnet50(im_tensor[None, ...])

Or using the weights object:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


from torchvision import models
from torchvision.io import read_image

im = read_image("data/nerf_llff_data/fern/images_4/image000.png")

# Step 1: Initialize model with the best available weights
weights = models.ResNet50_Weights.DEFAULT
model = models.resnet50(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(im).unsqueeze(0)

# Step 4: Use the model and print the predicted category
prediction = model(batch).squeeze(0).softmax(0)
category_id = prediction.argmax().item()
score = prediction[category_id].item()
category_name = weights.meta["categories"][category_id]
print(f"{category_name}: {100*score:.1f}%")

triceratops: 11.5%

But the prediction seems not to be accurate.

Adjust image input

example

1
2
3
4
5
6


# Initialize the Weight Transforms
weights = ResNet50_Weights.DEFAULT
preprocess = weights.transforms()

# Apply it to the input image
img_transformed = preprocess(img)

(2023-07-23)

Inspect model’s modules

Use hook to print layer name and the shape of their outputs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


import torch
from torch import nn, Tensor
from torchvision import models

class InspectModel(nn.Module):
  def __init__(self, model: nn.Module):
    super().__init__()
    self.model = model
    self.hook_handles = []

    # print layers and their outputs' shape
    for name, module in self.model.named_children():
      module.__name__ = name
      handle = module.register_forward_hook(
        lambda module, inputs, output: 
          print(f"{module.__name__}: {output.shape}; Op: {module._get_name()}"))
      self.hook_handles.append(handle)
  
  def forward(self, x: Tensor):
    self.model(x)
    for handle in self.hook_handles:
      handle.remove()

Another ugly way is using a for loop:

1
2


for name, module in self._modules.items():
    print(name)

Extract & Intrpl feature maps

Retrieve specified layers’s feature maps and interpolate them to the same size.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55


import torch
from torch import nn, Tensor
from torchvision import models
from typing import Callable, Iterable, Tuple
import torch.nn.functional as F

class ExtractIntrplFeatures(nn.Module):
  def __init__(self, model: nn.Module, 
               layer_names: Iterable[str] = None,
               chnl_dim: int = 1) -> None:
    super().__init__()
    self.model = model
    self.layer_names = layer_names
    self._features = {}

    # Register a hook for each layer
    if layer_names is None:
      layer_names = list(model._modules)

    for layerName in layer_names:
      layer = dict([*self.model.named_children()])[layerName]
      layer.register_forward_hook(self.save_features(
                                  layerName, chnl_dim))

  def save_features(self, layerName, chnl_dim) -> Callable:
    # print(layerName)
    def hook_func(module, inputs, output):

      if chnl_dim != 1:
        perm_order = list(range(len(output.shape)))
        perm_order.remove(chnl_dim)
        perm_order = [0, chnl_dim] + perm_order[1:]
        output = output.permute(*perm_order)

      self._features[layerName] = output
    return hook_func
    
  def forward(self, x: Tensor, size: Tuple[int] = None) -> Tensor:
    self.model(x)

    # Interpolate to the same size as the first conv feature
    for i in range(0, len(self._features)):
      layerName = list(self._features.keys())[i]

      if size is None:
        size = list(self._features.values())[0].shape[-2:]

      self._features[layerName] = F.interpolate(
        input=self._features[layerName],
        size=size,
        mode="bilinear",
        align_corners=True,
      )

    return torch.cat(list(self._features.values()), dim=1)

(2023-08-08)

Feature extraction

Get feature map at certain layers through create_feature_extractor() torchvision.models.feature_extraction — Torchvision 0.11.0 documentation I guess it cannot realize fine-tuning the pre-trained model.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


import torch
from torchvision import models
from torchvision.models.feature_extraction import get_graph_node_names
from torchvision.models.feature_extraction import create_feature_extractor

model = models.resnet50( weights="DEFAULT")
train_nodes, eval_nodes = get_graph_node_names(model)
print(train_nodes)
featExtractor = create_feature_extractor(model, return_nodes={'layer4.2.relu_2':'layer4_feat'})
inp = torch.ones(2, 3,224,224)
with torch.no_grad():
    out = featExtractor(inp) # dict

(2023-08-08)

.modules vs ._modules

convnext.modules is a method. Its output content is in a sepcific format.
convnext.modules() is a generator. Docs - nn.Module();
Docs - Modules
However, if I traverse it like for _ in convnext.modules(): print(_), it will repeatly print all the modules in the model.

And chatGPT answer:

modules() function also iterates through sub-modules of each module, resulting in duplicate prints. To avoid this, you can use the children(). Using children() will only give you the immediate sub-modules of the features module
1 2 3 4 5

import torchvision.models as models convnext = models.convnext_tiny(weights='DEFAULT') for module in convnext.features.children(): print(module)
convnext._modules is an OrderedDict
list(convnext._modules) only has the keys (name of the modules).

Truncate pre-trained model

Question for chatGPT:

“Given a pre-trained multiple-layer neural network in PyTorch, how to run a part of it and stop at certain layer?”

Just extract features up to a certain layer, without performing classification or regresion steps.

1
2
3
4
5
6
7
8


# 1. Load model
import torch.nn as nn
from torchvision import models
pre_model = models.alexnet(weights='DEFAULT')

# 2. Create a new model
sliced_model = nn.Sequential(*list(pre_model.features.children())[:5])
sliced_model.eval()

The new model will inherit the pre-trained weights.

AlexNet

Source code

AlexNet.features contains 13 modules

1
2
3
4


alexnet = models.alexnet(weights='DEFAULT')
alexnet_debug = InspectModel(alexnet.features)
dummy_input = torch.ones(1,3,224, 224)
alexnet_debug(dummy_input)

Get pre-logits vector

Retrieve the feature vector before it gets compressed to 1_000 categories, i.e., removing the last Linear layer of ‘classifier’ module in alexnet and making a new model

1
2
3
4
5
6


alexnet = models.alexnet(weights='DEFAULT')

# drop the last 'linear layer' in classifier module
new_classifier = nn.Sequential(*list(alexnet.classifier.children())[:-1])

alexnet.classifier = new_classifier

Get feature maps

Make a new model that stops at a certain feature map

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


class AlexNetConv4(nn.Module):
  def __init__(self, original_alexnet):
    super().__init__()

    # stop at conv4
    layer_list = list(original_alexnet.features.children())[:-3]

    self.features = nn.Sequential(*layer_list) 

  def forward(self, x):
    x = self.features(x)
    return x

model = AlexNetConv4(models.alexnet(weights='DEFAULT'))
dummy_input = torch.ones(1, 3, 224, 224)
features = model(dummy_input)  # (1, 256, 13, 13)

Retrieve the feature maps after ReLU each time:

1
2
3
4
5
6


alexnet = models.alexnet(weights='DEFAULT')
alexnet_features = ExtractIntrplFeatures(
    alexnet.features, layer_names=['1', '4', '7', '9', '11']
)
dummy_input = torch.ones(1, 3, 224, 224)
feat_alexnet = alexnet_features(dummy_input) # (1, 1152, 55, 55)

ResNet

Get feat map of resnet34

Pixel-NeRF obtains the feature map by copying the forward method until the layer4 and concat feature maps of each layer along the channels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53


class SpatialEncoder(nn.Module):
  def __init__(self, backbone):
    super().__init__()
    self.model = getattr(torchvision.model, backbone)(weights='DEFAULT')

    # Modifying model here doesn't affect forward method.
    self.model.avgpool = nn.Sequential()
    self.model.fc = nn.Sequential()


  def forward(self, img): # DTU img: (1, 3, H=300, W=400)
    ...
      x = self.model.conv1(x) # (B, 64, H/2, W/2)
      x = self.model.bn1(x)
      x = self.model.relu(x)

      latents = [x]       # store feature maps after different times of convlution
      if self.num_layers > 1:
          if self.use_first_pool:
              x = self.model.maxpool(x)  # (B, 64, H/4, W/4)
          x = self.model.layer1(x)   # (B, 64, H/4, W/4)
          latents.append(x)

      if self.num_layers > 2:
          x = self.model.layer2(x)   # (B, 128, H/8, W/8)
          latents.append(x)

      if self.num_layers > 3:
          x = self.model.layer3(x)   # (B, 256, H/16, W/16)
          latents.append(x)

      if self.num_layers > 4:
          x = self.model.layer4(x)   # (B, 512, H/32, W/32)
          latents.append(x)

      self.latents = latents
      align_corners = None if self.index_interp == "nearest " else True   
      latent_sz = latents[0].shape[-2:]   # (H/2, W/2)

      # expand the feature maps to the original size
      for i in range(len(latents)):   
          latents[i] = F.interpolate(
              input=latents[i],
              size=latent_sz,     # (H/2, W/2)
              mode=self.upsample_interp,  # bilinear
              align_corners=align_corners,
          )
      self.latent = torch.cat(latents, dim=1)     # (B, 64+64+128+256, H/2, W/2)
    self.latent_scaling[0] = self.latent.shape[-1]  # W, 200
    self.latent_scaling[1] = self.latent.shape[-2]  # H, 150
    self.latent_scaling = self.latent_scaling / (self.latent_scaling - 1) * 2.0     
    # tensor([200., 150.]) -> tensor([2.0101, 2.0134])
    return self.latent  # (B, 512, H/2, W/2)

There is a flatten step in the forward method (Source code), so even the avgpool and fc are canceled as self.model.fc = nn.Sequential(), the output after calling the modified model will still become a vector, but not a feature map (planes).

The solution is to create a model whose forward method doesn’t contain the torch.flatten operation.

1
2
3
4
5


resnet34 = models.resnet34(weights='DEFAULT')
resnet_feat = nn.Sequential(*list(resnet34.children())[:-2])

im_tensor = transforms.ToTensor()(PIL.Image.open('path/to/png'))[None,...]
resnet_feat(im_tensor)

However, if want to keep the feature map at every level, ~~the forward method has to be rewritten.~~ hooks for modules can realize this.

(2023-07-23)

Hook feat maps

1
2
3
4
5
6
7
8


resnet34 = models.resnet34(weights='DEFAULT')
# save feature maps after: relu, layer1, layer2, layer3.
resnet_features = ExtractIntrplFeatures(
        resnet34, ["relu", *list(resnet34._modules)[4:-3]] )

im_tensor = transforms.ToTensor()(
  PIL.Image.open('data/nerf_llff_data/fern/images_4/image000.png'))
feat = resnet_features(im_tensor.unsqueeze(0))

The result is identical to pixel-NeRF’s SpatialEncoder, which I copied its definition to a ipynb and instantiate it:

1
2
3


pixelNeRFEncoder = SpatialEncoder()
feature_maps = pixelNeRFEncoder(im_tensor.unsqueeze(0))
torch.eq(feat, feature_maps).detach().numpy().all()

(2023-07-21)

MobileNet v3

MobileNetV3 contains 3 components: features, avgpool, and classifier.

The MobileNetV3.features starts with a Conv2dNormActivation layer followed by 15 InvertResidual blocks, and ends with a Conv2dNormActivation layer.

An InvertResidual module is a nn.Sequential model of 3 Conv2dNormActivation() layers, corresponding to “expand”, “depthwise”, and “project”. Source code

Inspect mobilenet

1
2
3
4


mobilenetv3 = models.mobilenet_v3_large(weights='DEFAULT')
mobilenet_inspect = InspectModel(mobilenetv3.features) 
dummy_input = torch.ones(1, 3, 224, 224)
mobilenet_inspect(dummy_input)

Hook feature maps

1
2
3
4
5
6


mobilenetv3 = models.mobilenet_v3_large(weights='DEFAULT')
mobilenet_feats = ExtractIntrplFeatures(
    mobilenetv3.features, 
    layer_names=list(mobilenetv3.features._modules)
)
feat_mbnet = mobilenet_feats(dummy_input) # (1,1224,112,112)

faster-rcnn Obj Detect

Everything you need to know about TorchVision’s MobileNetV3 implementation - PyTorch blog

Docs | Faster RCNN Code

MobileNet feature maps –> Feature Pyramid Network –> FasterRCNN detector for locating bounding box

1
2
3
4


high_res = models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True) 
high_res.eval()
out = high_res([im_tensor[0]])[0]
out.keys()

1
2
3
4
5
6
7


low_res = models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(
        weights=FasterRCNN_MobileNet_V3_Large_320_FPN_Weights.DEFAULT)
low_res.eval()
x = [im_tensor, torch.rand(3, 500, 400)]
predictions = low_res(x)

print(predictions[0].keys())

lraspp segment

Outputs are masks for different classes. I guess these masks may be not suitable to serve as feature maps.

1
2
3
4
5
6


lraspp = models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
# input: (B, C, H, W); output: (B, C', H, W)
output = lraspp(im_tensor)

out = output['out'].detach().numpy()
plt.imshow(out[0][3])

deeplabv3 segment

1
2


deeplabv3 = models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
deeplabv3(im_tensor)

(2023-07-22)

Replace forward of predefined model

How can I replace the forward method of a predefined torchvision model with my customized forward function?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


import torchvision.models as models
import torch

def new_forward(self, x):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.maxpool(x)

    x = self.layer1(x)
    x = self.layer2(x)
    x = self.layer3(x)
    x = self.layer4(x)
    return x


# define a resnet instance
resent = models.resnet18()

# add new_forward function to the resnet instance as a class method
bound_method = new_forward.__get__(resent, resent.__class__)
setattr(resent, 'forward', bound_method)

(2023-07-22)

Extract & interpolate feature maps

Question: I want to extract and interpolate all the features maps after each stage of convolution to the same size form a pytorch pretrained model, for example, mobilenet v3. Please give the code.

Bing chat

Here is an example code that extracts and interpolates all the feature maps after each stage of convolution to the same size from a PyTorch pre-trained model, MobileNet V3.

Click to expand the code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.models.mobilenetv3 import mobilenet_v3_large

class FeatureExtractor(nn.Module):
    def __init__(self):
        super(FeatureExtractor, self).__init__()
        self.features = nn.ModuleList([
            nn.Sequential(
                m.conv,
                m.bn,
                nn.ReLU(inplace=True),
                m.hs,
                m.se,
                m.conv_pw,
                m.bn_pw,
                nn.ReLU(inplace=True),
                m.hs_pw,
            ) for m in mobilenet_v3_large().features[:-1]
        ])
        
    def forward(self, x):
        out = []
        for f in self.features:
            x = f(x)
            out.append(x)
        return out

model = FeatureExtractor()
x = torch.randn(1, 3, 224, 224)
out = model(x)

This code creates a FeatureExtractor class that extracts all the feature maps after each stage of convolution to the same size from a PyTorch pre-trained model, MobileNet V3. The forward method takes an input tensor x and returns a list of feature maps. The out variable contains all the feature maps after each stage of convolution to the same size.

I hope this helps! Let me know if you have any other questions.

Run it:

AttributeError: ‘Conv2dNormActivation’ object has no attribute ‘conv’

ChatGPT

To extract and interpolate all the feature maps after each stage of convolution in a PyTorch pre-trained model like MobileNetV3, you can use the intermediate_layers functionality to access intermediate outputs and the nn.Upsample module for interpolation. Here’s a step-by-step guide on how to achieve this:

Load the pre-trained MobileNetV3 model.
Modify the model to store intermediate feature maps during forward pass.
Use the nn.Upsample module to interpolate the feature maps to a common size.

First, make sure you have PyTorch and torchvision installed: pip install torch torchvision

Now, let’s create the code:

Click to expand the code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55


import torch
import torch.nn as nn
import torchvision.models as models

class MobileNetV3WithIntermediateOutputs(nn.Module):
    def __init__(self, model_name='mobilenet_v3_large', pretrained=True):
        super(MobileNetV3WithIntermediateOutputs, self).__init__()
        self.model = models.__dict__[model_name](pretrained=pretrained)

        # Register hooks to save intermediate outputs
        self.intermediate_outputs = []
        self.hook_handles = []
        for module in self.model.modules():
            handle = module.register_forward_hook(self.hook_fn)
            self.hook_handles.append(handle)

    def hook_fn(self, module, input, output):
        self.intermediate_outputs.append(output)

    def forward(self, x):
        self.intermediate_outputs = []  # Clear previous intermediate outputs
        _ = self.model(x)
        return self.intermediate_outputs

def interpolate_to_common_size(feature_maps, target_size):
    _, _, H, W = feature_maps[-1].size()
    if (H, W) == target_size:
        return feature_maps

    interpolated_feature_maps = []
    for fmap in feature_maps:
        interpolated_feature_maps.append(nn.functional.interpolate(fmap, size=target_size, mode='bilinear', align_corners=False))
    return interpolated_feature_maps

# Example usage:
model_name = 'mobilenet_v3_large'
input_size = 224
target_size = (56, 56)  # Specify the target size you want to interpolate the feature maps to

# Load the MobileNetV3 model with intermediate outputs
model = MobileNetV3WithIntermediateOutputs(model_name=model_name, pretrained=True)

# Set the model to evaluation mode
model.eval()

# Create a random input tensor (batch size 1, 3 channels, height=input_size, width=input_size)
input_tensor = torch.rand(1, 3, input_size, input_size)

# Get the intermediate feature maps
intermediate_feature_maps = model(input_tensor)

# Interpolate the feature maps to the common size
interpolated_feature_maps = interpolate_to_common_size(intermediate_feature_maps, target_size)

# The 'interpolated_feature_maps' now contain all the intermediate feature maps, each resized to the 'target_size'.

In the code above, we create a MobileNetV3WithIntermediateOutputs class that inherits from nn.Module. This class will hold the pre-trained MobileNetV3 model and includes a hook function to store intermediate feature maps during forward pass.

The interpolate_to_common_size function takes a list of feature maps and resizes each one to the specified target_size using bilinear interpolation. If the size of the feature maps already matches the target size, they are left unchanged.

Finally, we load the MobileNetV3 model with intermediate outputs, perform a forward pass with a random input tensor, and then interpolate all the intermediate feature maps to the common size specified by target_size.

Please note that the example assumes you want to interpolate the feature maps to a target size of (56, 56). You can modify the target_size to your desired dimensions. Additionally, you can change the model_name parameter to use other versions of MobileNetV3, such as ‘mobilenet_v3_small’, etc.

Run it:

1
2
3
4
5
6
7


/home/zichen/Downloads/GNT-comments/temptest.ipynb Cell 4 in interpolate_to_common_size(feature_maps, target_size)
     25 def interpolate_to_common_size(feature_maps, target_size):
---> 26     _, _, H, W = feature_maps[-1].size()
     27     if (H, W) == target_size:
     28         return feature_maps

ValueError: not enough values to unpack (expected 4, got 2)

It used hook, but it registered a hook for each module. But it didn’t stop before torch.flatten when forwarding, so the last feature_maps is a vector. Maybe that’s my bad in prompting.

(2023-07-24)

ConvNeXt-tiny

Print the shape of output after each layer:

1
2
3
4


convnext = models.convnext_tiny(weights='DEFAULT')
convnext_inspect = InspectModel(convnext)
dummy_input = torch.ones(1,3,754,1008)
convnext_inspect(dummy_input)

Extract specified feature maps and interpolate them to specified size:

1
2
3


convnext_features = ExtractIntrplFeatures(convnext.features, ['0', '1'])
H_maps, W_maps = dummy_input.size()[-2:]
features = convnext_features(dummy_input, (H_maps, W_maps)) # (1,192,754,1008)