memo: Vis | Visual Metrics

1 Peak signal-to-noise Ratio

  • 两幅大小为m×n的单色图,一幅是原图I,另一幅是近似图像K。 两者之间的平均平方误差Mean Squared Error是:

    $$ MSE = \frac{1}{mn} \sum_{i=1}^m \sum_{j=1}^n\ [I(i,j) - K(i,j)]^2 $$

    则峰值信噪比定义为: PSNR = 10 log₁₀ (MAXᵢ² / MSE)

    MAXᵢ 是原图中可能的最大像素值,当一个像素用8 bits表示时,MAXᵢ=255

  • 为什么是这样子?

  • 常用于衡量有损压缩编解码器的重建质量。 wikipedia

  • MSE越小,PSNR越大。当两图没有误差时,PSNR趋于无穷


2 Structural SIMilarity Index

  • 用于预测数字图像的可感知质量 perceived quality

SSIM ² is used to measure the distortion degree of an image, or measure the similarity between two images from 3 aspects: Luminance, Contrast, and Structure.

  1. Ratio of luminance l(X,Y) = 2μₓμᵧ+C₁/(μₓ²+μᵧ²+C₁), where μₓ is the mean of the intensity of all pixels in image X: μₓ = 1/N ∑ᵢ₌₁ᴺ Xᵢ. And C₁ prevents the divisor from being 0.
    Based on AM-GM inequality: (√x - √y)²≥0, if and only if μₓ=μᵧ, l(X,Y)=1.

  2. Ratio of contrast c(X,Y) = 2σₓσᵧ+C₂/(σₓ²+σᵧ²+C₂), where σₓ is the (non-bias) standard deviation of the intensity of all pixels in the image X: σₓ = (1/(N-1) ∑ᵢ₌₁ᴺ (Xᵢ-μₓ)²)¹ᐟ².
    If and only if σ₁=σ₂, c(X,Y)=0.

  3. Structure is reflected by correlation coefficient: s(X,Y) = σₓᵧ+C₃/(σₓσᵧ+C₃), where σₓᵧ is the covaraiance of the intensity of 2 images: σₓᵧ= 1/(N-1) ∑ᵢ₌₁ᴺ (Xᵢ-μₓ)(Yᵢ-μᵧ).

    The intent of the paper may be to measure the information aside from Luminance and Contrast , so each pixel is reduced by the mean and devided by stddev: (Xᵢ-μₓ)/σₓ and (Yᵢ-μᵧ)/σᵧ, i.e., normalization, and then compute the distance between corresponding pixels in 2 images by inner product.

    Thus, S(X,Y) is 1/(N-1) ∑ᵢ₌₁ᴺ [(Xᵢ-μₓ)/σₓ ⋅ (Yᵢ-μᵧ)/σᵧ ] = σₓᵧ/σₓσᵧ

The final expression is the product of the above 3 features with specified power (weights) α β γ:

S(X,Y) = l(X,Y)ᵅ ⋅ c(X,Y)ᵝ ⋅ s(X,Y)ᵞ ∈ [-1,1]

where l(X,Y), c(X,Y), s(X,Y) ∈ [-1,1], and since brightness ≥0, the actural range of l and c are (0,1]. If and only if the images X and Y are the same, the 3 items are equal to 1 at the same time.

Let α, β, γ=1, the SSIM(X,Y) = $\frac{ 2(μₓμᵧ+C₁) (2σₓᵧ + C₂) }{(μₓ²+μᵧ²+C₁) (σₓ²+σᵧ²+C₂)}$, where C₁ = (K₁L)², C₂ = (K₂L)², C₃=C₂/2, and L is the maximum for a pixel (L=2^b). Based on the rule of thumb, K₁ = 0.01, K₂=0.03.

In practice, SSIM is not performed on the entire image, but calculating the mean and stddev in a local sliding window (kernel, filter of 11x11 with stddev=1.5, sum=1), which represents a circular-symmetric Gaussian Weighting Function.

Hence, the local mean and std-dev are confined within the kernel:

  • μₓ = ∑ᵢ wᵢ Xᵢ
  • σₓ = ∑ᵢ wᵢ (Xᵢ-μₓ)²)¹ᐟ²
  • σₓᵧ= ∑ᵢ wᵢ (Xᵢ-μₓ)(Yᵢ-μᵧ)

where wᵢ is the parameters of the Gaussian kernel smoothing the image, such that fine-details are smeared and compare mainly the general features.

At the end, the average of all local SSIM is M-SSIM(X,Y) = 1/M . ∑ⱼ₌₁ᴹ SSIM(Xᵢ, Yⱼ)

Used as loss

Since the SSIM measures the similarity between 2 images, it can be used in the supervised training. Thus, SSIM dissimilarity, SSIMD = (1-SSIM)/2 ∈ (0,] is a kind of loss function.

有损压缩Lossy compression两种基本机制: wikipedia

  1. 有损变换编解码:对图像/声音采样,
  2. 预测编解码:

3 LPIPS

The motivation of Learned Perceptual Image Patch Similarity (LPIPS) ³ is that:
the conclusion for the similarity between 2 images from deep neural networks are aligned with humans perception. But the structure-based metrics usually give opposite judgement on the too-smoothed images . (2023-02-18)

LPIPS compares the intermediate convolutional feature vectors at different levels.

lpips.LPIPS(net="alex") 把两幅图输入神经网络 (VGG, Alexnet) 进行多层级特征间的对比。 每层输出的 feature mapᴴˣᵂˣᶜ 激活后归一化,再相减,将各层差异按像素加权求和,并(除以像素个数)做spatial 平均,再把各层差异加起来

Ref